Base SAS Programming 1

Date: 2018/14/03

This note contains some programming examples that facilicate my learning in SAS. It works best and is mostly based on the Base SAS certification textbook and the lessons on SAS Programming 1 offered by the SAS Institute.

Content:

Example 1: Options

Example 2: Proc Contents

Example 3: Proc Print

Example 4: Proc Sort + By

Example 5: SAS Format

Example 6: Data Step

Initialization


In [ ]:
libname orion "/folders/myshortcuts/SAS/SAS_Programming/ecprg193";
libname orion clear; * Clear the libname;

libname orion "/folders/myshortcuts/SAS/SAS_Programming/ecprg193";

Example 1: Options

options nonumber nodate;
options date;
options number pageno=3;
option pagesize=15;
option linesize=64;
options datastmtchk = allkeywords;

This example saves the current system option settings using the OPTSAVE procedure.

  • Remember when you use a WHERE statement in the DATA step, the WHERE expression must reference only variables from the input data set.
  • If the default value of nnnn (1920) is in effect, the 100-year span begins with 1920 and ends with 2019. Therefore, any informat or function that uses a two-digit year value that ranges from 20 to 99 assumes a prefix of 19. For example, the value 92 refers to the year 1992. See YEARCUTOFF= System Option.

In [ ]:
proc optsave out = optionsave; 
run;

/* data countries */
option date pageno = 3 pagesize = 15 linesize = 64;

data countries;
set orion.country;
run;

title "countries";
proc print data = countries;
run;

/* data countries1 with new options*/
data countries1;
set orion.country;
where Continent_ID >= 92;
run;

options yearcutoff = 1925 firstobs=4;
title "countries1 (yearcutoff option)";
proc print data = countries1;
run;

/* Restore system options */ 
proc optload data = optionsave;
run;

/* Check the difference after restoring options setting */
title "countries (option restored)";
proc print data = countries; run;

title "countries1 (option yearcutoff restored)";
proc print data = countries1; run;
title;

Example 2: proc contents

  • "nods" stands for "no details". It is a keyword to suppress the descriptor data for each individual file in the library. Without it, SAS produces a long list of output.
  • "libref._ALL_" requests a listing of all files in the library. (Use a period (.) to append the key word _ALL_ to the libref.)

In [ ]:
proc contents data = orion._all_ nods;
run;

proc contents data = orion._all_;
run;

By default, PROC CONTENTS and PROC DATASETS list variables alphabetically. To list variable names in the order of their logical position (or creation order) in the data set, you can specify the VARNUM option in PROC CONTENTS or in the CONTENTS statement in PROC DATASETS.

  • "varnum" returns the number of a variable's position in a SAS data set;
  • "position" generates the "Variables in Creation Order" table;

In [ ]:
proc contents data = orion.orders varnum; 
run;

proc contents data = orion.orders position;
run;

The major difference between the CONTENTS procedure and the CONTENTS statement in PROC DATASETS is the default for libref in the DATA= option. For PROC CONTENTS, the default is either Work or User. For the CONTENTS statement, the default is the libref of the procedure input library. Notice also that PROC DATASETS supports RUN-group processing. It uses a QUIT statement to end the procedure. The QUIT statement and the RUN statement are not required.

Example 3: proc print

PROC PRINT <option(s)>;
BY <DESCENDING> variable-1 <…<DESCENDING> variable-n> <NOTSORTED>;
PAGEBY BY-variable;
SUMBY BY-variable;
ID variable(s) <option>;
SUM variable(s) <option>;
VAR variable(s) <option>;


Special WHERE Operators

BETWEEN - AND
WHERE SAME AND
IS NULL
IS MISSING
LIKE

% any number of characters
_ one character

where Name like '%N'
where Name like 'T_m%'

See PRINT Procedure


In [ ]:
title "Orion.Sales Specified";
proc print data = orion.sales noobs label split = ' ';
	label Job_Title = "Job Title"; 
	var Employee_ID First_Name Last_Name Job_Title;
	where First_Name like 'T_m_%' and Job_Title contains "Sales Rep.";
	format Last_Name $upcase. 
           Job_Title $quote25.;
run;

title "Orion.Sales Origal";
proc print data = orion.sales;
run;
title;

Example 4: Proc Sort + By

  • The NODUPKEY option deletes observations with duplicate specified in the BY values.
  • PROC SORT replaces the original data set unless you specify an output data set in the OUT= option.

In [ ]:
proc sort data = orion.orders out = work.custorders; 
	by Customer_ID;
run;

proc sort data = orion.orders out = work.custorders nodupkey;
	by Customer_ID;
run;
  • To affect any single file, you can use FIRSTOBS= or OBS= as data set options instead of as system options. Check the second output begins with Obs = 3
  • Specify the keyword DESCENDING before each variable. Placing after the variable generates error. As in:

      proc sort data=orion.sales          
         out=work.sales2;   
         by Country descending Salary;
      run;
  • Subsetting in the PROC SORT step is more efficient as it selects and sorts only the required observations.

In [ ]:
options firstobs = 3; 

/* Customer ID 9 is the first; 70221 the last*/ 
proc print data=work.custorders;
 by Customer_ID; *generate messy long code;
 id Customer_ID;
run;

proc sort data = orion.orders out=work.custorders nodupkey;
	by descending Customer_ID;
run;

/* Customer ID 70201 becomes the first; 4 the last. */ 
proc print data=work.custorders (firstobs = 3);
by descending Customer_ID;
run;

The NODUPKEY option checks for and eliminates observations with duplicate BY variable values. If you specify this option, PROC SORT compares all BY variable values for each observation to those for the previous observation written to the output data set. If an exact match using the BY variable values is found, the observation is not written to the output data set. The DUPOUT= option can be used only with the NODUPKEY option.

Compare the output below to the log file above and check if the number of obersvation matches.


In [ ]:
proc sort data = orion.orders out = work.custorders nodupkey dupout = work.duplicates;
	by Customer_ID;
run;

proc print data = work.duplicates;
run;

Example 6: SAS Format


In [ ]: