SAS: Part 0 - Base SAS, PROC SGPLOT


License

Copyright (c) 2015 by SAS Institute Inc., Cary, NC 27513 USA

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


NOTE: these examples are meant for the free SAS University Edition


1. SAS Output

  • The _null_ data step allows you to execute commands (like writing to the log) or to read a data set without creating a new data set
  • The put command outputs information to the log

In [1]:
data _null_;
    put 'Hello world!'; /* put command to the log */
run;


Out[1]:

11   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
12
13 data _null_;
14 put 'Hello world!'; /* put command to the log */
15 run;
Hello world!
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

16 ods html5 close;ods listing;

17

In [2]:
data _null_;
    x = 'Hello world!';
    put x;
    put x=; /* useful for debugging */
run;


Out[2]:

19   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
20
21 data _null_;
22 x = 'Hello world!';
23 put x;
24 put x=; /* useful for debugging */
25 run;
Hello world!
x=Hello world!
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

26 ods html5 close;ods listing;

27
file print redirects the results to html in a Jupyter notebook

In [3]:
data _null_;
    file print; /* redirects the results to html*/
    put 'Hello world!';
run;


Out[3]:
SAS Output

SAS Output

The SAS System

The DATASTEP Procedure

FilePrint1

Hello world!                                                                                                                        
Logging information levels - use these prefixes to print color-coded information to the log

In [4]:
data _null_;
    put 'NOTE: Hello world!';
    put 'WARNING: Hello world!';
    put 'ERROR: Hello world!';
run;


Out[4]:

38   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
39
40 data _null_;
41 put 'NOTE: Hello world!';
42 put 'WARNING: Hello world!';
43 put 'ERROR: Hello world!';
44 run;
NOTE: Hello world!
WARNING: Hello world!
ERROR: Hello world!
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

45 ods html5 close;ods listing;

46
You can also use the put macro statement
  • SAS macro statements are often used for program flow control around data step statements and SAS procedures
  • This tutorial will only use simple macro statements

In [5]:
%put Hello world!;
%put NOTE: Hello world!;
%put WARNING: Hello world!;
%put ERROR: Hello world!;


Out[5]:

48   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
49
50 %put Hello world!;
Hello world!
51 %put NOTE: Hello world!;
NOTE: Hello world!
52 %put WARNING: Hello world!;
WARNING: Hello world!
53 %put ERROR: Hello world!;
ERROR: Hello world!
54 ods html5 close;ods listing;

55
Macro variables are ALWAYS strings

In [6]:
%put 'Hello world!'; /* single quotes will be printed */


Out[6]:

57   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
58
59 %put 'Hello world!'; /* single quotes will be printed */
'Hello world!'
60 ods html5 close;ods listing;

61
The macro preprocessor resolves macro variables as text literals before data step code is executed
  • Single quotes PREVENT macro resolution
  • Double quotes ALLOW macro resolution

In [7]:
%let x = Hello world!;
%put &x;
%put '&x'; /* single quotes PREVENT macro resolution */
%put "&x"; /* double quotes ALLOW macro resolution */


Out[7]:

63   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
64
65 %let x = Hello world!;
66 %put &x;
Hello world!
67 %put '&x'; /* single quotes PREVENT macro resolution */
'&x'
68 %put "&x"; /* double quotes ALLOW macro resolution */
"Hello world!"
69 ods html5 close;ods listing;

70

2. Generate a sample data set

The SAS data set is the primary data structure in the SAS language

  • To simulate a sample data set for this example, macro variables are used to define the number of columns and rows
  • The size of data set is more typically defined by the size of the SAS data set(s) from which it is created

In [8]:
%let n_rows = 1000; /* define number of rows */
%let n_vars = 2;    /* define number of character and numeric variables */


Out[8]:

72   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
73
74 %let n_rows = 1000; /* define number of rows */
75 %let n_vars = 2; /* define number of character and numeric variables */
76 ods html5 close;ods listing;

77
The SAS data step is used to create and manipulate data sets

In [9]:
* options mprint; /* to see the macro variables resolve uncomment this line */
data scratch;

    /* data sets can be made permanent by creating them in a library */
    /* syntax: data <library>.<table> */
    /* a library is like a database */
    /* a library is usually directly mapped to a filesystem directory */  
    /* since a permanent library was not specified for the data set */
    /* the scratch data set will be created in the temporary library, work */
    /* it will be deleted when you leave SAS */

    /* SAS is strongly typed - it is safest to declare variables */
    /* using a length statement - especially for character variables */
    /* $ denotes a character variable */

    /* arrays are a data structure that can exist during the data step */
    /* they are a reference to a group of variables */
    /* horizontally across a data set */
    /* $ denotes a character array */
    /* do loops are often used in conjuction with arrays */
    /* SAS arrays are indexed from 1, like R data structures */

    /* a key is a variable with a unique value for each row */

    /* mod() is the modulo function */
    /* the %eval() macro function performs math operations */
    /* before text substitution */

    /* the drop statement removes variables from the output data set */

    /* since you are not reading from a pre-existing data set */
    /* you must output rows explicitly using the output statement */

    length key 8 char1-char&n_vars $ 8 numeric1-numeric&n_vars 8;
    text_draw = 'AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD EEEEEEEE FFFFFFFF GGGGGGGG';
    array c $ char1-char&n_vars;
    array n numeric1-numeric&n_vars;
    do i=1 to &n_rows;
        key = i;
        do j=1 to %eval(&n_vars);
            /* assign a random value from text_draw */
            /* to each element of the array c */
            c[j] = scan(text_draw, floor(7*ranuni(12345)+1), ' ');
            /* assign a random numeric value to each element of the n array */
            /* ranuni() requires a seed value */
            n[j] = ranuni(%eval(&n_rows*&n_vars));
        end;
      if mod(i, %eval(&n_rows/10)) = 0 then put 'Processing line ' i '...';
        drop i j text_draw;
        output;
    end;
    put 'Done.';
run;


Out[9]:

79   ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
80
81 * options mprint; /* to see the macro variables resolve uncomment this line */
82 data scratch;
83
84 /* data sets can be made permanent by creating them in a library */
85 /* syntax: data <library>.<table> */
86 /* a library is like a database */
87 /* a library is usually directly mapped to a filesystem directory */
88 /* since a permanent library was not specified for the data set */
89 /* the scratch data set will be created in the temporary library, work */
90 /* it will be deleted when you leave SAS */
91
92 /* SAS is strongly typed - it is safest to declare variables */
93 /* using a length statement - especially for character variables */
94 /* $ denotes a character variable */
95
96 /* arrays are a data structure that can exist during the data step */
97 /* they are a reference to a group of variables */
98 /* horizontally across a data set */
99 /* $ denotes a character array */
100 /* do loops are often used in conjuction with arrays */
101 /* SAS arrays are indexed from 1, like R data structures */
102
103 /* a key is a variable with a unique value for each row */
104
105 /* mod() is the modulo function */
106 /* the %eval() macro function performs math operations */
107 /* before text substitution */
108
109 /* the drop statement removes variables from the output data set */
110
111 /* since you are not reading from a pre-existing data set */
112 /* you must output rows explicitly using the output statement */
113
114 length key 8 char1-char&n_vars $ 8 numeric1-numeric&n_vars 8;
115 text_draw = 'AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD EEEEEEEE FFFFFFFF GGGGGGGG';
116 array c $
116! char1-char&n_vars;
117 array n
117! numeric1-numeric&n_vars;
118 do i=1 to &n_rows;
119 key = i;
120 do j=1 to %eval(&n_vars);
121 /* assign a random value from text_draw */
122 /* to each element of the array c */
123 c[j] = scan(text_draw, floor(7*ranuni(12345)+1), ' ');
124 /* assign a random numeric value to each element of the n array */
125 /* ranuni() requires a seed value */
126 n[j] = ranuni(%eval(&n_rows*&n_vars));
127 end;
128 if mod(i, %eval(&n_rows/10)) = 0 then put 'Processing line ' i '...';
129 drop i j text_draw;
130 output;
131 end;
132 put 'Done.';
133 run;
Processing line 100 ...
Processing line 200 ...
Processing line 300 ...
Processing line 400 ...
Processing line 500 ...
Processing line 600 ...
Processing line 700 ...
Processing line 800 ...
Processing line 900 ...
Processing line 1000 ...
Done.
NOTE: The data set WORK.SCRATCH has 1000 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

134 ods html5 close;ods listing;

135
Use PROC PRINT to print the data set

Procedure output is directed to html automatically in a Jupyter notebook


In [10]:
/* (obs=) option enables setting the number of rows to print */
proc print data=scratch (obs=5); run;


Out[10]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH

Obs key char1 char2 numeric1 numeric2
1 1 CCCCCCCC FFFFFFFF 0.74519 0.27628
2 2 BBBBBBBB AAAAAAAA 0.72888 0.73432
3 3 EEEEEEEE DDDDDDDD 0.76408 0.18159
4 4 BBBBBBBB DDDDDDDD 0.39360 0.85949
5 5 CCCCCCCC BBBBBBBB 0.18129 0.23532

3. Basic data manipulation and analysis

Use proc contents to understand basic information about a data set

In [11]:
proc contents data=scratch; run;


Out[11]:
SAS Output

SAS Output

The SAS System

The CONTENTS Procedure

The CONTENTS Procedure

WORK.SCRATCH

Attributes

Data Set Name WORK.SCRATCH Observations 1000
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 11/28/2016 01:56:19 Observation Length 40
Last Modified 11/28/2016 01:56:19 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    

Engine/Host Information

Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 1632
Obs in First Data Page 1000
Number of Data Set Repairs 0
Filename /tmp/SAS_work0F7300002FA6_localhost.localdomain/scratch.sas7bdat
Release Created 9.0401M3
Host Created Linux
Inode Number 277735
Access Permission rw-r--r--
Owner Name sasdemo
File Size 128KB
File Size (bytes) 131072

Variables

Alphabetic List of Variables and Attributes
# Variable Type Len
2 char1 Char 8
3 char2 Char 8
1 key Num 8
4 numeric1 Num 8
5 numeric2 Num 8
Use PROC FREQ to analyze categorical data

In [12]:
proc freq
    /* nlevels counts the discreet levels in each variable */
    /* the colon operator expands to include variable names with prefix char */
    data=scratch nlevels;
    /* request frequency bar charts for each variable */
    tables char: / plots=freqplot(type=bar);
run;


Out[12]:
SAS Output

SAS Output

The SAS System

The FREQ Procedure

The FREQ Procedure

NLevels

Number of Variable Levels
Variable Levels
char1 7
char2 7

Table char1

One-Way Frequencies

char1 Frequency Percent Cumulative
Frequency
Cumulative
Percent
AAAAAAAA 143 14.30 143 14.30
BBBBBBBB 143 14.30 286 28.60
CCCCCCCC 149 14.90 435 43.50
DDDDDDDD 142 14.20 577 57.70
EEEEEEEE 152 15.20 729 72.90
FFFFFFFF 137 13.70 866 86.60
GGGGGGGG 134 13.40 1000 100.00

Distribution Plots

Frequency Plot

Table char2

One-Way Frequencies

char2 Frequency Percent Cumulative
Frequency
Cumulative
Percent
AAAAAAAA 130 13.00 130 13.00
BBBBBBBB 135 13.50 265 26.50
CCCCCCCC 165 16.50 430 43.00
DDDDDDDD 141 14.10 571 57.10
EEEEEEEE 145 14.50 716 71.60
FFFFFFFF 142 14.20 858 85.80
GGGGGGGG 142 14.20 1000 100.00

Distribution Plots

Frequency Plot

Use PROC UNIVARIATE to analyze numeric data

In [13]:
proc univariate
    data=scratch;
    /* request univariate statistics for variables names with prefix 'numeric'' */
    var numeric:;
    /* request histograms for the same variables */
    histogram numeric:;
    /* inset basic statistics on the histograms */
    inset min max mean / position=ne;
run;


Out[13]:
SAS Output

SAS Output

The SAS System

The UNIVARIATE Procedure

Variable: numeric1

The UNIVARIATE Procedure

numeric1

Moments

Moments
N 1000 Sum Weights 1000
Mean 0.50633821 Sum Observations 506.33821
Std Deviation 0.28943761 Variance 0.08377413
Skewness -0.0121365 Kurtosis -1.2171282
Uncorrected SS 340.068739 Corrected SS 83.6903564
Coeff Variation 57.1629012 Std Error Mean 0.00915282

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.506338 Std Deviation 0.28944
Median 0.496667 Variance 0.08377
Mode . Range 0.99590
    Interquartile Range 0.49767

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 55.32045 Pr > |t| <.0001
Sign M 500 Pr >= |M| <.0001
Signed Rank S 250250 Pr >= |S| <.0001

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 0.99999473
99% 0.98796014
95% 0.94980769
90% 0.90868125
75% Q3 0.76605127
50% Median 0.49666691
25% Q1 0.26838609
10% 0.10392146
5% 0.05273732
1% 0.01445704
0% Min 0.00409347

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.00409347 215 0.994379 912
0.00433785 709 0.995267 483
0.00507468 729 0.996240 63
0.00610044 863 0.998160 191
0.00647425 498 0.999995 793

The SAS System

The UNIVARIATE Procedure

Histogram 1

Panel 1


The SAS System

The UNIVARIATE Procedure

Variable: numeric2

numeric2

Moments

Moments
N 1000 Sum Weights 1000
Mean 0.4982058 Sum Observations 498.205796
Std Deviation 0.2863714 Variance 0.08200858
Skewness 0.03568271 Kurtosis -1.1339161
Uncorrected SS 330.135587 Corrected SS 81.9265714
Coeff Variation 57.4805441 Std Error Mean 0.00905586

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.498206 Std Deviation 0.28637
Median 0.488096 Variance 0.08201
Mode . Range 0.99554
    Interquartile Range 0.48204

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 55.01475 Pr > |t| <.0001
Sign M 500 Pr >= |M| <.0001
Signed Rank S 250250 Pr >= |S| <.0001

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 0.99859106
99% 0.99375025
95% 0.95629616
90% 0.90871284
75% Q3 0.74257851
50% Median 0.48809598
25% Q1 0.26053719
10% 0.10560284
5% 0.04079423
1% 0.01021775
0% Min 0.00304634

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.00304634 72 0.995486 145
0.00450423 396 0.995995 888
0.00564384 87 0.996861 213
0.00698680 53 0.997395 355
0.00797265 395 0.998591 99

The SAS System

The UNIVARIATE Procedure

Histogram 1

Panel 1


4. Basic data manipulation

Subsetting data sets by column using data set options

In [14]:
* create scratch2 set;
data scratch2;
    /* set statement reads from a pre-existing data set */
    /* no output statement is required - this is more typical */
    /* using data set options: keep, drop, etc. is often more efficient than */
    /* corresponding data step statements */
    /* : notation */
    set scratch(keep=numeric:);
run;

* print first five rows;
proc print data=scratch2(obs=5); run;


Out[14]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH2

Obs numeric1 numeric2
1 0.74519 0.27628
2 0.72888 0.73432
3 0.76408 0.18159
4 0.39360 0.85949
5 0.18129 0.23532
Subsetting data sets by column using data step statements

SAS data step supports in place overwrites of data


In [15]:
* overwrite scratch2 set;
data scratch2;
    /* ranges of vars specified using var<N> - var<M> syntax */
    set scratch(keep=char1-char&n_vars);
run;

* print first five rows;
proc print data=scratch2(obs=5); run;


Out[15]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH2

Obs char1 char2
1 CCCCCCCC FFFFFFFF
2 BBBBBBBB AAAAAAAA
3 EEEEEEEE DDDDDDDD
4 BBBBBBBB DDDDDDDD
5 CCCCCCCC BBBBBBBB
Subsetting data sets by column using variable names

In [16]:
* overwrite scratch2 set;
data scratch2;
    /* by name */
    set scratch(keep=key numeric1 char1);
run;

* print first five rows;
proc print data=scratch2(obs=5); run;


Out[16]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH2

Obs key char1 numeric1
1 1 CCCCCCCC 0.74519
2 2 BBBBBBBB 0.72888
3 3 EEEEEEEE 0.76408
4 4 BBBBBBBB 0.39360
5 5 CCCCCCCC 0.18129
Subset and modify columns

In [17]:
* select two columns and modify them with data step functions;
* overwrite scratch2 set;
data scratch2;
    /* use length statement to ensure correct length of trans_char1 */
    /* the lag function saves the value from the row above */
    /* lag will create a numeric missing ('.') value in the first row */
    /* tranwrd finds and replaces character values */
    set scratch(keep=key char1 numeric1
        rename=(char1=new_char1 numeric1=new_numeric1));
    length trans_char1 $8;
    lag_numeric1 = lag(new_numeric1);
    trans_char1 = tranwrd(new_char1, 'GGGGGGGG', 'foo');
run;

* print first five rows;
* notice that '.' represents numeric missing in SAS;
proc print data=scratch2(obs=5); run;


Out[17]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH2

Obs key new_char1 new_numeric1 trans_char1 lag_numeric1
1 1 CCCCCCCC 0.74519 CCCCCCCC .
2 2 BBBBBBBB 0.72888 BBBBBBBB 0.74519
3 3 EEEEEEEE 0.76408 EEEEEEEE 0.72888
4 4 BBBBBBBB 0.39360 BBBBBBBB 0.76408
5 5 CCCCCCCC 0.18129 CCCCCCCC 0.39360
Subsetting rows using the where data set option

In [18]:
* select only the first row and impute the missing value;
* create scratch3 set;
data scratch3;
    /* the where data set option can subset rows of data sets */
    /* there are MANY other ways to do this ... */
    set scratch2 (where=(key=1));
    lag_numeric1 = 0;
run;

* print;
proc print data=scratch3; run;


Out[18]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH3

Obs key new_char1 new_numeric1 trans_char1 lag_numeric1
1 1 CCCCCCCC 0.74519 CCCCCCCC 0
Subsetting rows using data step statements

In [19]:
* remove the problematic first row containing the missing value;
* from scratch2 set;
data scratch2;
    set scratch2;
    if key > 1;
run;

* print first five rows;
proc print data=scratch2(obs=5); run;


Out[19]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH2

Obs key new_char1 new_numeric1 trans_char1 lag_numeric1
1 2 BBBBBBBB 0.72888 BBBBBBBB 0.74519
2 3 EEEEEEEE 0.76408 EEEEEEEE 0.72888
3 4 BBBBBBBB 0.39360 BBBBBBBB 0.76408
4 5 CCCCCCCC 0.18129 CCCCCCCC 0.39360
5 6 BBBBBBBB 0.56993 BBBBBBBB 0.18129
Combining data sets top-to-bottom using PROC APPEND

In [20]:
* add scratch3 to the bottom of scratch2;
proc append
    base=scratch2  /* proc append does not read the base set */
    data=scratch3; /* for performance reasons base set should be largest */
run;


Out[20]:

271  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
272
273 * add scratch3 to the bottom of scratch2;
274 proc append
275 base=scratch2 /* proc append does not read the base set */
276 data=scratch3; /* for performance reasons base set should be largest */
277 run;
NOTE: Appending WORK.SCRATCH3 to WORK.SCRATCH2.
NOTE: There were 1 observations read from the data set WORK.SCRATCH3.
NOTE: 1 observations added.
NOTE: The data set WORK.SCRATCH2 has 1000 observations and 5 variables.
NOTE: PROCEDURE APPEND used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

278 ods html5 close;ods listing;

279
Sorting data sets using PROC SORT

In [21]:
* sort scratch2 in place;
proc sort
    data=scratch2;
    by key; /* you must specificy a variables to sort by */
run;

* print first five rows;
proc print data=scratch2(obs=5); run;


Out[21]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH2

Obs key new_char1 new_numeric1 trans_char1 lag_numeric1
1 1 CCCCCCCC 0.74519 CCCCCCCC 0.00000
2 2 BBBBBBBB 0.72888 BBBBBBBB 0.74519
3 3 EEEEEEEE 0.76408 EEEEEEEE 0.72888
4 4 BBBBBBBB 0.39360 BBBBBBBB 0.76408
5 5 CCCCCCCC 0.18129 CCCCCCCC 0.39360

In [22]:
* create the new scratch4 set;
proc sort
    data=scratch2
    out=scratch4; /* specifying an out set creates a new data set */
    by new_char1 new_numeric1; /* you can sort by many variables */
run;

* print first five rows;
proc print data=scratch4(obs=5); run;


Out[22]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH4

Obs key new_char1 new_numeric1 trans_char1 lag_numeric1
1 729 AAAAAAAA 0.005075 AAAAAAAA 0.71008
2 370 AAAAAAAA 0.012808 AAAAAAAA 0.40257
3 965 AAAAAAAA 0.029816 AAAAAAAA 0.79305
4 758 AAAAAAAA 0.043995 AAAAAAAA 0.77802
5 383 AAAAAAAA 0.064970 AAAAAAAA 0.39526
Combining data sets side-by-side using the data step merge statement

In [23]:
* combining data sets side-by-side;
* to create messy scratch5 set;
data scratch5;
    /* merge simply attaches two or more data sets together side-by-side*/
    /* it overwrites common variables - be careful */
    merge scratch scratch4;
run;

* print first five rows;
proc print data=scratch5(obs=5); run;


Out[23]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH5

Obs key char1 char2 numeric1 numeric2 new_char1 new_numeric1 trans_char1 lag_numeric1
1 729 CCCCCCCC FFFFFFFF 0.74519 0.27628 AAAAAAAA 0.005075 AAAAAAAA 0.71008
2 370 BBBBBBBB AAAAAAAA 0.72888 0.73432 AAAAAAAA 0.012808 AAAAAAAA 0.40257
3 965 EEEEEEEE DDDDDDDD 0.76408 0.18159 AAAAAAAA 0.029816 AAAAAAAA 0.79305
4 758 BBBBBBBB DDDDDDDD 0.39360 0.85949 AAAAAAAA 0.043995 AAAAAAAA 0.77802
5 383 CCCCCCCC BBBBBBBB 0.18129 0.23532 AAAAAAAA 0.064970 AAAAAAAA 0.39526

In [24]:
* join columns to scratch from scratch2 when key variable matches;
* to create scratch6 correctly;
data scratch6;
    /* merging with a by variable is safer */
    /* it requires that both sets be sorted */
    /* then rows are matched when key values are equal */
    /* very similar to SQL join */
    merge scratch scratch2;
    by key;
run;

* print first five rows;
proc print data=scratch6(obs=5); run;


Out[24]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH6

Obs key char1 char2 numeric1 numeric2 new_char1 new_numeric1 trans_char1 lag_numeric1
1 1 CCCCCCCC FFFFFFFF 0.74519 0.27628 CCCCCCCC 0.74519 CCCCCCCC 0.00000
2 2 BBBBBBBB AAAAAAAA 0.72888 0.73432 BBBBBBBB 0.72888 BBBBBBBB 0.74519
3 3 EEEEEEEE DDDDDDDD 0.76408 0.18159 EEEEEEEE 0.76408 EEEEEEEE 0.72888
4 4 BBBBBBBB DDDDDDDD 0.39360 0.85949 BBBBBBBB 0.39360 BBBBBBBB 0.76408
5 5 CCCCCCCC BBBBBBBB 0.18129 0.23532 CCCCCCCC 0.18129 CCCCCCCC 0.39360
Combining data sets side-by-side using PROC SQL

PROC SQL allows the execution of SQL statements inside a SAS session


In [25]:
* nearly all common SQL statements and functions are supported by PROC SQL;
* join columns to scratch from scratch2 when key variable matches;
* to create scratch7 correctly;
proc sql noprint; /* noprint suppresses procedure output */
    create table scratch7 as
    select *
    from scratch
    join scratch2
    on scratch.key = scratch2.key;
quit;

* print first five rows;
proc print data=scratch7(obs=5); run;


Out[25]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH7

Obs key char1 char2 numeric1 numeric2 new_char1 new_numeric1 trans_char1 lag_numeric1
1 1 CCCCCCCC FFFFFFFF 0.74519 0.27628 CCCCCCCC 0.74519 CCCCCCCC 0.00000
2 2 BBBBBBBB AAAAAAAA 0.72888 0.73432 BBBBBBBB 0.72888 BBBBBBBB 0.74519
3 3 EEEEEEEE DDDDDDDD 0.76408 0.18159 EEEEEEEE 0.76408 EEEEEEEE 0.72888
4 4 BBBBBBBB DDDDDDDD 0.39360 0.85949 BBBBBBBB 0.39360 BBBBBBBB 0.76408
5 5 CCCCCCCC BBBBBBBB 0.18129 0.23532 CCCCCCCC 0.18129 CCCCCCCC 0.39360
Comparing data sets using PROC COMPARE

In [26]:
* results from data step merge with by variable and PROC SQL join;
* should be equal;
proc compare base=scratch6 compare=scratch7;
run;


Out[26]:
SAS Output

SAS Output

The SAS System

The COMPARE Procedure

Datasets

                                                       The COMPARE Procedure                                                        
                                           Comparison of WORK.SCRATCH6 with WORK.SCRATCH7                                           
                                                           (Method=EXACT)                                                           
                                                                                                                                    
                                                         Data Set Summary                                                           
                                                                                                                                    
                                  Dataset                 Created          Modified  NVar    NObs                                   
                                                                                                                                    
                                  WORK.SCRATCH6  28NOV16:01:56:24  28NOV16:01:56:24     9    1000                                   
                                  WORK.SCRATCH7  28NOV16:01:56:24  28NOV16:01:56:24     9    1000                                   
                                                                                                                                    
                                                                                                                                    
                                                         Variables Summary                                                          
                                                                                                                                    
                                               Number of Variables in Common: 9.                                                    

Summary

                                                                                                                                    
                                                                                                                                    
                                                        Observation Summary                                                         
                                                                                                                                    
                                                   Observation      Base  Compare                                                   
                                                                                                                                    
                                                   First Obs           1        1                                                   
                                                   Last  Obs        1000     1000                                                   
                                                                                                                                    
                                  Number of Observations in Common: 1000.                                                           
                                  Total Number of Observations Read from WORK.SCRATCH6: 1000.                                       
                                  Total Number of Observations Read from WORK.SCRATCH7: 1000.                                       
                                                                                                                                    
                                  Number of Observations with Some Compared Variables Unequal: 0.                                   
                                  Number of Observations with All Compared Variables Equal: 1000.                                   
                                                                                                                                    
                                  NOTE: No unequal values were found. All values compared are exactly equal.                        
                                                                                                                                    
Export data using PROC EXPORT

In [27]:
* export data set to create a csv file;
* to default directory;
proc export
    data=scratch7
    outfile='scratch.csv'
    dbms=csv
    /* replace an existing file with that name */
    replace;
run;


Out[27]:

368  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
369
370 * export data set to create a csv file;
371 * to default directory;
372 proc export
373 data=scratch7
374 outfile='scratch.csv'
375 dbms=csv
376 /* replace an existing file with that name */
377 replace;
378 run;
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASUSER.PROFILE.CATALOG.
NOTE: Unable to open SASUSER.PROFILE. WORK.PROFILE will be opened instead.
NOTE: All profile changes will be lost at the end of the session.
379 /**********************************************************************
380 * PRODUCT: SAS
381 * VERSION: 9.4
382 * CREATOR: External File Interface
383 * DATE: 28NOV16
384 * DESC: Generated SAS Datastep Code
385 * TEMPLATE SOURCE: (None Specified.)
386 ***********************************************************************/
387 data _null_;
388 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
389 %let _EFIREC_ = 0; /* clear export record count macro variable */
390 file 'scratch.csv' delimiter=',' DSD DROPOVER lrecl=32767;
391 if _n_ = 1 then /* write column names or labels */
392 do;
393 put
394 "key"
395 ','
396 "char1"
397 ','
398 "char2"
399 ','
400 "numeric1"
401 ','
402 "numeric2"
403 ','
404 "new_char1"
405 ','
406 "new_numeric1"
407 ','
408 "trans_char1"
409 ','
410 "lag_numeric1"
411 ;
412 end;
413 set SCRATCH7 end=EFIEOD;
414 format key best12. ;
415 format char1 $8. ;
416 format char2 $8. ;
417 format numeric1 best12. ;
418 format numeric2 best12. ;
419 format new_char1 $8. ;
420 format new_numeric1 best12. ;
421 format trans_char1 $8. ;
422 format lag_numeric1 best12. ;
423 do;
424 EFIOUT + 1;
425 put key @;
426 put char1 $ @;
427 put char2 $ @;
428 put numeric1 @;
429 put numeric2 @;
430 put new_char1 $ @;
431 put new_numeric1 @;
432 put trans_char1 $ @;
433 put lag_numeric1 ;
434 ;
435 end;
436 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
437 if EFIEOD then call symputx('_EFIREC_',EFIOUT);
438 run;
NOTE: The file 'scratch.csv' is:
Filename=/folders/myfolders/scratch.csv,
Owner Name=sasdemo,Group Name=sas,
Access Permission=-rw-r--r--,
Last Modified=28Nov2016:01:56:23

NOTE: 1001 records were written to the file 'scratch.csv'.
The minimum record length was 75.
The maximum record length was 92.
NOTE: There were 1000 observations read from the data set WORK.SCRATCH7.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

1000 records created in scratch.csv from SCRATCH7.


NOTE: "scratch.csv" file was successfully created.
NOTE: PROCEDURE EXPORT used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds

439 ods html5 close;ods listing;

440
Import data using PROC IMPORT

In [28]:
* import data set;
* from default directory;
* from the csv file;
* to overwrite scratch7 set;
proc import
    /* import from scratch7.csv */
    datafile='scratch.csv'
    /* create a sas table in the work library */
    out=scratch7
    /* from a csv file */
    dbms=csv
    /* replace an existing data set with that name */
    replace;
run;


Out[28]:

442  ods listing close;ods html5 file=stdout options(bitmap_mode='inline') device=png; ods graphics on / outputfmt=png;
NOTE: Writing HTML5 Body file: STDOUT
443
444 * import data set;
445 * from default directory;
446 * from the csv file;
447 * to overwrite scratch7 set;
448 proc import
449 /* import from scratch7.csv */
450 datafile='scratch.csv'
451 /* create a sas table in the work library */
452 out=scratch7
453 /* from a csv file */
454 dbms=csv
455 /* replace an existing data set with that name */
456 replace;
457 run;
458 /**********************************************************************
459 * PRODUCT: SAS
460 * VERSION: 9.4
461 * CREATOR: External File Interface
462 * DATE: 28NOV16
463 * DESC: Generated SAS Datastep Code
464 * TEMPLATE SOURCE: (None Specified.)
465 ***********************************************************************/
466 data WORK.SCRATCH7 ;
467 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
468 infile 'scratch.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
469 informat key best32. ;
470 informat char1 $8. ;
471 informat char2 $8. ;
472 informat numeric1 best32. ;
473 informat numeric2 best32. ;
474 informat new_char1 $8. ;
475 informat new_numeric1 best32. ;
476 informat trans_char1 $8. ;
477 informat lag_numeric1 best32. ;
478 format key best12. ;
479 format char1 $8. ;
480 format char2 $8. ;
481 format numeric1 best12. ;
482 format numeric2 best12. ;
483 format new_char1 $8. ;
484 format new_numeric1 best12. ;
485 format trans_char1 $8. ;
486 format lag_numeric1 best12. ;
487 input
488 key
489 char1 $
490 char2 $
491 numeric1
492 numeric2
493 new_char1 $
494 new_numeric1
495 trans_char1 $
496 lag_numeric1
497 ;
498 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
499 run;
NOTE: The infile 'scratch.csv' is:
Filename=/folders/myfolders/scratch.csv,
Owner Name=sasdemo,Group Name=sas,
Access Permission=-rw-r--r--,
Last Modified=28Nov2016:01:56:23,
File Size (bytes)=90834

NOTE: 1000 records were read from the infile 'scratch.csv'.
The minimum record length was 75.
The maximum record length was 92.
NOTE: The data set WORK.SCRATCH7 has 1000 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

1000 rows created in WORK.SCRATCH7 from scratch.csv.



NOTE: WORK.SCRATCH7 data set was successfully created.
NOTE: The data set WORK.SCRATCH7 has 1000 observations and 9 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds

500 ods html5 close;ods listing;

501
By group processing in the data step

In [29]:
* by variables can be used in the data step;
* the data set must be sorted;
* create scratch8 summary set;
data scratch8;
    set scratch4;
    by new_char1 new_numeric1;
    retain count 0; /* retained variables are remembered from row-to-row */
    if last.new_char1 then do; /* first. and last. can be used with by vars */
        count + 1; /* shorthand to increment a retained variable */
        output; /* output the last row of a sorted by group */
    end;
run;

* using PROC PRINT without the data= option prints the most recent set;
proc print; run;


Out[29]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH8

Obs key new_char1 new_numeric1 trans_char1 lag_numeric1 count
1 962 AAAAAAAA 0.99264 AAAAAAAA 0.07529 1
2 201 BBBBBBBB 0.98891 BBBBBBBB 0.71665 2
3 191 CCCCCCCC 0.99816 CCCCCCCC 0.84966 3
4 597 DDDDDDDD 0.98891 DDDDDDDD 0.82614 4
5 793 EEEEEEEE 0.99999 EEEEEEEE 0.11873 5
6 63 FFFFFFFF 0.99624 FFFFFFFF 0.02960 6
7 456 GGGGGGGG 0.97751 foo 0.28907 7
By group processing in SAS procedures

In [30]:
* by variables can be used efficiently in most procedures;
* the data set must be sorted;
proc univariate
    data=scratch4;
    var lag_numeric1;
    histogram lag_numeric1;
    inset min max mean / position=ne;
    by new_char1;
run;


Out[30]:
SAS Output

SAS Output

The SAS System

The UNIVARIATE Procedure

Variable: lag_numeric1

The UNIVARIATE Procedure

new_char1=AAAAAAAA

lag_numeric1

Moments

Moments
N 143 Sum Weights 143
Mean 0.49071887 Sum Observations 70.1727988
Std Deviation 0.29483857 Variance 0.08692978
Skewness 0.13096455 Kurtosis -1.2282289
Uncorrected SS 46.7791457 Corrected SS 12.3440289
Coeff Variation 60.0829894 Std Error Mean 0.02465564

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.490719 Std Deviation 0.29484
Median 0.430797 Variance 0.08693
Mode . Range 0.98457
    Interquartile Range 0.52836

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 19.90291 Pr > |t| <.0001
Sign M 71.5 Pr >= |M| <.0001
Signed Rank S 5148 Pr >= |S| <.0001

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 0.98890576
99% 0.98655522
95% 0.96006550
90% 0.92620130
75% Q3 0.77802470
50% Median 0.43079665
25% Q1 0.24966709
10% 0.10271068
5% 0.06484634
1% 0.00610044
0% Min 0.00433785

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.00433785 117 0.967188 34
0.00610044 75 0.974507 16
0.00647425 49 0.978504 99
0.01405615 126 0.986555 62
0.02457284 9 0.988906 18

The SAS System

The UNIVARIATE Procedure

Histogram 1

Panel 1


The SAS System

The UNIVARIATE Procedure

Variable: lag_numeric1

new_char1=BBBBBBBB

lag_numeric1

Moments

Moments
N 143 Sum Weights 143
Mean 0.53527493 Sum Observations 76.544315
Std Deviation 0.29608721 Variance 0.08766764
Skewness -0.1741468 Kurtosis -1.2955583
Uncorrected SS 53.4210573 Corrected SS 12.4488044
Coeff Variation 55.3149782 Std Error Mean 0.02476006

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.535275 Std Deviation 0.29609
Median 0.548172 Variance 0.08767
Mode . Range 0.97268
    Interquartile Range 0.55076

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 21.61849 Pr > |t| <.0001
Sign M 71.5 Pr >= |M| <.0001
Signed Rank S 5148 Pr >= |S| <.0001

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 0.9856062
99% 0.9844003
95% 0.9629953
90% 0.9016123
75% Q3 0.8079205
50% Median 0.5481721
25% Q1 0.2571620
10% 0.1145349
5% 0.0580258
1% 0.0193226
0% Min 0.0129249

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.0129249 276 0.968072 222
0.0193226 248 0.973458 170
0.0296455 189 0.981268 269
0.0392363 174 0.984400 282
0.0431449 210 0.985606 278

The SAS System

The UNIVARIATE Procedure

Histogram 1

Panel 1


The SAS System

The UNIVARIATE Procedure

Variable: lag_numeric1

new_char1=CCCCCCCC

lag_numeric1

Moments

Moments
N 149 Sum Weights 149
Mean 0.51844134 Sum Observations 77.2477597
Std Deviation 0.30300948 Variance 0.09181474
Skewness -0.0848525 Kurtosis -1.2553408
Uncorrected SS 53.637014 Corrected SS 13.5885819
Coeff Variation 58.4462411 Std Error Mean 0.0248235

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.518441 Std Deviation 0.30301
Median 0.545160 Variance 0.09181
Mode . Range 0.99367
    Interquartile Range 0.52698

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 20.8851 Pr > |t| <.0001
Sign M 74 Pr >= |M| <.0001
Signed Rank S 5513 Pr >= |S| <.0001

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 0.99366507
99% 0.99113039
95% 0.97016056
90% 0.93772714
75% Q3 0.78893718
50% Median 0.54515967
25% Q1 0.26196195
10% 0.09154551
5% 0.03813221
1% 0.00507468
0% Min 0.00000000

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.00000000 402 0.981746 289
0.00507468 429 0.984780 384
0.01014111 361 0.987015 303
0.01629693 291 0.991130 382
0.01898068 378 0.993665 298

The SAS System

The UNIVARIATE Procedure

Histogram 1

Panel 1


The SAS System

The UNIVARIATE Procedure

Variable: lag_numeric1

new_char1=DDDDDDDD

lag_numeric1

Moments

Moments
N 142 Sum Weights 142
Mean 0.5209562 Sum Observations 73.9757808
Std Deviation 0.28497267 Variance 0.08120942
Skewness -0.2170349 Kurtosis -1.1551723
Uncorrected SS 49.9886703 Corrected SS 11.4505284
Coeff Variation 54.7018476 Std Error Mean 0.02391438

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.520956 Std Deviation 0.28497
Median 0.578337 Variance 0.08121
Mode . Range 0.98069
    Interquartile Range 0.50251

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 21.78423 Pr > |t| <.0001
Sign M 71 Pr >= |M| <.0001
Signed Rank S 5076.5 Pr >= |S| <.0001

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 0.98477907
99% 0.97048743
95% 0.92748484
90% 0.89918963
75% Q3 0.75668994
50% Median 0.57833705
25% Q1 0.25417622
10% 0.08992516
5% 0.05752626
1% 0.01591177
0% Min 0.00409347

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.00409347 530 0.935270 491
0.01591177 504 0.953484 445
0.03219571 529 0.969972 537
0.03317137 439 0.970487 559
0.04476607 572 0.984779 521

The SAS System

The UNIVARIATE Procedure

Histogram 1

Panel 1


The SAS System

The UNIVARIATE Procedure

Variable: lag_numeric1

new_char1=EEEEEEEE

lag_numeric1

Moments

Moments
N 152 Sum Weights 152
Mean 0.50392372 Sum Observations 76.5964058
Std Deviation 0.30547624 Variance 0.09331573
Skewness 0.05104127 Kurtosis -1.359438
Uncorrected SS 52.6894218 Corrected SS 14.0906759
Coeff Variation 60.6195399 Std Error Mean 0.0247774

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.503924 Std Deviation 0.30548
Median 0.472996 Variance 0.09332
Mode . Range 0.97934
    Interquartile Range 0.56091

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 20.33804 Pr > |t| <.0001
Sign M 76 Pr >= |M| <.0001
Signed Rank S 5814 Pr >= |S| <.0001

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 0.9999947
99% 0.9981595
95% 0.9686054
90% 0.9213424
75% Q3 0.7869107
50% Median 0.4729956
25% Q1 0.2260008
10% 0.1094131
5% 0.0597403
1% 0.0274655
0% Min 0.0206584

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.0206584 647 0.988909 636
0.0274655 648 0.994379 674
0.0339727 609 0.995267 723
0.0349232 651 0.998160 678
0.0376134 664 0.999995 721

The SAS System

The UNIVARIATE Procedure

Histogram 1

Panel 1


The SAS System

The UNIVARIATE Procedure

Variable: lag_numeric1

new_char1=FFFFFFFF

lag_numeric1

Moments

Moments
N 137 Sum Weights 137
Mean 0.48863056 Sum Observations 66.9423872
Std Deviation 0.27619572 Variance 0.07628408
Skewness -0.0498362 Kurtosis -1.0731154
Uncorrected SS 43.084731 Corrected SS 10.3746346
Coeff Variation 56.524447 Std Error Mean 0.02359699

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.488631 Std Deviation 0.27620
Median 0.475913 Variance 0.07628
Mode . Range 0.98572
    Interquartile Range 0.44419

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 20.70732 Pr > |t| <.0001
Sign M 68.5 Pr >= |M| <.0001
Signed Rank S 4726.5 Pr >= |S| <.0001

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 0.9962396
99% 0.9734552
95% 0.9105424
90% 0.8545792
75% Q3 0.7400244
50% Median 0.4759132
25% Q1 0.2958376
10% 0.0941431
5% 0.0296041
1% 0.0128085
0% Min 0.0105241

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.0105241 785 0.920413 766
0.0128085 746 0.949874 786
0.0155565 800 0.966347 845
0.0165746 838 0.973455 806
0.0184293 801 0.996240 813

The SAS System

The UNIVARIATE Procedure

Histogram 1

Panel 1


The SAS System

The UNIVARIATE Procedure

Variable: lag_numeric1

new_char1=GGGGGGGG

lag_numeric1

Moments

Moments
N 134 Sum Weights 134
Mean 0.48282754 Sum Observations 64.69889
Std Deviation 0.26307513 Variance 0.06920853
Skewness 0.23125403 Kurtosis -0.9008902
Uncorrected SS 40.4431397 Corrected SS 9.20473394
Coeff Variation 54.4863566 Std Error Mean 0.02272623

Basic Measures of Location and Variability

Basic Statistical Measures
Location Variability
Mean 0.482828 Std Deviation 0.26308
Median 0.451759 Variance 0.06921
Mode . Range 0.97778
    Interquartile Range 0.40099

Tests For Location

Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 21.24539 Pr > |t| <.0001
Sign M 67 Pr >= |M| <.0001
Signed Rank S 4522.5 Pr >= |S| <.0001

Quantiles

Quantiles (Definition 5)
Level Quantile
100% Max 0.9926352
99% 0.9775109
95% 0.9349751
90% 0.8826924
75% Q3 0.7133919
50% Median 0.4517587
25% Q1 0.3123979
10% 0.1230748
5% 0.0749330
1% 0.0296565
0% Min 0.0148579

Extreme Observations

Extreme Observations
Lowest Highest
Value Obs Value Obs
0.0148579 895 0.957998 881
0.0296565 982 0.966782 975
0.0483612 922 0.976707 986
0.0515094 889 0.977511 944
0.0538630 894 0.992635 917

The SAS System

The UNIVARIATE Procedure

Histogram 1

Panel 1

Transposing a table
  • Transposing a matrix simply switches row and columns values
  • Transposing a SAS data set is more complex because of metadata associated with variable names

In [31]:
* transpose;
proc transpose 
    data=scratch
    out=scratch8;
run;

* print;
proc print; var _NAME_ col1-col5; run;


Out[31]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.SCRATCH8

Obs _NAME_ COL1 COL2 COL3 COL4 COL5
1 key 1.00000 2.00000 3.00000 4.00000 5.00000
2 numeric1 0.74519 0.72888 0.76408 0.39360 0.18129
3 numeric2 0.27628 0.73432 0.18159 0.85949 0.23532

Often, instead of simply transposing, a data set will need to be reformatted in a melt/stack - column split - cast action described in Tidy Data by Hadley Wickham: https://www.jstatsoft.org/article/view/v059i10

See also: https://github.com/sassoftware/enlighten-apply/tree/master/SAS_UE_TidyData


5. Generating plots with PROC SGPLOT

Histograms with PROC SGPLOT

In [32]:
proc sgplot
    /* sashelp.iris is a sample data set */
    /* binwidth - bin width in terms of histogram variable */
    /* datalabel - display counts or percents for each bin */
    /* showbins - use bins to determine x-axis tickmarks */
    data=sashelp.iris;
    histogram petalwidth /
        binwidth=2
        datalabel=count
        showbins;
run;


Out[32]:
SAS Output

SAS Output

The SGPLOT Procedure

The SGPlot Procedure

Bubble plots with PROC SGPLOT

In [33]:
proc sgplot
    /* group - color by a categorical variable */
    /* lineattrs - sets the bubble outline color and other outline attributes */
    data=sashelp.iris;
    bubble x=petalwidth y=petallength size=sepallength /
        group=species
        lineattrs=(color=grey);
run;


Out[33]:
SAS Output

SAS Output

The SGPLOT Procedure

The SGPlot Procedure

Scatter plot with regression overlay using PROC SGPLOT

In [34]:
proc sgplot
    /* clm - confidence limits for mean predicted values */
    /* cli - prediction limits for individual predicted values */
    /* alpha - set threshold for clm and cli limits */
    data=sashelp.iris;
    reg x=petalwidth y=petallength /
    clm cli alpha=0.1;
run;


Out[34]:
SAS Output

SAS Output

The SGPLOT Procedure

The SGPlot Procedure

Stack bar chart with PROC SGPLOT

In [35]:
proc sgplot
    /* sashelp.cars is a sample data set */
    /* vbar variable on x-axis */
    /* group - splits vertical bars */
    /* add title */
    data=sashelp.cars;
    vbar type / group=origin;
    title 'Car Types by Country of Origin';
run;


Out[35]:
SAS Output

SAS Output

The SGPLOT Procedure

The SGPlot Procedure