Code form Statistical Programming with SAS IML Language (Chapter 1 - 2)
You can call DATA step functions from your SAS/IML programs
Do not use a RUN statement at the end of a PROC IML program. In interactive mode, each statement is executed as it is submitted.
You can call functions in Base SAS software from SAS/IML programs. In most cases, these functions act on each element of a matrix.
The length of a character matrix is determined by the length of its longest element at the time it is created.
When concatenating character strings, use the STRIP function to remove leading and trailing blanks from elements of a character matrix.
Do not confuse an empty matrix with a matrix that contains missing values or with a zero matrix. An empty matrix has no rows and no columns. A matrix that contains missing values has at least one row and column, as does a matrix that contains zeros.
The index creation operator (:) has lower precedence than arithmetic operators. Therefore, a+bc:d+ef is equivalent to (a+bc):(d+ef).
Use the UNIFORM and NORMAL functions when you need to quickly generate a small random sample for an example or for testing purposes.
Data in SAS/IML matrices are stored in row-major order.
It is an error to refer to a subscript of an undefined matrix.
If you extract a subset of a matrix by specifying a single set of indices, the resulting vector is a column vector.
To print a submatrix, you must enclose the submatrix in parentheses: print (x[1:3]); This is a peculiarity of the PRINT statement. You also need to enclose matrix expressions in parentheses: print (3*x + 1);
The comparison operators return a matrix of zeros and ones.
Unlike the DATA step, the SAS/IML language does not support the mnemonic keywords EQ, NE, GT, LT, GE, or LE as a replacement for the symbols =, ^=, >, <, >=, or <=.
Unlike the DATA step, the SAS/IML language does not support the mnemonic keywords AND, OR, or NOT as a replacement for the symbols &, |, and ^.
The SAS/IML language does not support short-circuit evaluation.
The intersection of sets can be empty, so always check the matrix returned by XSECT and handle the possibility that the matrix is empty. Do the same for set differences found by using the SETDIF function.
If you need to solve the linear equation Ab = z, use the SOLVE function (b=solve(A,z)) unless there is a compelling reason to explicitly compute the inverse matrix A−1.
List of Functions and Keywords
In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample.
For a univariate data set X1, X2, ..., Xn, the MAD is defined as the median of the absolute deviations from the data's median:
$$\displaystyle \operatorname {MAD} =\operatorname {median} \left(\ \left|X_{i}-\operatorname {median} (X)\right|\ \right) $$that is, starting with the residuals (deviations) from the data's median, the MAD is the median of their absolute values.
Sample Code:
In [ ]:
/* standardize data by using robust estimates of center and scale */
proc print data = sashelp.class (obs=8);
run;
proc iml;
use Sashelp.Class; /* open data set for reading */
read all var _NUM_ into x[colname=VarNames]; /* read variables */
close Sashelp.Class; /* close data set */
/* estimate centers and scales of each variable */
c = median(x); /* centers = medians of each column */
s = mad(x); /* scales = MAD of each column */
stdX = (x - c) / s; /* standardize the data */
print c[colname=varNames]; /* print statistics for each column */
print s[colname=varNames];
The scale of each variable is estimated by calling the MAD function, which computes the median absolute deviation (MAD) from the median. (The median is a robust alternative to the mean; the MAD is a robust alternative to the standard deviation.)
Conceptually, there are two main differences between the DATA step and a SAS/IML program. First, the DATA step implicitly loops over all observations, whereas a typical SAS/IML program does not. Second, the fundamental unit in the DATA step is an observation, whereas the fundamental unit in the SAS/IML language is a matrix.
In [ ]:
/* convert temperatures from Celsius to Fahrenheit scale */
proc iml;
Celsius = {-40, 0, 20, 37, 100}; /* some interesting temperatures */
Fahrenheit = 9/5 * Celsius +32; /* convert to Fahrenheit scale */
print Celsius Fahrenheit;
In [ ]:
Kelvin = Celsius + 273.15; /* convert to Kelvin scale */
print Kelvin;
In [ ]:
QUIT;
In [ ]:
Kelvin = Celsius + 273.15; /* convert to Kelvin scale */
print Kelvin;
The variable Celsius, defined in the first set of statements, remains available until you quit the procedure or free the variable. When you use the IML procedure in this way, you are essentially using it as a massively powerful calculator.
In [ ]:
/* Present a simple example program with comments. */
/* The PROC IML statement is not required in SAS/IML Studio. */
x = 3; /* 1 */ /* NUMBERS indicate steps that */
y = 2; /* are described in a list */
z = x + y; /* 2 */ /* AFTER the program. */
print z; /* display result */ /* Other statements are briefly */
/* described WITHIN the program. */
List of functions or keywords
The PRINT statement has four useful options that affect the way a matrix is displayed:
COLNAME=matrix
specifies a character matrix to be used for column headings.
FORMAT=format
specifies a valid SAS or user-defined format to use when printing matrix values.
LABEL=label
specifies a label for the matrix. If this option is not specified, the name of the matrix is used as a label.
ROWNAME=matrix
In [ ]:
/* print marital status of 24 people */
proc iml;
ageGroup = {"<= 45", " > 45"}; /* headings for rows */
status = {"Single" "Married" "Divorced"}; /* headings for columns */
counts = { 5 5 0 , /* data to print */
2 9 3 };
pct = counts / sum(counts); /* compute proportions */
print pct[colname = status
rowname = ageGroup
label = "marital status of 24 people (%)"
format=PERCENT7.1];
In [ ]:
/* determine the type of a matrix */
x = {1 2 3};
y = {"male" "female"};
z = {"male", 33};
type_x = type(x);
type_y = type(y);
type_z = type(z);
print type_x type_y type_z;
In [ ]:
/* handle an empty or undefined matrix */
type_u = type(undefined_matrix);
if type_u = 'U' then
msg = "The matrix is not defined.";
else
msg = "The matrix is defined.";
print msg;
a = length(msg);
b = nleng(msg);
print a b; /*So msg is a matrix(or character scaler in this example)*/
In [ ]:
proc iml;
seed = j(1, 10, 1); /*create a 1x10 row vector of 1s*/
print seed;
x = uniform(seed);
print x;
y = 3*x + 2 + normal(seed);
print y;
n_x = nrow(x);
p_x = ncol(x);
n_y = nrow(y);
p_y = ncol(y);
print n_x, p_x, n_y, p_y ;
In [ ]:
/* copy character strings to a matrix with a longer length */
c = {"Low" "Med" "High"}; /* maximum length is 4 characters */
d = putc(c, "$6."); /* copy into vector with length 6 */
d[2] = "Medium"; /* value fits into d without truncation */
nlen = nleng(d);
len = length(d);
print nlen, len, d;
In [ ]:
x = {1 2 3, 4 5 6, 7 8 9};
y = {"male" "female"};
constantm = j(nrow(x), ncol(y), 99); /*Create a 3x2 matrix with the same constant as elements*/
print(constantm);
In [ ]:
/* the index creation operator has low precedence */
n1 = 1;
n2 = 10;
h = n1+2:n2-3; /* equivalent to 3:7 */
print h;
There is one situation in which the index creation operator can be used with character values: the creation of variable names with a common prefix and a numerical suffix. For example, if you have 10 variables and want to name them x1, x2, ..., x10, the index creation operator can create these variable names, as shown in the following statements:
In [ ]:
proc iml;
varNames = "x1":"x10";
print varNames;
In [ ]:
/* create pseudorandom vectors (better statistical properties) */
proc iml;
call randseed(12345); /* set seed for RANDGEN */
x = j(10, 1); /* allocate 10 × 1 vector */
e = x; /* allocate 10 × 1 vector */
call randgen(x, "Uniform"); /* fill x; values from uniform dist */
call randgen(e, "Normal"); /* fill e; values from normal dist */
y = 3*x + 2 + e; /* linear response plus normal error */
print x, y;
In [ ]:
/* transpose a matrix */
s = {1 2 3, 4 5 6, 7 8 9, 10 11 12}; /* 4 × 3 matrix */
transpose = t(s); /* 3 × 4 matrix */
sPrime = s`;
print s,transpose, sPrime;
In [62]:
/* reshape a matrix */
x = 1:12; /* 1 × 12 matrix */
s = shape(x, 4, 3); /* reshape data into 4 × 3 matrix */
print x, s;
/* omit dimension; automatically determined */
s1 = shape(x, 4); /* 4 rows ==> 3 columns */
s2 = shape(x, 0, 3); /* 3 columns ==> 4 rows */
print s1, s2; /*same as s*/
/* pad data with a specified value (missing) */
s = shape(1:10, 4, 3, .); /* 4 × 3, pad with missing value */
sp = s`;
print s, sp;
Out[62]:
This is different from
Matlab:
>> x = [1:12]
>> s = reshape(x, 3, 4)
s =
1 4 7 10
2 5 8 11
3 6 9 12
R:
> mat <- matrix(1:12, nrow = 3)
> mat
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Data in SAS/IML matrices are stored in row-major order.
In [ ]:
/* use indices on either side of assignment statements */
proc iml;
x = {1 2 3, 4 5 6};
y = x[2, 3]; /* value of 2nd row, 3rd column */
print x,y;
x[2, 3] = 7; /* changes the value of x[2,3]; y is unchanged */
print x, y;
/* ERROR: subscripting a matrix that has not been created */
*z[1] = 0; /* z is undefined */
In [ ]:
/* extract submatrix */
z = {1 2 3 4, 5 6 7 8, 9 10 11 12};
rows = {1 3};
cols = {2 4};
w = z[rows, cols]; /*cross product {1,2} {1,4}, {3, 2}, {3, 4}*/
print w;
In [ ]:
/* different ways to specify elements of a matrix */
z = {1 3 5 7 9 11 13}; /* row vector */
w = z[{2 4 6}]; /* column vector with values {3,7,11} */
t=z[ , {2 4 6}]; /* row vector with values {3 7 11} */
print w, t;
In [ ]:
/* extract all rows except those specified in a vector v */
q = {1 2, . 3, 4 5, 6 7, 8 .};
v = {2 5}; /* specify rows to exclude */
idx = setdif(1:nrow(q), v); /* start with 1:5, exclude values in v */
r = q[idx, ]; /* extract submatrix */
print idx, r;
In [ ]:
m = {1 2 3,
2 5 6,
3 6 10};
d = vecdiag(m);
print d;
In [ ]:
/* create a diagonal matrix from a vector */
vals = {5, 2, -1};
s = diag(vals);
print s;
Matlab:
>> i = eye(3)
i =
1 0 0
0 1 0
0 0 1
In [ ]:
m = {1 2 3,
2 5 6,
3 6 10};
print m;
/* index the diagonal elements of a matrix */
n = nrow(m); /*n = 3*/
p = ncol(m); /*p = 3*/
diagIdx = do(1, n*p, p+1); /* indices of the diagonal */
print diagIdx;
d2 = m[diagIdx]; /* extract the diagonal */
print d2;
Matlab:
m =
1 2 3
2 5 6
3 6 10
>> diag(m) /*Differnet form SAS's diag function*/
ans =
1
5
10
In [ ]:
/* compute B = (m - lambda*I) for lambda=1 */
/* First approach: use the formula */
B1 = m - I(n); /* I(n) gives n × n identity matrix;
n is defined above as number of rows of m, which is 3 here*/
print B1;
/* Second approach: Do not physically create the identity matrix */
B = m;
B[diagIdx] = m[diagIdx] - 1; /* modify the diagonal */
print m B;
In [ ]:
x = shape(1:1000, 0, 4); /* 4 columns, 250 rows */
print x;
print (x[1:3,]); /* print first three rows */
When SAS/IML encounters an expression enclosed in parentheses, it creates a temporary matrix to hold the result and the PRINT statement does not output a name for temporary matrices. However, as described in section "Printing a Matrix" on page 19, you can use the LABEL= and COLNAME= options in the PRINT statement to make the printed output easier to understand, as shown in the following statements:
In [ ]:
varNames = 'Col1':'Col4';
print (x[1:3,])[label="My Data" colname=varNames];
In [ ]:
x = 1:4;
print (3*x + 1); /* print expression */
In [ ]:
/* comparison operators */
x = {1 2 3, 2 1 .};
s1 = (x = 2);
print s1;
z = {1 2 3, 3 2 1};
s2 = (x<z);
print s2;
The comparison operators treat a missing value as a value that is less than any valid nonmissing value, as shown in the following example:
In [ ]:
/* missing values compare as less than any nonmissing value */
m = .;
n = 0;
r = (m<n);
print r;
In [ ]:
/* an IF-THEN/ELSE statement */
x = {1 2 3, 2 1 1};
z = {1 2 3, 3 2 1};
if x <= z then
msg = "all(x <= z)";
else
msg = "some element of x is greater than the corresponding element of z";
print msg;
In [ ]:
/* a DO-END block of statements */
if x <= z then do;
msg = "all(x <= z)";
/* more statements ... */
end;
else do;
msg = "some element of x is greater than the corresponding element of z";
/* more statements ... */
end;
print msg;
In [ ]:
/* the ALL statement */
if all(x <= z) then do;
msg = "all(x <= z)";
/* more statements ... */
end;
print msg;
In [ ]:
/* the Any statement */
if any(x <= z) then do;
msg = "any(x <= z)";
/* more statements ... */
end;
print msg;
In [ ]:
/* the iterative DO statement */
x = 1:5;
sum = 0;
do i = 1 to ncol(x);
sum = sum + x[i]; /* inefficient way to sum elements */
end;
print sum;
x = 1:5;
sum = sum(x); /* efficient; eliminate DO loop */
print sum(x);
/* an UNTIL clause */
x = 1:5;
sum = 0;
do i = 1 to ncol(x) until(sum > 8);
sum = sum + x[i];
end;
print sum;
In [ ]:
/* a WHILE clause */
x = {1 2 . 4 5};
sum = 0;
do i = 1 to ncol(x) while(x[i]^=.);
sum = sum + x[i];
end;
print sum;
x = {1 2 . 4 5};
sum = 0;
do i = 1 to ncol(x) until(x[i]^=.);
sum = sum + x[i];
end;
print sum;
Recall:
In [ ]:
proc iml;
x = 5 + . ;
print x;
In [ ]:
/* a DO-UNTIL statement */
x = 0;
dx = 0.01;
do until(sin(x) > sin(x+dx));
x = x + dx;
end;
print x;
In [ ]:
/* horizontal concatenation */
a = {1,2,3,4,5}; /* 5 × 1 column vectors */
b = {3,5,4,1,3};
c = {0,1,0,0,1};
x = a || b || c; /* 5 × 3 matrix */
print x;
In [ ]:
/* vertical concatenation */
males = {1 3 0, 2 5 1, 3 4 0}; /* 3 × 3 matrix */
females ={4 1 0,5 3 1}; /* 2 × 3 matrix */
x = males // females; /* 5 × 3 matrix */
print x [format = mmddyy10.];
In [ ]:
/* construct vector of even integers */
free x; /* make sure x is empty */
do i = 1 to 5;
x = x || 2*i; /* inefficient! 5 allocations */
end;
print x;
/* Improved algorithm: allocate the matrix to hold the
even integers. Assign values into the vector. */
x = j(1, 5, .); /* allocate; fill with missing */
do i = 1 to ncol(x);
x[i] = 2*i; /* assign value */
end;
print x;
x = 2* (1:5);
print x;
In [ ]:
proc iml;
y = {., 1, ., 2, 3, .};
v = remove(y, {1 3 6});
print v;
x = {. 1 . 2 3 .};
h = remove(y, {1 3 6});
print h;
g = {1 2 3};
i = insert(v, g[1], 0, 1);
i = insert(i, g[2], 0, 3);
i = insert(i, g[3], 0, 6);
print i;
In [ ]:
proc iml;
v= {1,2,3};
g = {1 2 3, 4 5 6, 7 8 9};
i = insert(g, v, 0, 1);
print i ;
INSERT (x, y, row<, column>) ;
The INSERT function inserts one matrix inside another. The arguments to the INSERT function are as follows:
x is the target matrix. It can be either numeric or character.
y is the matrix to be inserted into the target. It can be either numeric or character, depending on the type of the target matrix.
row is the row where the insertion is to be made.
column is the column where the insertion is to be made.
In [ ]:
/* logical operators */
proc iml;
r = {0 0 1 1};
s = {0 1 0 1};
and = (r & s);
or = (r | s);
not = ^r;
print and, or, not;
In [ ]:
proc iml;
r = {0 0 1 1};
s = {0 1 0 1};
and = (r & s);
or = (r | s);
not = ^r;
print and, or, not;
h ={1 2 3, 4 5 6, 7 8 9};
not_h = ^h;
print not_h;
d = {9 8 0, 6 0 4, 0 2 1};
handd = h & d;
hord = h | d;
print handd, hord;
i = i(3);
t = i & h;
print t;
In [ ]:
/* logical combination of criteria */
x = do(-1, 1, 0.5);
if (x>= -1) & (x<=1) then
y = sqrt(1-x##2); /* the ## operator squares each element of x */
else
y = "Unable to compute the function.";
print y;
In [ ]:
/* different ways to compute the same logical condition */
if (x< -1) | (x>1) then
y = "Unable to compute the function.";
else
y = sqrt(1-x##2);
if ^(x< -1) & ^(x>1) then
y = sqrt(1-x##2);
else
y = "Unable to compute the function.";
print y;
Unlike the DATA step, the SAS/IML language does not support the mnemonic keywords EQ, NE, GT, LT, GE, or LE as a replacement for the symbols =, ^=, >, <, >=, or <=.
Unlike the DATA step, the SAS/IML language does not support the mnemonic keywords AND, OR, or NOT as a replacement for the symbols &, |, and ^.
In [ ]:
/* union, intersection, and difference of sets */
A = 1:4;
B = do(2.5, 4.5, 0.5);
u = union(A, B); /* union */
inter = xsect(A, B); /* intersection */
dif = setdif(A, B); /* difference between sets */
print A,B,u, inter, dif;
In [ ]:
/* check whether intersection and set difference are empty */
A = 1:4;
B = do(5, 8, 0.5);
print A,B;
inter = xsect(A, B);
if ncol(inter)>0 then do;
print inter;
end;
else do;
d = "N/A";
print d;
end;
dif = setdif(A, B);
if ncol(dif)>0 then do;
print dif;
end;
else do;
f = "N/A";
print f;
end;
In [ ]:
/* elementwise matrix operations with scalar or vector */
x = {7 7, 8 9, 7 9, 5 7, 8 8};
grandmean = 7.5; /* mean of all elements */
y = x - grandmean; /* subtract 7.5 from each element */
mean ={7 8}; /* mean of each column */
xc = x - mean; /* subtract (7 8) from each row */
print y xc;
In [ ]:
/* elementwise matrix operations with vector or matrix */
/* r is row vector */
r = {1.225 1}; /* std dev of each col */
std_x = xc / r; /* divide each column (normalize) */
/* c is column vector */
c = x[,1]; /* first column of x */
y = x - c; /* subtract from each column */
/* m is matrix */
m = {6.5 7.5, 7.9 9.1, 7.5 8.5, 5.6 6.4, 7.5 8.5};
deviations = x - m; /* difference between matrices */
print std_x y deviations;
In [ ]:
/* matrix multiplication */
A = {7 7, 8 9, 7 9, 5 7, 8 8}; /* 5 × 2 matrix */
v = {1, -1}; /* 2 × 1 vector */
y = A*v; /* result is 5 × 1 vector */
print y;
In [ ]:
/* Set up the normal equations (X`X)b = X`y */
x = (1:8)`; /* X data: 8 × 1 vector */
y = {5 9 10 15 16 20 22 27}`; /* corresponding Y data */
/* Step 1: Compute X`X and X`y */
x = j(nrow(x), 1, 1) || x; /* add intercept column */
xpx = x` * x; /* cross-products */
xpy = x` * y;
In [ ]:
/* solve linear system */
/* Solution 1: compute inverse with INV (inefficient) */
xpxi = inv(xpx); /* form inverse crossproducts */
b = xpxi * xpy; /* solve for parameter estimates */
print b;
In [ ]:
/* Solution 2: compute solution with SOLVE. More efficient */
b = solve(xpx, xpy); /* solve for parameter estimates */
print b;
In [ ]:
/* display the names, dimensions, and type of matrices */
proc iml;
x = 1:3; /* define some matrices */
y = j(1e5, 100); /* large matrix: 10 million elements */
animals = {"Cat" "Dog" "Mouse",
"Cow" "Pig" "Horse"};
show names; /* show information about all matrices */
show y animals; /* show information about specified matrices */
In [ ]:
/* delete one or more matrices by listing their names */
free y;
show names;
In [ ]:
/* delete all matrices */
free /;
show names;
In [ ]:
/* store all matrices to a permanent location */
proc iml;
x = 1:3;
animals = {"Cat" "Dog" "Mouse", "Cow" "Pig" "Horse"};
libname MyLib "/folders/myshortcuts/_myfolders"; /* set directory for storage */
reset storage=MyLib.MyStorage; /* set the storage catalog */
store _all_; /* store all matrices */
show storage; /* display catalog contents */
In [ ]:
/* load matrices from a storage catalog */
proc iml;
reset storage=MyLib.MyStorage;
load _all_;
show names;
In [ ]:
/* remove all contents from a storage catalog */
remove _all_;
show storage; /* show the contents of the catalog */