License


Copyright (C) 2017 J. Patrick Hall, jphall@gwu.edu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


SAS: Part 1 - PROC SQL


NOTE: these examples are meant for the free SAS University Edition


1. Generate sample data set

Generate some small example tables using SAS data step
  • table1 has a primary key called key and two numeric variables: x1 and x2
  • table1 is located in the SAS work library, it could be called work.table1

In [1]:
data table1; 
    do key=1 to 20;
        x1 = key * 10; 
        x2 = key + 10;
        output;
    end; 
run; 
proc print; run;


Out[1]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TABLE1

Obs key x1 x2
1 1 10 11
2 2 20 12
3 3 30 13
4 4 40 14
5 5 50 15
6 6 60 16
7 7 70 17
8 8 80 18
9 9 90 19
10 10 100 20
11 11 110 21
12 12 120 22
13 13 130 23
14 14 140 24
15 15 150 25
16 16 160 26
17 17 170 27
18 18 180 28
19 19 190 29
20 20 200 30
  • table2 has a primary key called key and two character variables: x3 and x4
  • table2 is located in the SAS work library, it could be called work.table2

In [2]:
data table2; 
    do key=2 to 20 by 2; 
        x3 = scan('a b c d e f g h i j', key/2);
        x4 = scan('k l m n o p q r s t', key/2);
        output;
    end; 
run;
proc print; run;


Out[2]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TABLE2

Obs key x3 x4
1 2 a k
2 4 b l
3 6 c m
4 8 d n
5 10 e o
6 12 f p
7 14 g q
8 16 h r
9 18 i s
10 20 j t

2. Data manipulation with PROC SQL

  • SAS PROC SQL allows users to execute valid SQL statements from a SAS session
  • In a more typical SQL environment the PROC SQL and quit statements would be unnecessary and unrecognized ##### Query x1 from table

In [3]:
proc sql;

    select x1 from work.table1; 

quit;


Out[3]:
SAS Output

SAS Output

The SAS System

The SQL Procedure

Query Results

x1
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200

Creating a new table using PROC SQL

  • create table3 in the work library/database
  • x1 from table1 will be named x5 in the new table

In [4]:
proc sql; 

    create table table3 as 
    select key, x1 as x5
    from table1;

quit;
proc print data=table3; run;


Out[4]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TABLE3

Obs key x5
1 1 10
2 2 20
3 3 30
4 4 40
5 5 50
6 6 60
7 7 70
8 8 80
9 9 90
10 10 100
11 11 110
12 12 120
13 13 130
14 14 140
15 15 150
16 16 160
17 17 170
18 18 180
19 19 190
20 20 200
Subsetting tables with a where clause
  • a where clause is used to subset rows of a table
  • the order by statement sorts displayed results or created tables
  • desc refers to descending sort order

In [5]:
proc sql;

    create table table4 as 
    select key, x2 as x6 
    from table1 
    where key <= 10
    order by x6 desc;

quit;
proc print data=table4; run;


Out[5]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TABLE4

Obs key x6
1 10 20
2 9 19
3 8 18
4 7 17
5 6 16
6 5 15
7 4 14
8 3 13
9 2 12
10 1 11

Updating data with PROC SQL

insert statement

In [6]:
proc sql;

    * insert can be used to add data to a table;
    insert into table1
    values (21, 210, 31);

quit;
proc print data=table1; run;


Out[6]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TABLE1

Obs key x1 x2
1 1 10 11
2 2 20 12
3 3 30 13
4 4 40 14
5 5 50 15
6 6 60 16
7 7 70 17
8 8 80 18
9 9 90 19
10 10 100 20
11 11 110 21
12 12 120 22
13 13 130 23
14 14 140 24
15 15 150 25
16 16 160 26
17 17 170 27
18 18 180 28
19 19 190 29
20 20 200 30
21 21 210 31
PROC SQL supports in place over writing of data

update statement


In [7]:
proc sql;

    * update can be used to change the value of previously existing data;
    update table1
    set key = 6, x1 = 60, x2 = 16
    where key = 7;

quit;
proc print data=table1; run;


Out[7]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TABLE1

Obs key x1 x2
1 1 10 11
2 2 20 12
3 3 30 13
4 4 40 14
5 5 50 15
6 6 60 16
7 6 60 16
8 8 80 18
9 9 90 19
10 10 100 20
11 11 110 21
12 12 120 22
13 13 130 23
14 14 140 24
15 15 150 25
16 16 160 26
17 17 170 27
18 18 180 28
19 19 190 29
20 20 200 30
21 21 210 31

Joining tables using PROC SQL

Inner joins
  • An inner join only retains rows from both tables where key values match
  • Inner join is the default behavior of the join statement

In [8]:
proc sql; 

    create table table5 as
    select * 
    from table1
    join table2
    on table1.key = table2.key; 

quit;
proc print data=table5; run;


Out[8]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TABLE5

Obs key x1 x2 x3 x4
1 2 20 12 a k
2 4 40 14 b l
3 6 60 16 c m
4 6 60 16 c m
5 8 80 18 d n
6 10 100 20 e o
7 12 120 22 f p
8 14 140 24 g q
9 16 160 26 h r
10 18 180 28 i s
11 20 200 30 j t
Left joins
  • Left joins retain all the rows from one table, the left table
  • Left joins only retain rows where key values match from the other table, the right table
  • Aliases can also be used for tables

In [9]:
proc sql;

    create table table6 as 
    select * 
    from table1 as t1 /* left table */
    left join table2 as t2 /* right table */
    on t1.key = t2.key;

quit;
proc print data=table6; run;


Out[9]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TABLE6

Obs key x1 x2 x3 x4
1 1 10 11    
2 2 20 12 a k
3 3 30 13    
4 4 40 14 b l
5 5 50 15    
6 6 60 16 c m
7 6 60 16 c m
8 8 80 18 d n
9 9 90 19    
10 10 100 20 e o
11 11 110 21    
12 12 120 22 f p
13 13 130 23    
14 14 140 24 g q
15 15 150 25    
16 16 160 26 h r
17 17 170 27    
18 18 180 28 i s
19 19 190 29    
20 20 200 30 j t
21 21 210 31    

Aggregating data using PROC SQL

  • The where statement cannot be used with aggregate functions
  • Instead use the having statement
  • where sum_x1 > 100 would cause errors in this query

In [10]:
proc sql;

    create table table7 as
    select key, sum(x1) as sum_x1
    from table1 
    group by key
    having sum_x1 > 100;

quit;
proc print data=table7; run;


Out[10]:
SAS Output

SAS Output

The SAS System

The PRINT Procedure

Data Set WORK.TABLE7

Obs key sum_x1
1 6 120
2 11 110
3 12 120
4 13 130
5 14 140
6 15 150
7 16 160
8 17 170
9 18 180
10 19 190
11 20 200
12 21 210

Subqueries

A subquery is a query embedded in another query


In [11]:
proc sql print;

    select * from 
    
        /* subquery */
        (select key, x1, x2
        from table1
        where key <= 10);

quit;


Out[11]:
SAS Output

SAS Output

The SAS System

The SQL Procedure

Query Results

key x1 x2
1 10 11
2 20 12
3 30 13
4 40 14
5 50 15
6 60 16
6 60 16
8 80 18
9 90 19
10 100 20