Jupyter notebook

This Jupyter notebook provides the commands used in the article Great tip for dynamic data selection using SAS Viya and Python.

Import python packages


In [1]:
import pandas as pd
import swat

Connect to the CAS server


In [2]:
conn = swat.CAS('sasserver.demo.sas.com',8777,'sasdemo','Orion123')

Check connection


In [3]:
conn


Out[3]:
CAS('sasserver.demo.sas.com', 8777, 'sasdemo', protocol='http', name='py-session-1', session='a3000ac4-61fb-784c-a1e5-36ad27c67259')

Upload data into a DataFrame


In [4]:
df=conn.upload('https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv')


NOTE: Cloud Analytic Services made the uploaded file available as table IRIS in caslib CASUSER(sasdemo).
NOTE: The table IRIS has been created in caslib CASUSER(sasdemo) from binary data uploaded to Cloud Analytic Services.

Load data into a CAS table


In [5]:
tbl = df.casTable

Sort the stored values in a variable and change the sort order


In [8]:
sorttbl = tbl.sort_values(['sepal_length','sepal_width'],
                         ascending=[False,True])

Expression to create filter and display data


In [9]:
expr = sorttbl.petal_length > 6.5

In [10]:
newtbl = sorttbl[expr]

In [11]:
newtbl.head()


Out[11]:
Selected Rows from Table IRIS
sepal_length sepal_width petal_length petal_width species
0 7.7 2.6 6.9 2.3 virginica
1 7.7 2.8 6.7 2.0 virginica
2 7.7 3.8 6.7 2.2 virginica
3 7.6 3.0 6.6 2.1 virginica

Same expression as above writte on one line


In [12]:
newtbl = sorttbl[sorttbl.petal_length > 6.5]

In [13]:
newtbl.head()


Out[13]:
Selected Rows from Table IRIS
sepal_length sepal_width petal_length petal_width species
0 7.7 2.6 6.9 2.3 virginica
1 7.7 2.8 6.7 2.0 virginica
2 7.7 3.8 6.7 2.2 virginica
3 7.6 3.0 6.6 2.1 virginica

Another expression to change the filter criteria


In [14]:
newtbl2 = newtbl[newtbl.petal_width < 2.2]

In [15]:
newtbl2.head()


Out[15]:
Selected Rows from Table IRIS
sepal_length sepal_width petal_length petal_width species
0 7.7 2.8 6.7 2.0 virginica
1 7.6 3.0 6.6 2.1 virginica

Expression to add a bitwise comparison (&) to join the filters


In [16]:
sorttbl[(sorttbl.petal_length > 6.5) &
        (sorttbl.petal_width < 2.2)].head()


Out[16]:
Selected Rows from Table IRIS
sepal_length sepal_width petal_length petal_width species
0 7.7 2.8 6.7 2.0 virginica
1 7.6 3.0 6.6 2.1 virginica

Display the CASTable object


In [17]:
sorttbl[(sorttbl.petal_length > 6.5) & 
        (sorttbl.petal_width < 2.2)]


Out[17]:
CASTable('IRIS', caslib='CASUSER(sasdemo)', computedvars=['_gt_6_', '_and_8_', '_lt_7_'], computedvarsprogram='_gt_6_ = (petal_length > 6.5); _lt_7_ = (petal_width < 2.2); _and_8_ = (_gt_6_ and _lt_7_); ', where='(_and_8_)')[['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']].sort_values(['sepal_length', 'sepal_width'], ascending=[False, True])

Expression that includes mathematical operations


In [18]:
sorttbl[(sorttbl.petal_length + sorttbl.petal_width)
        * 2 > 17.5].head()


Out[18]:
Selected Rows from Table IRIS
sepal_length sepal_width petal_length petal_width species
0 7.7 2.6 6.9 2.3 virginica
1 7.7 3.8 6.7 2.2 virginica

Expression that uses the Python str method to make comparisions on character columns


In [19]:
sorttbl[sorttbl.species.str.upper().str.startswith('SET')
        ].head()


Out[19]:
Selected Rows from Table IRIS
sepal_length sepal_width petal_length petal_width species
0 5.8 4.0 1.2 0.2 setosa
1 5.7 3.8 1.7 0.3 setosa
2 5.7 4.4 1.5 0.4 setosa
3 5.5 3.5 1.3 0.2 setosa
4 5.5 4.2 1.4 0.2 setosa