In [27]:
import saspy
sas = saspy.SASsession(cfgname='default')
In [28]:
cars = sas.sasdata('cars', 'sashelp')
cars.head()
Out[28]:
Like the TABULATE procedure on which it relies, using the tabulate methods attached to your SASPy data sets means specifying three things:
Then you compose the table using a simple syntax, in which * indicates a nesting and | indicates elements at the same level (this is made possible by Python's operator overloading).
In [11]:
# define columns to use as classes or computational vars
by_origin, by_type = cars.tabulate.classes('origin', 'type')
horsepower, cylinders = cars.tabulate.vars('horsepower', 'cylinders')
# grab statistics of interest
mean, n = cars.tabulate.stats('mean', 'n')
# compose these elements into a table
cars.tabulate.table(
left = by_origin * by_type,
top = (horsepower | cylinders) * (mean | n)
)
In [4]:
# alternatively, you can output pure-text tables using .text_table()
cars.tabulate.text_table(
left = by_origin * by_type,
top = (horsepower | cylinders) * (mean | n)
)
In [12]:
# assign labels, formats
by_origin, by_type = cars.tabulate.classes('origin', 'type', labels=False)
horsepower, cylinders = cars.tabulate.vars('horsepower', 'cylinders')
mean, n = cars.tabulate.stats('mean', 'n', formats=['6.2', '3.'], labels=['Average', 'Count'])
# you can override or add options in composition by using .with_()
cars.tabulate.table(
left = by_origin.with_(all='Total') * by_type,
top = (horsepower.with_(label='Power!') | cylinders) * (mean.with_(format='6.2') | n)
)
In [6]:
by_origin = cars.tabulate.as_class('origin', label='', all='All Origins')
by_type = cars.tabulate.as_class('type', label='')
horsepower = cars.tabulate.as_var('horsepower')
average = cars.tabulate.stat('mean', format='8.2', label='Mean HP')
n = cars.tabulate.stat('n', label='Count', format='6.')
cars.tabulate.table(
left = by_origin,
top = by_type * (n | horsepower.with_(label='') * average)
)
In [7]:
# create some elements for reuse
by_origin = cars.tabulate.as_class('origin', label='')
by_type = cars.tabulate.as_class('type', label='')
horsepower = cars.tabulate.as_var('horsepower', label='Horses')
cylinders = cars.tabulate.as_var('cylinders', label='Cyls.')
enginesize = cars.tabulate.as_var('enginesize', label='Engine Size')
average = cars.tabulate.stat('mean', label='Avg', format='8.2')
stdev = cars.tabulate.stat('std', label='Std. Dev.', format='5.2')
n = cars.tabulate.stat('n', label='Count', format='6.0')
# create some compositional fragments
by_origin_and_type = by_origin.with_(all='All') * by_type.with_(all='All')
hpstats = horsepower * (average | stdev | n)
cylstats = cylinders * (average | stdev | n)
enginestats = enginesize * (average | stdev)
In [13]:
# draw a table
cars.tabulate.table(
left = by_origin_and_type,
top = hpstats
)
In [14]:
# draw another table
cars.tabulate.table(
left = by_type,
top = cylstats | enginestats
)
In [10]:
# grab another class when needed, draw another table
drivetrain = cars.tabulate.as_class('drivetrain', label='Drive Train')
cars.tabulate.table(
left = by_type * drivetrain.with_(all='All'),
top = cylstats | enginestats
)
One of SASPy's best features is the integration with Pandas DataFrames. Instead of drawing a presentational table in HTML or plain text, you can have the resulting nested values converted to a DataFrame using nested indices.
Note that certain presentational elements (labels, formats, etc) aren't represented, nor is the exact visual arrangement of your groupings. However, all computations are in the DataFrame and can be further accessed or sliced easily in Python.
In [21]:
# since w
my_frame = cars.tabulate.to_dataframe(
left = by_type * drivetrain * by_origin,
top = cylstats | enginestats
)
# showing an excerpt
my_frame[:10]
Out[21]:
In [23]:
# you can access portions of that nested frame by indices, from left to right
my_frame.loc[('Sedan','Asia')]
Out[23]:
In [24]:
my_frame.loc[('SUV')]
Out[24]:
In [26]:
sas.teach_me_SAS(True)
cars.tabulate.table(
left = by_type * drivetrain.with_(all='All'),
top = cylstats | enginestats
)
In [ ]: