In this notebook we'll focus on basic use of Hillmaker for analyzing occupancy in a typical hospital setting. The data is fictitious data from a hospital short stay unit. Patients flow through a short stay unit for a variety of procedures, tests or therapies. Let's assume patients can be classified into one of five categories of patient types: ART (arterialgram), CAT (post cardiac-cath), MYE (myelogram), IVT (IV therapy), and OTH (other). From one of our hospital information systems we were able to get raw data about the entry and exit times of each patient. For simplicity, the data is in a csv file.
This example assumes you are already familiar with statistical occupancy analysis using the old version of Hillmaker or some similar such tool. It also assumes some knowledge of using Python for analytical work.
The following blog posts are helpful if you are not familiar with occupancy analysis:
Computing occupancy statistics with Python - Part 1 of 3
Computing occupancy statistics with Python - Part 2 of 3
The new hillmaker is implemented as a Python module which can be used by importing hillmaker
and then calling the main hillmaker function, make_hills()
(or any component function included in the module). This new version of hillmaker is in what I'd call an alpha state. The output does match the Access version for the ShortStay database that I included in the original Hillmaker. Use at your own risk.
It is licensed under an Apache 2.0 license. It is a widely used permissive free software license. See https://en.wikipedia.org/wiki/Apache_License for additional information.
Whereas the old Hillmaker required MS Access, the new one requires an installation of Python 3 along with several Python modules that are widely used for analytics and data science work.
An very easy way to get Python 3 pre-configured with tons of analytical Python packages is to use the Anaconda distro for Python. From their Downloads page:
Anaconda is a completely free Python distribution (including for commercial use and redistribution). It includes more than 300 of the most popular Python packages for science, math, engineering, and data analysis. See the packages included with Anaconda and the Anaconda changelog.
Make sure you download Python 3.x (3.5 is latest version as of January, 2016)
There are several really nice reasons to use the Anaconda Python distro for data science work:
As of January 22, 2016,hillmaker is publicly available from the Python Package Index known as PyPi as well as Anaconda Cloud. They are similar to CRAN for R. Source code is also be available from my GitHub site https://github.com/misken/hillmaker and it is an open-source project. If you work with Python, you should know a little bit about Python package installation. There is already a companion project on GitHub called hillmaker-examples
which contains, well, examples of hillmaker use cases.
You can use either pip
or conda
to install hillmaker. I suggest learning about Python virtual environments and either using pyenv
, virtualenv
or conda
(preferred) to create a Python virtual environment and then install hillmaker into it. This way you avoid mixing developmental third-party packages like hillmaker with your base Anaconda Python environment.
To install using conda
:
conda install -c https://conda.anaconda.org/hselab hillmaker
OR
To install using pip
:
pip install hillmaker
Use the conda list
command to see all the installed packages in your Anaconda3 root.
conda list
Now fire up a Python session and try to import hillmaker
.
c:\Users\jerry\Documents\hillmaker>python
Python 3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:16:01)
[MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import hillmaker
>>> help(hillmaker.make_hills)
Help on function make_hills in module hillmaker.hills:
make_hills(scenario_name, stops_df, infield, outfield, start_analysis, end_analysis, catfield='', total_str='Total', bin_size_minutes=60, cat_to_exclude=None, totals=True, export_csv=True, export_path='.', return_dataframes=False, verbose=0)
Compute occupancy, arrival, and departure statistics by time bin of day and day of week.
Main function that first calls `bydatetime.make_bydatetime` to calculate occupancy, arrival
and departure values by date by time bin and then calls `summarize.summarize_bydatetime`
to compute the summary statistics.
Parameters
----------
scenario_name : string
Used in output filenames
stops_df : DataFrame
Base data containing one row per visit
infield : string
Column name corresponding to the arrival times
... a bunch more stuff ...
If the install went well, you shouldn't get any errors when you import hillmaker and the help() command should show you the docstring for the make_hills()
function.
In [1]:
import pandas as pd
import hillmaker as hm
Here's the first few lines from our csv file containing the patient stop data:
PatID,InRoomTS,OutRoomTS,PatType
1,1/1/1996 7:44,1/1/1996 8:50,IVT
2,1/1/1996 8:28,1/1/1996 9:20,IVT
3,1/1/1996 11:44,1/1/1996 13:30,MYE
4,1/1/1996 11:51,1/1/1996 12:55,CAT
5,1/1/1996 12:10,1/1/1996 13:00,IVT
6,1/1/1996 14:16,1/1/1996 15:35,IVT
7,1/1/1996 14:40,1/1/1996 15:25,IVT
Read the short stay data from a csv file into a DataFrame and tell Pandas which fields to treat as dates.
In [2]:
file_stopdata = '../data/ShortStay.csv'
stops_df = pd.read_csv(file_stopdata, parse_dates=['InRoomTS','OutRoomTS'])
stops_df.info() # Check out the structure of the resulting DataFrame
Check out the top and bottom of stops_df
.
In [3]:
stops_df.head(7)
Out[3]:
In [4]:
stops_df.tail(5)
Out[4]:
No obvious problems. We'll assume the data was all read in correctly.
In [5]:
help(hm.make_hills)
Most of the parameters are similar to those in the original VBA version, though a few new ones have been added. For example, the cat_to_exclude
parameter allows you to specify a list of category values for which you do not want occupancy statistics computed. Also, since the VBA version used an Access database as the container for its output, new parameters were added to control output to csv files instead.
In [16]:
# Required inputs
scenario = 'ss_example_1'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
cat_fld_name = 'PatType'
start = '1/1/1996'
end = '3/30/1996 23:45'
# Optional inputs
verbose = 1
output = './output'
Now we'll call the main make_hills
function. We won't capture the return values but will simply take the default behavior of having the summaries exported to csv files. You'll see that the filenames will contain the scenario value.
In [17]:
hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, cat_fld_name,
export_path = output, verbose=verbose)
Here's a screenshot of the output folder containing the csv files created by Hillmaker.
If you've used the previous version of Hillmaker, you'll recognize these files. A few more statistics have been added, but otherwise they are the same. These csv files can be imported into a spreadsheet application for plot creation. Of course, we can also make plots in Python.
The files with 'cat' in their name are new. They contain summary overall summary statistics by category. In other words, they are NOT by time of day and day of week.
In [8]:
# Required inputs - same as Example 1 except for scenario name
scenario = 'ss_example_2'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
cat_fld_name = 'PatType'
start = '1/1/1996'
end = '3/30/1996 23:45'
# Optional inputs
tot_fld_name = 'CAT_IVT' # Just to make it clear that it's only these patient types
bin_mins = 30 # Half-hour time bins
exclude = ['ART','MYE','OTH'] # Tell Hillmaker to ignore these patient types
outputpath = '.'
Now we'll call make_hills
and tuck the results (a dictionary of DataFrames) into a local variable. Then we can explore them a bit with Pandas.
In [9]:
results_ex2 = hm.make_hills(scenario, stops_df, in_fld_name, out_fld_name, start, end, cat_fld_name,
total_str=tot_fld_name, bin_size_minutes=bin_mins, export_path=outputpath,
cat_to_exclude=exclude, return_dataframes=True)
In [10]:
results_ex2.keys()
Out[10]:
In [11]:
occ_df = results_ex2['occupancy']
In [12]:
occ_df.head()
Out[12]:
In [13]:
occ_df.tail()
Out[13]:
In [14]:
occ_df.info()
Of course, you don't have to run Python statements through an IPython notebook. You can simply create a short Python script and run that directly in a terminal. An example, test_shortstay.py
, can be found in the scripts
subfolder of the hillmaker-examples project. You can run it from a command prompt like this:
python test_shortstay.py
Here's what it looks like - you can modify as necessary for your needs. There is another example in that folder as well, test_obsim_log.py
, that is slightly more complex in that the input data has raw simulation times (i.e. minutes past t=0) and we need to do some datetime math to turn them into calendar based inputs.
In [15]:
import pandas as pd
import hillmaker as hm
file_stopdata = '../data/ShortStay.csv'
# Required inputs
scenario = 'sstest_120'
in_fld_name = 'InRoomTS'
out_fld_name = 'OutRoomTS'
cat_fld_name = 'PatType'
start = '1/1/1996'
end = '3/30/1996 23:45'
# Optional inputs
tot_fld_name = 'SSU'
bin_mins = 120
df = pd.read_csv(file_stopdata, parse_dates=[in_fld_name, out_fld_name])
hm.make_hills(scenario, df, in_fld_name, out_fld_name,
start, end, cat_fld_name,
tot_fld_name, bin_mins,
cat_to_exclude=None,
verbose=1)
More elaborate versions of scripts like test_shortstay.py
can be envisioned. For example, an entire folder of input data files could be processed by simple enclosing the hm.make_hills
call inside a loop over the collection of input files:
In [ ]:
for log_fn in glob.glob('logs/*.csv'):
# Read the log file and filter by included categories
stops_df = pd.read_csv(log_fn, parse_dates=[in_fld_name, out_fld_name])
hm.make_hills(scenario, df, in_fld_name, out_fld_name, start, end, cat_fld_name)
...
Over the years, I (and many others) have used Hillmaker in a variety of ways, including:
I'd like users to be able to use the new Python based version in a number of different ways as well. As I'll show in this IPython notebook, it can be used by importing the hillmaker
module and then calling Hillmaker functions via:
While these two options provide tons of flexibility for power users, I also want to create other interfaces that don't require users to write Python code. At a minimum, I plan to create a command line interface (CLI) as well as a GUI that is similar to the old Access version.
Python has several nice tools for creating CLI's. Both docopt
and argparse
are part of the standard library. Layered on top of these are tools like Click. See http://docs.python-guide.org/en/latest/scenarios/cli/ for more. A well designed CLI will make it easy to use Python from the command line in either Windows or Linux.
This is uncharted territory for me. Python has a number of frameworks/toolkits for creating GUI apps. This is not the highest priority for me but I do plan on creating a GUI for Hillmaker. If anyone wants to help with this, awesome.
In [ ]: