2.2 Integration with pandas data frames

As was shown earlier, we can use indexes to search for specific data points. One way to operate on the data is using pandas data frames.

Please note: The following steps require the pandas package.


In [1]:
import signac
import pandas as pd

project = signac.get_project(root='projects/tutorial')

Let's first create a basic index and use it to construct an index data frame:


In [2]:
df_index = pd.DataFrame(project.index())
df_index.head()


Out[2]:
V_gas V_liq _id fluid root signac_id statepoint
0 294.117647 0.000000 0e909ffdba496bbb590fbce31f3a4563 ideal gas /home/johndoe/signac-example... 0e909ffdba496bbb590fbce31f3a4563 {'kT': 1.0, 'a': 0, 'p': 3.4000000000000004, '...
1 10000.000000 0.000000 10743bc8b95bffab09503bce9abbe627 ideal gas /home/johndoe/signac-example... 10743bc8b95bffab09503bce9abbe627 {'kT': 1.0, 'a': 0, 'p': 0.1, 'N': 1000, 'b': 0}
2 416.581783 30.659767 11d8997f19b8ba53d2360ee9fb1606fa water /home/johndoe/signac-example... 11d8997f19b8ba53d2360ee9fb1606fa {'kT': 1.0, 'a': 5.536, 'p': 1.200000000000000...
3 132.890365 0.000000 195c26531df979e70d8f50267f67f0e5 NaN /home/johndoe/signac-example... 195c26531df979e70d8f50267f67f0e5 {'kT': 1.0, 'a': 0, 'p': 7.525, 'N': 1000, 'b'...
4 110.715506 32.801209 1f147aff97cbbda8aa7c4457a9b51159 argon /home/johndoe/signac-example... 1f147aff97cbbda8aa7c4457a9b51159 {'kT': 1.0, 'a': 1.355, 'p': 4.5, 'N': 1000, '...

It is a good idea, to explicitly use the _id value as index key:


In [3]:
df_index = df_index.set_index(['_id'])
df_index.head()


Out[3]:
V_gas V_liq fluid root signac_id statepoint
_id
0e909ffdba496bbb590fbce31f3a4563 294.117647 0.000000 ideal gas /home/johndoe/signac-example... 0e909ffdba496bbb590fbce31f3a4563 {'kT': 1.0, 'a': 0, 'p': 3.4000000000000004, '...
10743bc8b95bffab09503bce9abbe627 10000.000000 0.000000 ideal gas /home/johndoe/signac-example... 10743bc8b95bffab09503bce9abbe627 {'kT': 1.0, 'a': 0, 'p': 0.1, 'N': 1000, 'b': 0}
11d8997f19b8ba53d2360ee9fb1606fa 416.581783 30.659767 water /home/johndoe/signac-example... 11d8997f19b8ba53d2360ee9fb1606fa {'kT': 1.0, 'a': 5.536, 'p': 1.200000000000000...
195c26531df979e70d8f50267f67f0e5 132.890365 0.000000 NaN /home/johndoe/signac-example... 195c26531df979e70d8f50267f67f0e5 {'kT': 1.0, 'a': 0, 'p': 7.525, 'N': 1000, 'b'...
1f147aff97cbbda8aa7c4457a9b51159 110.715506 32.801209 argon /home/johndoe/signac-example... 1f147aff97cbbda8aa7c4457a9b51159 {'kT': 1.0, 'a': 1.355, 'p': 4.5, 'N': 1000, '...

Furthermore, the index would be more useful if each statepoint parameter had its own column.


In [4]:
statepoints = {doc['_id']: doc['statepoint'] for doc in project.index()}
df = pd.DataFrame(statepoints).T.join(df_index)
df.head()


Out[4]:
N a b kT p V_gas V_liq fluid root signac_id statepoint
0e909ffdba496bbb590fbce31f3a4563 1000.0 0.000 0.00000 1.0 3.400 294.117647 0.000000 ideal gas /home/johndoe/signac-example... 0e909ffdba496bbb590fbce31f3a4563 {'kT': 1.0, 'a': 0, 'p': 3.4000000000000004, '...
10743bc8b95bffab09503bce9abbe627 1000.0 0.000 0.00000 1.0 0.100 10000.000000 0.000000 ideal gas /home/johndoe/signac-example... 10743bc8b95bffab09503bce9abbe627 {'kT': 1.0, 'a': 0, 'p': 0.1, 'N': 1000, 'b': 0}
11d8997f19b8ba53d2360ee9fb1606fa 1000.0 5.536 0.03049 1.0 1.200 416.581783 30.659767 water /home/johndoe/signac-example... 11d8997f19b8ba53d2360ee9fb1606fa {'kT': 1.0, 'a': 5.536, 'p': 1.200000000000000...
195c26531df979e70d8f50267f67f0e5 1000.0 0.000 0.00000 1.0 7.525 132.890365 0.000000 NaN /home/johndoe/signac-example... 195c26531df979e70d8f50267f67f0e5 {'kT': 1.0, 'a': 0, 'p': 7.525, 'N': 1000, 'b'...
1f147aff97cbbda8aa7c4457a9b51159 1000.0 1.355 0.03201 1.0 4.500 110.715506 32.801209 argon /home/johndoe/signac-example... 1f147aff97cbbda8aa7c4457a9b51159 {'kT': 1.0, 'a': 1.355, 'p': 4.5, 'N': 1000, '...

Now we can select specific data subsets, for example to calculate the mean gas volume of argon for a pressure p between 2.0 and 5.0:


In [5]:
df[(df.fluid=='argon') & (df.p > 2.0) & (df.p <= 5.0)].V_gas.mean()


Out[5]:
158.12444608049674

Or we can plot a p-V phase diagram for argon (requires matplotlib).


In [6]:
% matplotlib inline

df_water = df[df.fluid=='argon'][['p', 'V_liq', 'V_gas']]
df_water.sort_values('p').set_index('p').plot(logy=True)


Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x10fce6c18>

Or we group the data by fluid and compare the gas densities for low pressures:


In [7]:
from matplotlib import pyplot as plt

for fluid, group in df[df.p < 2].groupby('fluid'):
    d = group.sort_values('p')
    plt.plot(d['p'], d['V_gas'] / d['N'], label=fluid)
plt.xlabel('p')
plt.ylabel(r'$\rho_{gas}$')
plt.legend(loc=0)


Out[7]:
<matplotlib.legend.Legend at 0x1130da0b8>