This is an exploration of the ability to get element-wise averages for Pandas DataFrames.

In our case, we know the row/column combinations do not match across matrices we want to average.

This pd.Panel looked promising: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Panel.html when I saw it here: http://stackoverflow.com/questions/29438585/pandas-elsement-wise-average-and-stdandard-deviation-across-multiple-dataframes


In [1]:
import pandas as pd

In [2]:
df1 = pd.DataFrame({'dog':[0,1,1], 'cat':[0,0,0], 'rat':[3,4,5]})
df1.index = df1.columns
df1


Out[2]:
cat dog rat
cat 0 0 3
dog 0 1 4
rat 0 1 5

In [3]:
df2 = pd.DataFrame({'dog':[0, 0, 0], 'cat':[0, 1, 1], 'rat':[3,4,5]})
df2.index = df2.columns
df2


Out[3]:
cat dog rat
cat 0 0 3
dog 1 0 4
rat 1 0 5

In [4]:
df3 = pd.DataFrame({'dog':[1, 1, 1], 'cat':[0, -1, -1], 'rat':[0,0,0]})
df3.index = df3.columns
df3


Out[4]:
cat dog rat
cat 0 1 0
dog -1 1 0
rat -1 1 0

Follow this example:

p = pd.Panel({n: df for n, df in enumerate([df1, df2, df3])})

What does this do? {n: df for n, df in enumerate([df1, df2, df3])}


In [5]:
{n: df for n, df in enumerate([df1, df2, df3])}


Out[5]:
{0:      cat  dog  rat
 cat    0    0    3
 dog    0    1    4
 rat    0    1    5, 1:      cat  dog  rat
 cat    0    0    3
 dog    1    0    4
 rat    1    0    5, 2:      cat  dog  rat
 cat    0    1    0
 dog   -1    1    0
 rat   -1    1    0}

It just makes a dictionary with keys 0, 1, 2, and dataframes df1, df2, df3.


In [6]:
p = pd.Panel(data={n: df for n, df in enumerate([df1, df2, df3])})

In [7]:
p


Out[7]:
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 3 (major_axis) x 3 (minor_axis)
Items axis: 0 to 2
Major_axis axis: cat to rat
Minor_axis axis: cat to rat

In [8]:
p.mean(axis=0)


Out[8]:
cat dog rat
cat 0 0.333333 2.000000
dog 0 0.666667 2.666667
rat 0 0.666667 3.333333

In [9]:
p.std(axis=0)


Out[9]:
cat dog rat
cat 0 0.57735 1.732051
dog 1 0.57735 2.309401
rat 1 0.57735 2.886751

Now lets do that on matrices that don't have matching shapes/labels.


In [10]:
df4 = df1.copy()

In [11]:
df4 = pd.DataFrame({'dog':[0,1,1,9], 'cat':[0,0,0,9], 'rat':[3,4,5,9], 'zebra':[9,9,9,9]})
df4.index = df4.columns
df4


Out[11]:
cat dog rat zebra
cat 0 0 3 9
dog 0 1 4 9
rat 0 1 5 9
zebra 9 9 9 9

In [12]:
p2 = pd.Panel(data={n: df for n, df in enumerate([df4, df2, df3])})

In [13]:
p2.mean(axis=0)


Out[13]:
cat dog rat zebra
cat 0 0.333333 2.000000 9
dog 0 0.666667 2.666667 9
rat 0 0.666667 3.333333 9
zebra 9 9.000000 9.000000 9

In [14]:
p2.std(axis=0)


Out[14]:
cat dog rat zebra
cat 0 0.57735 1.732051 NaN
dog 1 0.57735 2.309401 NaN
rat 1 0.57735 2.886751 NaN
zebra NaN NaN NaN NaN

Hmm.... it looks like it drops species that aren't common across. Are we cool with it?

Try it with our code!


In [15]:
import network_construction as net

aggregate_adjacency_matrices() works:


In [16]:
net.aggregate_adjacency_matrices([df1, df2, df3, df4])


Out[16]:
{'mean':        cat   dog   rat  zebra
 cat      0  0.25  2.25      9
 dog      0  0.75  3.00      9
 rat      0  0.75  3.75      9
 zebra    9  9.00  9.00      9,
 'standard deviation':             cat  dog  rat  zebra
 cat    0.000000  0.5  1.5    NaN
 dog    0.816497  0.5  2.0    NaN
 rat    0.816497  0.5  2.5    NaN
 zebra       NaN  NaN  NaN    NaN}

Make this toy data have the same tuple-keys format to test summarize_replicate_adjacency_matrices()


In [17]:
mock_data = {('High', 1): df1, ('High', 2): df2, ('High', 3):df3, ('High', 4):df4,
            ('Low', 1): df1, ('Low', 2): df2, ('Low', 3):df3, ('Low', 4):df4}

In [18]:
net.summarize_replicate_adjacency_matrices(mock_data)


Out[18]:
{'Low': {'mean':        cat   dog   rat  zebra
  cat      0  0.25  2.25      9
  dog      0  0.75  3.00      9
  rat      0  0.75  3.75      9
  zebra    9  9.00  9.00      9,
  'standard deviation':             cat  dog  rat  zebra
  cat    0.000000  0.5  1.5    NaN
  dog    0.816497  0.5  2.0    NaN
  rat    0.816497  0.5  2.5    NaN
  zebra       NaN  NaN  NaN    NaN},
 'high': {'mean':        cat   dog   rat  zebra
  cat      0  0.25  2.25      9
  dog      0  0.75  3.00      9
  rat      0  0.75  3.75      9
  zebra    9  9.00  9.00      9,
  'standard deviation':             cat  dog  rat  zebra
  cat    0.000000  0.5  1.5    NaN
  dog    0.816497  0.5  2.0    NaN
  rat    0.816497  0.5  2.5    NaN
  zebra       NaN  NaN  NaN    NaN}}

In [ ]: