This notebook contains a code snippet that generats a pandas DataFrame containing the summary for a given knowledge assembly, split by a given annotation


In [1]:
import os
import sys
import time

import pandas as pd

import pybel
from pybel.constants import VERSION as PYBEL_VERSION
from pybel_tools import selection
from pybel_tools.summary import info_json, info_list
from pybel_tools.mutation import infer_central_dogma

In [2]:
print(sys.version)


3.6.3 (default, Oct  9 2017, 09:47:56) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]

In [3]:
print(time.asctime())


Thu Jan 11 14:11:38 2018

In [4]:
print(PYBEL_VERSION)


0.10.2-dev

In [5]:
bms_base = os.environ['BMS_BASE']

In this example, we'll summarize the NeuroMMSig subgraphs in the Epilepsy Knowledge Assembly (Hoyt, et. al 2018).


In [6]:
graph = pybel.from_pickle(os.path.join(bms_base, 'aetionomy', 'epilepsy', 'epilepsy.gpickle'))
print(graph)


Epilepsy Knowledge Assembly v2.0.1

In [7]:
infer_central_dogma(graph)

In [8]:
subgraphs = selection.get_subgraphs_by_annotation(graph, 'Subgraph')

len(subgraphs)


Out[8]:
32

In [9]:
def fix_columns(df_):
    for c in ['Authors', 'Nodes', 'Edges', 'Citations', 'Components']:
        df_[c] = df_[c].astype(int)

Using the info_json function, the nodes, edges, citations, authors, average degree, and network density of a graph are entered in a dictionary.


In [ ]:


In [10]:
data = {
    subgraph_name.capitalize(): info_json(subgraph)
    for subgraph_name, subgraph in subgraphs.items()
}
df = pd.DataFrame(data).T
fix_columns(df)

df_total = pd.DataFrame({'Total': info_json(graph)}).T
del df_total['Compilation warnings']
fix_columns(df_total)
df_total

df = pd.concat([df, df_total])
df


Out[10]:
Authors Average degree Citations Components Edges Network density Nodes
Adaptive immune system subgraph 0 1.000000 5 4 12 0.090909 12
Adenosine signaling subgraph 0 2.026316 15 3 154 0.027018 76
Apoptosis signaling subgraph 0 2.206140 115 5 503 0.009719 228
Brain_derived neurotrophic factor signaling subgraph 0 1.893333 29 1 142 0.025586 75
Calcium dependent subgraph 0 2.625828 73 8 793 0.008724 302
Chromatin organization subgraph 0 1.250000 2 2 10 0.178571 8
Energy metabolic subgraph 0 1.945055 24 4 177 0.021612 91
Estradiol metabolism 0 1.142857 1 2 8 0.190476 7
G-protein-mediated signaling 0 1.794872 25 5 140 0.023310 78
Gaba subgraph 0 2.412214 56 2 632 0.009242 262
Glutamatergic subgraph 0 2.033058 32 5 246 0.016942 121
Hormone signaling subgraph 0 2.031746 16 9 256 0.016254 126
Inflammatory response subgraph 0 1.166667 21 4 49 0.028455 42
Innate immune system subgraph 0 1.536585 18 8 63 0.038415 41
Interleukin signaling subgraph 0 2.413043 17 3 111 0.053623 46
Long term synaptic depression 0 2.015625 21 2 129 0.031994 64
Long term synaptic potentiation 0 2.016000 46 2 252 0.016258 125
Mapk-erk subgraph 0 2.255591 68 5 706 0.007229 313
Metabolism 0 1.764706 126 14 600 0.005206 340
Mirna subgraph 0 0.800000 3 1 4 0.200000 5
Mossy fiber subgraph 0 1.736842 14 2 66 0.046942 38
Mtor signaling subgraph 0 2.024096 42 3 336 0.012267 166
Neurotransmitter release subgraph 0 3.019928 131 5 1667 0.005481 552
Notch signaling subgraph 0 1.952381 20 3 205 0.018773 105
Protein kinase signaling subgraph 0 2.254642 87 6 850 0.005996 377
Protein metabolism 0 1.434109 44 8 185 0.011204 129
Reelin signaling subgraph 0 2.162393 21 2 253 0.018641 117
Regulation of actin cytoskeleton subgraph 0 0.600000 2 2 3 0.150000 5
Serotonergic subgraph 0 3.229730 18 2 478 0.021971 148
Thyroid hormone signaling subgraph 0 2.150943 9 2 228 0.020485 106
Transport related subgraph 0 1.224490 25 7 60 0.025510 49
Wnt signaling subgraph 0 1.407407 15 3 38 0.054131 27
Total 0 3.588557 638 16 12481 0.001032 3478

The dataframe can be output to CSV, or a wide variety of other formats using pandas.


In [11]:
path = os.path.join(os.path.expanduser('~'), 'Desktop', 'subgraph_summary.vcsv')
df[['Nodes', 'Edges', 'Components', 'Citations']].to_csv(path)