notebook.community

Edit and run

This notebook contains a code snippet that generats a pandas DataFrame containing the summary for a given knowledge assembly, split by a given annotation



In [1]:

    
import os
import sys
import time

import pandas as pd

import pybel
from pybel.constants import VERSION as PYBEL_VERSION
from pybel_tools import selection
from pybel_tools.summary import info_json, info_list
from pybel_tools.mutation import infer_central_dogma



In [2]:

    
print(sys.version)









    



3.6.3 (default, Oct  9 2017, 09:47:56) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]



In [3]:

    
print(time.asctime())









    



Thu Jan 11 14:11:38 2018



In [4]:

    
print(PYBEL_VERSION)









    



0.10.2-dev



In [5]:

    
bms_base = os.environ['BMS_BASE']

In this example, we'll summarize the NeuroMMSig subgraphs in the Epilepsy Knowledge Assembly (Hoyt, et. al 2018).



In [6]:

    
graph = pybel.from_pickle(os.path.join(bms_base, 'aetionomy', 'epilepsy', 'epilepsy.gpickle'))
print(graph)









    



Epilepsy Knowledge Assembly v2.0.1



In [7]:

    
infer_central_dogma(graph)



In [8]:

    
subgraphs = selection.get_subgraphs_by_annotation(graph, 'Subgraph')

len(subgraphs)









    Out[8]:





32



In [9]:

    
def fix_columns(df_):
    for c in ['Authors', 'Nodes', 'Edges', 'Citations', 'Components']:
        df_[c] = df_[c].astype(int)

Using the info_json function, the nodes, edges, citations, authors, average degree, and network density of a graph are entered in a dictionary.



In [ ]:



In [10]:

    
data = {
    subgraph_name.capitalize(): info_json(subgraph)
    for subgraph_name, subgraph in subgraphs.items()
}
df = pd.DataFrame(data).T
fix_columns(df)

df_total = pd.DataFrame({'Total': info_json(graph)}).T
del df_total['Compilation warnings']
fix_columns(df_total)
df_total

df = pd.concat([df, df_total])
df









    Out[10]:







  
    
      
      Authors
      Average degree
      Citations
      Components
      Edges
      Network density
      Nodes
    
  
  
    
      Adaptive immune system subgraph
      0
      1.000000
      5
      4
      12
      0.090909
      12
    
    
      Adenosine signaling subgraph
      0
      2.026316
      15
      3
      154
      0.027018
      76
    
    
      Apoptosis signaling subgraph
      0
      2.206140
      115
      5
      503
      0.009719
      228
    
    
      Brain_derived neurotrophic factor signaling subgraph
      0
      1.893333
      29
      1
      142
      0.025586
      75
    
    
      Calcium dependent subgraph
      0
      2.625828
      73
      8
      793
      0.008724
      302
    
    
      Chromatin organization subgraph
      0
      1.250000
      2
      2
      10
      0.178571
      8
    
    
      Energy metabolic subgraph
      0
      1.945055
      24
      4
      177
      0.021612
      91
    
    
      Estradiol metabolism
      0
      1.142857
      1
      2
      8
      0.190476
      7
    
    
      G-protein-mediated signaling
      0
      1.794872
      25
      5
      140
      0.023310
      78
    
    
      Gaba subgraph
      0
      2.412214
      56
      2
      632
      0.009242
      262
    
    
      Glutamatergic subgraph
      0
      2.033058
      32
      5
      246
      0.016942
      121
    
    
      Hormone signaling subgraph
      0
      2.031746
      16
      9
      256
      0.016254
      126
    
    
      Inflammatory response subgraph
      0
      1.166667
      21
      4
      49
      0.028455
      42
    
    
      Innate immune system subgraph
      0
      1.536585
      18
      8
      63
      0.038415
      41
    
    
      Interleukin signaling subgraph
      0
      2.413043
      17
      3
      111
      0.053623
      46
    
    
      Long term synaptic depression
      0
      2.015625
      21
      2
      129
      0.031994
      64
    
    
      Long term synaptic potentiation
      0
      2.016000
      46
      2
      252
      0.016258
      125
    
    
      Mapk-erk subgraph
      0
      2.255591
      68
      5
      706
      0.007229
      313
    
    
      Metabolism
      0
      1.764706
      126
      14
      600
      0.005206
      340
    
    
      Mirna subgraph
      0
      0.800000
      3
      1
      4
      0.200000
      5
    
    
      Mossy fiber subgraph
      0
      1.736842
      14
      2
      66
      0.046942
      38
    
    
      Mtor signaling subgraph
      0
      2.024096
      42
      3
      336
      0.012267
      166
    
    
      Neurotransmitter release subgraph
      0
      3.019928
      131
      5
      1667
      0.005481
      552
    
    
      Notch signaling subgraph
      0
      1.952381
      20
      3
      205
      0.018773
      105
    
    
      Protein kinase signaling subgraph
      0
      2.254642
      87
      6
      850
      0.005996
      377
    
    
      Protein metabolism
      0
      1.434109
      44
      8
      185
      0.011204
      129
    
    
      Reelin signaling subgraph
      0
      2.162393
      21
      2
      253
      0.018641
      117
    
    
      Regulation of actin cytoskeleton subgraph
      0
      0.600000
      2
      2
      3
      0.150000
      5
    
    
      Serotonergic subgraph
      0
      3.229730
      18
      2
      478
      0.021971
      148
    
    
      Thyroid hormone signaling subgraph
      0
      2.150943
      9
      2
      228
      0.020485
      106
    
    
      Transport related subgraph
      0
      1.224490
      25
      7
      60
      0.025510
      49
    
    
      Wnt signaling subgraph
      0
      1.407407
      15
      3
      38
      0.054131
      27
    
    
      Total
      0
      3.588557
      638
      16
      12481
      0.001032
      3478

The dataframe can be output to CSV, or a wide variety of other formats using pandas.



In [11]:

    
path = os.path.join(os.path.expanduser('~'), 'Desktop', 'subgraph_summary.vcsv')
df[['Nodes', 'Edges', 'Components', 'Citations']].to_csv(path)

	Average degree	Citations	Components	Edges	Network density	Nodes
Adaptive immune system subgraph	1.000000	5	4	12	0.090909	12
Adenosine signaling subgraph	2.026316	15	3	154	0.027018	76
Apoptosis signaling subgraph	2.206140	115	5	503	0.009719	228
Brain_derived neurotrophic factor signaling subgraph	1.893333	29	1	142	0.025586	75
Calcium dependent subgraph	2.625828	73	8	793	0.008724	302
Chromatin organization subgraph	1.250000	2	2	10	0.178571	8
Energy metabolic subgraph	1.945055	24	4	177	0.021612	91
Estradiol metabolism	1.142857	1	2	8	0.190476	7
G-protein-mediated signaling	1.794872	25	5	140	0.023310	78
Gaba subgraph	2.412214	56	2	632	0.009242	262
Glutamatergic subgraph	2.033058	32	5	246	0.016942	121
Hormone signaling subgraph	2.031746	16	9	256	0.016254	126
Inflammatory response subgraph	1.166667	21	4	49	0.028455	42
Innate immune system subgraph	1.536585	18	8	63	0.038415	41
Interleukin signaling subgraph	2.413043	17	3	111	0.053623	46
Long term synaptic depression	2.015625	21	2	129	0.031994	64
Long term synaptic potentiation	2.016000	46	2	252	0.016258	125
Mapk-erk subgraph	2.255591	68	5	706	0.007229	313
Metabolism	1.764706	126	14	600	0.005206	340
Mirna subgraph	0.800000	3	1	4	0.200000	5
Mossy fiber subgraph	1.736842	14	2	66	0.046942	38
Mtor signaling subgraph	2.024096	42	3	336	0.012267	166
Neurotransmitter release subgraph	3.019928	131	5	1667	0.005481	552
Notch signaling subgraph	1.952381	20	3	205	0.018773	105
Protein kinase signaling subgraph	2.254642	87	6	850	0.005996	377
Protein metabolism	1.434109	44	8	185	0.011204	129
Reelin signaling subgraph	2.162393	21	2	253	0.018641	117
Regulation of actin cytoskeleton subgraph	0.600000	2	2	3	0.150000	5
Serotonergic subgraph	3.229730	18	2	478	0.021971	148
Thyroid hormone signaling subgraph	2.150943	9	2	228	0.020485	106
Transport related subgraph	1.224490	25	7	60	0.025510	49
Wnt signaling subgraph	1.407407	15	3	38	0.054131	27
Total	3.588557	638	16	12481	0.001032	3478