Summarizing Multiple Graphs Together

Author: Charles Tapley Hoyt

Estimated Run Time: 45 seconds

This notebook shows how to combine multiple graphs from different sources and summarize them together. This might be useful during projects where multiple curators are creating BEL scripts that should be joined for scientific use, but for provenance, should be kept separate.

Imports


In [1]:
import os
import time
import sys

import pybel
import pybel_tools
from pybel_tools.summary import info_str

Environment


In [2]:
print(sys.version)


3.6.3 (default, Oct  9 2017, 09:47:56) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]

In [3]:
print(time.asctime())


Thu Mar 15 14:37:02 2018

Dependencies


In [4]:
pybel.utils.get_version()


Out[4]:
'0.11.2-dev'

In [5]:
pybel_tools.utils.get_version()


Out[5]:
'0.5.2-dev'

Setup


In [6]:
bms_base = os.environ['BMS_BASE']

In [7]:
human_dir = os.path.join(bms_base, 'cbn', 'Human-2.0')
mouse_dir = os.path.join(bms_base, 'cbn', 'Mouse-2.0')
rat_dir = os.path.join(bms_base, 'cbn', 'Rat-2.0')

Data

In this notebook, pickled instances of networks from the Causal Biological Networks database are used.


In [8]:
%%time
graphs = []

for d in (human_dir, mouse_dir, rat_dir):
    for p in os.listdir(d):
        if not p.endswith('gpickle'):
            continue

        path = os.path.join(d, p)
        g = pybel.from_pickle(path)
        graphs.append(g)


CPU times: user 291 ms, sys: 78.2 ms, total: 369 ms
Wall time: 451 ms

In [9]:
len(graphs)


Out[9]:
138

Processing

The graphs are combine with the union function, which retains all node and edges from each graph


In [10]:
%%time
combine = pybel.struct.union(graphs)


CPU times: user 42.4 s, sys: 165 ms, total: 42.5 s
Wall time: 42.7 s

The info_str function creates a short text summary of the network. The information is generated with info_json which is more useful programatically.


In [11]:
print(info_str(combine))


Nodes: 5343
Edges: 28766
Citations: 4580
Authors: 0
Network density: 0.001007837278459561
Components: 466
Average degree: 5.383866741530975

Conclusion

Because networks are represented with Python objects, they can easily be operated upon and passed to functions that already create the appropriate summaries.