In [1]:
from igraph import Graph
from igraph import summary
import pandas
import numpy
Step 1: load in the SIF file (refer to Class 6 exercise) into a data frame sif_data
, using the pandas.read_csv
function, and name the columns species1
, interaction_type
, and species2
.
In [2]:
sif_data = pandas.read_csv("shared/pathway_commons.sif",
sep="\t", names=["species1","interaction_type","species2"])
Step 2: restrict the interactions to protein-protein undirected ("in-complex-with", "interacts-with"), by using the isin
function and then using [
to index rows into the data frame. Call the returned ata frame interac_ppi
.
In [3]:
interaction_types_ppi = set(["interacts-with",
"in-complex-with"])
interac_ppi = sif_data[sif_data.interaction_type.isin(interaction_types_ppi)].copy()
Step 3: restrict the data frame to only the unique interaction pairs of proteins (ignoring the interaction type), and call that data frame interac_ppi_unique
. Make an igraph Graph
object from interac_ppi_unique
using Graph.TupleList
, values
, and tolist
. Call summary
on the Graph
object. Refer to the notebooks for the in-class exercises in Class sessions 3 and 6.
In [4]:
boolean_vec = interac_ppi['species1'] > interac_ppi['species2']
interac_ppi.loc[boolean_vec, ['species1', 'species2']] = interac_ppi.loc[boolean_vec, ['species2', 'species1']].values
interac_ppi_unique = interac_ppi[["species1","species2"]].drop_duplicates()
ppi_igraph = Graph.TupleList(interac_ppi_unique.values.tolist(), directed=False)
summary(ppi_igraph)
Step 4: Map the components of the network using the igraph.Graph.clusters
method. That method returns a igraph.clustering.VertexClustering
object. Call the sizes
method on that VertexClustering
object, to get a list of sizes of the components. What is the giant component size?
In [11]:
# call the `clusters` method on the `ppi_igraph` object, and assign the
# resulting `VertexClustering` object to have object name `ppi_components`
ppi_components = ppi_igraph.clusters()
# call the `sizes` method on the `ppi_components` object, and assign the
# resulting list object to have the name `ppi_component_sizes`.
ppi_component_sizes = ppi_components.sizes()
# make a `numpy.array` initialized by `ppi_component_sizes`, and find its
# maximum value using the `max` method on the `numpy.array` class
numpy.array(ppi_component_sizes).max()
Out[11]:
Advanced code-spellunking question: go to the GitHub repo for igraph (https://github.com/igraph), and find the code components.c. For the weakly connected components, is it doing a BFS or DFS?
In [ ]: