CS446/546 - Class Session 8 - Components

In this class session we are going to find the number of proteins that are in the giant component of the (undirected) protein-protein interaction network, using igraph.


In [1]:
from igraph import Graph
from igraph import summary
import pandas
import numpy

Step 1: load in the SIF file (refer to Class 6 exercise) into a data frame sif_data, using the pandas.read_csv function, and name the columns species1, interaction_type, and species2.


In [2]:
sif_data = pandas.read_csv("shared/pathway_commons.sif",
                           sep="\t", names=["species1","interaction_type","species2"])

Step 2: restrict the interactions to protein-protein undirected ("in-complex-with", "interacts-with"), by using the isin function and then using [ to index rows into the data frame. Call the returned ata frame interac_ppi.


In [3]:
interaction_types_ppi = set(["interacts-with",
                             "in-complex-with"])
interac_ppi = sif_data[sif_data.interaction_type.isin(interaction_types_ppi)].copy()

Step 3: restrict the data frame to only the unique interaction pairs of proteins (ignoring the interaction type), and call that data frame interac_ppi_unique. Make an igraph Graph object from interac_ppi_unique using Graph.TupleList, values, and tolist. Call summary on the Graph object. Refer to the notebooks for the in-class exercises in Class sessions 3 and 6.


In [4]:
boolean_vec = interac_ppi['species1'] > interac_ppi['species2']
interac_ppi.loc[boolean_vec, ['species1', 'species2']] = interac_ppi.loc[boolean_vec, ['species2', 'species1']].values
        
interac_ppi_unique = interac_ppi[["species1","species2"]].drop_duplicates()        


ppi_igraph = Graph.TupleList(interac_ppi_unique.values.tolist(), directed=False)
summary(ppi_igraph)


IGRAPH UN-- 17531 475553 -- 
+ attr: name (v)

Step 4: Map the components of the network using the igraph.Graph.clusters method. That method returns a igraph.clustering.VertexClustering object. Call the sizes method on that VertexClustering object, to get a list of sizes of the components. What is the giant component size?


In [11]:
# call the `clusters` method on the `ppi_igraph` object, and assign the 
# resulting `VertexClustering` object to have object name `ppi_components`
ppi_components = ppi_igraph.clusters()

# call the `sizes` method on the `ppi_components` object, and assign the
# resulting list object to have the name `ppi_component_sizes`.
ppi_component_sizes = ppi_components.sizes()

# make a `numpy.array` initialized by `ppi_component_sizes`, and find its 
# maximum value using the `max` method on the `numpy.array` class
numpy.array(ppi_component_sizes).max()


Out[11]:
17524

Advanced code-spellunking question: go to the GitHub repo for igraph (https://github.com/igraph), and find the code components.c. For the weakly connected components, is it doing a BFS or DFS?


In [ ]: