CS446/546 - Class Session 8 - Components

In this class session we are going to find the number of proteins that are in the giant component of the (undirected) protein-protein interaction network, using igraph.


In [1]:
suppressPackageStartupMessages(library(igraph))

Step 1: load in the SIF file as a data frame sif_data, using the read.table function


In [2]:
sif_data <- read.table("shared/pathway_commons.sif",
                       sep="\t",
                       header=FALSE,
                       stringsAsFactors=FALSE,
                       col.names=c("species1",
                                   "interaction_type",
                                   "species2"),
                       quote="",
                       comment.char="")

Step 2: restrict the interactions to protein-protein undirected ("in-complex-with", "interacts-with"), using the %in% operator and using array indexing [, and include only the two species columns. The restricted data frame should be called interac_ppi.


In [3]:
interac_ppi <- sif_data[sif_data$interaction_type %in% c("in-complex-with",
                                                         "interacts-with"), c(1,3)]

Step 3: restrict the data frame to only the unique interaction pairs of proteins (ignoring the interaction type), using the unique function. Make an igraph Graph object from the data frame, using graph_from_data_frame.


In [4]:
interac_ppi_unique <- unique(interac_ppi)
ppi_igraph <- graph_from_data_frame(interac_ppi_unique, directed=FALSE)

Map the components of the graph ppi_igraph using the igraph function components. That will return a list which you should assign to object name component_res_list. Get the csize member of the list, which will be a vector of the sizes of the components of the graph. Call max on that vector to get the size of the giant component of the PPI.


In [5]:
## call the igraph function `components` on the `ppi_igraph` object; name
## resulting object `component_res_list`
component_res_list <- components(ppi_igraph)

In [7]:
## obtain the list item in the slot named `csize`, and name the
## resulting object `component_sizes_vec`
component_sizes_vec <- component_res_list$csize

In [9]:
## use the `max` function to find the size of the giant component
max(component_sizes_vec)


17524

Advanced code-spellunking question: go to the GitHub repo for igraph (https://github.com/igraph), and find the code components.c. For the weakly connected components, is it doing a BFS or DFS?