In [1]:
suppressPackageStartupMessages(library(igraph))
Step 1: load in the SIF file as a data frame sif_data
, using the read.table
function
In [2]:
sif_data <- read.table("shared/pathway_commons.sif",
sep="\t",
header=FALSE,
stringsAsFactors=FALSE,
col.names=c("species1",
"interaction_type",
"species2"),
quote="",
comment.char="")
Step 2: restrict the interactions to protein-protein undirected ("in-complex-with", "interacts-with"), using the %in%
operator and using array indexing [
, and include only the two species columns. The restricted data frame should be called interac_ppi
.
In [3]:
interac_ppi <- sif_data[sif_data$interaction_type %in% c("in-complex-with",
"interacts-with"), c(1,3)]
Step 3: restrict the data frame to only the unique interaction pairs of proteins (ignoring the interaction type), using the unique
function. Make an igraph Graph
object from the data frame, using graph_from_data_frame
.
In [4]:
interac_ppi_unique <- unique(interac_ppi)
ppi_igraph <- graph_from_data_frame(interac_ppi_unique, directed=FALSE)
Map the components of the graph ppi_igraph
using the igraph
function components
. That will return a list which you should assign to object name component_res_list
. Get the csize
member of the list, which will be a vector of the sizes of the components of the graph. Call max
on that vector to get the size of the giant component of the PPI.
In [5]:
## call the igraph function `components` on the `ppi_igraph` object; name
## resulting object `component_res_list`
component_res_list <- components(ppi_igraph)
In [7]:
## obtain the list item in the slot named `csize`, and name the
## resulting object `component_sizes_vec`
component_sizes_vec <- component_res_list$csize
In [9]:
## use the `max` function to find the size of the giant component
max(component_sizes_vec)
Advanced code-spellunking question: go to the GitHub repo for igraph (https://github.com/igraph), and find the code components.c. For the weakly connected components, is it doing a BFS or DFS?