In analysis of differential expression data, it is often useful to analyze properties of the local neighborhood of specific genes. I developed a simple interactive tool for this purpose, which takes as input diferential expression data, and gene interaction data (from http://www.genemania.org/). The network is then plotted in an interactive widget, where the node properties, edge properties, and layout can be mapped to different network properties. The interaction type (of the 6 options from genemania) can also be selected.
This tool will also serve as an example for how to create, modify, visualize and analyze weighted and unweighted gene interaction networks using the highly useful and flexible python package NetworkX (https://networkx.github.io/)
This tool is most useful if you have a reasonably small list of genes (~100) with differential expression data, and want to explore properties of their interconnections and their local neighborhoods.
In [154]:
# import some useful packages
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
% matplotlib inline
This experiment contains fold change information for genes in an experiment studying 'alveolar macrophage response to bacterial endotoxin lipopolysaccharide exposure in vivo'. We selected a list of genes from the experiment which had high differential expression, and were enriched for 'immune response' and 'response to external biotic stimulus' in the gene ontology. This experiment and gene list were selected purely as examples for how to use this tool for an initial exploration of differential expression data.
NOTE: change paths/filenames in this cell to apply network visualizer to other datasets. Network format comes from genemania (e.g. columns are 'Entity 1', 'Entity 2', 'Weight', 'Network_group', 'Networks')
NOTE: File format is tsv, and needs to contain columns for 'IDENTIFIER', 'DiffExp', and 'absDiffExp'. Other columns are optional
In [155]:
dataDE = pd.read_csv('DE_data/DE_ayyagari_data_genename_foldchange.csv',sep='\t')
print(dataDE.head())
# genes in dataDE
gene_list = list(dataDE['IDENTIFIER'])
# only use the average fold-change (because there are multiple entries for some genes
#dataDE_mean = dataDE.DiffExp.groupby(dataDE['IDENTIFIER']).mean()
dataDE_mean = dataDE['fold_change']
dataDE_mean.index=gene_list
print(dataDE_mean)
# load the gene-gene interactions (from genemania)
#filename = 'DE_data/DE_experiment_interactions.txt'
filename = 'DE_data/DE_ayyagari_interactions.txt'
DE_network = pd.read_csv(filename, sep='\t', header=6)
DE_network.columns = ['Entity 1','Entity 2', 'Weight','Network_group','Networks']
In [156]:
# create the graph, and add some edges (and nodes)
G_DE = nx.Graph()
idxCE = DE_network['Network_group']=='Co-expression'
edge_list = zip(list(DE_network['Entity 1'][idxCE]),list(DE_network['Entity 2'][idxCE]))
G_DE.add_edges_from(edge_list)
print('number of edges = ' + str(len(G_DE.edges())))
print('number of nodes = '+ str(len(G_DE.nodes())))
In [157]:
# create version with weighted edges
G_DE_w = nx.Graph()
edge_list_w = zip(list(DE_network['Entity 1']),list(DE_network['Entity 2']),list(DE_network['Weight']))
G_DE_w.add_weighted_edges_from(edge_list_w)
In [158]:
import imp
import plot_network
imp.reload(plot_network)
Out[158]:
In [159]:
from IPython.html.widgets import interact
from IPython.html import widgets
import matplotlib.colorbar as cb
import seaborn as sns
import community
# import network plotting module
from plot_network import *
# temporary graph variable
Gtest = nx.Graph()
# check whether you have differential expression data
diff_exp_analysis=True
# replace G_DE_w with G_DE in these two lines if unweighted version is desired
Gtest.add_nodes_from(G_DE_w.nodes())
Gtest.add_edges_from(G_DE_w.edges(data=True))
# prep border colors
nodes = Gtest.nodes()
#gene_list = gene_list
if diff_exp_analysis:
diff_exp = dataDE_mean
genes_intersect = np.intersect1d(gene_list,nodes)
border_cols = Series(index=nodes)
for i in genes_intersect:
if diff_exp[i]=='Unmeasured':
border_cols[i] = np.nan
else:
border_cols[i] = diff_exp[i]
else: # if no differential expression data
border_cols = [None]
numnodes = len(Gtest)
# make these three global to feed into widget
global Gtest
global boder_cols
global DE_network
def plot_network_shell(focal_node_name,edge_thresh=.5,network_algo='spl', map_degree=True,
plot_border_col=False, draw_shortest_paths=True,
coexpression=True, colocalization=True, other=False,physical_interactions=False,
predicted_interactions=False,shared_protein_domain=False):
# this is the main plotting function, called from plot_network module
fig = plot_network(Gtest, border_cols, DE_network,
focal_node_name, edge_thresh, network_algo, map_degree, plot_border_col, draw_shortest_paths,
coexpression, colocalization, other, physical_interactions, predicted_interactions, shared_protein_domain)
return fig
# threshold slider parameters
min_thresh = np.min(DE_network['Weight'])
max_thresh = np.max(DE_network['Weight']/10)
thresh_step = (max_thresh-min_thresh)/1000.0
interact(plot_network_shell, focal_node_name=list(np.sort(nodes)),
edge_thresh=widgets.FloatSliderWidget(min=min_thresh,max=max_thresh,step=thresh_step,value=min_thresh,description='edge threshold'),
network_algo = ['community','clustering_coefficient','pagerank','spl']);
First let's look at the graph when 'spl' (shortest path length) is selected as the network algo. ADA is the focal node in this case, and it has 4 nearest neighbors (MX1, CD44, FITM1, and CD80). Note that CD44 connects the focal node ADA to many other nodes in the network, as it is an opaque blue line. Also note that there is only one gene with a negative fold change in this gene set (CCL13). The white nodes are genes included by genemania- they are the 20 genes nearest to the input genelist.
When the network_algo button is switched to 'community', the louvain modularity maximization algorithm runs on the network, and partitions the nodes into communities which maximize the modularity. In this case (with CXCL10 as the focal node), the nodes are partitioned into 5 groups, with the three largest groups indicated by red, green, and teal circles. While you can see some support for this grouping by eye, the overall graph modularity is 0.33, which is a relatively low value. This means that although groups were found in the graph, the graph itself is not very modular. As a rule of thumb, very modular graphs have modularities of about 0.5 or 0.6.
Below the graph, there is a panel showing the average fold change for the nodes in this community. Since most of the nodes in the input gene list have positive fold changes here, all communities also have positive average fold changes. Were the input gene list to have fewer large fold changes, this would enable you to see if a particular grouping of nodes had significantly higher (or lower) levels of differential expression than alternative groupings.
In [ ]:
In [ ]:
In [ ]:
In [ ]: