Note: This is not really a "network analysis" - we are only looking at the graph and seeing what cells are there. if you want to do more than just zoom in and look around at the cells in the graphs, I recommend using Cytoscape for visualizing newtorks.


In [1]:
# Interactive jupyter widgets - use IntSlider directly for more control
from ipywidgets import IntSlider, interact

# Convert RGB colors to hex cor portability
from matplotlib.colors import rgb2hex

# Visualize networks
import networkx

# Numerical python
import numpy as np

# Pandas for dataframes
import pandas as pd

# K-nearest neighbors cell clustering from Dana Pe'er's lab
import phenograph

# Make color palettes
import seaborn as sns
%matplotlib inline

# Bokeh - interactive plotting in the browser
from bokeh.plotting import figure, show, output_file
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.models.widgets import Panel, Tabs
from bokeh.layouts import widgetbox
from bokeh.io import output_notebook

# Local file: networkplots.py
import networkplots

# This line is required for the plots to appear in the notebooks
output_notebook()


Loading BokehJS ...

At this point, you can follow along with either the pre-baked Macosko2015 amacrine data, or you can load in your own expression matrices. For the best experience, make sure that the rows are cells and the columns are gene names.


In [2]:
import macosko2015
counts, cell_metadata, gene_metadata = macosko2015.load_big_clusters()
counts.head()


Out[2]:
2010107E04RIK 4930447C04RIK A930011O12RIK ABCA8A ABLIM1 ACSL3 AIPL1 ALDOC ANK3 APLP2 ... VEGFA VIM VSTM2B VSX1 VSX2 WIPI1 YWHAB ZBTB20 ZFP365 ZFP36L1
r1_TTCCTGCTAGGC 2 0 0 0 1 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
r1_TGGAGATACTCT 0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
r1_CGTCTACATCCG 2 0 0 0 0 0 2 0 0 1 ... 0 0 0 0 0 0 0 0 0 0
r1_CAAGCTTGGCGC 0 0 11 0 1 0 6 0 0 2 ... 0 0 0 0 0 0 0 0 1 0
r1_ACTCACATAGAG 1 0 0 0 0 0 0 0 0 1 ... 0 0 0 0 0 0 0 2 0 0

5 rows × 259 columns

Calculate correlation between cells:


In [3]:
correlations = counts.T.rank().corr()
print(correlations.shape)
correlations.head()


(300, 300)
Out[3]:
r1_TTCCTGCTAGGC r1_TGGAGATACTCT r1_CGTCTACATCCG r1_CAAGCTTGGCGC r1_ACTCACATAGAG r1_TAACGGACACGC r1_CGCATGGGATAC r1_TAACGACGCTTG r1_TCGGCAGCCTCT r1_TAGGATGCAAAC ... r1_AGGGTGGGTACA r1_AATGCTGCAAGA r1_GTCGGGCCTTTC r1_GGGTCAGCGGCG r1_CTGGACCTGCCC r1_AAGATATTGCTG r1_GAGACCTCATGG r1_CGGAGCGCGACA r1_AAGGACAGATCC r1_ATATGCACCCTA
r1_TTCCTGCTAGGC 1.000000 0.578489 0.592947 0.581111 0.600062 0.668730 0.562366 0.537223 0.625188 0.627728 ... -0.127396 -0.238725 -0.191087 -0.062375 -0.070431 -0.211101 0.004142 0.005390 0.028681 -0.208886
r1_TGGAGATACTCT 0.578489 1.000000 0.605171 0.668457 0.605529 0.699568 0.626681 0.619552 0.686334 0.603006 ... -0.088473 -0.164247 -0.091119 -0.012380 0.002600 -0.128525 0.110028 0.123022 0.087241 -0.151023
r1_CGTCTACATCCG 0.592947 0.605171 1.000000 0.592150 0.589383 0.616885 0.539639 0.459749 0.633616 0.563735 ... -0.110518 -0.131933 -0.131094 -0.019492 -0.019556 -0.105237 0.023963 0.057967 0.124087 -0.138839
r1_CAAGCTTGGCGC 0.581111 0.668457 0.592150 1.000000 0.614245 0.747307 0.610552 0.624505 0.670207 0.682267 ... -0.052749 -0.108256 -0.081267 -0.036022 0.048468 -0.154414 0.184313 0.051814 0.141338 -0.155600
r1_ACTCACATAGAG 0.600062 0.605529 0.589383 0.614245 1.000000 0.615884 0.642180 0.556297 0.648107 0.566039 ... -0.104368 -0.184757 -0.136784 -0.045760 0.003680 -0.183599 0.096902 0.015629 0.036012 -0.142725

5 rows × 300 columns

Correlation != distance

Correlation is not equal to distance. If two things are exactly the same, their correlation value is 1. But in space, if two things are exactly the same, the distance between them is 0. Therefore, correlation is not a distance! Correlation is a similarity metric, where bigger = more similar. But we want a dissimilarity (aka distance) metric.

Take a look for yourself. Many values in the distribution of all correlation values are near zero (not correlated), and a blip near 1 ( self-correlations).


In [4]:
sns.distplot(correlations.values.flat)


Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x117262c88>

But for building a K-nearest neighbors graph, we want the closest things (in distance space) to be actually close. So we'll convert our correlation ($\rho$) into a distance ($d$) using this equation:

$$ d = \sqrt{2(1-\rho)} $$

You can look at the code for networkplots.correlation_to_distance to convince yourself that's actually what it's doing:


In [5]:
networkplots.correlation_to_distance??

Exercise 1

Create a dataframe called distance using the correlation_to_distance function from networkplots on your corr dataframe.


In [6]:
# YOUR CODE HERE


In [7]:
distances = networkplots.correlation_to_distance(correlations)
distances.head()


Out[7]:
r1_TTCCTGCTAGGC r1_TGGAGATACTCT r1_CGTCTACATCCG r1_CAAGCTTGGCGC r1_ACTCACATAGAG r1_TAACGGACACGC r1_CGCATGGGATAC r1_TAACGACGCTTG r1_TCGGCAGCCTCT r1_TAGGATGCAAAC ... r1_AGGGTGGGTACA r1_AATGCTGCAAGA r1_GTCGGGCCTTTC r1_GGGTCAGCGGCG r1_CTGGACCTGCCC r1_AAGATATTGCTG r1_GAGACCTCATGG r1_CGGAGCGCGACA r1_AAGGACAGATCC r1_ATATGCACCCTA
r1_TTCCTGCTAGGC 0.000000 0.918162 0.902278 0.915303 0.894357 0.813966 0.935558 0.962057 0.865809 0.862869 ... 1.501597 1.573992 1.543429 1.457653 1.463168 1.556343 1.411282 1.410397 1.393785 1.554919
r1_TGGAGATACTCT 0.918162 0.000000 0.888627 0.814301 0.888224 0.775154 0.864082 0.872293 0.792043 0.891061 ... 1.475448 1.525941 1.477240 1.422941 1.412374 1.502348 1.334146 1.324370 1.351118 1.517250
r1_CGTCTACATCCG 0.902278 0.888627 0.000000 0.903161 0.906219 0.875346 0.959543 1.039472 0.856018 0.934093 ... 1.490314 1.504615 1.504057 1.427930 1.427975 1.486766 1.397166 1.372613 1.323565 1.509198
r1_CAAGCTTGGCGC 0.915303 0.814301 0.903161 0.000000 0.878356 0.710905 0.882551 0.866597 0.812149 0.797162 ... 1.451034 1.488795 1.470556 1.439459 1.379516 1.519483 1.277252 1.377088 1.310467 1.520263
r1_ACTCACATAGAG 0.894357 0.888224 0.906219 0.878356 0.000000 0.876488 0.845955 0.942022 0.838919 0.931624 ... 1.486181 1.539323 1.507835 1.446209 1.411609 1.538570 1.343948 1.403119 1.388516 1.511771

5 rows × 300 columns

Exercise 2

Let's take a look at our values to make sure we have most of our values far away from zero. Use sns.distplot to look the flattened values of the distances dataframe.


In [8]:
# YOUR CODE HERE


In [9]:
sns.distplot(distances.values.flat)


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x11be4dfd0>

Now we'll run phenograph.cluster, which returns three items:

  • communities: the cluster labels of each cell
  • sparse_matrix: a sparse matrix representing the connections between cells in the graph
  • Q: the modularity score. Higher is better, and the highest is 1.
    • 0 means your graph is randomly connected and -1 means your graph isn't connected at all.

In [10]:
communities, sparse_matrix, Q = phenograph.cluster(distances, k=10)


Finding 10 nearest neighbors using minkowski metric and 'auto' algorithm
Neighbors computed in 0.10953593254089355 seconds
Jaccard graph constructed in 0.05929207801818848 seconds
Wrote graph to binary file in 0.005506038665771484 seconds
Running Louvain modularity optimization
After 1 runs, maximum modularity is Q = 0.862174
Louvain completed 21 runs in 0.21957707405090332 seconds
PhenoGraph complete in 0.41158509254455566 seconds

Let's take a look at each of these returned values


In [11]:
communities


Out[11]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 8,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 4, 6, 4, 4, 6, 4, 4, 4, 6, 6, 4, 4, 6, 6, 4, 6, 0, 6,
       4, 0, 6, 4, 4, 6, 4, 4, 4, 4, 6, 6, 4, 6, 4, 6, 6, 4, 4, 4, 4, 6, 4,
       4, 4, 0, 4, 6, 4, 6, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 8, 1, 1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 1,
       1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 5, 9, 5, 9, 5, 5, 5, 9, 5, 0, 1,
       5, 5, 5, 5, 5, 5, 9, 5, 5, 9, 0, 5, 9, 9, 5, 0, 1, 5, 9, 7, 2, 2, 5,
       5, 9, 9, 5, 5, 5, 5, 5, 9, 7, 5, 9, 9, 5, 5, 9, 2, 8, 2, 2, 2, 2, 2,
       2, 2, 8, 8, 2, 2, 2, 2, 2, 2, 2, 2, 8, 2, 0, 2, 2, 2, 8, 2, 2, 2, 2,
       8, 2, 2, 2, 2, 8, 8, 2, 2, 2, 2, 2, 2, 2, 2, 8, 2, 2, 8, 2, 3, 3, 3,
       3, 3, 7, 7, 3, 7, 3, 7, 3, 3, 3, 3, 7, 7, 3, 3, 7, 3, 3, 3, 3, 3, 3,
       7, 3, 7, 3, 3, 7, 3, 3, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 7, 3, 7, 7, 7,
       3])

In [12]:
sparse_matrix


Out[12]:
<300x300 sparse matrix of type '<class 'numpy.float64'>'
	with 2018 stored elements in COOrdinate format>

In [13]:
Q


Out[13]:
0.862174

It looks like the communities labels each cell as belonging to a particular cluster, the sparse_matrix is some data type that we can't directly investigate, and Q is the modularity value.

Make a graph from the sparse matrix

To be able to lay out our graph in two dimensions, we'll use the networkx Python Package to build the graph and lay out the cells and edges.


In [14]:
graph = networkx.from_scipy_sparse_matrix(sparse_matrix)
graph


Out[14]:
<networkx.classes.graph.Graph at 0x11c2926d8>

We'll use the "Spring layout" which is a force-directed layout that pushes cells and edges away from each other. We'll use the built-in networkx function called spring_layout on our graph:


In [15]:
positions = networkx.spring_layout(graph)
positions


Out[15]:
{0: array([ 0.11858926,  0.18350153]),
 1: array([ 0.03878882,  0.52678488]),
 2: array([ 0.06187783,  0.27057012]),
 3: array([ 0.13133896,  0.82526991]),
 4: array([ 0.02870572,  0.35504813]),
 5: array([ 0.06732686,  0.7275114 ]),
 6: array([ 0.04621855,  0.36044474]),
 7: array([ 0.17361623,  0.14091893]),
 8: array([ 0.04696309,  0.33607266]),
 9: array([ 0.03083784,  0.42507337]),
 10: array([ 0.10086864,  0.78453325]),
 11: array([ 0.03378239,  0.60250719]),
 12: array([ 0.07255188,  0.25743317]),
 13: array([ 0.01940302,  0.50319039]),
 14: array([ 0.01421221,  0.39104004]),
 15: array([ 0.18200872,  0.12691829]),
 16: array([ 0.08915212,  0.77801319]),
 17: array([ 0.08524582,  0.74651175]),
 18: array([ 0.09154744,  0.69092874]),
 19: array([ 0.02699054,  0.34152587]),
 20: array([ 0.54825803,  0.35920535]),
 21: array([ 0.08758751,  0.21952919]),
 22: array([ 0.62387915,  0.65638029]),
 23: array([ 0.14092928,  0.17381448]),
 24: array([ 0.04706814,  0.67507474]),
 25: array([ 0.2020894 ,  0.12430869]),
 26: array([ 0.10260653,  0.38992709]),
 27: array([ 0.0204807 ,  0.37852492]),
 28: array([ 0.02017567,  0.36789348]),
 29: array([ 0.07375837,  0.56873045]),
 30: array([ 0.2213752 ,  0.09691239]),
 31: array([ 0.16468178,  0.15334521]),
 32: array([ 0.04187911,  0.68838894]),
 33: array([ 0.09969204,  0.2927752 ]),
 34: array([ 0.06528779,  0.64226415]),
 35: array([ 0.01308873,  0.56070617]),
 36: array([ 0.06131379,  0.59666014]),
 37: array([ 0.03768468,  0.66593983]),
 38: array([ 0.0851757 ,  0.70804583]),
 39: array([ 0.04848673,  0.29762774]),
 40: array([ 0.05314153,  0.62851654]),
 41: array([ 0.11050289,  0.80443645]),
 42: array([ 0.08373951,  0.73408128]),
 43: array([ 0.09726112,  0.5563277 ]),
 44: array([ 0.06233362,  0.30335043]),
 45: array([ 0.10438366,  0.20971344]),
 46: array([ 0.10138291,  0.43842533]),
 47: array([ 0.07491486,  0.23716008]),
 48: array([ 0.04261795,  0.31531652]),
 49: array([ 0.08419821,  0.76182709]),
 50: array([ 0.        ,  0.50135633]),
 51: array([ 0.23731881,  0.17988868]),
 52: array([ 0.80180874,  0.10983022]),
 53: array([ 0.00442117,  0.54742569]),
 54: array([ 0.01336328,  0.60135624]),
 55: array([ 0.78682577,  0.10851908]),
 56: array([ 0.00983536,  0.58592379]),
 57: array([ 0.00317729,  0.51530074]),
 58: array([ 0.01018359,  0.57057936]),
 59: array([ 0.76355457,  0.08266126]),
 60: array([ 0.61187897,  0.08831182]),
 61: array([ 0.04344781,  0.43003986]),
 62: array([ 0.00544706,  0.48391002]),
 63: array([ 0.63714297,  0.06584988]),
 64: array([ 0.74683587,  0.07806149]),
 65: array([ 0.01840027,  0.6383111 ]),
 66: array([ 0.7303398 ,  0.07807754]),
 67: array([ 0.12334463,  0.81283244]),
 68: array([ 0.7740847 ,  0.08888564]),
 69: array([ 0.1265545,  0.2445429]),
 70: array([ 0.00459482,  0.45812054]),
 71: array([ 0.56975859,  0.10588005]),
 72: array([ 0.06431486,  0.35483615]),
 73: array([ 0.0111814 ,  0.47287149]),
 74: array([ 0.69037477,  0.07613188]),
 75: array([ 0.00425981,  0.52980379]),
 76: array([ 0.11553287,  0.19416027]),
 77: array([ 0.05101942,  0.7076026 ]),
 78: array([ 0.08858468,  0.33772148]),
 79: array([ 0.51015523,  0.09267429]),
 80: array([ 0.68330104,  0.05716341]),
 81: array([ 0.00408352,  0.527812  ]),
 82: array([ 0.77642727,  0.10082551]),
 83: array([ 0.22806867,  0.20491782]),
 84: array([ 0.47912235,  0.10843628]),
 85: array([ 0.40731504,  0.13983928]),
 86: array([ 0.01428922,  0.61362392]),
 87: array([ 0.01698228,  0.40674804]),
 88: array([ 0.15437689,  0.1538087 ]),
 89: array([ 0.20847498,  0.09628633]),
 90: array([ 0.70776335,  0.0493038 ]),
 91: array([ 0.17446053,  0.24719983]),
 92: array([ 0.01190982,  0.43995485]),
 93: array([ 0.01599047,  0.45819373]),
 94: array([ 0.02166033,  0.64824161]),
 95: array([ 0.01860526,  0.62750625]),
 96: array([ 0.71821931,  0.05756627]),
 97: array([ 0.0177259 ,  0.39470809]),
 98: array([ 0.72571335,  0.06636586]),
 99: array([ 0.08967582,  0.22495115]),
 100: array([ 0.90440347,  0.37373148]),
 101: array([ 0.98837644,  0.497194  ]),
 102: array([ 0.99421628,  0.54806505]),
 103: array([ 0.97407892,  0.38541796]),
 104: array([ 0.9773887 ,  0.40222398]),
 105: array([ 0.98733707,  0.45859255]),
 106: array([ 0.9596544 ,  0.31595962]),
 107: array([ 0.86377267,  0.17692907]),
 108: array([ 0.99378177,  0.52236804]),
 109: array([ 0.90747383,  0.22506561]),
 110: array([ 0.97086967,  0.3679172 ]),
 111: array([ 0.91947173,  0.27681724]),
 112: array([ 0.92816039,  0.26233049]),
 113: array([ 0.9397754 ,  0.27196688]),
 114: array([ 0.92853404,  0.25654734]),
 115: array([ 0.91378829,  0.23495982]),
 116: array([ 0.91689686,  0.24783003]),
 117: array([ 0.95479629,  0.30710941]),
 118: array([ 0.82396014,  0.19164298]),
 119: array([ 0.98882437,  0.41382525]),
 120: array([ 0.97991477,  0.37630785]),
 121: array([ 0.88795946,  0.66236715]),
 122: array([ 0.93798847,  0.28477767]),
 123: array([ 0.98793094,  0.42588978]),
 124: array([ 0.9695464 ,  0.42562419]),
 125: array([ 0.93907463,  0.2992509 ]),
 126: array([ 0.82536446,  0.72038408]),
 127: array([ 0.89063737,  0.21991549]),
 128: array([ 0.97536851,  0.35652447]),
 129: array([ 0.95829931,  0.32634563]),
 130: array([ 0.95344059,  0.52143394]),
 131: array([ 0.89012132,  0.19544624]),
 132: array([ 0.98646658,  0.60576853]),
 133: array([ 0.89016702,  0.24989385]),
 134: array([ 0.86459339,  0.20256509]),
 135: array([ 0.98017279,  0.46970573]),
 136: array([ 0.84241553,  0.75774196]),
 137: array([ 0.98880173,  0.58591937]),
 138: array([ 0.95083716,  0.2957806 ]),
 139: array([ 0.98323523,  0.39280754]),
 140: array([ 0.95590096,  0.33775908]),
 141: array([ 0.86319056,  0.69719717]),
 142: array([ 0.99552434,  0.50660779]),
 143: array([ 0.97201276,  0.34515683]),
 144: array([ 0.98268577,  0.62486456]),
 145: array([ 0.9370174 ,  0.31812703]),
 146: array([ 0.99226899,  0.53399745]),
 147: array([ 0.99222004,  0.4461101 ]),
 148: array([ 0.99013472,  0.56301974]),
 149: array([ 0.99495157,  0.48269002]),
 150: array([ 0.90896699,  0.77189806]),
 151: array([ 0.67997352,  0.52313676]),
 152: array([ 0.90026074,  0.78238633]),
 153: array([ 0.79586975,  0.54205845]),
 154: array([ 0.86034775,  0.80829885]),
 155: array([ 0.94548914,  0.70568122]),
 156: array([ 0.92952805,  0.68695948]),
 157: array([ 0.6402237,  0.5471598]),
 158: array([ 0.79275356,  0.89497899]),
 159: array([ 0.0197928 ,  0.43237104]),
 160: array([ 0.96203807,  0.56936733]),
 161: array([ 0.81675605,  0.87302619]),
 162: array([ 0.87585549,  0.63010357]),
 163: array([ 0.88343529,  0.77976772]),
 164: array([ 0.8729441 ,  0.82506921]),
 165: array([ 0.94217691,  0.64589255]),
 166: array([ 0.83702325,  0.86190077]),
 167: array([ 0.64613269,  0.51331429]),
 168: array([ 0.9363757 ,  0.73048091]),
 169: array([ 0.91346939,  0.75792119]),
 170: array([ 0.48215994,  0.45958373]),
 171: array([ 0.08362013,  0.42027785]),
 172: array([ 0.85049408,  0.84362845]),
 173: array([ 0.7528975 ,  0.52655768]),
 174: array([ 0.54611298,  0.49816149]),
 175: array([ 0.80570781,  0.89151892]),
 176: array([ 0.0706183 ,  0.47683742]),
 177: array([ 0.8976432 ,  0.20514341]),
 178: array([ 0.9444338 ,  0.67723808]),
 179: array([ 0.53132185,  0.53567054]),
 180: array([ 0.19037035,  0.88790289]),
 181: array([ 0.25362351,  0.07314998]),
 182: array([ 0.64719718,  0.03401144]),
 183: array([ 0.77482558,  0.91216864]),
 184: array([ 0.92879871,  0.63705438]),
 185: array([ 0.77921521,  0.57386696]),
 186: array([ 0.48441444,  0.51494812]),
 187: array([ 0.87683231,  0.80212795]),
 188: array([ 0.92445702,  0.73770328]),
 189: array([ 0.85559571,  0.83295749]),
 190: array([ 0.88677724,  0.79362932]),
 191: array([ 0.75900416,  0.92347129]),
 192: array([ 0.64267299,  0.48625357]),
 193: array([ 0.25252546,  0.91460733]),
 194: array([ 0.82512586,  0.86612242]),
 195: array([ 0.41015457,  0.50837184]),
 196: array([ 0.56319261,  0.53892885]),
 197: array([ 0.93230443,  0.71730313]),
 198: array([ 0.91227493,  0.74383405]),
 199: array([ 0.55916367,  0.47606511]),
 200: array([ 0.3813931 ,  0.02300292]),
 201: array([ 0.70901658,  0.58662827]),
 202: array([ 0.39338258,  0.02832884]),
 203: array([ 0.5583593 ,  0.00323563]),
 204: array([ 0.37176388,  0.0184432 ]),
 205: array([ 0.52730117,  0.01794663]),
 206: array([ 0.51119407,  0.00843315]),
 207: array([ 0.41304334,  0.01986438]),
 208: array([ 0.61485994,  0.02027692]),
 209: array([ 0.63672765,  0.79472749]),
 210: array([ 0.76443832,  0.74614683]),
 211: array([ 0.5701922 ,  0.01338734]),
 212: array([ 0.6697132 ,  0.24201824]),
 213: array([ 0.28264041,  0.05937227]),
 214: array([ 0.52155121,  0.00343442]),
 215: array([ 0.36434653,  0.03729801]),
 216: array([ 0.44115074,  0.01214266]),
 217: array([ 0.60667485,  0.17793386]),
 218: array([ 0.40606777,  0.03540001]),
 219: array([ 0.76754114,  0.78613883]),
 220: array([ 0.29432858,  0.05408193]),
 221: array([ 0.06213309,  0.28185572]),
 222: array([ 0.58932094,  0.02081146]),
 223: array([ 0.66239477,  0.03629662]),
 224: array([ 0.54454705,  0.0128397 ]),
 225: array([ 0.6120928 ,  0.77825351]),
 226: array([ 0.47865317,  0.01692952]),
 227: array([ 0.30811342,  0.0525207 ]),
 228: array([ 0.42488747,  0.03639465]),
 229: array([ 0.5346145 ,  0.00481447]),
 230: array([ 0.7257613 ,  0.75928113]),
 231: array([ 0.46363127,  0.        ]),
 232: array([ 0.42856612,  0.0203981 ]),
 233: array([ 0.57821369,  0.01195448]),
 234: array([ 0.32324787,  0.0521844 ]),
 235: array([ 0.73460897,  0.80638286]),
 236: array([ 0.6818737 ,  0.82052141]),
 237: array([ 0.3523324 ,  0.03349241]),
 238: array([ 0.63230513,  0.0207339 ]),
 239: array([ 0.49261184,  0.00462282]),
 240: array([ 0.46140279,  0.01268595]),
 241: array([ 0.65651195,  0.02538416]),
 242: array([ 0.49784712,  0.00826196]),
 243: array([ 0.60522239,  0.01485168]),
 244: array([ 0.26778128,  0.06300987]),
 245: array([ 0.69898669,  0.75710664]),
 246: array([ 0.34063225,  0.04043316]),
 247: array([ 0.32681081,  0.03938789]),
 248: array([ 0.6832224 ,  0.79131094]),
 249: array([ 0.67434037,  0.03125217]),
 250: array([ 0.30703108,  0.96094993]),
 251: array([ 0.41960667,  0.99206416]),
 252: array([ 0.64554356,  0.96822995]),
 253: array([ 0.63449405,  0.97354638]),
 254: array([ 0.29126739,  0.95144391]),
 255: array([ 0.27020626,  0.90478704]),
 256: array([ 0.16516003,  0.87004806]),
 257: array([ 0.60686871,  0.98600139]),
 258: array([ 0.3424023,  0.9222445]),
 259: array([ 0.35883378,  0.97506399]),
 260: array([ 0.38256917,  0.93553982]),
 261: array([ 0.44187846,  0.99365616]),
 262: array([ 0.57475574,  0.97987729]),
 263: array([ 0.35991017,  0.97621658]),
 264: array([ 0.51846526,  0.978178  ]),
 265: array([ 0.45197268,  0.96352601]),
 266: array([ 0.49889408,  0.99401143]),
 267: array([ 0.48968649,  0.98680291]),
 268: array([ 0.32685562,  0.96555533]),
 269: array([ 0.39816221,  0.96641151]),
 270: array([ 0.59309661,  0.98745052]),
 271: array([ 0.56776369,  0.99089595]),
 272: array([ 0.33929992,  0.96538033]),
 273: array([ 0.51384883,  0.99606347]),
 274: array([ 0.46102988,  0.99277387]),
 275: array([ 0.3712744 ,  0.97363884]),
 276: array([ 0.38215318,  0.96936228]),
 277: array([ 0.47960387,  1.        ]),
 278: array([ 0.22322207,  0.90268936]),
 279: array([ 0.34463582,  0.97388282]),
 280: array([ 0.66791789,  0.96455161]),
 281: array([ 0.42361231,  0.9535159 ]),
 282: array([ 0.39576707,  0.986615  ]),
 283: array([ 0.28010021,  0.94675327]),
 284: array([ 0.51436653,  0.9936281 ]),
 285: array([ 0.3195008 ,  0.96414848]),
 286: array([ 0.65775055,  0.97238168]),
 287: array([ 0.6281704 ,  0.98057599]),
 288: array([ 0.47415923,  0.99180317]),
 289: array([ 0.38073448,  0.98682451]),
 290: array([ 0.26133079,  0.9368331 ]),
 291: array([ 0.61414895,  0.9788418 ]),
 292: array([ 0.4319572,  0.9903854]),
 293: array([ 0.41210286,  0.99199226]),
 294: array([ 0.32135049,  0.93487226]),
 295: array([ 0.68440526,  0.96098115]),
 296: array([ 0.2783036 ,  0.92472019]),
 297: array([ 0.22631023,  0.88372588]),
 298: array([ 0.23943028,  0.90975154]),
 299: array([ 0.55266889,  0.99504838])}

Convert positions dict to dataframe with node information

This positions dataframe is a dictionary mapping the node id (in this case, a number) and the $(x, y)$ position. The nodes are in exactly the same order as the rows of the distances dataframe we gave phenograph.cluster.


In [16]:
networkplots.get_nodes_specs??

Looks like this function can deal with if we already have some clusters defined in our metadata! Let's look at our cell_metadata and remind ourselves of which column we might like to use for the other_cluster_col value.


In [17]:
cell_metadata.head()


Out[17]:
cluster_id celltype cluster_n cluster_n_celltype cluster_celltype_with_id
r1_TTCCTGCTAGGC cluster_24 Rods 24 #24 (Rods) Rods (cluster_24)
r1_TGGAGATACTCT cluster_24 Rods 24 #24 (Rods) Rods (cluster_24)
r1_CGTCTACATCCG cluster_24 Rods 24 #24 (Rods) Rods (cluster_24)
r1_CAAGCTTGGCGC cluster_24 Rods 24 #24 (Rods) Rods (cluster_24)
r1_ACTCACATAGAG cluster_24 Rods 24 #24 (Rods) Rods (cluster_24)

In this case, I'd like to use the cluster_n_celltype column.

Let's take a look at the code again to see how the networkplots.get_nodes_specs function uses the metadata:


In [18]:
networkplots.get_nodes_specs??

Looks like this function uses another one, called labels_to_colors -- what does that do?


In [19]:
networkplots.labels_to_colors??

Now let's use get_nodes_specs to create a dataframe of information about nodes so we can plot them.


In [26]:
nodes_specs = networkplots.get_nodes_specs(
    positions, cell_metadata, distances.index, 
    communities, other_cluster_col='cluster_n_celltype',
    palette='Set2')
print(nodes_specs.shape)
nodes_specs.head()


(300, 11)
Out[26]:
xs ys community barcode cluster_id celltype cluster_n cluster_n_celltype cluster_celltype_with_id other_cluster_color community_color
0 0.118589 0.183502 Community #0 r1_TTCCTGCTAGGC cluster_24 Rods 24 #24 (Rods) Rods (cluster_24) #66c2a5 #66c2a5
28 0.020176 0.367893 Community #0 r1_ATGGCTCGCAAA cluster_24 Rods 24 #24 (Rods) Rods (cluster_24) #66c2a5 #66c2a5
176 0.070618 0.476837 Community #0 r1_CGATGGCTGGAC cluster_24 Rods 24 #24 (Rods) Rods (cluster_24) #66c2a5 #66c2a5
26 0.102607 0.389927 Community #0 r1_GCGTGCTACTAC cluster_24 Rods 24 #24 (Rods) Rods (cluster_24) #66c2a5 #66c2a5
3 0.131339 0.825270 Community #0 r1_GGTAAGGCGCTC cluster_24 Rods 24 #24 (Rods) Rods (cluster_24) #66c2a5 #66c2a5

Convert positions dict to dataframe with edge information

We've now created a dataframe containing the x,y positions, the community labels, and the colors for the communities and other clusters we were interested in. Now we want to do the same for the edges (lines between cells).

Let's take a look at the function we'll use:


In [21]:
networkplots.get_edges_specs??

What arguments does it take? What does it do with them? What does it return?

Exercise 3

Create a variable called edges_specs using the networkplots.get_edges_specs and the correct inputs.


In [22]:
# YOUR CODE HERE


In [23]:
edges_specs = networkplots.get_edges_specs(graph, positions)
print(edges_specs.shape)
edges_specs.head()


(2018, 3)
Out[23]:
xs ys alphas
0 [0.118589262338, 0.173616226047] [0.183501529868, 0.140918929513] 0.283333
1 [0.118589262338, 0.0308378367884] [0.183501529868, 0.425073372097] 0.191667
2 [0.118589262338, 0.182008718558] [0.183501529868, 0.126918292813] 0.229412
3 [0.118589262338, 0.0269905395037] [0.183501529868, 0.341525865025] 0.140741
4 [0.118589262338, 0.202089396812] [0.183501529868, 0.124308686151] 0.164706

To be able to use the dataframes with the Bokeh plotting language, we need to convert our dataframes into ColumnDataSource objects.


In [24]:
nodes_source = ColumnDataSource(nodes_specs)
edges_source = ColumnDataSource(edges_specs)

In [25]:
# --- First tab: KNN clustering --- #
tab1 = networkplots.plot_graph(nodes_source, edges_source, 
                               legend_col='community',
                  color_col='community_color', tab=True,
                  title='KNN Clustering')

# --- Second tab: Clusters from paper --- #
tab2 = networkplots.plot_graph(nodes_source, edges_source,
                  legend_col='cluster_n_celltype', tab=True,
                  color_col='other_cluster_color',
                  title="Clusters from paper")

tabs = Tabs(tabs=[tab1, tab2])
show(tabs)