Note: This is not really a "network analysis" - we are only looking at the graph and seeing what cells are there. if you want to do more than just zoom in and look around at the cells in the graphs, I recommend using Cytoscape for visualizing newtorks.



In [1]:

    
# Interactive jupyter widgets - use IntSlider directly for more control
from ipywidgets import IntSlider, interact

# Convert RGB colors to hex cor portability
from matplotlib.colors import rgb2hex

# Visualize networks
import networkx

# Numerical python
import numpy as np

# Pandas for dataframes
import pandas as pd

# K-nearest neighbors cell clustering from Dana Pe'er's lab
import phenograph

# Make color palettes
import seaborn as sns
%matplotlib inline

# Bokeh - interactive plotting in the browser
from bokeh.plotting import figure, show, output_file
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.models.widgets import Panel, Tabs
from bokeh.layouts import widgetbox
from bokeh.io import output_notebook

# Local file: networkplots.py
import networkplots

# This line is required for the plots to appear in the notebooks
output_notebook()









    





    
        
        Loading BokehJS ...

At this point, you can follow along with either the pre-baked Macosko2015 amacrine data, or you can load in your own expression matrices. For the best experience, make sure that the rows are cells and the columns are gene names.



In [2]:

    
import macosko2015
counts, cell_metadata, gene_metadata = macosko2015.load_big_clusters()
counts.head()









    Out[2]:







  
    
      
      2010107E04RIK
      4930447C04RIK
      A930011O12RIK
      ABCA8A
      ABLIM1
      ACSL3
      AIPL1
      ALDOC
      ANK3
      APLP2
      ...
      VEGFA
      VIM
      VSTM2B
      VSX1
      VSX2
      WIPI1
      YWHAB
      ZBTB20
      ZFP365
      ZFP36L1
    
  
  
    
      r1_TTCCTGCTAGGC
      2
      0
      0
      0
      1
      0
      1
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      r1_TGGAGATACTCT
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      1
      0
      0
    
    
      r1_CGTCTACATCCG
      2
      0
      0
      0
      0
      0
      2
      0
      0
      1
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      r1_CAAGCTTGGCGC
      0
      0
      11
      0
      1
      0
      6
      0
      0
      2
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      1
      0
    
    
      r1_ACTCACATAGAG
      1
      0
      0
      0
      0
      0
      0
      0
      0
      1
      ...
      0
      0
      0
      0
      0
      0
      0
      2
      0
      0
    
  

5 rows × 259 columns

Calculate correlation between cells:



In [3]:

    
correlations = counts.T.rank().corr()
print(correlations.shape)
correlations.head()









    



(300, 300)






    Out[3]:







  
    
      
      r1_TTCCTGCTAGGC
      r1_TGGAGATACTCT
      r1_CGTCTACATCCG
      r1_CAAGCTTGGCGC
      r1_ACTCACATAGAG
      r1_TAACGGACACGC
      r1_CGCATGGGATAC
      r1_TAACGACGCTTG
      r1_TCGGCAGCCTCT
      r1_TAGGATGCAAAC
      ...
      r1_AGGGTGGGTACA
      r1_AATGCTGCAAGA
      r1_GTCGGGCCTTTC
      r1_GGGTCAGCGGCG
      r1_CTGGACCTGCCC
      r1_AAGATATTGCTG
      r1_GAGACCTCATGG
      r1_CGGAGCGCGACA
      r1_AAGGACAGATCC
      r1_ATATGCACCCTA
    
  
  
    
      r1_TTCCTGCTAGGC
      1.000000
      0.578489
      0.592947
      0.581111
      0.600062
      0.668730
      0.562366
      0.537223
      0.625188
      0.627728
      ...
      -0.127396
      -0.238725
      -0.191087
      -0.062375
      -0.070431
      -0.211101
      0.004142
      0.005390
      0.028681
      -0.208886
    
    
      r1_TGGAGATACTCT
      0.578489
      1.000000
      0.605171
      0.668457
      0.605529
      0.699568
      0.626681
      0.619552
      0.686334
      0.603006
      ...
      -0.088473
      -0.164247
      -0.091119
      -0.012380
      0.002600
      -0.128525
      0.110028
      0.123022
      0.087241
      -0.151023
    
    
      r1_CGTCTACATCCG
      0.592947
      0.605171
      1.000000
      0.592150
      0.589383
      0.616885
      0.539639
      0.459749
      0.633616
      0.563735
      ...
      -0.110518
      -0.131933
      -0.131094
      -0.019492
      -0.019556
      -0.105237
      0.023963
      0.057967
      0.124087
      -0.138839
    
    
      r1_CAAGCTTGGCGC
      0.581111
      0.668457
      0.592150
      1.000000
      0.614245
      0.747307
      0.610552
      0.624505
      0.670207
      0.682267
      ...
      -0.052749
      -0.108256
      -0.081267
      -0.036022
      0.048468
      -0.154414
      0.184313
      0.051814
      0.141338
      -0.155600
    
    
      r1_ACTCACATAGAG
      0.600062
      0.605529
      0.589383
      0.614245
      1.000000
      0.615884
      0.642180
      0.556297
      0.648107
      0.566039
      ...
      -0.104368
      -0.184757
      -0.136784
      -0.045760
      0.003680
      -0.183599
      0.096902
      0.015629
      0.036012
      -0.142725
    
  

5 rows × 300 columns

Correlation != distance

Correlation is not equal to distance. If two things are exactly the same, their correlation value is 1. But in space, if two things are exactly the same, the distance between them is 0. Therefore, correlation is not a distance! Correlation is a similarity metric, where bigger = more similar. But we want a dissimilarity (aka distance) metric.

Take a look for yourself. Many values in the distribution of all correlation values are near zero (not correlated), and a blip near 1 ( self-correlations).



In [4]:

    
sns.distplot(correlations.values.flat)









    Out[4]:





<matplotlib.axes._subplots.AxesSubplot at 0x117262c88>

But for building a K-nearest neighbors graph, we want the closest things (in distance space) to be actually close. So we'll convert our correlation ($\rho$) into a distance ($d$) using this equation:

$$ d = \sqrt{2(1-\rho)} $$

You can look at the code for networkplots.correlation_to_distance to convince yourself that's actually what it's doing:



In [5]:

    
networkplots.correlation_to_distance??

Exercise 1

Create a dataframe called distance using the correlation_to_distance function from networkplots on your corr dataframe.



In [6]:

    
# YOUR CODE HERE



In [7]:

    
distances = networkplots.correlation_to_distance(correlations)
distances.head()









    Out[7]:







  
    
      
      r1_TTCCTGCTAGGC
      r1_TGGAGATACTCT
      r1_CGTCTACATCCG
      r1_CAAGCTTGGCGC
      r1_ACTCACATAGAG
      r1_TAACGGACACGC
      r1_CGCATGGGATAC
      r1_TAACGACGCTTG
      r1_TCGGCAGCCTCT
      r1_TAGGATGCAAAC
      ...
      r1_AGGGTGGGTACA
      r1_AATGCTGCAAGA
      r1_GTCGGGCCTTTC
      r1_GGGTCAGCGGCG
      r1_CTGGACCTGCCC
      r1_AAGATATTGCTG
      r1_GAGACCTCATGG
      r1_CGGAGCGCGACA
      r1_AAGGACAGATCC
      r1_ATATGCACCCTA
    
  
  
    
      r1_TTCCTGCTAGGC
      0.000000
      0.918162
      0.902278
      0.915303
      0.894357
      0.813966
      0.935558
      0.962057
      0.865809
      0.862869
      ...
      1.501597
      1.573992
      1.543429
      1.457653
      1.463168
      1.556343
      1.411282
      1.410397
      1.393785
      1.554919
    
    
      r1_TGGAGATACTCT
      0.918162
      0.000000
      0.888627
      0.814301
      0.888224
      0.775154
      0.864082
      0.872293
      0.792043
      0.891061
      ...
      1.475448
      1.525941
      1.477240
      1.422941
      1.412374
      1.502348
      1.334146
      1.324370
      1.351118
      1.517250
    
    
      r1_CGTCTACATCCG
      0.902278
      0.888627
      0.000000
      0.903161
      0.906219
      0.875346
      0.959543
      1.039472
      0.856018
      0.934093
      ...
      1.490314
      1.504615
      1.504057
      1.427930
      1.427975
      1.486766
      1.397166
      1.372613
      1.323565
      1.509198
    
    
      r1_CAAGCTTGGCGC
      0.915303
      0.814301
      0.903161
      0.000000
      0.878356
      0.710905
      0.882551
      0.866597
      0.812149
      0.797162
      ...
      1.451034
      1.488795
      1.470556
      1.439459
      1.379516
      1.519483
      1.277252
      1.377088
      1.310467
      1.520263
    
    
      r1_ACTCACATAGAG
      0.894357
      0.888224
      0.906219
      0.878356
      0.000000
      0.876488
      0.845955
      0.942022
      0.838919
      0.931624
      ...
      1.486181
      1.539323
      1.507835
      1.446209
      1.411609
      1.538570
      1.343948
      1.403119
      1.388516
      1.511771
    
  

5 rows × 300 columns

Exercise 2

Let's take a look at our values to make sure we have most of our values far away from zero. Use sns.distplot to look the flattened values of the distances dataframe.



In [8]:

    
# YOUR CODE HERE



In [9]:

    
sns.distplot(distances.values.flat)









    Out[9]:





<matplotlib.axes._subplots.AxesSubplot at 0x11be4dfd0>

Now we'll run phenograph.cluster, which returns three items:

communities: the cluster labels of each cell
sparse_matrix: a sparse matrix representing the connections between cells in the graph
Q: the modularity score. Higher is better, and the highest is 1.
- 0 means your graph is randomly connected and -1 means your graph isn't connected at all.



In [10]:

    
communities, sparse_matrix, Q = phenograph.cluster(distances, k=10)









    



Finding 10 nearest neighbors using minkowski metric and 'auto' algorithm
Neighbors computed in 0.10953593254089355 seconds
Jaccard graph constructed in 0.05929207801818848 seconds
Wrote graph to binary file in 0.005506038665771484 seconds
Running Louvain modularity optimization
After 1 runs, maximum modularity is Q = 0.862174
Louvain completed 21 runs in 0.21957707405090332 seconds
PhenoGraph complete in 0.41158509254455566 seconds

Let's take a look at each of these returned values



In [11]:

    
communities









    Out[11]:





array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 8,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 4, 4, 6, 4, 4, 6, 4, 4, 4, 6, 6, 4, 4, 6, 6, 4, 6, 0, 6,
       4, 0, 6, 4, 4, 6, 4, 4, 4, 4, 6, 6, 4, 6, 4, 6, 6, 4, 4, 4, 4, 6, 4,
       4, 4, 0, 4, 6, 4, 6, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 8, 1, 1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 1,
       1, 1, 1, 8, 1, 1, 1, 1, 1, 1, 1, 1, 5, 9, 5, 9, 5, 5, 5, 9, 5, 0, 1,
       5, 5, 5, 5, 5, 5, 9, 5, 5, 9, 0, 5, 9, 9, 5, 0, 1, 5, 9, 7, 2, 2, 5,
       5, 9, 9, 5, 5, 5, 5, 5, 9, 7, 5, 9, 9, 5, 5, 9, 2, 8, 2, 2, 2, 2, 2,
       2, 2, 8, 8, 2, 2, 2, 2, 2, 2, 2, 2, 8, 2, 0, 2, 2, 2, 8, 2, 2, 2, 2,
       8, 2, 2, 2, 2, 8, 8, 2, 2, 2, 2, 2, 2, 2, 2, 8, 2, 2, 8, 2, 3, 3, 3,
       3, 3, 7, 7, 3, 7, 3, 7, 3, 3, 3, 3, 7, 7, 3, 3, 7, 3, 3, 3, 3, 3, 3,
       7, 3, 7, 3, 3, 7, 3, 3, 7, 3, 3, 3, 3, 3, 3, 3, 3, 3, 7, 3, 7, 7, 7,
       3])



In [12]:

    
sparse_matrix









    Out[12]:





<300x300 sparse matrix of type '<class 'numpy.float64'>'
	with 2018 stored elements in COOrdinate format>



In [13]:

    
Q









    Out[13]:





0.862174

It looks like the communities labels each cell as belonging to a particular cluster, the sparse_matrix is some data type that we can't directly investigate, and Q is the modularity value.

Make a graph from the sparse matrix

To be able to lay out our graph in two dimensions, we'll use the networkx Python Package to build the graph and lay out the cells and edges.



In [14]:

    
graph = networkx.from_scipy_sparse_matrix(sparse_matrix)
graph









    Out[14]:





<networkx.classes.graph.Graph at 0x11c2926d8>

We'll use the "Spring layout" which is a force-directed layout that pushes cells and edges away from each other. We'll use the built-in networkx function called spring_layout on our graph:



In [15]:

    
positions = networkx.spring_layout(graph)
positions









    Out[15]:





{0: array([ 0.11858926,  0.18350153]),
 1: array([ 0.03878882,  0.52678488]),
 2: array([ 0.06187783,  0.27057012]),
 3: array([ 0.13133896,  0.82526991]),
 4: array([ 0.02870572,  0.35504813]),
 5: array([ 0.06732686,  0.7275114 ]),
 6: array([ 0.04621855,  0.36044474]),
 7: array([ 0.17361623,  0.14091893]),
 8: array([ 0.04696309,  0.33607266]),
 9: array([ 0.03083784,  0.42507337]),
 10: array([ 0.10086864,  0.78453325]),
 11: array([ 0.03378239,  0.60250719]),
 12: array([ 0.07255188,  0.25743317]),
 13: array([ 0.01940302,  0.50319039]),
 14: array([ 0.01421221,  0.39104004]),
 15: array([ 0.18200872,  0.12691829]),
 16: array([ 0.08915212,  0.77801319]),
 17: array([ 0.08524582,  0.74651175]),
 18: array([ 0.09154744,  0.69092874]),
 19: array([ 0.02699054,  0.34152587]),
 20: array([ 0.54825803,  0.35920535]),
 21: array([ 0.08758751,  0.21952919]),
 22: array([ 0.62387915,  0.65638029]),
 23: array([ 0.14092928,  0.17381448]),
 24: array([ 0.04706814,  0.67507474]),
 25: array([ 0.2020894 ,  0.12430869]),
 26: array([ 0.10260653,  0.38992709]),
 27: array([ 0.0204807 ,  0.37852492]),
 28: array([ 0.02017567,  0.36789348]),
 29: array([ 0.07375837,  0.56873045]),
 30: array([ 0.2213752 ,  0.09691239]),
 31: array([ 0.16468178,  0.15334521]),
 32: array([ 0.04187911,  0.68838894]),
 33: array([ 0.09969204,  0.2927752 ]),
 34: array([ 0.06528779,  0.64226415]),
 35: array([ 0.01308873,  0.56070617]),
 36: array([ 0.06131379,  0.59666014]),
 37: array([ 0.03768468,  0.66593983]),
 38: array([ 0.0851757 ,  0.70804583]),
 39: array([ 0.04848673,  0.29762774]),
 40: array([ 0.05314153,  0.62851654]),
 41: array([ 0.11050289,  0.80443645]),
 42: array([ 0.08373951,  0.73408128]),
 43: array([ 0.09726112,  0.5563277 ]),
 44: array([ 0.06233362,  0.30335043]),
 45: array([ 0.10438366,  0.20971344]),
 46: array([ 0.10138291,  0.43842533]),
 47: array([ 0.07491486,  0.23716008]),
 48: array([ 0.04261795,  0.31531652]),
 49: array([ 0.08419821,  0.76182709]),
 50: array([ 0.        ,  0.50135633]),
 51: array([ 0.23731881,  0.17988868]),
 52: array([ 0.80180874,  0.10983022]),
 53: array([ 0.00442117,  0.54742569]),
 54: array([ 0.01336328,  0.60135624]),
 55: array([ 0.78682577,  0.10851908]),
 56: array([ 0.00983536,  0.58592379]),
 57: array([ 0.00317729,  0.51530074]),
 58: array([ 0.01018359,  0.57057936]),
 59: array([ 0.76355457,  0.08266126]),
 60: array([ 0.61187897,  0.08831182]),
 61: array([ 0.04344781,  0.43003986]),
 62: array([ 0.00544706,  0.48391002]),
 63: array([ 0.63714297,  0.06584988]),
 64: array([ 0.74683587,  0.07806149]),
 65: array([ 0.01840027,  0.6383111 ]),
 66: array([ 0.7303398 ,  0.07807754]),
 67: array([ 0.12334463,  0.81283244]),
 68: array([ 0.7740847 ,  0.08888564]),
 69: array([ 0.1265545,  0.2445429]),
 70: array([ 0.00459482,  0.45812054]),
 71: array([ 0.56975859,  0.10588005]),
 72: array([ 0.06431486,  0.35483615]),
 73: array([ 0.0111814 ,  0.47287149]),
 74: array([ 0.69037477,  0.07613188]),
 75: array([ 0.00425981,  0.52980379]),
 76: array([ 0.11553287,  0.19416027]),
 77: array([ 0.05101942,  0.7076026 ]),
 78: array([ 0.08858468,  0.33772148]),
 79: array([ 0.51015523,  0.09267429]),
 80: array([ 0.68330104,  0.05716341]),
 81: array([ 0.00408352,  0.527812  ]),
 82: array([ 0.77642727,  0.10082551]),
 83: array([ 0.22806867,  0.20491782]),
 84: array([ 0.47912235,  0.10843628]),
 85: array([ 0.40731504,  0.13983928]),
 86: array([ 0.01428922,  0.61362392]),
 87: array([ 0.01698228,  0.40674804]),
 88: array([ 0.15437689,  0.1538087 ]),
 89: array([ 0.20847498,  0.09628633]),
 90: array([ 0.70776335,  0.0493038 ]),
 91: array([ 0.17446053,  0.24719983]),
 92: array([ 0.01190982,  0.43995485]),
 93: array([ 0.01599047,  0.45819373]),
 94: array([ 0.02166033,  0.64824161]),
 95: array([ 0.01860526,  0.62750625]),
 96: array([ 0.71821931,  0.05756627]),
 97: array([ 0.0177259 ,  0.39470809]),
 98: array([ 0.72571335,  0.06636586]),
 99: array([ 0.08967582,  0.22495115]),
 100: array([ 0.90440347,  0.37373148]),
 101: array([ 0.98837644,  0.497194  ]),
 102: array([ 0.99421628,  0.54806505]),
 103: array([ 0.97407892,  0.38541796]),
 104: array([ 0.9773887 ,  0.40222398]),
 105: array([ 0.98733707,  0.45859255]),
 106: array([ 0.9596544 ,  0.31595962]),
 107: array([ 0.86377267,  0.17692907]),
 108: array([ 0.99378177,  0.52236804]),
 109: array([ 0.90747383,  0.22506561]),
 110: array([ 0.97086967,  0.3679172 ]),
 111: array([ 0.91947173,  0.27681724]),
 112: array([ 0.92816039,  0.26233049]),
 113: array([ 0.9397754 ,  0.27196688]),
 114: array([ 0.92853404,  0.25654734]),
 115: array([ 0.91378829,  0.23495982]),
 116: array([ 0.91689686,  0.24783003]),
 117: array([ 0.95479629,  0.30710941]),
 118: array([ 0.82396014,  0.19164298]),
 119: array([ 0.98882437,  0.41382525]),
 120: array([ 0.97991477,  0.37630785]),
 121: array([ 0.88795946,  0.66236715]),
 122: array([ 0.93798847,  0.28477767]),
 123: array([ 0.98793094,  0.42588978]),
 124: array([ 0.9695464 ,  0.42562419]),
 125: array([ 0.93907463,  0.2992509 ]),
 126: array([ 0.82536446,  0.72038408]),
 127: array([ 0.89063737,  0.21991549]),
 128: array([ 0.97536851,  0.35652447]),
 129: array([ 0.95829931,  0.32634563]),
 130: array([ 0.95344059,  0.52143394]),
 131: array([ 0.89012132,  0.19544624]),
 132: array([ 0.98646658,  0.60576853]),
 133: array([ 0.89016702,  0.24989385]),
 134: array([ 0.86459339,  0.20256509]),
 135: array([ 0.98017279,  0.46970573]),
 136: array([ 0.84241553,  0.75774196]),
 137: array([ 0.98880173,  0.58591937]),
 138: array([ 0.95083716,  0.2957806 ]),
 139: array([ 0.98323523,  0.39280754]),
 140: array([ 0.95590096,  0.33775908]),
 141: array([ 0.86319056,  0.69719717]),
 142: array([ 0.99552434,  0.50660779]),
 143: array([ 0.97201276,  0.34515683]),
 144: array([ 0.98268577,  0.62486456]),
 145: array([ 0.9370174 ,  0.31812703]),
 146: array([ 0.99226899,  0.53399745]),
 147: array([ 0.99222004,  0.4461101 ]),
 148: array([ 0.99013472,  0.56301974]),
 149: array([ 0.99495157,  0.48269002]),
 150: array([ 0.90896699,  0.77189806]),
 151: array([ 0.67997352,  0.52313676]),
 152: array([ 0.90026074,  0.78238633]),
 153: array([ 0.79586975,  0.54205845]),
 154: array([ 0.86034775,  0.80829885]),
 155: array([ 0.94548914,  0.70568122]),
 156: array([ 0.92952805,  0.68695948]),
 157: array([ 0.6402237,  0.5471598]),
 158: array([ 0.79275356,  0.89497899]),
 159: array([ 0.0197928 ,  0.43237104]),
 160: array([ 0.96203807,  0.56936733]),
 161: array([ 0.81675605,  0.87302619]),
 162: array([ 0.87585549,  0.63010357]),
 163: array([ 0.88343529,  0.77976772]),
 164: array([ 0.8729441 ,  0.82506921]),
 165: array([ 0.94217691,  0.64589255]),
 166: array([ 0.83702325,  0.86190077]),
 167: array([ 0.64613269,  0.51331429]),
 168: array([ 0.9363757 ,  0.73048091]),
 169: array([ 0.91346939,  0.75792119]),
 170: array([ 0.48215994,  0.45958373]),
 171: array([ 0.08362013,  0.42027785]),
 172: array([ 0.85049408,  0.84362845]),
 173: array([ 0.7528975 ,  0.52655768]),
 174: array([ 0.54611298,  0.49816149]),
 175: array([ 0.80570781,  0.89151892]),
 176: array([ 0.0706183 ,  0.47683742]),
 177: array([ 0.8976432 ,  0.20514341]),
 178: array([ 0.9444338 ,  0.67723808]),
 179: array([ 0.53132185,  0.53567054]),
 180: array([ 0.19037035,  0.88790289]),
 181: array([ 0.25362351,  0.07314998]),
 182: array([ 0.64719718,  0.03401144]),
 183: array([ 0.77482558,  0.91216864]),
 184: array([ 0.92879871,  0.63705438]),
 185: array([ 0.77921521,  0.57386696]),
 186: array([ 0.48441444,  0.51494812]),
 187: array([ 0.87683231,  0.80212795]),
 188: array([ 0.92445702,  0.73770328]),
 189: array([ 0.85559571,  0.83295749]),
 190: array([ 0.88677724,  0.79362932]),
 191: array([ 0.75900416,  0.92347129]),
 192: array([ 0.64267299,  0.48625357]),
 193: array([ 0.25252546,  0.91460733]),
 194: array([ 0.82512586,  0.86612242]),
 195: array([ 0.41015457,  0.50837184]),
 196: array([ 0.56319261,  0.53892885]),
 197: array([ 0.93230443,  0.71730313]),
 198: array([ 0.91227493,  0.74383405]),
 199: array([ 0.55916367,  0.47606511]),
 200: array([ 0.3813931 ,  0.02300292]),
 201: array([ 0.70901658,  0.58662827]),
 202: array([ 0.39338258,  0.02832884]),
 203: array([ 0.5583593 ,  0.00323563]),
 204: array([ 0.37176388,  0.0184432 ]),
 205: array([ 0.52730117,  0.01794663]),
 206: array([ 0.51119407,  0.00843315]),
 207: array([ 0.41304334,  0.01986438]),
 208: array([ 0.61485994,  0.02027692]),
 209: array([ 0.63672765,  0.79472749]),
 210: array([ 0.76443832,  0.74614683]),
 211: array([ 0.5701922 ,  0.01338734]),
 212: array([ 0.6697132 ,  0.24201824]),
 213: array([ 0.28264041,  0.05937227]),
 214: array([ 0.52155121,  0.00343442]),
 215: array([ 0.36434653,  0.03729801]),
 216: array([ 0.44115074,  0.01214266]),
 217: array([ 0.60667485,  0.17793386]),
 218: array([ 0.40606777,  0.03540001]),
 219: array([ 0.76754114,  0.78613883]),
 220: array([ 0.29432858,  0.05408193]),
 221: array([ 0.06213309,  0.28185572]),
 222: array([ 0.58932094,  0.02081146]),
 223: array([ 0.66239477,  0.03629662]),
 224: array([ 0.54454705,  0.0128397 ]),
 225: array([ 0.6120928 ,  0.77825351]),
 226: array([ 0.47865317,  0.01692952]),
 227: array([ 0.30811342,  0.0525207 ]),
 228: array([ 0.42488747,  0.03639465]),
 229: array([ 0.5346145 ,  0.00481447]),
 230: array([ 0.7257613 ,  0.75928113]),
 231: array([ 0.46363127,  0.        ]),
 232: array([ 0.42856612,  0.0203981 ]),
 233: array([ 0.57821369,  0.01195448]),
 234: array([ 0.32324787,  0.0521844 ]),
 235: array([ 0.73460897,  0.80638286]),
 236: array([ 0.6818737 ,  0.82052141]),
 237: array([ 0.3523324 ,  0.03349241]),
 238: array([ 0.63230513,  0.0207339 ]),
 239: array([ 0.49261184,  0.00462282]),
 240: array([ 0.46140279,  0.01268595]),
 241: array([ 0.65651195,  0.02538416]),
 242: array([ 0.49784712,  0.00826196]),
 243: array([ 0.60522239,  0.01485168]),
 244: array([ 0.26778128,  0.06300987]),
 245: array([ 0.69898669,  0.75710664]),
 246: array([ 0.34063225,  0.04043316]),
 247: array([ 0.32681081,  0.03938789]),
 248: array([ 0.6832224 ,  0.79131094]),
 249: array([ 0.67434037,  0.03125217]),
 250: array([ 0.30703108,  0.96094993]),
 251: array([ 0.41960667,  0.99206416]),
 252: array([ 0.64554356,  0.96822995]),
 253: array([ 0.63449405,  0.97354638]),
 254: array([ 0.29126739,  0.95144391]),
 255: array([ 0.27020626,  0.90478704]),
 256: array([ 0.16516003,  0.87004806]),
 257: array([ 0.60686871,  0.98600139]),
 258: array([ 0.3424023,  0.9222445]),
 259: array([ 0.35883378,  0.97506399]),
 260: array([ 0.38256917,  0.93553982]),
 261: array([ 0.44187846,  0.99365616]),
 262: array([ 0.57475574,  0.97987729]),
 263: array([ 0.35991017,  0.97621658]),
 264: array([ 0.51846526,  0.978178  ]),
 265: array([ 0.45197268,  0.96352601]),
 266: array([ 0.49889408,  0.99401143]),
 267: array([ 0.48968649,  0.98680291]),
 268: array([ 0.32685562,  0.96555533]),
 269: array([ 0.39816221,  0.96641151]),
 270: array([ 0.59309661,  0.98745052]),
 271: array([ 0.56776369,  0.99089595]),
 272: array([ 0.33929992,  0.96538033]),
 273: array([ 0.51384883,  0.99606347]),
 274: array([ 0.46102988,  0.99277387]),
 275: array([ 0.3712744 ,  0.97363884]),
 276: array([ 0.38215318,  0.96936228]),
 277: array([ 0.47960387,  1.        ]),
 278: array([ 0.22322207,  0.90268936]),
 279: array([ 0.34463582,  0.97388282]),
 280: array([ 0.66791789,  0.96455161]),
 281: array([ 0.42361231,  0.9535159 ]),
 282: array([ 0.39576707,  0.986615  ]),
 283: array([ 0.28010021,  0.94675327]),
 284: array([ 0.51436653,  0.9936281 ]),
 285: array([ 0.3195008 ,  0.96414848]),
 286: array([ 0.65775055,  0.97238168]),
 287: array([ 0.6281704 ,  0.98057599]),
 288: array([ 0.47415923,  0.99180317]),
 289: array([ 0.38073448,  0.98682451]),
 290: array([ 0.26133079,  0.9368331 ]),
 291: array([ 0.61414895,  0.9788418 ]),
 292: array([ 0.4319572,  0.9903854]),
 293: array([ 0.41210286,  0.99199226]),
 294: array([ 0.32135049,  0.93487226]),
 295: array([ 0.68440526,  0.96098115]),
 296: array([ 0.2783036 ,  0.92472019]),
 297: array([ 0.22631023,  0.88372588]),
 298: array([ 0.23943028,  0.90975154]),
 299: array([ 0.55266889,  0.99504838])}

Convert `positions` dict to dataframe with node information

This positions dataframe is a dictionary mapping the node id (in this case, a number) and the $(x, y)$ position. The nodes are in exactly the same order as the rows of the distances dataframe we gave phenograph.cluster.



In [16]:

    
networkplots.get_nodes_specs??

Looks like this function can deal with if we already have some clusters defined in our metadata! Let's look at our cell_metadata and remind ourselves of which column we might like to use for the other_cluster_col value.



In [17]:

    
cell_metadata.head()









    Out[17]:







  
    
      
      cluster_id
      celltype
      cluster_n
      cluster_n_celltype
      cluster_celltype_with_id
    
  
  
    
      r1_TTCCTGCTAGGC
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)
    
    
      r1_TGGAGATACTCT
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)
    
    
      r1_CGTCTACATCCG
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)
    
    
      r1_CAAGCTTGGCGC
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)
    
    
      r1_ACTCACATAGAG
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)

In this case, I'd like to use the cluster_n_celltype column.

Let's take a look at the code again to see how the networkplots.get_nodes_specs function uses the metadata:



In [18]:

    
networkplots.get_nodes_specs??

Looks like this function uses another one, called labels_to_colors -- what does that do?



In [19]:

    
networkplots.labels_to_colors??

Now let's use get_nodes_specs to create a dataframe of information about nodes so we can plot them.



In [26]:

    
nodes_specs = networkplots.get_nodes_specs(
    positions, cell_metadata, distances.index, 
    communities, other_cluster_col='cluster_n_celltype',
    palette='Set2')
print(nodes_specs.shape)
nodes_specs.head()









    



(300, 11)






    Out[26]:







  
    
      
      xs
      ys
      community
      barcode
      cluster_id
      celltype
      cluster_n
      cluster_n_celltype
      cluster_celltype_with_id
      other_cluster_color
      community_color
    
  
  
    
      0
      0.118589
      0.183502
      Community #0
      r1_TTCCTGCTAGGC
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)
      #66c2a5
      #66c2a5
    
    
      28
      0.020176
      0.367893
      Community #0
      r1_ATGGCTCGCAAA
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)
      #66c2a5
      #66c2a5
    
    
      176
      0.070618
      0.476837
      Community #0
      r1_CGATGGCTGGAC
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)
      #66c2a5
      #66c2a5
    
    
      26
      0.102607
      0.389927
      Community #0
      r1_GCGTGCTACTAC
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)
      #66c2a5
      #66c2a5
    
    
      3
      0.131339
      0.825270
      Community #0
      r1_GGTAAGGCGCTC
      cluster_24
      Rods
      24
      #24 (Rods)
      Rods (cluster_24)
      #66c2a5
      #66c2a5

Convert `positions` dict to dataframe with edge information

We've now created a dataframe containing the x,y positions, the community labels, and the colors for the communities and other clusters we were interested in. Now we want to do the same for the edges (lines between cells).

Let's take a look at the function we'll use:



In [21]:

    
networkplots.get_edges_specs??

What arguments does it take? What does it do with them? What does it return?

Exercise 3

Create a variable called edges_specs using the networkplots.get_edges_specs and the correct inputs.



In [22]:

    
# YOUR CODE HERE



In [23]:

    
edges_specs = networkplots.get_edges_specs(graph, positions)
print(edges_specs.shape)
edges_specs.head()









    



(2018, 3)






    Out[23]:







  
    
      
      xs
      ys
      alphas
    
  
  
    
      0
      [0.118589262338, 0.173616226047]
      [0.183501529868, 0.140918929513]
      0.283333
    
    
      1
      [0.118589262338, 0.0308378367884]
      [0.183501529868, 0.425073372097]
      0.191667
    
    
      2
      [0.118589262338, 0.182008718558]
      [0.183501529868, 0.126918292813]
      0.229412
    
    
      3
      [0.118589262338, 0.0269905395037]
      [0.183501529868, 0.341525865025]
      0.140741
    
    
      4
      [0.118589262338, 0.202089396812]
      [0.183501529868, 0.124308686151]
      0.164706

To be able to use the dataframes with the Bokeh plotting language, we need to convert our dataframes into ColumnDataSource objects.



In [24]:

    
nodes_source = ColumnDataSource(nodes_specs)
edges_source = ColumnDataSource(edges_specs)



In [25]:

    
# --- First tab: KNN clustering --- #
tab1 = networkplots.plot_graph(nodes_source, edges_source, 
                               legend_col='community',
                  color_col='community_color', tab=True,
                  title='KNN Clustering')

# --- Second tab: Clusters from paper --- #
tab2 = networkplots.plot_graph(nodes_source, edges_source,
                  legend_col='cluster_n_celltype', tab=True,
                  color_col='other_cluster_color',
                  title="Clusters from paper")

tabs = Tabs(tabs=[tab1, tab2])
show(tabs)

	2010107E04RIK	A930011O12RIK	ABLIM1	AIPL1	APLP2	...	ZBTB20	ZFP365
r1_TTCCTGCTAGGC	2	0	1	1	0	...	0	0
r1_TGGAGATACTCT	0	1	0	0	0	...	1	0
r1_CGTCTACATCCG	2	0	0	2	1	...	0	0
r1_CAAGCTTGGCGC	0	11	1	6	2	...	0	1
r1_ACTCACATAGAG	1	0	0	0	1	...	2	0

	r1_TTCCTGCTAGGC	r1_TGGAGATACTCT	r1_CGTCTACATCCG	r1_CAAGCTTGGCGC	r1_ACTCACATAGAG	r1_TAACGGACACGC	r1_CGCATGGGATAC	r1_TAACGACGCTTG	r1_TCGGCAGCCTCT	r1_TAGGATGCAAAC	...	r1_AGGGTGGGTACA	r1_AATGCTGCAAGA	r1_GTCGGGCCTTTC	r1_GGGTCAGCGGCG	r1_CTGGACCTGCCC	r1_AAGATATTGCTG	r1_GAGACCTCATGG	r1_CGGAGCGCGACA	r1_AAGGACAGATCC	r1_ATATGCACCCTA
r1_TTCCTGCTAGGC	1.000000	0.578489	0.592947	0.581111	0.600062	0.668730	0.562366	0.537223	0.625188	0.627728	...	-0.127396	-0.238725	-0.191087	-0.062375	-0.070431	-0.211101	0.004142	0.005390	0.028681	-0.208886
r1_TGGAGATACTCT	0.578489	1.000000	0.605171	0.668457	0.605529	0.699568	0.626681	0.619552	0.686334	0.603006	...	-0.088473	-0.164247	-0.091119	-0.012380	0.002600	-0.128525	0.110028	0.123022	0.087241	-0.151023
r1_CGTCTACATCCG	0.592947	0.605171	1.000000	0.592150	0.589383	0.616885	0.539639	0.459749	0.633616	0.563735	...	-0.110518	-0.131933	-0.131094	-0.019492	-0.019556	-0.105237	0.023963	0.057967	0.124087	-0.138839
r1_CAAGCTTGGCGC	0.581111	0.668457	0.592150	1.000000	0.614245	0.747307	0.610552	0.624505	0.670207	0.682267	...	-0.052749	-0.108256	-0.081267	-0.036022	0.048468	-0.154414	0.184313	0.051814	0.141338	-0.155600
r1_ACTCACATAGAG	0.600062	0.605529	0.589383	0.614245	1.000000	0.615884	0.642180	0.556297	0.648107	0.566039	...	-0.104368	-0.184757	-0.136784	-0.045760	0.003680	-0.183599	0.096902	0.015629	0.036012	-0.142725

	r1_TTCCTGCTAGGC	r1_TGGAGATACTCT	r1_CGTCTACATCCG	r1_CAAGCTTGGCGC	r1_ACTCACATAGAG	r1_TAACGGACACGC	r1_CGCATGGGATAC	r1_TAACGACGCTTG	r1_TCGGCAGCCTCT	r1_TAGGATGCAAAC	...	r1_AGGGTGGGTACA	r1_AATGCTGCAAGA	r1_GTCGGGCCTTTC	r1_GGGTCAGCGGCG	r1_CTGGACCTGCCC	r1_AAGATATTGCTG	r1_GAGACCTCATGG	r1_CGGAGCGCGACA	r1_AAGGACAGATCC	r1_ATATGCACCCTA
r1_TTCCTGCTAGGC	0.000000	0.918162	0.902278	0.915303	0.894357	0.813966	0.935558	0.962057	0.865809	0.862869	...	1.501597	1.573992	1.543429	1.457653	1.463168	1.556343	1.411282	1.410397	1.393785	1.554919
r1_TGGAGATACTCT	0.918162	0.000000	0.888627	0.814301	0.888224	0.775154	0.864082	0.872293	0.792043	0.891061	...	1.475448	1.525941	1.477240	1.422941	1.412374	1.502348	1.334146	1.324370	1.351118	1.517250
r1_CGTCTACATCCG	0.902278	0.888627	0.000000	0.903161	0.906219	0.875346	0.959543	1.039472	0.856018	0.934093	...	1.490314	1.504615	1.504057	1.427930	1.427975	1.486766	1.397166	1.372613	1.323565	1.509198
r1_CAAGCTTGGCGC	0.915303	0.814301	0.903161	0.000000	0.878356	0.710905	0.882551	0.866597	0.812149	0.797162	...	1.451034	1.488795	1.470556	1.439459	1.379516	1.519483	1.277252	1.377088	1.310467	1.520263
r1_ACTCACATAGAG	0.894357	0.888224	0.906219	0.878356	0.000000	0.876488	0.845955	0.942022	0.838919	0.931624	...	1.486181	1.539323	1.507835	1.446209	1.411609	1.538570	1.343948	1.403119	1.388516	1.511771

	cluster_id	celltype	cluster_n	cluster_n_celltype	cluster_celltype_with_id
r1_TTCCTGCTAGGC	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)
r1_TGGAGATACTCT	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)
r1_CGTCTACATCCG	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)
r1_CAAGCTTGGCGC	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)
r1_ACTCACATAGAG	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)

	xs	ys	community	barcode	cluster_id	celltype	cluster_n	cluster_n_celltype	cluster_celltype_with_id	other_cluster_color	community_color
0	0.118589	0.183502	Community #0	r1_TTCCTGCTAGGC	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)	#66c2a5	#66c2a5
28	0.020176	0.367893	Community #0	r1_ATGGCTCGCAAA	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)	#66c2a5	#66c2a5
176	0.070618	0.476837	Community #0	r1_CGATGGCTGGAC	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)	#66c2a5	#66c2a5
26	0.102607	0.389927	Community #0	r1_GCGTGCTACTAC	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)	#66c2a5	#66c2a5
3	0.131339	0.825270	Community #0	r1_GGTAAGGCGCTC	cluster_24	Rods	24	#24 (Rods)	Rods (cluster_24)	#66c2a5	#66c2a5

	xs	ys	alphas
0	[0.118589262338, 0.173616226047]	[0.183501529868, 0.140918929513]	0.283333
1	[0.118589262338, 0.0308378367884]	[0.183501529868, 0.425073372097]	0.191667
2	[0.118589262338, 0.182008718558]	[0.183501529868, 0.126918292813]	0.229412
3	[0.118589262338, 0.0269905395037]	[0.183501529868, 0.341525865025]	0.140741
4	[0.118589262338, 0.202089396812]	[0.183501529868, 0.124308686151]	0.164706