Applying classifiers to Macaulay2016

We're going to use the classifier knowledge that we've learned so far and apply it to the shalek2013 and macaulay2016 datasets.

For the GO analysis, we'll need a few other packages:

mygene for looking up the gene ontology categories of genes
goatools for performing gene ontology enrichment analysis
fishers_exact_test for goatools

Use the following commands at your terminal to install the packages. Some of them are on Github so it's important to get the whole command right.

$ pip install mygene
$ pip install git+git://github.com/olgabot/goatools.git
$ pip install git+https://github.com/brentp/fishers_exact_test.git



In [1]:

    
# Alphabetical order is standard
# We're doing "import superlongname as abbrev" for our laziness - this way we don't have to type out the whole thing each time.

# From python standard library
import collections

# Python plotting library
import matplotlib.pyplot as plt

# Numerical python library (pronounced "num-pie")
import numpy as np

# Dataframes in Python
import pandas as pd

# Statistical plotting library we'll use
import seaborn as sns
sns.set(style='whitegrid')

# Matrix decomposition
from sklearn.decomposition import PCA, FastICA

# Matrix decomposition
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier

# Manifold learning
from sklearn.manifold import MDS, TSNE

# Gene ontology
import goatools
import mygene

# Initialize the "mygene.info" (http://mygene.info/) interface
mg = mygene.MyGeneInfo()


# This is necessary to show the plotted figures inside the notebook -- "inline" with the notebook cells
%matplotlib inline









    



/Users/olga/anaconda3/envs/single-cell-bioinformatics/lib/python3.5/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))



In [2]:

    
shalek2013_metadata = pd.read_csv('../data/shalek2013/metadata.csv', 
                               
                                     # Sets the first (Python starts counting from 0 not 1) column as the row names
                                      index_col=0)
shalek2013_expression = pd.read_csv('../data/shalek2013/expression.csv', 
                               
                                     # Sets the first (Python starts counting from 0 not 1) column as the row names
                                      index_col=0)
shalek2013_expression_feature = pd.read_csv('../data/shalek2013/expression_feature.csv', 
                               
                                     # Sets the first (Python starts counting from 0 not 1) column as the row names
                                      index_col=0)

Gene Ontology (GO) Enrichment

Gene ontology is a tree (directed acyclic graph) of gene annotations. The topmost node is the most general, and the bottommost nodes are the most specific. Here is an example GO graph

Perform GO enrichment analysis (GOEA)

GOEA Step 1: Download GO graph file of "obo" type (same for all species)

This will download the file "go-basic.obo" if it doesn't already exist. This only needs to be done once.



In [3]:

    
from goatools.base import download_go_basic_obo
obo_fname = download_go_basic_obo()

# Show the filename
obo_fname









    



  EXISTS: go-basic.obo






    Out[3]:





'go-basic.obo'

GOEA Step 2: Create the GO graph (same for all species)



In [4]:

    
obo_dag = goatools.obo_parser.GODag(obo_file='go-basic.obo')









    



load obo file go-basic.obo






    



go-basic.obo: format-version(1.2) data-version(releases/2016-06-15)






    



46823 nodes imported

GOEA Step 3: Get gene ID to GO id mapping (species-specific and experiment-specific)

Here we are establishing the background for our GOEA. Defining your background is very important because, for example, tehre are lots of neural genes so if you use all human genes as background in your study of which genes are upregulated in Neuron Type X vs Neuron Type Y, you'll get a bunch of neuron genes (which is true) but not the smaller differences between X and Y. Typicall, you use all expressed genes as the background.

For our data, we can access all expressed genes very simply by getting the column names in the dataframe: shalek2013_expression.columns.



In [ ]:

    
GO_KEYS = 'go.BP', 'go.MF', 'go.CC'


mygene_output = mg.querymany(shalek2013_expression.columns, 
                             scopes='symbol', fields=GO_KEYS, species='mouse', 
                             returnall=True)

def parse_mygene_output(mygene_output):
    """Convert mygene.querymany output to a gene id to go term mapping
    
    Parameters
    ----------
    mygene_output : dict or list
        Dictionary (returnall=True) or list (returnall=False) of 
        output from mygene.querymany
        
    Output
    ------
    gene_name_to_go : dict
        Mapping of gene name to a set of GO ids    
    """
    # if "returnall=True" was specified, need to get just the "out" key
    if isinstance(mygene_output, dict):
        mygene_output = mygene_output['out']

    gene_name_to_go = collections.defaultdict(set)

    for line in mygene_output:
        gene_name = line['query']
        for go_key in GO_KEYS:
            try:
                go_terms = line[go_key]
            except KeyError:
                continue
            if isinstance(go_terms, dict):
                go_ids = set([go_terms['id']])
            else:
                go_ids = set(x['id'] for x in go_terms)
            gene_name_to_go[gene_name] |= go_ids
    return gene_name_to_go

gene_name_to_go = parse_mygene_output(mygene_output)

GOEA Step 4: Create a GO enrichment calculator object `go_enricher` (species- and experiment-specific)

In this step, we are using the two objects we've created (obo_dag from Step 2 and gene_name_to_go from Step 3) plus the gene ids to create a go_enricher object



In [7]:

    
go_enricher = goatools.GOEnrichmentStudy(shalek2013_expression.columns, gene_name_to_go, obo_dag)









    



Propagating term counts to parents ..






    



 5,425 out of  6,312 population items found in association



In [54]:

    
# cl = gene_annotation.sort(col, ascending=False)[gene_annotation[col] > 5e-4].index

go_enricher = goatools.GOEnrichmentStudy(genes, gene_name_to_go, obo_dag)

results = go.run_study(genes[:5])

go_enrichment = pd.DataFrame([r.__dict__ for r in results])
go_enrichment









    



Propagating term counts to parents ..






    



     9 out of     10 population items found in association
Calculating uncorrected p-values using fisher
     4 out of      5 study items found in association
Running multitest correction: local bonferroni
Running multitest correction: local sidak
Running multitest correction: local holm
1,004 GO terms are associated with 4 of 5 study items in a population of 10






    Out[54]:






  
    
      
      GO
      NS
      _methods
      enrichment
      goterm
      name
      p_bonferroni
      p_holm
      p_sidak
      p_uncorrected
      pop_count
      pop_items
      pop_n
      ratio_in_pop
      ratio_in_study
      study_count
      study_items
      study_n
    
  
  
    
      0
      GO:0044710
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0044710\tlevel-02\tdepth-02\tsingle-organis...
      single-organism metabolic process
      1.0
      1.0
      1.0
      0.047619
      4
      {PARK2, AGPAT4, NPL, QK}
      10
      (4, 10)
      (4, 5)
      4
      {PARK2, AGPAT4, NPL, QK}
      5
    
    
      1
      GO:0045597
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0045597\tlevel-05\tdepth-05\tpositive regul...
      positive regulation of cell differentiation
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      2
      GO:0044707
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0044707\tlevel-02\tdepth-02\tsingle-multice...
      single-multicellular organism process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      3
      GO:0009890
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0009890\tlevel-05\tdepth-05\tnegative regul...
      negative regulation of biosynthetic process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      4
      GO:0031344
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0031344\tlevel-05\tdepth-05\tregulation of ...
      regulation of cell projection organization
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      5
      GO:0044255
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0044255\tlevel-03\tdepth-04\tcellular lipid...
      cellular lipid metabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {AGPAT4, QK}
      10
      (2, 10)
      (2, 5)
      2
      {AGPAT4, QK}
      5
    
    
      6
      GO:0009892
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0009892\tlevel-04\tdepth-04\tnegative regul...
      negative regulation of metabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      7
      GO:0006082
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0006082\tlevel-03\tdepth-04\torganic acid m...
      organic acid metabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {NPL, QK}
      10
      (2, 10)
      (2, 5)
      2
      {NPL, QK}
      5
    
    
      8
      GO:0010976
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0010976\tlevel-07\tdepth-09\tpositive regul...
      positive regulation of neuron projection devel...
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      9
      GO:0044248
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0044248\tlevel-03\tdepth-03\tcellular catab...
      cellular catabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, NPL}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, NPL}
      5
    
    
      10
      GO:0051172
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0051172\tlevel-05\tdepth-05\tnegative regul...
      negative regulation of nitrogen compound metab...
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      11
      GO:0050767
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0050767\tlevel-06\tdepth-06\tregulation of ...
      regulation of neurogenesis
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      12
      GO:0051094
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0051094\tlevel-04\tdepth-04\tpositive regul...
      positive regulation of developmental process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      13
      GO:2000113
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:2000113\tlevel-07\tdepth-07\tnegative regul...
      negative regulation of cellular macromolecule ...
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      14
      GO:0051960
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0051960\tlevel-05\tdepth-05\tregulation of ...
      regulation of nervous system development
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      15
      GO:0044281
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0044281\tlevel-03\tdepth-03\tsmall molecule...
      small molecule metabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {NPL, QK}
      10
      (2, 10)
      (2, 5)
      2
      {NPL, QK}
      5
    
    
      16
      GO:0051248
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0051248\tlevel-06\tdepth-06\tnegative regul...
      negative regulation of protein metabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      17
      GO:0010629
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0010629\tlevel-06\tdepth-06\tnegative regul...
      negative regulation of gene expression
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      18
      GO:0043436
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0043436\tlevel-04\tdepth-05\toxoacid metabo...
      oxoacid metabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {NPL, QK}
      10
      (2, 10)
      (2, 5)
      2
      {NPL, QK}
      5
    
    
      19
      GO:0032269
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0032269\tlevel-06\tdepth-07\tnegative regul...
      negative regulation of cellular protein metabo...
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      20
      GO:0032501
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0032501\tlevel-01\tdepth-01\tmulticellular ...
      multicellular organismal process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      21
      GO:1901575
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:1901575\tlevel-03\tdepth-03\torganic substa...
      organic substance catabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, NPL}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, NPL}
      5
    
    
      22
      GO:0010605
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0010605\tlevel-05\tdepth-05\tnegative regul...
      negative regulation of macromolecule metabolic...
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      23
      GO:0044712
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0044712\tlevel-03\tdepth-03\tsingle-organis...
      single-organism catabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, NPL}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, NPL}
      5
    
    
      24
      GO:0050769
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0050769\tlevel-06\tdepth-07\tpositive regul...
      positive regulation of neurogenesis
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      25
      GO:0050793
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0050793\tlevel-03\tdepth-03\tregulation of ...
      regulation of developmental process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      26
      GO:0051246
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0051246\tlevel-05\tdepth-05\tregulation of ...
      regulation of protein metabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      27
      GO:0045664
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0045664\tlevel-07\tdepth-07\tregulation of ...
      regulation of neuron differentiation
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      28
      GO:0006629
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0006629\tlevel-03\tdepth-03\tlipid metaboli...
      lipid metabolic process
      1.0
      1.0
      1.0
      0.444444
      2
      {AGPAT4, QK}
      10
      (2, 10)
      (2, 5)
      2
      {AGPAT4, QK}
      5
    
    
      29
      GO:0043066
      BP
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0043066\tlevel-07\tdepth-07\tnegative regul...
      negative regulation of apoptotic process
      1.0
      1.0
      1.0
      0.444444
      2
      {PARK2, QK}
      10
      (2, 10)
      (2, 5)
      2
      {PARK2, QK}
      5
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      974
      GO:0004871
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0004871\tlevel-01\tdepth-01\tsignal transdu...
      signal transducer activity
      1.0
      1.0
      1.0
      1.000000
      1
      {IGF2R}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      975
      GO:0003735
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0003735\tlevel-02\tdepth-02\tstructural con...
      structural constituent of ribosome
      1.0
      1.0
      1.0
      1.000000
      1
      {MRPL18}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      976
      GO:0019901
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0019901\tlevel-05\tdepth-05\tprotein kinase...
      protein kinase binding
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      977
      GO:0016879
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0016879\tlevel-03\tdepth-03\tligase activit...
      ligase activity, forming carbon-nitrogen bonds
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      978
      GO:0019900
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0019900\tlevel-04\tdepth-04\tkinase binding...
      kinase binding
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      979
      GO:0004521
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0004521\tlevel-06\tdepth-06\tendoribonuclea...
      endoribonuclease activity
      1.0
      1.0
      1.0
      1.000000
      1
      {RNASEL}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      980
      GO:0016881
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0016881\tlevel-04\tdepth-04\tacid-amino aci...
      acid-amino acid ligase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      981
      GO:0005520
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0005520\tlevel-04\tdepth-04\tinsulin-like g...
      insulin-like growth factor binding
      1.0
      1.0
      1.0
      1.000000
      1
      {IGF2R}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      982
      GO:0042826
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0042826\tlevel-04\tdepth-04\thistone deacet...
      histone deacetylase binding
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      983
      GO:0032296
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0032296\tlevel-06\tdepth-06\tdouble-strande...
      double-stranded RNA-specific ribonuclease acti...
      1.0
      1.0
      1.0
      1.000000
      1
      {RNASEL}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      984
      GO:0001664
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0001664\tlevel-04\tdepth-04\tG-protein coup...
      G-protein coupled receptor binding
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      985
      GO:0099600
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0099600\tlevel-03\tdepth-03\ttransmembrane ...
      transmembrane receptor activity
      1.0
      1.0
      1.0
      1.000000
      1
      {IGF2R}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      986
      GO:0004672
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0004672\tlevel-05\tdepth-05\tprotein kinase...
      protein kinase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {RNASEL}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      987
      GO:0018169
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0018169\tlevel-06\tdepth-06\tribosomal S6-g...
      ribosomal S6-glutamic acid ligase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      988
      GO:0016893
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0016893\tlevel-06\tdepth-06\tendonuclease a...
      endonuclease activity, active with either ribo...
      1.0
      1.0
      1.0
      1.000000
      1
      {RNASEL}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      989
      GO:0031406
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0031406\tlevel-04\tdepth-04\tcarboxylic aci...
      carboxylic acid binding
      1.0
      1.0
      1.0
      1.000000
      1
      {IGF2R}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      990
      GO:0004842
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0004842\tlevel-04\tdepth-04\tubiquitin-prot...
      ubiquitin-protein transferase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      991
      GO:0003729
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0003729\tlevel-06\tdepth-06\tmRNA binding [...
      mRNA binding
      1.0
      1.0
      1.0
      1.000000
      1
      {QK}
      10
      (1, 10)
      (1, 5)
      1
      {QK}
      5
    
    
      992
      GO:0019838
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0019838\tlevel-03\tdepth-03\tgrowth factor ...
      growth factor binding
      1.0
      1.0
      1.0
      1.000000
      1
      {IGF2R}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      993
      GO:0016408
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0016408\tlevel-05\tdepth-05\tC-acyltransfer...
      C-acyltransferase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {ACAT2}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      994
      GO:0003690
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0003690\tlevel-05\tdepth-05\tdouble-strande...
      double-stranded DNA binding
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      995
      GO:0008747
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0008747\tlevel-05\tdepth-05\tN-acetylneuram...
      N-acetylneuraminate lyase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {NPL}
      10
      (1, 10)
      (1, 5)
      1
      {NPL}
      5
    
    
      996
      GO:0016874
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0016874\tlevel-02\tdepth-02\tligase activit...
      ligase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      997
      GO:0033293
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0033293\tlevel-05\tdepth-05\tmonocarboxylic...
      monocarboxylic acid binding
      1.0
      1.0
      1.0
      1.000000
      1
      {IGF2R}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      998
      GO:0070739
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0070739\tlevel-05\tdepth-05\tprotein-glutam...
      protein-glutamic acid ligase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      999
      GO:0016411
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0016411\tlevel-06\tdepth-06\tacylglycerol O...
      acylglycerol O-acyltransferase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {AGPAT4}
      10
      (1, 10)
      (1, 5)
      1
      {AGPAT4}
      5
    
    
      1000
      GO:0003677
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0003677\tlevel-04\tdepth-04\tDNA binding [m...
      DNA binding
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      1001
      GO:0097602
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      e
      GO:0097602\tlevel-03\tdepth-03\tcullin family ...
      cullin family protein binding
      1.0
      1.0
      1.0
      1.000000
      1
      {PARK2}
      10
      (1, 10)
      (1, 5)
      1
      {PARK2}
      5
    
    
      1002
      GO:0016453
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0016453\tlevel-06\tdepth-06\tC-acetyltransf...
      C-acetyltransferase activity
      1.0
      1.0
      1.0
      1.000000
      1
      {ACAT2}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
    
      1003
      GO:0043177
      MF
      [(local, bonferroni, bonferroni), (local, sida...
      p
      GO:0043177\tlevel-03\tdepth-03\torganic acid b...
      organic acid binding
      1.0
      1.0
      1.0
      1.000000
      1
      {IGF2R}
      10
      (1, 10)
      (0, 5)
      0
      {}
      5
    
  

1004 rows × 18 columns



In [6]:

    
from sklearn.svm import SVC



In [ ]:



In [10]:

    
classifier = SVC(kernel='linear')



In [11]:

    
def plot_svc_decision_function(clf, ax=None):
    """Plot the decision function for a 2D SVC"""
    if ax is None:
        ax = plt.gca()
    x = np.linspace(plt.xlim()[0], plt.xlim()[1], 30)
    y = np.linspace(plt.ylim()[0], plt.ylim()[1], 30)
    Y, X = np.meshgrid(y, x)
    P = np.zeros_like(X)
    for i, xi in enumerate(x):
        for j, yj in enumerate(y):
            P[i, j] = clf.decision_function([[xi, yj]])
    # plot the margins
    ax.contour(X, Y, P, colors='k',
               levels=[-1, 0, 1], alpha=0.5,
               linestyles=['--', '-', '--'])



In [12]:

    
shalek2013_singles = shalek2013_expression.loc[[x for x in shalek2013_expression.index if x.startswith('S')]]
shalek2013_singles.shape









    Out[12]:





(18, 6312)



In [13]:

    
shalek2013_singles = shalek2013_singles.loc[:, (shalek2013_singles > 1).sum() >= 3]
shalek2013_singles.shape









    Out[13]:





(18, 6013)



In [14]:

    
mature = 'S12', 'S13', 'S16'
target = [1 if x in mature else 0 for x in shalek2013_singles.index]
target









    Out[14]:





[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0]



In [15]:

    
classifier.fit(shalek2013_singles, target)









    Out[15]:





SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)



In [16]:

    
from sklearn.decomposition import PCA, FastICA
from sklearn.manifold import TSNE, MDS


smusher = FastICA(n_components=4)
reduced_data = smusher.fit_transform(shalek2013_singles+1)









    



/Users/olga/anaconda3/envs/single-cell-bioinformatics/lib/python3.5/site-packages/sklearn/decomposition/fastica_.py:116: UserWarning: FastICA did not converge. Consider increasing tolerance or the maximum number of iterations.
  warnings.warn('FastICA did not converge. Consider increasing '



In [17]:

    
reduced_data_df = pd.DataFrame(reduced_data)



In [18]:

    
reduced_data_df.min()









    Out[18]:





0   -0.556400
1   -0.643007
2   -0.299692
3   -0.382223
dtype: float64



In [19]:

    
reduced_data_df.max()









    Out[19]:





0    0.212303
1    0.387360
2    0.864751
3    0.332152
dtype: float64



In [20]:

    
reduced_data_df.quantile(1)









    Out[20]:





0    0.212303
1    0.387360
2    0.864751
3    0.332152
dtype: float64



In [21]:

    
low_d_space = pd.DataFrame(reduced_data).apply(lambda x: pd.Series(np.linspace(x.min(), x.max(), 50)))



In [22]:

    
x_min, x_max = reduced_data[:, 0].min() - 1, reduced_data[:, 0].max() + 1
y_min, y_max = reduced_data[:, 1].min() - 1, reduced_data[:, 1].max() + 1
X = np.linspace(x_min, x_max, 50)
Y = np.linspace(y_min, y_max, 50)
xx, yy = np.meshgrid(X, Y)

# Get the decision boundary
# Z = classifier.decision_function(np.c_[xx.ravel(), yy.ravel()])

# two_d_space = np.c_[xx.ravel(), yy.ravel()]
# two_d_space
low_d_space = np.concatenate([])









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-4abe8f52bb01> in <module>()
     10 # two_d_space = np.c_[xx.ravel(), yy.ravel()]
     11 # two_d_space
---> 12 low_d_space = np.concatenate([])

ValueError: need at least one array to concatenate



In [ ]:

    
xx.shape



In [ ]:

    
np.concatenate([xx.ravel(), yy.ravel()]).reshape(50**2, 2)



In [ ]:

    
two_d_space.shape



In [ ]:

    
plt.scatter(two_d_space[:, 0], two_d_space[:, 1], color='black')



In [ ]:

    
import goatools



In [ ]:

    
shalek2013_expression.index[:10]



In [31]:

    
# Get http://geneontology.org/ontology/go-basic.obo
from goatools.base import download_go_basic_obo
obo_fname = download_go_basic_obo()
obo_fname









    



  EXISTS: go-basic.obo






    Out[31]:





'go-basic.obo'



In [ ]:

    
# Get ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
from goatools.base import download_ncbi_associations
gene2go = download_ncbi_associations()



In [32]:

    
obo_dag = goatools.obo_parser.GODag(obo_file=obo_fname)

import goatools

# cl = gene_annotation.sort(col, ascending=False)[gene_annotation[col] > 5e-4].index
g = goatools.GOEnrichmentStudy(list(gene_annotation.index), assoc_dict, obo_dag, study=list(cl))









    



load obo file go-basic.obo






    



go-basic.obo: format-version(1.2) data-version(releases/2016-06-15)






    



46823 nodes imported






    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-32-183177986a6c> in <module>()
      4 
      5 # cl = gene_annotation.sort(col, ascending=False)[gene_annotation[col] > 5e-4].index
----> 6 g = goatools.GOEnrichmentStudy(list(gene_annotation.index), assoc_dict, obo_dag, study=list(cl))

NameError: name 'gene_annotation' is not defined



In [ ]:



In [3]:

    
# assoc = pd.read_table('danio-rerio-gene-ontology.txt').dropna()
# assoc_df = assoc.groupby('Ensembl Gene ID').agg(lambda s: ';'.join(s))
# assoc_s = assoc_df['GO Term Accession'].apply(lambda s: set(s.split(';')))
# assoc_dict = assoc_s.to_dict()









    



load obo file ../data/macaulay2016/go-basic.obo






    



---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-3-ec6a9e36b6e5> in <module>()
----> 1 obo_dag = goatools.obo_parser.GODag(obo_file='../data/macaulay2016/go-basic.obo')
      2 
      3 # assoc = pd.read_table('danio-rerio-gene-ontology.txt').dropna()
      4 # assoc_df = assoc.groupby('Ensembl Gene ID').agg(lambda s: ';'.join(s))
      5 # assoc_s = assoc_df['GO Term Accession'].apply(lambda s: set(s.split(';')))

/Users/olga/anaconda3/envs/single-cell-bioinformatics/lib/python3.5/site-packages/goatools/obo_parser.py in __init__(self, obo_file, optional_attrs)
    386 
    387     def __init__(self, obo_file="go-basic.obo", optional_attrs=None):
--> 388         self.load_obo_file(obo_file, optional_attrs)
    389 
    390     def load_obo_file(self, obo_file, optional_attrs):

/Users/olga/anaconda3/envs/single-cell-bioinformatics/lib/python3.5/site-packages/goatools/obo_parser.py in load_obo_file(self, obo_file, optional_attrs)
    391 
    392         print("load obo file %s" % obo_file, file=sys.stderr)
--> 393         reader = OBOReader(obo_file, optional_attrs)
    394         for rec in reader:
    395             self[rec.id] = rec

/Users/olga/anaconda3/envs/single-cell-bioinformatics/lib/python3.5/site-packages/goatools/obo_parser.py in __init__(self, obo_file, optional_attrs)
     40             # GOTerm attributes that are necessary for any operations:
     41         else:
---> 42             raise Exception("download obo file first\n "
     43                             "[http://geneontology.org/ontology/"
     44                             "go-basic.obo]")

Exception: download obo file first
 [http://geneontology.org/ontology/go-basic.obo]



In [19]:

    
import goatools

# cl = gene_annotation.sort(col, ascending=False)[gene_annotation[col] > 5e-4].index
g = goatools.GOEnrichmentStudy(list(gene_annotation.index), assoc_dict, obo_dag, study=list(cl))



In [6]:









    



/Users/olga/anaconda3/envs/single-cell-bioinformatics/lib/python3.5/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))



In [ ]:

    
for r in g.results[:25]:
    print r.goterm.id, '{:.2}'.format(r.p_bonferroni), r.ratio_in_study, r.goterm.name, r.goterm.namespace



In [21]:

    
unsmushed = smusher.inverse_transform(two_d_space)









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-6cebabe76618> in <module>()
----> 1 unsmushed = smusher.inverse_transform(two_d_space)

/Users/olga/anaconda3/envs/single-cell-bioinformatics/lib/python3.5/site-packages/sklearn/decomposition/fastica_.py in inverse_transform(self, X, copy)
    566 
    567         X = check_array(X, copy=(copy and self.whiten), dtype=FLOAT_DTYPES)
--> 568         X = fast_dot(X, self.mixing_.T)
    569         if self.whiten:
    570             X += self.mean_

ValueError: shapes (2500,2) and (4,6013) not aligned: 2 (dim 1) != 4 (dim 0)



In [57]:

    
Z = classifier.decision_function(unsmushed)
Z = Z.reshape(xx.shape)









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-57-2bcd1d6f9dd1> in <module>()
----> 1 Z = classifier.decision_function(unsmushed)
      2 Z = Z.reshape(xx.shape)

NameError: name 'unsmushed' is not defined



In [ ]:

    
fig, ax = plt.subplots()
ax.scatter(reduced_data[:, 0], reduced_data[:, 1], c=target, cmap='Dark2')
ax.contour(X, Y, Z, colors='k',
           levels=[-1, 0, 1], alpha=0.5,
           linestyles=['--', '-', '--'])

Exercise 1

Try the same analysis, but use ICA instead of PCA.
1. How are the results different?
2. Are the cells closer or farther from the decision boundary?
3. Is that a "better" or "worse" classification? Why?
4. Why does the reduction algorithm affect the visualization of the classification?
Try the same analysis, but use MDS or t-SNE instead of PCA.
1. How are the results different?
2. Are the cells closer or farther from the decision boundary?
3. Is that a "better" or "worse" classification? Why?
Try the same analysis, but use the "LPS Response" genes and a dimensionality reduction algorithm of your choice.
1. How are the results different?
2. Are the cells closer or farther from the decision boundary?
3. Is that a "better" or "worse" classification? Why?

Decision trees



In [ ]:

    
def visualize_tree(estimator, X, y, smusher, boundaries=True,
                   xlim=None, ylim=None):
    estimator.fit(X, y)
    smushed = smusher.fit_transform(X)

    if xlim is None:
        xlim = (smushed[:, 0].min() - 0.1, smushed[:, 0].max() + 0.1)
    if ylim is None:
        ylim = (smushed[:, 1].min() - 0.1, smushed[:, 1].max() + 0.1)

    x_min, x_max = xlim
    y_min, y_max = ylim
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))
    two_d_space = np.c_[xx.ravel(), yy.ravel()]
    unsmushed = smusher.inverse_transform(two_d_space)
    Z = estimator.predict(unsmushed)

    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    plt.figure()
    plt.pcolormesh(xx, yy, Z, alpha=0.2, cmap='Paired')
    plt.clim(y.min(), y.max())

    # Plot also the training points
    plt.scatter(smushed[:, 0], smushed[:, 1], c=y, s=50, cmap='Paired')
    plt.axis('off')

    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)        
    plt.clim(y.min(), y.max())
    
    # Plot the decision boundaries
    def plot_boundaries(i, xlim, ylim):
        if i < 0:
            return

        tree = estimator.tree_
        
        if tree.feature[i] == 0:
            plt.plot([tree.threshold[i], tree.threshold[i]], ylim, '-k')
            plot_boundaries(tree.children_left[i],
                            [xlim[0], tree.threshold[i]], ylim)
            plot_boundaries(tree.children_right[i],
                            [tree.threshold[i], xlim[1]], ylim)
        
        elif tree.feature[i] == 1:
            plt.plot(xlim, [tree.threshold[i], tree.threshold[i]], '-k')
            plot_boundaries(tree.children_left[i], xlim,
                            [ylim[0], tree.threshold[i]])
            plot_boundaries(tree.children_right[i], xlim,
                            [tree.threshold[i], ylim[1]])
            
    if boundaries:
        plot_boundaries(0, plt.xlim(), plt.ylim())



In [ ]:

    
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()


from sklearn.decomposition import PCA, FastICA
from sklearn.manifold import TSNE, MDS


smusher = PCA(n_components=2)
# reduced_data = smusher.fit_transform(shalek2013_singles+1)

visualize_tree(classifier, shalek2013_singles, np.array(target), smusher)



In [ ]:

    
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier
classifier = RandomForestClassifier()

smusher = PCA(n_components=2)
# reduced_data = smusher.fit_transform(shalek2013_singles+1)

visualize_tree(classifier, shalek2013_singles, np.array(target), smusher, boundaries=False)



In [ ]:

    
classifier = ExtraTreesClassifier()

smusher = PCA(n_components=2)
# reduced_data = smusher.fit_transform(shalek2013_singles+1)

visualize_tree(classifier, shalek2013_singles, np.array(target), smusher, boundaries=False)

Macaulay2016



In [ ]:

    
pd.options.display.max_columns = 50



In [ ]:

    
macaulay2016_metadata = pd.read_csv('../4._Case_Study/macaulay2016/sample_info_qc.csv', index_col=0)
macaulay2016_metadata.head()



In [ ]:

    
macaulay2016_cluster_names = tuple(sorted(macaulay2016_metadata['cluster'].unique()))
macaulay2016_cluster_names



In [ ]:

    
macaulay2016_target = macaulay2016_metadata['cluster'].map(lambda x: macaulay2016_cluster_names.index(x))
macaulay2016_target



In [ ]:

    
macaulay2016_expression = pd.read_csv('../4._Case_Study/macaulay2016/gene_expression_s.csv', index_col=0).T



In [ ]:

    
macaulay2016_expression.head()



In [ ]:

    
macaulay2016_expression_filtered = macaulay2016_expression[[x for x in macaulay2016_expression if x.startswith("ENS")]]
macaulay2016_expression_filtered.shape



In [ ]:

    
macaulay2016_expression_filtered = macaulay2016_expression_filtered.loc[macaulay2016_metadata.index]



In [ ]:

    
macaulay2016_expression_filtered = 1e6*macaulay2016_expression_filtered.divide(macaulay2016_expression_filtered.sum(axis=1), axis=0)
macaulay2016_expression_filtered.head()



In [ ]:

    
macaulay2016_expression_filtered = np.log10(macaulay2016_expression_filtered+1)
macaulay2016_expression_filtered.head()



In [ ]:

    
macaulay2016_expression_filtered = macaulay2016_expression_filtered.loc[:, (macaulay2016_expression_filtered > 1).sum() >=3]
macaulay2016_expression_filtered.shape



In [ ]:

    
# classifier = SVC(kernel='linear')
# classifier = DecisionTreeClassifier(max_depth=10)
classifier = ExtraTreesClassifier(n_estimators=1000)
classifier.fit(macaulay2016_expression_filtered, macaulay2016_target)



In [ ]:

    
smusher = FastICA(n_components=2, random_state=0)
smushed_data = smusher.fit_transform(macaulay2016_expression_filtered)

x_min, x_max = smushed_data[:, 0].min(), smushed_data[:, 0].max()
y_min, y_max = smushed_data[:, 1].min(), smushed_data[:, 1].max()
delta_x = 0.05 * abs(x_max - x_min)
delta_y = 0.05 * abs(x_max - x_min)

x_min -= delta_x
x_max += delta_x
y_min -= delta_y
y_max += delta_y

X = np.linspace(x_min, x_max, 100)
Y = np.linspace(y_min, y_max, 100)
xx, yy = np.meshgrid(X, Y)

two_d_space = np.c_[xx.ravel(), yy.ravel()]
two_d_space



In [ ]:

    
high_dimensional_space = smusher.inverse_transform(two_d_space)



In [ ]:

    
# Get the class boundaries
Z = classifier.predict(high_dimensional_space)



In [ ]:

    
import matplotlib as mpl
macaulay2016_metadata['cluster_color_hex'] = macaulay2016_metadata['cluster_color'].map(lambda x: mpl.colors.rgb2hex(eval(x)))



In [ ]:

    
int_to_cluster_name = dict(zip(range(len(macaulay2016_cluster_names)), macaulay2016_cluster_names))
int_to_cluster_name



In [ ]:

    
cluster_name_to_color = dict(zip(macaulay2016_metadata['cluster'], macaulay2016_metadata['cluster_color_hex']))
cluster_name_to_color



In [ ]:

    
macaulay2016_palette = [mpl.colors.hex2color(cluster_name_to_color[int_to_cluster_name[i]]) 
                        for i in range(len(macaulay2016_cluster_names))]
macaulay2016_palette



In [ ]:

    
cmap = mpl.colors.ListedColormap(macaulay2016_palette)
cmap



In [ ]:

    
x_min, x_max



In [ ]:

    
y = macaulay2016_target
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, alpha=0.2, cmap=cmap)
plt.clim(y.min(), y.max())

# Plot also the training points
plt.scatter(smushed_data[:, 0], smushed_data[:, 1], s=50, color=macaulay2016_metadata['cluster_color_hex'], 
            edgecolor='k') #c=macaulay2016_target, s=50, cmap='Set2')
plt.axis('off')

plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)        
plt.clim(y.min(), y.max())



In [ ]:

    
smusher = FastICA(n_components=4, random_state=354)
smushed_data = pd.DataFrame(smusher.fit_transform(macaulay2016_expression_filtered))

# x_min, x_max = smushed_data[:, 0].min(), smushed_data[:, 0].max()
# y_min, y_max = smushed_data[:, 1].min(), smushed_data[:, 1].max()
# delta_x = 0.05 * abs(x_max - x_min)
# delta_y = 0.05 * abs(x_max - x_min)

# x_min -= delta_x
# x_max += delta_x
# y_min -= delta_y
# y_max += delta_y

# X = np.linspace(x_min, x_max, 100)
# Y = np.linspace(y_min, y_max, 100)
# xx, yy = np.meshgrid(X, Y)

# low_dimensional_space = np.c_[xx.ravel(), yy.ravel()]
# low_dimensional_space



In [ ]:

    
smushed_data.max() - smushed_data.min()



In [ ]:

    
grid = smushed_data.apply(lambda x: pd.Series(np.linspace(x.min(), x.max(), 50)))
grid.head()
# grid = [x.ravel() for x in grid]
# grid
# low_dimensional_space = np.concatenate(grid, axis=0)
# low_dimensional_space.shape
# # low_dimensional_space = low_dimensional_space.reshape(shape)



In [ ]:

    
x1, x2, x3, x4 = np.meshgrid(*[grid[col] for col in grid])
low_dimensional_space = np.c_[x1.ravel(), x2.ravel(), x3.ravel(), x4.ravel()]



In [ ]:

    
high_dimensional_space = smusher.inverse_transform(low_dimensional_space)



In [ ]:

    
smushed_data['hue'] = macau



In [ ]:

    
sns.pairplot(smushed_data)

	GO	NS	_methods	enrichment	goterm	name	p_bonferroni	p_holm	p_sidak	p_uncorrected	pop_count	pop_items	pop_n	ratio_in_pop	ratio_in_study	study_count	study_items	study_n
0	GO:0044710	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0044710\tlevel-02\tdepth-02\tsingle-organis...	single-organism metabolic process	1.0	1.0	1.0	0.047619	4	{PARK2, AGPAT4, NPL, QK}	10	(4, 10)	(4, 5)	4	{PARK2, AGPAT4, NPL, QK}	5
1	GO:0045597	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0045597\tlevel-05\tdepth-05\tpositive regul...	positive regulation of cell differentiation	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
2	GO:0044707	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0044707\tlevel-02\tdepth-02\tsingle-multice...	single-multicellular organism process	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
3	GO:0009890	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0009890\tlevel-05\tdepth-05\tnegative regul...	negative regulation of biosynthetic process	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
4	GO:0031344	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0031344\tlevel-05\tdepth-05\tregulation of ...	regulation of cell projection organization	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
5	GO:0044255	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0044255\tlevel-03\tdepth-04\tcellular lipid...	cellular lipid metabolic process	1.0	1.0	1.0	0.444444	2	{AGPAT4, QK}	10	(2, 10)	(2, 5)	2	{AGPAT4, QK}	5
6	GO:0009892	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0009892\tlevel-04\tdepth-04\tnegative regul...	negative regulation of metabolic process	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
7	GO:0006082	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0006082\tlevel-03\tdepth-04\torganic acid m...	organic acid metabolic process	1.0	1.0	1.0	0.444444	2	{NPL, QK}	10	(2, 10)	(2, 5)	2	{NPL, QK}	5
8	GO:0010976	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0010976\tlevel-07\tdepth-09\tpositive regul...	positive regulation of neuron projection devel...	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
9	GO:0044248	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0044248\tlevel-03\tdepth-03\tcellular catab...	cellular catabolic process	1.0	1.0	1.0	0.444444	2	{PARK2, NPL}	10	(2, 10)	(2, 5)	2	{PARK2, NPL}	5
10	GO:0051172	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0051172\tlevel-05\tdepth-05\tnegative regul...	negative regulation of nitrogen compound metab...	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
11	GO:0050767	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0050767\tlevel-06\tdepth-06\tregulation of ...	regulation of neurogenesis	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
12	GO:0051094	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0051094\tlevel-04\tdepth-04\tpositive regul...	positive regulation of developmental process	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
13	GO:2000113	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:2000113\tlevel-07\tdepth-07\tnegative regul...	negative regulation of cellular macromolecule ...	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
14	GO:0051960	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0051960\tlevel-05\tdepth-05\tregulation of ...	regulation of nervous system development	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
15	GO:0044281	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0044281\tlevel-03\tdepth-03\tsmall molecule...	small molecule metabolic process	1.0	1.0	1.0	0.444444	2	{NPL, QK}	10	(2, 10)	(2, 5)	2	{NPL, QK}	5
16	GO:0051248	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0051248\tlevel-06\tdepth-06\tnegative regul...	negative regulation of protein metabolic process	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
17	GO:0010629	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0010629\tlevel-06\tdepth-06\tnegative regul...	negative regulation of gene expression	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
18	GO:0043436	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0043436\tlevel-04\tdepth-05\toxoacid metabo...	oxoacid metabolic process	1.0	1.0	1.0	0.444444	2	{NPL, QK}	10	(2, 10)	(2, 5)	2	{NPL, QK}	5
19	GO:0032269	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0032269\tlevel-06\tdepth-07\tnegative regul...	negative regulation of cellular protein metabo...	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
20	GO:0032501	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0032501\tlevel-01\tdepth-01\tmulticellular ...	multicellular organismal process	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
21	GO:1901575	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:1901575\tlevel-03\tdepth-03\torganic substa...	organic substance catabolic process	1.0	1.0	1.0	0.444444	2	{PARK2, NPL}	10	(2, 10)	(2, 5)	2	{PARK2, NPL}	5
22	GO:0010605	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0010605\tlevel-05\tdepth-05\tnegative regul...	negative regulation of macromolecule metabolic...	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
23	GO:0044712	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0044712\tlevel-03\tdepth-03\tsingle-organis...	single-organism catabolic process	1.0	1.0	1.0	0.444444	2	{PARK2, NPL}	10	(2, 10)	(2, 5)	2	{PARK2, NPL}	5
24	GO:0050769	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0050769\tlevel-06\tdepth-07\tpositive regul...	positive regulation of neurogenesis	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
25	GO:0050793	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0050793\tlevel-03\tdepth-03\tregulation of ...	regulation of developmental process	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
26	GO:0051246	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0051246\tlevel-05\tdepth-05\tregulation of ...	regulation of protein metabolic process	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
27	GO:0045664	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0045664\tlevel-07\tdepth-07\tregulation of ...	regulation of neuron differentiation	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
28	GO:0006629	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0006629\tlevel-03\tdepth-03\tlipid metaboli...	lipid metabolic process	1.0	1.0	1.0	0.444444	2	{AGPAT4, QK}	10	(2, 10)	(2, 5)	2	{AGPAT4, QK}	5
29	GO:0043066	BP	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0043066\tlevel-07\tdepth-07\tnegative regul...	negative regulation of apoptotic process	1.0	1.0	1.0	0.444444	2	{PARK2, QK}	10	(2, 10)	(2, 5)	2	{PARK2, QK}	5
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
974	GO:0004871	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0004871\tlevel-01\tdepth-01\tsignal transdu...	signal transducer activity	1.0	1.0	1.0	1.000000	1	{IGF2R}	10	(1, 10)	(0, 5)	0	{}	5
975	GO:0003735	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0003735\tlevel-02\tdepth-02\tstructural con...	structural constituent of ribosome	1.0	1.0	1.0	1.000000	1	{MRPL18}	10	(1, 10)	(0, 5)	0	{}	5
976	GO:0019901	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0019901\tlevel-05\tdepth-05\tprotein kinase...	protein kinase binding	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
977	GO:0016879	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0016879\tlevel-03\tdepth-03\tligase activit...	ligase activity, forming carbon-nitrogen bonds	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
978	GO:0019900	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0019900\tlevel-04\tdepth-04\tkinase binding...	kinase binding	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
979	GO:0004521	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0004521\tlevel-06\tdepth-06\tendoribonuclea...	endoribonuclease activity	1.0	1.0	1.0	1.000000	1	{RNASEL}	10	(1, 10)	(0, 5)	0	{}	5
980	GO:0016881	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0016881\tlevel-04\tdepth-04\tacid-amino aci...	acid-amino acid ligase activity	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
981	GO:0005520	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0005520\tlevel-04\tdepth-04\tinsulin-like g...	insulin-like growth factor binding	1.0	1.0	1.0	1.000000	1	{IGF2R}	10	(1, 10)	(0, 5)	0	{}	5
982	GO:0042826	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0042826\tlevel-04\tdepth-04\thistone deacet...	histone deacetylase binding	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
983	GO:0032296	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0032296\tlevel-06\tdepth-06\tdouble-strande...	double-stranded RNA-specific ribonuclease acti...	1.0	1.0	1.0	1.000000	1	{RNASEL}	10	(1, 10)	(0, 5)	0	{}	5
984	GO:0001664	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0001664\tlevel-04\tdepth-04\tG-protein coup...	G-protein coupled receptor binding	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
985	GO:0099600	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0099600\tlevel-03\tdepth-03\ttransmembrane ...	transmembrane receptor activity	1.0	1.0	1.0	1.000000	1	{IGF2R}	10	(1, 10)	(0, 5)	0	{}	5
986	GO:0004672	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0004672\tlevel-05\tdepth-05\tprotein kinase...	protein kinase activity	1.0	1.0	1.0	1.000000	1	{RNASEL}	10	(1, 10)	(0, 5)	0	{}	5
987	GO:0018169	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0018169\tlevel-06\tdepth-06\tribosomal S6-g...	ribosomal S6-glutamic acid ligase activity	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
988	GO:0016893	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0016893\tlevel-06\tdepth-06\tendonuclease a...	endonuclease activity, active with either ribo...	1.0	1.0	1.0	1.000000	1	{RNASEL}	10	(1, 10)	(0, 5)	0	{}	5
989	GO:0031406	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0031406\tlevel-04\tdepth-04\tcarboxylic aci...	carboxylic acid binding	1.0	1.0	1.0	1.000000	1	{IGF2R}	10	(1, 10)	(0, 5)	0	{}	5
990	GO:0004842	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0004842\tlevel-04\tdepth-04\tubiquitin-prot...	ubiquitin-protein transferase activity	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
991	GO:0003729	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0003729\tlevel-06\tdepth-06\tmRNA binding [...	mRNA binding	1.0	1.0	1.0	1.000000	1	{QK}	10	(1, 10)	(1, 5)	1	{QK}	5
992	GO:0019838	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0019838\tlevel-03\tdepth-03\tgrowth factor ...	growth factor binding	1.0	1.0	1.0	1.000000	1	{IGF2R}	10	(1, 10)	(0, 5)	0	{}	5
993	GO:0016408	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0016408\tlevel-05\tdepth-05\tC-acyltransfer...	C-acyltransferase activity	1.0	1.0	1.0	1.000000	1	{ACAT2}	10	(1, 10)	(0, 5)	0	{}	5
994	GO:0003690	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0003690\tlevel-05\tdepth-05\tdouble-strande...	double-stranded DNA binding	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
995	GO:0008747	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0008747\tlevel-05\tdepth-05\tN-acetylneuram...	N-acetylneuraminate lyase activity	1.0	1.0	1.0	1.000000	1	{NPL}	10	(1, 10)	(1, 5)	1	{NPL}	5
996	GO:0016874	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0016874\tlevel-02\tdepth-02\tligase activit...	ligase activity	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
997	GO:0033293	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0033293\tlevel-05\tdepth-05\tmonocarboxylic...	monocarboxylic acid binding	1.0	1.0	1.0	1.000000	1	{IGF2R}	10	(1, 10)	(0, 5)	0	{}	5
998	GO:0070739	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0070739\tlevel-05\tdepth-05\tprotein-glutam...	protein-glutamic acid ligase activity	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
999	GO:0016411	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0016411\tlevel-06\tdepth-06\tacylglycerol O...	acylglycerol O-acyltransferase activity	1.0	1.0	1.0	1.000000	1	{AGPAT4}	10	(1, 10)	(1, 5)	1	{AGPAT4}	5
1000	GO:0003677	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0003677\tlevel-04\tdepth-04\tDNA binding [m...	DNA binding	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
1001	GO:0097602	MF	[(local, bonferroni, bonferroni), (local, sida...	e	GO:0097602\tlevel-03\tdepth-03\tcullin family ...	cullin family protein binding	1.0	1.0	1.0	1.000000	1	{PARK2}	10	(1, 10)	(1, 5)	1	{PARK2}	5
1002	GO:0016453	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0016453\tlevel-06\tdepth-06\tC-acetyltransf...	C-acetyltransferase activity	1.0	1.0	1.0	1.000000	1	{ACAT2}	10	(1, 10)	(0, 5)	0	{}	5
1003	GO:0043177	MF	[(local, bonferroni, bonferroni), (local, sida...	p	GO:0043177\tlevel-03\tdepth-03\torganic acid b...	organic acid binding	1.0	1.0	1.0	1.000000	1	{IGF2R}	10	(1, 10)	(0, 5)	0	{}	5