Note: if this notebook is taking too long to run, consider using the module HierarchicalClustering v7.3.4 available on gp-beta-ami.genepattern.org

A note on some of the parameters we are using:


In [7]:
import genepattern
# import cuzcatlan as cusca
# import pandas as pd    
from cuzcatlan import hc_genes

genepattern.GPUIBuilder(hc_genes, name="Hierarchical Clustering of Genes (Rows).", 
                        description="This function performs hierarchical clustering to group genes (rows) with similar expression profiles.",
                        parameters={
                                "distance_metric":{
                                                      "default": "pearson",
                                                      "choices":{'Information Coefficient':"information_coefficient",
                                                                 'City Block (Manhattan or L1-norm)':'manhattan',
                                                                 'Euclidean (L2-norm)':"euclidean",
                                                                 'Pearson Correlation':"pearson",
                                                                 'Uncentered Pearson Correlation':'uncentered_pearson',
                                                                 'Uncentered Pearson Correlation, absolute value':'absolute_uncentered_pearson',
                                                                 'Spearman Correlation':'spearman',
                                                                 "Kendall's Tau": 'kendall',
                                                                 'Cosine distance':'cosine',
                                                                }
                                                     }
                        })



In [2]:
hc_genes(input_gene_expression="https://datasets.genepattern.org/data/test_data/BRCA_minimal_60x19.gct", clustering_type="Single", distance_metric="pearson", file_basename="HC_out", clusters_to_highlight=3)


Currenty clustering_type is being ignored, only 'single' is supported.
Now we will start performing hierarchical clustering, this may take a little while.
----------------------------------------------------------------------
The PDF of this heatmap can be downloaded here:
----------------------------------------------------------------------
The CDF which is compatible with HierarchicalClusteringViewer is here:
----------------------------------------------------------------------
The GTR which is compatible with HierarchicalClusteringViewer is here:
----------------------------------------------------------------------
<matplotlib.figure.Figure at 0x7fd09f01b828>
Done with Hierarchical Clustering!
Out[2]:
AgglomerativeClustering(affinity=<function my_affinity_p at 0x7fd09fbd5378>,
            compute_full_tree='auto', connectivity=None, linkage='average',
            memory=Memory(cachedir=None), n_clusters=2,
            pooling_func=<function mean at 0x7fd0d803fa60>)

In [ ]: