GO-PCA analysis of the DMAP dataset

Author: Florian Wagner
Email: florian.wagner@duke.edu

This notebook demonstrates the application of GO-PCA to the DMAP dataset by Novershtern et al..



In [1]:

    
%%capture output
# get information about GO-PCA package versions
!pip show genometools
!pip show goparser
!pip show xlmhg
!pip show gopca



In [2]:

    
lines = output.stdout.split('\r\n')
first = 1
for i in range(len(lines)):
    if lines[i] == '---':
        if not first: print '---'
        else: first = 0
        print '\n'.join(lines[(i+2):(i+5)])









    



Name: genometools
Version: 1.1rc2
Summary: GenomeTools: Scripts and Functions For Working With Genomic Data.
---
Name: goparser
Version: 1.1rc2
Summary: GOparser - A Python framework for working with gene ontology (GO) terms and annotations
---
Name: xlmhg
Version: 1.1rc2
Summary: XL-mHG: A Nonparametric Test For Enrichment in Ranked Binary Lists.
---
Name: gopca
Version: 1.1rc3
Summary: GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge

Configuration (you don't have to change this)



In [3]:

    
# location of data files
data_file = 'DMap_data.gct' # the location of the raw data
genome_annotation_file = 'Homo_sapiens.GRCh38.79.gtf.gz' # Ensembl annotations of the human genome
gene2accession_file = 'gene2accession_2015-05-26_human.tsv.gz'
ontology_file = 'go-basic_2015-05-25.obo'
association_file = 'gene_association.goa_human_2015-05-26.gz'

# location of output files
gene_file = 'protein_coding_genes_human.tsv'
expression_file = 'dmap_expression.tsv'
go_annotation_file = 'go_annotations_human.tsv'
gopca_file = 'dmap_gopca.pickle'
signature_matrix_image_file = 'dmap_signatures_matrix.png'
signature_file = 'dmap_signatures.tsv'
signature_excel_file = 'dmap_signatures.xlsx'

Downloading the data

The notebook will attempt to automatically download the data to the locations specified in the configuration section (see above) using curl. If you don't have curl installed, or you're not working on Linux, you can download the data yourself using the direct download links provided.

We're first downloading the (processed) DMAP data from the Differentiation Map Portal (direct download).



In [4]:

    
!curl -o "$data_file" \
        "http://www.broadinstitute.org/dmap/downloadFile/DefaultSystemRoot/exp_1/ds_1/DMap_data.gct?downloadff=true&fileId=38"









    



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 22.4M  100 22.4M    0     0  2660k      0  0:00:08  0:00:08 --:--:-- 2775k

Next, we're downloading the Ensembl human genome annotations (direct download).



In [5]:

    
!curl -o "$genome_annotation_file" \
        "ftp://ftp.ensembl.org/pub/release-79/gtf/homo_sapiens/Homo_sapiens.GRCh38.79.gtf.gz"









    



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 42.3M  100 42.3M    0     0  3606k      0  0:00:12  0:00:12 --:--:-- 5707k

We also need data from the NCBI's Gene database (gene2accession.gz), containing a mapping between Entrez IDs and gene symbols. The full-sized file is over 700 MB in size, contains data for multiple species, and is updated daily. The NCBI does not appear to keep snapshots from older versions. To ensure that this analysis produces a consistent result, I am providing a version of this file that I downloaded on 5/26/2015, and filtered to only contain information for human (taxon ID 9606; direct download).



In [6]:

    
!curl -L -o "$gene2accession_file" \
        "https://www.dropbox.com/s/ggjrvnigtrfue3x/gene2accession_human_2015-05-26.tsv.gz?dl=1"









    



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   496    0   496    0     0    644      0 --:--:-- --:--:-- --:--:--   680
100 15.6M  100 15.6M    0     0  5414k      0  0:00:02  0:00:02 --:--:-- 16.7M

Next, we're downloading the Gene Ontology (direct download).



In [7]:

    
!curl -o "$ontology_file" \
        "http://viewvc.geneontology.org/viewvc/GO-SVN/ontology-releases/2015-05-25/go-basic.obo?revision=26059"









    



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 29.1M    0 29.1M    0     0  6496k      0 --:--:--  0:00:04 --:--:-- 6850k

And finally, we're downloading the human Uniprot-GOA gene association file (direct download).



In [8]:

    
!curl -o "$association_file" "ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/HUMAN/gene_association.goa_human.145.gz"









    



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 6265k  100 6265k    0     0  1018k      0  0:00:06  0:00:06 --:--:-- 1333k

Filtering the DMAP data for protein-coding genes

In this step, we will extract a list of the names of all human protein-coding genes, and filter the DMAP data against this list. However, we will do this by first converting the converted Entrez IDs from the DMAP data to gene symbols, instead of relying on the gene symbols provided in the DMAP data. (This results in a greater number of protein-coding genes being identified.)

First, we extract the list of protein-coding genes.



In [9]:

    
!extract_protein_coding_genes.py -s human -a "$genome_annotation_file" -o "$gene_file"









    



Regular expression used for filtering chromosome names: (?:\d\d?|MT|X|Y)$
Parsing data...
done (parsed 2720535 lines).

Gene chromosomes (25):
	1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 3, 4, 5, 6, 7, 8, 9, MT, X, Y

Excluded chromosomes (225):
	CHR_HG126_PATCH, CHR_HG1362_PATCH, CHR_HG142_HG150_NOVEL_TEST, CHR_HG151_NOVEL_TEST, CHR_HG1832_PATCH, CHR_HG2021_PATCH, CHR_HG2030_PATCH, CHR_HG2058_PATCH, CHR_HG2066_PATCH, CHR_HG2095_PATCH, CHR_HG2104_PATCH, CHR_HG2191_PATCH, CHR_HG2217_PATCH, CHR_HG2232_PATCH, CHR_HG2233_PATCH, CHR_HG2247_PATCH, CHR_HG2288_HG2289_PATCH, CHR_HG2291_PATCH, CHR_HG986_PATCH, CHR_HSCHR10_1_CTG1, CHR_HSCHR10_1_CTG2, CHR_HSCHR10_1_CTG4, CHR_HSCHR11_1_CTG1_2, CHR_HSCHR11_1_CTG5, CHR_HSCHR11_1_CTG6, CHR_HSCHR11_1_CTG7, CHR_HSCHR11_1_CTG8, CHR_HSCHR11_2_CTG1, CHR_HSCHR11_2_CTG1_1, CHR_HSCHR11_3_CTG1, CHR_HSCHR12_1_CTG1, CHR_HSCHR12_1_CTG2_1, CHR_HSCHR12_2_CTG2, CHR_HSCHR12_3_CTG2, CHR_HSCHR12_3_CTG2_1, CHR_HSCHR12_4_CTG2, CHR_HSCHR12_5_CTG2, CHR_HSCHR12_5_CTG2_1, CHR_HSCHR12_6_CTG2_1, CHR_HSCHR13_1_CTG1, CHR_HSCHR13_1_CTG3, CHR_HSCHR14_1_CTG1, CHR_HSCHR14_2_CTG1, CHR_HSCHR14_3_CTG1, CHR_HSCHR14_7_CTG1, CHR_HSCHR15_1_CTG1, CHR_HSCHR15_1_CTG3, CHR_HSCHR15_1_CTG8, CHR_HSCHR15_2_CTG3, CHR_HSCHR15_2_CTG8, CHR_HSCHR15_3_CTG3, CHR_HSCHR15_3_CTG8, CHR_HSCHR15_4_CTG8, CHR_HSCHR15_5_CTG8, CHR_HSCHR16_1_CTG1, CHR_HSCHR16_1_CTG3_1, CHR_HSCHR16_2_CTG3_1, CHR_HSCHR16_3_CTG1, CHR_HSCHR16_4_CTG1, CHR_HSCHR16_CTG2, CHR_HSCHR17_10_CTG4, CHR_HSCHR17_1_CTG1, CHR_HSCHR17_1_CTG2, CHR_HSCHR17_1_CTG4, CHR_HSCHR17_1_CTG5, CHR_HSCHR17_1_CTG9, CHR_HSCHR17_2_CTG1, CHR_HSCHR17_2_CTG2, CHR_HSCHR17_2_CTG5, CHR_HSCHR17_3_CTG2, CHR_HSCHR17_4_CTG4, CHR_HSCHR17_5_CTG4, CHR_HSCHR17_6_CTG4, CHR_HSCHR17_7_CTG4, CHR_HSCHR17_8_CTG4, CHR_HSCHR18_1_CTG1_1, CHR_HSCHR18_2_CTG2, CHR_HSCHR18_2_CTG2_1, CHR_HSCHR18_ALT2_CTG2_1, CHR_HSCHR19KIR_ABC08_A1_HAP_CTG3_1, CHR_HSCHR19KIR_ABC08_AB_HAP_C_P_CTG3_1, CHR_HSCHR19KIR_ABC08_AB_HAP_T_P_CTG3_1, CHR_HSCHR19KIR_FH05_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH05_B_HAP_CTG3_1, CHR_HSCHR19KIR_FH06_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH06_BA1_HAP_CTG3_1, CHR_HSCHR19KIR_FH08_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH08_BAX_HAP_CTG3_1, CHR_HSCHR19KIR_FH13_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH13_BA2_HAP_CTG3_1, CHR_HSCHR19KIR_FH15_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH15_B_HAP_CTG3_1, CHR_HSCHR19KIR_G085_A_HAP_CTG3_1, CHR_HSCHR19KIR_G085_BA1_HAP_CTG3_1, CHR_HSCHR19KIR_G248_A_HAP_CTG3_1, CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1, CHR_HSCHR19KIR_GRC212_AB_HAP_CTG3_1, CHR_HSCHR19KIR_GRC212_BA1_HAP_CTG3_1, CHR_HSCHR19KIR_LUCE_A_HAP_CTG3_1, CHR_HSCHR19KIR_LUCE_BDEL_HAP_CTG3_1, CHR_HSCHR19KIR_RP5_B_HAP_CTG3_1, CHR_HSCHR19KIR_RSH_A_HAP_CTG3_1, CHR_HSCHR19KIR_RSH_BA2_HAP_CTG3_1, CHR_HSCHR19KIR_T7526_A_HAP_CTG3_1, CHR_HSCHR19KIR_T7526_BDEL_HAP_CTG3_1, CHR_HSCHR19LRC_COX1_CTG3_1, CHR_HSCHR19LRC_COX2_CTG3_1, CHR_HSCHR19LRC_LRC_I_CTG3_1, CHR_HSCHR19LRC_LRC_J_CTG3_1, CHR_HSCHR19LRC_LRC_S_CTG3_1, CHR_HSCHR19LRC_LRC_T_CTG3_1, CHR_HSCHR19LRC_PGF1_CTG3_1, CHR_HSCHR19LRC_PGF2_CTG3_1, CHR_HSCHR19_1_CTG2, CHR_HSCHR19_1_CTG3_1, CHR_HSCHR19_2_CTG2, CHR_HSCHR19_3_CTG2, CHR_HSCHR19_3_CTG3_1, CHR_HSCHR19_4_CTG2, CHR_HSCHR19_4_CTG3_1, CHR_HSCHR19_5_CTG2, CHR_HSCHR1_1_CTG3, CHR_HSCHR1_1_CTG31, CHR_HSCHR1_1_CTG32_1, CHR_HSCHR1_2_CTG3, CHR_HSCHR1_2_CTG31, CHR_HSCHR1_2_CTG32_1, CHR_HSCHR1_3_CTG31, CHR_HSCHR1_3_CTG32_1, CHR_HSCHR1_4_CTG31, CHR_HSCHR1_ALT2_1_CTG32_1, CHR_HSCHR20_1_CTG2, CHR_HSCHR20_1_CTG3, CHR_HSCHR20_1_CTG4, CHR_HSCHR21_3_CTG1_1, CHR_HSCHR21_4_CTG1_1, CHR_HSCHR21_5_CTG2, CHR_HSCHR21_6_CTG1_1, CHR_HSCHR22_1_CTG1, CHR_HSCHR22_1_CTG2, CHR_HSCHR22_1_CTG3, CHR_HSCHR22_1_CTG4, CHR_HSCHR22_1_CTG5, CHR_HSCHR22_1_CTG7, CHR_HSCHR22_2_CTG1, CHR_HSCHR22_3_CTG1, CHR_HSCHR22_4_CTG1, CHR_HSCHR22_5_CTG1, CHR_HSCHR2_1_CTG1, CHR_HSCHR2_1_CTG5, CHR_HSCHR2_1_CTG7_2, CHR_HSCHR2_2_CTG7_2, CHR_HSCHR2_3_CTG1, CHR_HSCHR2_3_CTG15, CHR_HSCHR2_4_CTG1, CHR_HSCHR3_1_CTG1, CHR_HSCHR3_1_CTG2_1, CHR_HSCHR3_1_CTG3, CHR_HSCHR3_2_CTG2_1, CHR_HSCHR3_2_CTG3, CHR_HSCHR3_3_CTG3, CHR_HSCHR3_4_CTG3, CHR_HSCHR3_5_CTG3, CHR_HSCHR3_6_CTG3, CHR_HSCHR3_7_CTG3, CHR_HSCHR3_8_CTG3, CHR_HSCHR4_1_CTG12, CHR_HSCHR4_1_CTG4, CHR_HSCHR4_1_CTG9, CHR_HSCHR4_6_CTG12, CHR_HSCHR5_1_CTG1_1, CHR_HSCHR5_2_CTG1_1, CHR_HSCHR5_2_CTG5, CHR_HSCHR5_3_CTG1, CHR_HSCHR5_3_CTG5, CHR_HSCHR5_4_CTG1, CHR_HSCHR5_5_CTG1, CHR_HSCHR5_6_CTG1, CHR_HSCHR6_1_CTG4, CHR_HSCHR6_1_CTG5, CHR_HSCHR6_1_CTG8, CHR_HSCHR6_8_CTG1, CHR_HSCHR6_MHC_APD_CTG1, CHR_HSCHR6_MHC_COX_CTG1, CHR_HSCHR6_MHC_DBB_CTG1, CHR_HSCHR6_MHC_MANN_CTG1, CHR_HSCHR6_MHC_MCF_CTG1, CHR_HSCHR6_MHC_QBL_CTG1, CHR_HSCHR6_MHC_SSTO_CTG1, CHR_HSCHR7_1_CTG1, CHR_HSCHR7_1_CTG4_4, CHR_HSCHR7_1_CTG6, CHR_HSCHR7_2_CTG4_4, CHR_HSCHR7_2_CTG6, CHR_HSCHR7_3_CTG6, CHR_HSCHR8_2_CTG7, CHR_HSCHR8_3_CTG1, CHR_HSCHR8_3_CTG7, CHR_HSCHR8_4_CTG7, CHR_HSCHR8_5_CTG1, CHR_HSCHR8_5_CTG7, CHR_HSCHR8_7_CTG1, CHR_HSCHR8_8_CTG1, CHR_HSCHR8_9_CTG1, CHR_HSCHR9_1_CTG2, CHR_HSCHR9_1_CTG3, CHR_HSCHR9_1_CTG5, CHR_HSCHRX_1_CTG3, CHR_HSCHRX_2_CTG12, CHR_HSCHRX_2_CTG3, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1

Gene sources:
	ensembl_havana: 18873
	havana: 733
	ensembl: 236
	insdc: 13

Gene types:
	protein_coding: 19796
	polymorphic_pseudogene: 59

Genes with redundant annotations: 112

Polymorphic pseudogenes (59): AKR7L (1), C6orf183 (1), CASP12 (1), CYP2D7 (1), FBXL21 (1), FCGR2C (1), GBA3 (1), GSTT2 (1), IFNL4 (1), KIR2DS4 (1), MROH5 (1), NAT8B (1), OR10A6 (1), OR10AC1 (1), OR10C1 (1), OR10J4 (1), OR10X1 (1), OR11H7 (1), OR12D1 (1), OR12D2 (1), OR13C7 (1), OR1B1 (1), OR1S1 (1), OR2AG1 (1), OR2F1 (1), OR2J1 (1), OR2L8 (1), OR2S2 (1), OR2T11 (1), OR4A8 (1), OR4C16 (1), OR4Q2 (1), OR4X1 (1), OR4X2 (1), OR51B2 (1), OR51F1 (1), OR51G1 (1), OR52B4 (1), OR52E1 (1), OR52R1 (1), OR52Z1 (1), OR5AC1 (1), OR5AL1 (1), OR5AR1 (1), OR5D13 (1), OR5G3 (1), OR5H6 (1), OR5H8 (1), OR5L1 (1), OR5R1 (1), OR6Q1 (1), OR8B4 (1), OR8D2 (1), OR8J2 (1), OR8K3 (1), PKD1L2 (1), PNLIPRP2 (1), SERPINA2 (1), TUBB8P7 (1)

Total protein-coding genes: 19742

Now, we're using the Entrez ID->Gene Symbol conversion table in combination with our list of protein-coding genes to filter the DMAP data for protein-coding genes.



In [10]:

    
import csv
import gzip

import numpy as np

from genometools import misc
from gopca import common

def read_dmap_expression(fn):
    entrez = []
    genes = []
    expr = []
    n = None
    p = None
    with open(fn) as fh:
        reader = csv.reader(fh,dialect='excel-tab')
        reader.next()
        n,p = [int(f) for f in reader.next()]
        samples = reader.next()[2:]
        assert len(samples) == p
        for l in reader:
            entrez.append(l[0])
            genes.append(l[1])
            expr.append(l[2:])
    E = np.float64(expr)
    assert E.shape[0] == n
    print E.shape
    return entrez, genes,samples,E

def read_entrez2gene(fn):
    e2g = {}
    with gzip.open(fn) as fh:
        reader = csv.reader(fh,dialect='excel-tab')
        for l in reader:
            e2g[l[1]] = l[15]
    return e2g

protein_coding = misc.read_single(gene_file)
all_genes = set(protein_coding)
#print len(all_genes)
e2g = read_entrez2gene(gene2accession_file)
entrez_dmap, genes_dmap, samples_dmap, E_dmap = read_dmap_expression(data_file)

p,n = E_dmap.shape
known_entrez = 0
unknown = 0
genes = []
E = []
for i in range(p):
    # try gene conversion...
    # if it fails, or if we don't recognize the converted name as a protein-coding gene, skip it
    g = None
    try:
        converted = e2g[entrez_dmap[i]]
    except KeyError:
        pass
    else:
        known_entrez += 1
        if converted in all_genes:
            genes.append(converted)
            E.append(E_dmap[i,:])
E = np.float64(E)

# sort genes alphabetically
a = np.lexsort([genes])
genes = [genes[i] for i in a]
E = E[a,:]

print known_entrez
print E.shape
common.write_expression(expression_file,genes,samples_dmap,E)









    



(8968, 211)
8822
(8528, 211)

Extracting human GO annotations



In [11]:

    
min_genes = 5
max_genes = 200

evidence = ['IDA','IGI','IMP','ISO','ISS','IC','NAS','TAS']
ev_str = ' '.join(evidence)

!gopca_extract_go_annotations.py -o "$go_annotation_file" -g "$gene_file" -t "$ontology_file" -a "$association_file" \
        -e $ev_str --min-genes-per-term $min_genes --max-genes-per-term $max_genes --part-of-cc-only









    



Read 19742 genes.
go-basic_2015-05-25.obo
Parsed 43122 GO term definitions.
Adding child and part relationships... done!
Flattening ancestors... done!
Flattening descendants... done!
Read 19742 genes.
Parsing annotations... done!
Parsed 476648 positive GO annotations (264131 = 55.4% excluded based on evidence type).
Warning: 7348 annotations with 337 unkonwn gene names.
Found a total of 205169 valid annotations.
135340 unique Gene-Term associations.
Obtaining GO term associations... done.
Testing for perfect overlap... done!
# affected terms: 1114
# perfectly redundant descendant terms: 582
Selected 6675 / 7257 non-redundant GO terms.
Writing output file... done!



In [12]:

    
!go-pca.py -L 1000 -e "$expression_file" -t "$ontology_file" -a "$go_annotation_file" -o "$gopca_file" \
        -s 123456789 --go-part-of-cc-only









    



Info: Reading expression...
Info: MD5 hash: 1729bdf9de9c98dcdd87d6850e73909b
Info: Expression matrix size: p = 8528 genes x n = 211 samples.
Info: Estimating the number of principal components (seed = 123456789)... Info: done!
Info: The estimated number of PCs is 15.
Info: Reading ontology...
Info: (MD5 hash: a5623a26da07171db485634ae214eef9)
Parsed 43122 GO term definitions.
Adding child and part relationships... done!
Flattening ancestors... done!
Flattening descendants... done!
Info: Reading annotations... Info: (MD5 hash: cd2e3ad89e31f823ece6c784c04c8ceb) Info: (217713 annotations) done!
Info: Generating gene x GO term matrix... Info: done!
Info: Performing PCA... Info: done!
Info: Cumulative fraction of variance explained by the first 15 PCs: 80.5%
Info: 
Info: ----------------------------------------------------------------------
Info: PC 1 explains 24.3% of the variance.
Info: The new cumulative fraction of variance explained is 24.3%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 8 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 8 / 8 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 2 / 8 enriched terms.
Info: Local filter: Kept 2 / 8 enriched terms.
Info: Generated 2 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 2 / 2 signatures.
Info: BP: bicarbonate transport (GO:0015701) [-1:5/13, p=8.3e-08]
Info: CC: cytoplasmic membrane-bounded vesicle lumen (GO:0060205) [-1:13/48, p=8.6e-09]
Info: Total no. of signatures so far: 2
Info: 
Info: ----------------------------------------------------------------------
Info: PC 2 explains 16.9% of the variance.
Info: The new cumulative fraction of variance explained is 41.2%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 37 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 37 / 37 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 6 / 37 enriched terms.
Info: Local filter: Kept 6 / 37 enriched terms.
Info: Generated 6 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 6 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 6 / 6 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 2 / 6 enriched terms.
Info: Local filter: Kept 2 / 6 enriched terms.
Info: Generated 2 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 8 / 8 signatures.
Info: CC: T cell receptor complex (GO:0042101) [2:8/12, p=4.0e-08]
Info: BP: hydrogen peroxide catabolic process (GO:0042744) [-2:7/9, p=1.9e-08]
Info: BP: natural killer cell activation (GO:0030101) [2:5/11, p=1.4e-07]
Info: MF: chemokine receptor activity (GO:0004950) [2:5/12, p=5.2e-07]
Info: BP: cellular defense response (GO:0006968) [2:11/40, p=2.8e-12]
Info: BP: response to type I interferon (GO:0034340) [2:20/52, p=6.0e-09]
Info: BP: pos. regulation of T cell activation (GO:0050870) [2:24/93, p=4.3e-12]
Info: BP: G1/S transition of mitotic cell cycle (GO:0000082) [-2:38/130, p=3.0e-08]
Info: Total no. of signatures so far: 10
Info: 
Info: ----------------------------------------------------------------------
Info: PC 3 explains 11.5% of the variance.
Info: The new cumulative fraction of variance explained is 52.8%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 39 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 39 / 39 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 6 / 39 enriched terms.
Info: Local filter: Kept 6 / 39 enriched terms.
Info: Generated 6 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 3 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 3 / 3 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 2 / 3 enriched terms.
Info: Local filter: Kept 2 / 3 enriched terms.
Info: Generated 2 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 8 / 8 signatures.
Info: CC: MHC class II protein complex (GO:0042613) [3:9/11, p=5.6e-12]
Info: BP: response to fungus (GO:0009620) [3:5/6, p=3.6e-09]
Info: BP: neg. regulation of leukocyte prolif. (GO:0070664) [3:7/19, p=3.8e-08]
Info: BP: autophagy (GO:0006914) [-3:15/36, p=6.7e-09]
Info: BP: response to bacterium (GO:0009617) [3:35/118, p=5.8e-16]
Info: CC: cullin-RING ubiquitin ligase complex (GO:0031461) [-3:18/64, p=8.4e-07]
Info: BP: inflammatory response (GO:0006954) [3:26/104, p=2.3e-10]
Info: BP: platelet degranulation (GO:0002576) [3:19/66, p=1.3e-07]
Info: Total no. of signatures so far: 18
Info: 
Info: ----------------------------------------------------------------------
Info: PC 4 explains 8.5% of the variance.
Info: The new cumulative fraction of variance explained is 61.3%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 62 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 62 / 62 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 9 / 62 enriched terms.
Info: Local filter: Kept 9 / 62 enriched terms.
Info: Generated 9 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 3 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 3 / 3 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 1 / 3 enriched terms.
Info: Local filter: Kept 1 / 3 enriched terms.
Info: Generated 1 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 9 / 10 signatures.
Info: CC: endolysosome (GO:0036019) [4:5/7, p=3.6e-08]
Info: MF: MHC class II receptor activity (GO:0032395) [4:5/7, p=3.5e-09]
Info: BP: detection of external biotic stimulus (GO:0098581) [4:7/11, p=6.2e-09]
Info: BP: phagosome maturation (GO:0090382) [4:11/30, p=5.1e-10]
Info: BP: humoral immune response (GO:0006959) [4:13/45, p=3.3e-10]
Info: BP: DNA strand elongation involved in DNA replication (GO:0006271) [-4:14/29, p=1.1e-08]
Info: BP: defense response to other organism (GO:0098542) [4:23/89, p=1.5e-13]
Info: BP: response to IFN-gamma (GO:0034341) [4:22/82, p=3.8e-11]
Info: BP: MyD88-dependent toll-like receptor signal. pathway (GO:0002755) [4:17/66, p=5.6e-10]
Info: Total no. of signatures so far: 27
Info: 
Info: ----------------------------------------------------------------------
Info: PC 5 explains 5.7% of the variance.
Info: The new cumulative fraction of variance explained is 67.0%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 28 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 28 / 28 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 5 / 28 enriched terms.
Info: Local filter: Kept 5 / 28 enriched terms.
Info: Generated 5 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 19 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 19 / 19 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 3 / 19 enriched terms.
Info: Local filter: Kept 3 / 19 enriched terms.
Info: Generated 3 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 3 / 8 signatures.
Info: BP: B cell prolif. (GO:0042100) [-5:5/10, p=8.7e-07]
Info: BP: leukocyte aggregation (GO:0070486) [5:19/75, p=1.1e-13]
Info: BP: leukocyte migration (GO:0050900) [5:35/139, p=9.8e-10]
Info: Total no. of signatures so far: 30
Info: 
Info: ----------------------------------------------------------------------
Info: PC 6 explains 2.5% of the variance.
Info: The new cumulative fraction of variance explained is 69.5%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 20 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 20 / 20 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 5 / 20 enriched terms.
Info: Local filter: Kept 5 / 20 enriched terms.
Info: Generated 5 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: # signatures:
Info: Global filter: kept 3 / 5 signatures.
Info: BP: regulation of B cell receptor signal. pathway (GO:0050855) [6:5/7, p=2.7e-07]
Info: BP: pos. regulation of cell killing (GO:0031343) [6:6/21, p=1.3e-07]
Info: BP: antigen receptor-mediated signal. pathway (GO:0050851) [6:21/82, p=1.7e-08]
Info: Total no. of signatures so far: 33
Info: 
Info: ----------------------------------------------------------------------
Info: PC 7 explains 2.2% of the variance.
Info: The new cumulative fraction of variance explained is 71.7%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 23 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 23 / 23 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 6 / 23 enriched terms.
Info: Local filter: Kept 6 / 23 enriched terms.
Info: Generated 6 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 2 / 6 signatures.
Info: CC: platelet alpha granule membrane (GO:0031092) [-7:6/8, p=6.4e-09]
Info: BP: regulation of wound healing (GO:0061041) [-7:13/49, p=2.8e-08]
Info: Total no. of signatures so far: 35
Info: 
Info: ----------------------------------------------------------------------
Info: PC 8 explains 1.8% of the variance.
Info: The new cumulative fraction of variance explained is 73.5%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 66 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 66 / 66 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 9 / 66 enriched terms.
Info: Local filter: Kept 9 / 66 enriched terms.
Info: Generated 9 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 1 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 1 / 1 enriched terms with enrichment score >= 2.0x.
Info: Local filter: Kept 1 / 1 enriched terms.
Info: Generated 1 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 8 / 10 signatures.
Info: CC: condensed chromosome kinetochore (GO:0000777) [8:8/17, p=1.8e-13]
Info: BP: regulation of transcription involved in G1/S tr... (GO:0000083) [8:11/20, p=2.3e-08]
Info: BP: mitotic sister chromatid segregation (GO:0000070) [8:11/44, p=3.9e-10]
Info: CC: U1 snRNP (GO:0005685) [8:8/10, p=2.9e-09]
Info: BP: centromere complex assembly (GO:0034508) [8:10/24, p=1.6e-08]
Info: BP: DNA replication (GO:0006260) [8:43/112, p=8.6e-19]
Info: BP: regulation of mitotic nuclear division (GO:0007088) [8:19/75, p=2.6e-10]
Info: BP: nucleoside phosphate biosynthetic process (GO:1901293) [8:21/81, p=3.6e-07]
Info: Total no. of signatures so far: 43
Info: 
Info: ----------------------------------------------------------------------
Info: PC 9 explains 1.3% of the variance.
Info: The new cumulative fraction of variance explained is 74.8%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 1 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 1 / 1 enriched terms with enrichment score >= 2.0x.
Info: Local filter: Kept 1 / 1 enriched terms.
Info: Generated 1 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: # signatures:
Info: Global filter: kept 0 / 1 signatures.
Info: Total no. of signatures so far: 43
Info: 
Info: ----------------------------------------------------------------------
Info: PC 10 explains 1.2% of the variance.
Info: The new cumulative fraction of variance explained is 76.0%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: # signatures: Info: Global filter: kept 0 / 0 signatures.
Info: Total no. of signatures so far: 43
Info: 
Info: ----------------------------------------------------------------------
Info: PC 11 explains 1.1% of the variance.
Info: The new cumulative fraction of variance explained is 77.1%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 4 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 4 / 4 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 2 / 4 enriched terms.
Info: Local filter: Kept 2 / 4 enriched terms.
Info: Generated 2 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 10 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 10 / 10 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 3 / 10 enriched terms.
Info: Local filter: Kept 3 / 10 enriched terms.
Info: Generated 3 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 2 / 5 signatures.
Info: BP: regulation of cell shape (GO:0008360) [11:11/44, p=2.5e-07]
Info: BP: cotranslational protein targeting to membrane (GO:0006613) [-11:31/100, p=1.5e-08]
Info: Total no. of signatures so far: 45
Info: 
Info: ----------------------------------------------------------------------
Info: PC 12 explains 1.0% of the variance.
Info: The new cumulative fraction of variance explained is 78.1%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 12 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 12 / 12 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 3 / 12 enriched terms.
Info: Local filter: Kept 3 / 12 enriched terms.
Info: Generated 3 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 9 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 9 / 9 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 3 / 9 enriched terms.
Info: Local filter: Kept 3 / 9 enriched terms.
Info: Generated 3 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 1 / 6 signatures.
Info: BP: platelet aggregation (GO:0070527) [12:8/23, p=1.7e-09]
Info: Total no. of signatures so far: 46
Info: 
Info: ----------------------------------------------------------------------
Info: PC 13 explains 0.9% of the variance.
Info: The new cumulative fraction of variance explained is 79.0%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 4 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 4 / 4 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 2 / 4 enriched terms.
Info: Local filter: Kept 2 / 4 enriched terms.
Info: Generated 2 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: # signatures:
Info: Global filter: kept 1 / 2 signatures.
Info: BP: respiratory electron transport chain (GO:0022904) [13:21/84, p=7.6e-09]
Info: Total no. of signatures so far: 47
Info: 
Info: ----------------------------------------------------------------------
Info: PC 14 explains 0.8% of the variance.
Info: The new cumulative fraction of variance explained is 79.8%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 1 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 1 / 1 enriched terms with enrichment score >= 2.0x.
Info: Local filter: Kept 1 / 1 enriched terms.
Info: Generated 1 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 2 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 2 / 2 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 1 / 2 enriched terms.
Info: Local filter: Kept 1 / 2 enriched terms.
Info: Generated 1 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 1 / 2 signatures.
Info: MF: heparin binding (GO:0008201) [14:7/28, p=8.9e-07]
Info: Total no. of signatures so far: 48
Info: 
Info: ----------------------------------------------------------------------
Info: PC 15 explains 0.7% of the variance.
Info: The new cumulative fraction of variance explained is 80.5%.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 18 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 18 / 18 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 5 / 18 enriched terms.
Info: Local filter: Kept 5 / 18 enriched terms.
Info: Generated 5 GO-PCA signatures based on the enriched GO terms.
Info: Testing 6675 terms for enrichment...
Info: (N = 8528, X_frac = 0.25, X_min = 5, L = 1000; K_max = 162)
Info: done!
Info: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms... Info: done!
Info: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
Info: 16 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
Info: Enrichment filter: Kept 16 / 16 enriched terms with enrichment score >= 2.0x.
Info: Filtering: Kept 4 / 16 enriched terms.
Info: Local filter: Kept 4 / 16 enriched terms.
Info: Generated 4 GO-PCA signatures based on the enriched GO terms.
Info: # signatures:
Info: Global filter: kept 2 / 9 signatures.
Info: CC: stress fiber (GO:0001725) [-15:6/24, p=1.0e-07]
Info: BP: neg. regulation of viral genome replication (GO:0045071) [15:9/34, p=4.2e-07]
Info: Total no. of signatures so far: 50
Info: 
Info: ======================================================================
Info: GO-PCA generated 50 signatures:
Info: CC: MHC class II protein complex (GO:0042613) [3:9/11, p=5.6e-12]
Info: CC: condensed chromosome kinetochore (GO:0000777) [8:8/17, p=1.8e-13]
Info: BP: response to fungus (GO:0009620) [3:5/6, p=3.6e-09]
Info: BP: bicarbonate transport (GO:0015701) [-1:5/13, p=8.3e-08]
Info: CC: endolysosome (GO:0036019) [4:5/7, p=3.6e-08]
Info: CC: T cell receptor complex (GO:0042101) [2:8/12, p=4.0e-08]
Info: CC: platelet alpha granule membrane (GO:0031092) [-7:6/8, p=6.4e-09]
Info: MF: MHC class II receptor activity (GO:0032395) [4:5/7, p=3.5e-09]
Info: BP: detection of external biotic stimulus (GO:0098581) [4:7/11, p=6.2e-09]
Info: BP: hydrogen peroxide catabolic process (GO:0042744) [-2:7/9, p=1.9e-08]
Info: BP: B cell prolif. (GO:0042100) [-5:5/10, p=8.7e-07]
Info: BP: regulation of transcription involved in G1/S tr... (GO:0000083) [8:11/20, p=2.3e-08]
Info: BP: regulation of B cell receptor signal. pathway (GO:0050855) [6:5/7, p=2.7e-07]
Info: BP: neg. regulation of leukocyte prolif. (GO:0070664) [3:7/19, p=3.8e-08]
Info: BP: platelet aggregation (GO:0070527) [12:8/23, p=1.7e-09]
Info: BP: pos. regulation of cell killing (GO:0031343) [6:6/21, p=1.3e-07]
Info: BP: natural killer cell activation (GO:0030101) [2:5/11, p=1.4e-07]
Info: CC: stress fiber (GO:0001725) [-15:6/24, p=1.0e-07]
Info: BP: mitotic sister chromatid segregation (GO:0000070) [8:11/44, p=3.9e-10]
Info: MF: chemokine receptor activity (GO:0004950) [2:5/12, p=5.2e-07]
Info: MF: heparin binding (GO:0008201) [14:7/28, p=8.9e-07]
Info: CC: U1 snRNP (GO:0005685) [8:8/10, p=2.9e-09]
Info: BP: phagosome maturation (GO:0090382) [4:11/30, p=5.1e-10]
Info: BP: cellular defense response (GO:0006968) [2:11/40, p=2.8e-12]
Info: BP: centromere complex assembly (GO:0034508) [8:10/24, p=1.6e-08]
Info: BP: humoral immune response (GO:0006959) [4:13/45, p=3.3e-10]
Info: BP: leukocyte aggregation (GO:0070486) [5:19/75, p=1.1e-13]
Info: BP: neg. regulation of viral genome replication (GO:0045071) [15:9/34, p=4.2e-07]
Info: BP: autophagy (GO:0006914) [-3:15/36, p=6.7e-09]
Info: BP: response to type I interferon (GO:0034340) [2:20/52, p=6.0e-09]
Info: BP: regulation of cell shape (GO:0008360) [11:11/44, p=2.5e-07]
Info: BP: DNA replication (GO:0006260) [8:43/112, p=8.6e-19]
Info: BP: DNA strand elongation involved in DNA replication (GO:0006271) [-4:14/29, p=1.1e-08]
Info: BP: defense response to other organism (GO:0098542) [4:23/89, p=1.5e-13]
Info: BP: response to IFN-gamma (GO:0034341) [4:22/82, p=3.8e-11]
Info: BP: regulation of mitotic nuclear division (GO:0007088) [8:19/75, p=2.6e-10]
Info: BP: regulation of wound healing (GO:0061041) [-7:13/49, p=2.8e-08]
Info: CC: cytoplasmic membrane-bounded vesicle lumen (GO:0060205) [-1:13/48, p=8.6e-09]
Info: BP: MyD88-dependent toll-like receptor signal. pathway (GO:0002755) [4:17/66, p=5.6e-10]
Info: BP: response to bacterium (GO:0009617) [3:35/118, p=5.8e-16]
Info: BP: respiratory electron transport chain (GO:0022904) [13:21/84, p=7.6e-09]
Info: BP: pos. regulation of T cell activation (GO:0050870) [2:24/93, p=4.3e-12]
Info: CC: cullin-RING ubiquitin ligase complex (GO:0031461) [-3:18/64, p=8.4e-07]
Info: BP: inflammatory response (GO:0006954) [3:26/104, p=2.3e-10]
Info: BP: platelet degranulation (GO:0002576) [3:19/66, p=1.3e-07]
Info: BP: nucleoside phosphate biosynthetic process (GO:1901293) [8:21/81, p=3.6e-07]
Info: BP: leukocyte migration (GO:0050900) [5:35/139, p=9.8e-10]
Info: BP: antigen receptor-mediated signal. pathway (GO:0050851) [6:21/82, p=1.7e-08]
Info: BP: cotranslational protein targeting to membrane (GO:0006613) [-11:31/100, p=1.5e-08]
Info: BP: G1/S transition of mitotic cell cycle (GO:0000082) [-2:38/130, p=3.0e-08]
Info: GO-PCA runtime: 26.63s
Info: Saving result to file "dmap_gopca.pickle"... Info: done!

Plotting of the signature matrix



In [13]:

    
from IPython.display import Image

dpi = 90.0

!gopca_plot_signature_matrix.py -g "$gopca_file" -o "$signature_matrix_image_file" -r $dpi -t 
Image(filename=signature_matrix_image_file,width=600)









    



Clustering of samples... done!
Plotting... done!
Saving to file... done!






    Out[13]:

Save signatures to a tab-delimited file



In [14]:

    
!gopca_extract_signatures.py -g "$gopca_file" -o "$signature_file"









    



Wrote 50 signatures to "dmap_signatures.tsv".

Save signatures to an Excel spreadsheet



In [15]:

    
!gopca_extract_signatures_excel.py -g "$gopca_file" -o "$signature_excel_file"









    



Wrote 50 signatures to "dmap_signatures.xlsx".

Copyright and License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.