GO-PCA Demo "DMAP"

Author: Florian Wagner
Email: florian.wagner@duke.edu

This notebook demonstrates the application of GO-PCA (Wagner, 2015) to the DMAP dataset (Novershtern et al., 2011). For each step of the anlaysis, the notebook shows the commands used, as well the corresponding output. If you would like to run this notebook yourself, you need to first install GO-PCA.


In [44]:
# print package versions
from pkg_resources import require

print 'Package versions'
print '----------------'
print require('numpy')[0]
print require('scipy')[0]
print require('scikit-learn')[0]
print require('matplotlib')[0]
print
print require('genometools')[0]
print require('goparser')[0]
print require('xlmhg')[0]
print require('gopca')[0]


Package versions
----------------
numpy 1.10.1
scipy 0.15.1
scikit-learn 0.16.1
matplotlib 1.4.3

genometools 1.2rc4
goparser 1.1.2
xlmhg 1.1rc3
gopca 1.1rc12

Configuration (you don't have to change this)


In [45]:
# location of data files
expression_file = 'dmap_expression.tsv'
gene_annotation_file = 'Homo_sapiens.GRCh38.79.gtf.gz' # Ensembl annotations of the human genome
gene_ontology_file = 'go-basic_2015-05-25.obo'
gene_association_file = 'gene_association.goa_human_2015-05-26.gz'

# location of output files
gene_file = 'protein_coding_genes_human.tsv'
expression_file = 'dmap_expression.tsv'
go_annotation_file = 'go_annotations_human.tsv'
gopca_file = 'dmap_gopca.pickle'
signature_matrix_plot_file = 'dmap_signatures_matrix.png'
signature_file = 'dmap_signatures.tsv'
signature_excel_file = 'dmap_signatures.xlsx'
signature_plot_file = 'dmap_signature.png'
term_by_pc_plot_file = 'dmap_term_by_pc.png'
matlab_file = 'dmap_gopca.mat'

Downloading the data

The notebook, when executed, will attempt to automatically download the data to the locations specified in the configuration section (see above) using curl. If you don't have curl installed, or you're not working on Linux, you can download the data yourself using the direct download links provided.

We're first downloading a version of the DMAP dataset that I have filted to only include genes that I could identify as known protein-coding genes (8,528/8,968). (direct download)


In [46]:
!curl -L -o "$expression_file" \
        "https://www.dropbox.com/s/vjfovywu2omobti/dmap_expression.tsv?dl=1"


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   506    0   506    0     0    822      0 --:--:-- --:--:-- --:--:--   850
100 14.6M  100 14.6M    0     0   514k      0  0:00:29  0:00:29 --:--:--  462k

Next, we're downloading the Ensembl human genome annotations (direct download).


In [47]:
!curl -o "$gene_annotation_file" \
        "ftp://ftp.ensembl.org/pub/release-79/gtf/homo_sapiens/Homo_sapiens.GRCh38.79.gtf.gz"


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 42.3M  100 42.3M    0     0  5756k      0  0:00:07  0:00:07 --:--:-- 8993k

Next, we're downloading the Gene Ontology (direct download).


In [48]:
!curl -o "$gene_ontology_file" \
        "http://viewvc.geneontology.org/viewvc/GO-SVN/ontology-releases/2015-05-25/go-basic.obo?revision=26059"


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 29.1M    0 29.1M    0     0  2867k      0 --:--:--  0:00:10 --:--:-- 4260k

And finally, we're downloading the human Uniprot-GOA gene association file (direct download).


In [49]:
!curl -o "$gene_association_file" "ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/old/HUMAN/gene_association.goa_human.145.gz"


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 6265k  100 6265k    0     0  1363k      0  0:00:04  0:00:04 --:--:-- 1507k

Generating the GO annotation file for GO-PCA

We're first generating a list of protein-coding genes (using the "extract_protein_coding_genes.py" script from the genometools package), and then use the GO-PCA script "gopca_extract_go_annotations.py" to generate the GO annotation file that GO-PCA depends on.


In [50]:
# generate list of protein-coding genes using a script from the "genometools" Python package
!ensembl_extract_protein_coding_genes.py -a "$gene_annotation_file" -o "$gene_file"


[2015-12-26 11:16:47] INFO: Regular expression used for filtering chromosome names: "(?:\d\d?|MT|X|Y)$"
[2015-12-26 11:16:47] INFO: Parsing data...
[2015-12-26 11:17:28] INFO: done (parsed 2720535 lines).
[2015-12-26 11:17:28] INFO: 
[2015-12-26 11:17:28] INFO: Gene chromosomes (25):
[2015-12-26 11:17:28] INFO: 	1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 20, 21, 22, 3, 4, 5, 6, 7, 8, 9, MT, X, Y
[2015-12-26 11:17:28] INFO: 
[2015-12-26 11:17:28] INFO: Excluded chromosomes (225):
[2015-12-26 11:17:28] INFO: 	CHR_HG126_PATCH, CHR_HG1362_PATCH, CHR_HG142_HG150_NOVEL_TEST, CHR_HG151_NOVEL_TEST, CHR_HG1832_PATCH, CHR_HG2021_PATCH, CHR_HG2030_PATCH, CHR_HG2058_PATCH, CHR_HG2066_PATCH, CHR_HG2095_PATCH, CHR_HG2104_PATCH, CHR_HG2191_PATCH, CHR_HG2217_PATCH, CHR_HG2232_PATCH, CHR_HG2233_PATCH, CHR_HG2247_PATCH, CHR_HG2288_HG2289_PATCH, CHR_HG2291_PATCH, CHR_HG986_PATCH, CHR_HSCHR10_1_CTG1, CHR_HSCHR10_1_CTG2, CHR_HSCHR10_1_CTG4, CHR_HSCHR11_1_CTG1_2, CHR_HSCHR11_1_CTG5, CHR_HSCHR11_1_CTG6, CHR_HSCHR11_1_CTG7, CHR_HSCHR11_1_CTG8, CHR_HSCHR11_2_CTG1, CHR_HSCHR11_2_CTG1_1, CHR_HSCHR11_3_CTG1, CHR_HSCHR12_1_CTG1, CHR_HSCHR12_1_CTG2_1, CHR_HSCHR12_2_CTG2, CHR_HSCHR12_3_CTG2, CHR_HSCHR12_3_CTG2_1, CHR_HSCHR12_4_CTG2, CHR_HSCHR12_5_CTG2, CHR_HSCHR12_5_CTG2_1, CHR_HSCHR12_6_CTG2_1, CHR_HSCHR13_1_CTG1, CHR_HSCHR13_1_CTG3, CHR_HSCHR14_1_CTG1, CHR_HSCHR14_2_CTG1, CHR_HSCHR14_3_CTG1, CHR_HSCHR14_7_CTG1, CHR_HSCHR15_1_CTG1, CHR_HSCHR15_1_CTG3, CHR_HSCHR15_1_CTG8, CHR_HSCHR15_2_CTG3, CHR_HSCHR15_2_CTG8, CHR_HSCHR15_3_CTG3, CHR_HSCHR15_3_CTG8, CHR_HSCHR15_4_CTG8, CHR_HSCHR15_5_CTG8, CHR_HSCHR16_1_CTG1, CHR_HSCHR16_1_CTG3_1, CHR_HSCHR16_2_CTG3_1, CHR_HSCHR16_3_CTG1, CHR_HSCHR16_4_CTG1, CHR_HSCHR16_CTG2, CHR_HSCHR17_10_CTG4, CHR_HSCHR17_1_CTG1, CHR_HSCHR17_1_CTG2, CHR_HSCHR17_1_CTG4, CHR_HSCHR17_1_CTG5, CHR_HSCHR17_1_CTG9, CHR_HSCHR17_2_CTG1, CHR_HSCHR17_2_CTG2, CHR_HSCHR17_2_CTG5, CHR_HSCHR17_3_CTG2, CHR_HSCHR17_4_CTG4, CHR_HSCHR17_5_CTG4, CHR_HSCHR17_6_CTG4, CHR_HSCHR17_7_CTG4, CHR_HSCHR17_8_CTG4, CHR_HSCHR18_1_CTG1_1, CHR_HSCHR18_2_CTG2, CHR_HSCHR18_2_CTG2_1, CHR_HSCHR18_ALT2_CTG2_1, CHR_HSCHR19KIR_ABC08_A1_HAP_CTG3_1, CHR_HSCHR19KIR_ABC08_AB_HAP_C_P_CTG3_1, CHR_HSCHR19KIR_ABC08_AB_HAP_T_P_CTG3_1, CHR_HSCHR19KIR_FH05_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH05_B_HAP_CTG3_1, CHR_HSCHR19KIR_FH06_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH06_BA1_HAP_CTG3_1, CHR_HSCHR19KIR_FH08_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH08_BAX_HAP_CTG3_1, CHR_HSCHR19KIR_FH13_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH13_BA2_HAP_CTG3_1, CHR_HSCHR19KIR_FH15_A_HAP_CTG3_1, CHR_HSCHR19KIR_FH15_B_HAP_CTG3_1, CHR_HSCHR19KIR_G085_A_HAP_CTG3_1, CHR_HSCHR19KIR_G085_BA1_HAP_CTG3_1, CHR_HSCHR19KIR_G248_A_HAP_CTG3_1, CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1, CHR_HSCHR19KIR_GRC212_AB_HAP_CTG3_1, CHR_HSCHR19KIR_GRC212_BA1_HAP_CTG3_1, CHR_HSCHR19KIR_LUCE_A_HAP_CTG3_1, CHR_HSCHR19KIR_LUCE_BDEL_HAP_CTG3_1, CHR_HSCHR19KIR_RP5_B_HAP_CTG3_1, CHR_HSCHR19KIR_RSH_A_HAP_CTG3_1, CHR_HSCHR19KIR_RSH_BA2_HAP_CTG3_1, CHR_HSCHR19KIR_T7526_A_HAP_CTG3_1, CHR_HSCHR19KIR_T7526_BDEL_HAP_CTG3_1, CHR_HSCHR19LRC_COX1_CTG3_1, CHR_HSCHR19LRC_COX2_CTG3_1, CHR_HSCHR19LRC_LRC_I_CTG3_1, CHR_HSCHR19LRC_LRC_J_CTG3_1, CHR_HSCHR19LRC_LRC_S_CTG3_1, CHR_HSCHR19LRC_LRC_T_CTG3_1, CHR_HSCHR19LRC_PGF1_CTG3_1, CHR_HSCHR19LRC_PGF2_CTG3_1, CHR_HSCHR19_1_CTG2, CHR_HSCHR19_1_CTG3_1, CHR_HSCHR19_2_CTG2, CHR_HSCHR19_3_CTG2, CHR_HSCHR19_3_CTG3_1, CHR_HSCHR19_4_CTG2, CHR_HSCHR19_4_CTG3_1, CHR_HSCHR19_5_CTG2, CHR_HSCHR1_1_CTG3, CHR_HSCHR1_1_CTG31, CHR_HSCHR1_1_CTG32_1, CHR_HSCHR1_2_CTG3, CHR_HSCHR1_2_CTG31, CHR_HSCHR1_2_CTG32_1, CHR_HSCHR1_3_CTG31, CHR_HSCHR1_3_CTG32_1, CHR_HSCHR1_4_CTG31, CHR_HSCHR1_ALT2_1_CTG32_1, CHR_HSCHR20_1_CTG2, CHR_HSCHR20_1_CTG3, CHR_HSCHR20_1_CTG4, CHR_HSCHR21_3_CTG1_1, CHR_HSCHR21_4_CTG1_1, CHR_HSCHR21_5_CTG2, CHR_HSCHR21_6_CTG1_1, CHR_HSCHR22_1_CTG1, CHR_HSCHR22_1_CTG2, CHR_HSCHR22_1_CTG3, CHR_HSCHR22_1_CTG4, CHR_HSCHR22_1_CTG5, CHR_HSCHR22_1_CTG7, CHR_HSCHR22_2_CTG1, CHR_HSCHR22_3_CTG1, CHR_HSCHR22_4_CTG1, CHR_HSCHR22_5_CTG1, CHR_HSCHR2_1_CTG1, CHR_HSCHR2_1_CTG5, CHR_HSCHR2_1_CTG7_2, CHR_HSCHR2_2_CTG7_2, CHR_HSCHR2_3_CTG1, CHR_HSCHR2_3_CTG15, CHR_HSCHR2_4_CTG1, CHR_HSCHR3_1_CTG1, CHR_HSCHR3_1_CTG2_1, CHR_HSCHR3_1_CTG3, CHR_HSCHR3_2_CTG2_1, CHR_HSCHR3_2_CTG3, CHR_HSCHR3_3_CTG3, CHR_HSCHR3_4_CTG3, CHR_HSCHR3_5_CTG3, CHR_HSCHR3_6_CTG3, CHR_HSCHR3_7_CTG3, CHR_HSCHR3_8_CTG3, CHR_HSCHR4_1_CTG12, CHR_HSCHR4_1_CTG4, CHR_HSCHR4_1_CTG9, CHR_HSCHR4_6_CTG12, CHR_HSCHR5_1_CTG1_1, CHR_HSCHR5_2_CTG1_1, CHR_HSCHR5_2_CTG5, CHR_HSCHR5_3_CTG1, CHR_HSCHR5_3_CTG5, CHR_HSCHR5_4_CTG1, CHR_HSCHR5_5_CTG1, CHR_HSCHR5_6_CTG1, CHR_HSCHR6_1_CTG4, CHR_HSCHR6_1_CTG5, CHR_HSCHR6_1_CTG8, CHR_HSCHR6_8_CTG1, CHR_HSCHR6_MHC_APD_CTG1, CHR_HSCHR6_MHC_COX_CTG1, CHR_HSCHR6_MHC_DBB_CTG1, CHR_HSCHR6_MHC_MANN_CTG1, CHR_HSCHR6_MHC_MCF_CTG1, CHR_HSCHR6_MHC_QBL_CTG1, CHR_HSCHR6_MHC_SSTO_CTG1, CHR_HSCHR7_1_CTG1, CHR_HSCHR7_1_CTG4_4, CHR_HSCHR7_1_CTG6, CHR_HSCHR7_2_CTG4_4, CHR_HSCHR7_2_CTG6, CHR_HSCHR7_3_CTG6, CHR_HSCHR8_2_CTG7, CHR_HSCHR8_3_CTG1, CHR_HSCHR8_3_CTG7, CHR_HSCHR8_4_CTG7, CHR_HSCHR8_5_CTG1, CHR_HSCHR8_5_CTG7, CHR_HSCHR8_7_CTG1, CHR_HSCHR8_8_CTG1, CHR_HSCHR8_9_CTG1, CHR_HSCHR9_1_CTG2, CHR_HSCHR9_1_CTG3, CHR_HSCHR9_1_CTG5, CHR_HSCHRX_1_CTG3, CHR_HSCHRX_2_CTG12, CHR_HSCHRX_2_CTG3, GL000009.2, GL000194.1, GL000195.1, GL000205.2, GL000213.1, GL000218.1, GL000219.1, KI270711.1, KI270713.1, KI270721.1, KI270726.1, KI270727.1, KI270728.1, KI270731.1, KI270734.1
[2015-12-26 11:17:28] INFO: 
[2015-12-26 11:17:28] INFO: Gene sources:
[2015-12-26 11:17:28] INFO: 	ensembl_havana: 18873
[2015-12-26 11:17:28] INFO: 	havana: 733
[2015-12-26 11:17:28] INFO: 	ensembl: 236
[2015-12-26 11:17:28] INFO: 	insdc: 13
[2015-12-26 11:17:28] INFO: 
[2015-12-26 11:17:28] INFO: Gene types:
[2015-12-26 11:17:28] INFO: 	protein_coding: 19796
[2015-12-26 11:17:28] INFO: 	polymorphic_pseudogene: 59
[2015-12-26 11:17:28] INFO: 
[2015-12-26 11:17:28] INFO: # Genes with redundant annotations: 112
[2015-12-26 11:17:28] INFO: 
[2015-12-26 11:17:28] INFO: Polymorphic pseudogenes (59): AKR7L (1), C6orf183 (1), CASP12 (1), CYP2D7 (1), FBXL21 (1), FCGR2C (1), GBA3 (1), GSTT2 (1), IFNL4 (1), KIR2DS4 (1), MROH5 (1), NAT8B (1), OR10A6 (1), OR10AC1 (1), OR10C1 (1), OR10J4 (1), OR10X1 (1), OR11H7 (1), OR12D1 (1), OR12D2 (1), OR13C7 (1), OR1B1 (1), OR1S1 (1), OR2AG1 (1), OR2F1 (1), OR2J1 (1), OR2L8 (1), OR2S2 (1), OR2T11 (1), OR4A8 (1), OR4C16 (1), OR4Q2 (1), OR4X1 (1), OR4X2 (1), OR51B2 (1), OR51F1 (1), OR51G1 (1), OR52B4 (1), OR52E1 (1), OR52R1 (1), OR52Z1 (1), OR5AC1 (1), OR5AL1 (1), OR5AR1 (1), OR5D13 (1), OR5G3 (1), OR5H6 (1), OR5H8 (1), OR5L1 (1), OR5R1 (1), OR6Q1 (1), OR8B4 (1), OR8D2 (1), OR8J2 (1), OR8K3 (1), PKD1L2 (1), PNLIPRP2 (1), SERPINA2 (1), TUBB8P7 (1)
[2015-12-26 11:17:28] INFO: 
[2015-12-26 11:17:28] INFO: Total protein-coding genes: 19742

In [51]:
# extract GO annotations
# select annotation with "high-quality" evidence codes
# only keep GO terms that have 5-200 genes annotated with them
min_genes = 5 
max_genes = 200

evidence = ['IDA','IGI','IMP','ISO','ISS','IC','NAS','TAS'] # only include manually curated evidence
ev_str = ' '.join(evidence)

!gopca_extract_go_annotations.py -g "$gene_file" -t "$gene_ontology_file" -a "$gene_association_file" \
        -o "$go_annotation_file" \
        -e $ev_str --min-genes-per-term $min_genes --max-genes-per-term $max_genes --part-of-cc-only


[2015-12-26 11:17:29] INFO: Read 19742 genes.
[2015-12-26 11:17:30] INFO: Parsed 43122 GO term definitions.
[2015-12-26 11:17:30] INFO: Adding child and part relationships...
[2015-12-26 11:17:30] INFO: Flattening ancestors...
[2015-12-26 11:17:38] INFO: Flattening descendants...
[2015-12-26 11:17:46] INFO: Read 19742 genes.
[2015-12-26 11:17:46] INFO: Parsing annotations...
[2015-12-26 11:17:53] INFO: Parsed 476648 positive GO annotations (264131 = 55.4% excluded based on evidence type).
[2015-12-26 11:17:53] WARNING: Warning: 7348 annotations with 337 unkonwn gene names.
[2015-12-26 11:17:53] INFO: Found a total of 205169 valid annotations.
[2015-12-26 11:17:53] INFO: 135340 unique Gene-Term associations.
[2015-12-26 11:17:53] INFO: Obtaining GO term associations...
[2015-12-26 11:17:57] INFO: Testing for perfect overlap...
[2015-12-26 11:18:03] INFO: # affected terms: 1114
[2015-12-26 11:18:03] INFO: # perfectly redundant descendant terms: 582
[2015-12-26 11:18:03] INFO: Selected 6675 / 7257 non-redundant GO terms.
[2015-12-26 11:18:03] INFO: Writing output file...
[2015-12-26 11:18:03] INFO: done!

In [52]:
# run GO-PCA
!go-pca.py -L 1000 -e "$expression_file" -t "$gene_ontology_file" -a "$go_annotation_file" -o "$gopca_file" \
        -ps 123456789 --go-part-of-cc-only


[2015-12-26 11:18:04] INFO: Timestamp: 2015-12-26 16:18:04.480761
[2015-12-26 11:18:04] INFO: Expression file hash: 1729bdf9de9c98dcdd87d6850e73909b
[2015-12-26 11:18:04] INFO: Reading expression data...
[2015-12-26 11:18:06] INFO: Expression matrix size: (p = 8528 genes) x (n = 211 samples).
[2015-12-26 11:18:06] INFO: Gene ontology file hash: a5623a26da07171db485634ae214eef9
[2015-12-26 11:18:06] INFO: Reading ontology...
[2015-12-26 11:18:19] INFO: GO annotation file hash: cd2e3ad89e31f823ece6c784c04c8ceb
[2015-12-26 11:18:19] INFO: Reading GO annotations...
[2015-12-26 11:18:19] INFO: Estimating the number of principal components (seed = 123456789)...
[2015-12-26 11:18:34] INFO: The estimated number of PCs is 15.
[2015-12-26 11:18:34] INFO: Generating gene x GO term matrix...
[2015-12-26 11:18:35] INFO: Performing PCA...
[2015-12-26 11:18:36] INFO: Cumulative fraction of variance explained by the first 15 PCs: 80.5%
[2015-12-26 11:18:36] INFO: 
[2015-12-26 11:18:36] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:36] INFO: PC 1 explains 24.3% of the variance.
[2015-12-26 11:18:36] INFO: The new cumulative fraction of variance explained is 24.3%.
[2015-12-26 11:18:36] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:36] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:36] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:36] INFO: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:36] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:36] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:36] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:36] INFO: 8 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:36] INFO: Kept 8 / 8 enriched terms with E-score >= 2.0
[2015-12-26 11:18:37] INFO: Local filter: Kept 2 / 8 enriched terms.
[2015-12-26 11:18:37] INFO: Generated 2 signatures based on the enriched GO terms.
[2015-12-26 11:18:37] INFO: # signatures: 2
[2015-12-26 11:18:37] INFO: Global filter: kept 2 / 2 signatures.
[2015-12-26 11:18:37] INFO: Total no. of signatures so far: 2
[2015-12-26 11:18:37] INFO: 
[2015-12-26 11:18:37] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:37] INFO: PC 2 explains 16.9% of the variance.
[2015-12-26 11:18:37] INFO: The new cumulative fraction of variance explained is 41.2%.
[2015-12-26 11:18:37] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:37] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:38] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:38] INFO: 37 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:38] INFO: Kept 37 / 37 enriched terms with E-score >= 2.0
[2015-12-26 11:18:38] INFO: Local filter: Kept 6 / 37 enriched terms.
[2015-12-26 11:18:38] INFO: Generated 6 signatures based on the enriched GO terms.
[2015-12-26 11:18:39] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:39] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:39] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:39] INFO: 6 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:39] INFO: Kept 6 / 6 enriched terms with E-score >= 2.0
[2015-12-26 11:18:39] INFO: Local filter: Kept 2 / 6 enriched terms.
[2015-12-26 11:18:39] INFO: Generated 2 signatures based on the enriched GO terms.
[2015-12-26 11:18:39] INFO: # signatures: 8
[2015-12-26 11:18:39] INFO: Global filter: kept 8 / 8 signatures.
[2015-12-26 11:18:39] INFO: Total no. of signatures so far: 10
[2015-12-26 11:18:39] INFO: 
[2015-12-26 11:18:39] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:39] INFO: PC 3 explains 11.5% of the variance.
[2015-12-26 11:18:39] INFO: The new cumulative fraction of variance explained is 52.8%.
[2015-12-26 11:18:39] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:40] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:40] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:40] INFO: 39 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:40] INFO: Kept 39 / 39 enriched terms with E-score >= 2.0
[2015-12-26 11:18:40] INFO: Local filter: Kept 6 / 39 enriched terms.
[2015-12-26 11:18:40] INFO: Generated 6 signatures based on the enriched GO terms.
[2015-12-26 11:18:41] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:41] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:41] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:41] INFO: 3 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:41] INFO: Kept 3 / 3 enriched terms with E-score >= 2.0
[2015-12-26 11:18:41] INFO: Local filter: Kept 2 / 3 enriched terms.
[2015-12-26 11:18:41] INFO: Generated 2 signatures based on the enriched GO terms.
[2015-12-26 11:18:41] INFO: # signatures: 8
[2015-12-26 11:18:41] INFO: Global filter: kept 8 / 8 signatures.
[2015-12-26 11:18:41] INFO: Total no. of signatures so far: 18
[2015-12-26 11:18:41] INFO: 
[2015-12-26 11:18:41] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:41] INFO: PC 4 explains 8.5% of the variance.
[2015-12-26 11:18:41] INFO: The new cumulative fraction of variance explained is 61.3%.
[2015-12-26 11:18:41] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:42] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:42] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:42] INFO: 62 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:42] INFO: Kept 62 / 62 enriched terms with E-score >= 2.0
[2015-12-26 11:18:44] INFO: Local filter: Kept 9 / 62 enriched terms.
[2015-12-26 11:18:44] INFO: Generated 9 signatures based on the enriched GO terms.
[2015-12-26 11:18:44] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:44] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:44] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:44] INFO: 3 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:44] INFO: Kept 3 / 3 enriched terms with E-score >= 2.0
[2015-12-26 11:18:44] INFO: Local filter: Kept 1 / 3 enriched terms.
[2015-12-26 11:18:44] INFO: Generated 1 signatures based on the enriched GO terms.
[2015-12-26 11:18:44] INFO: # signatures: 10
[2015-12-26 11:18:44] INFO: Global filter: kept 9 / 10 signatures.
[2015-12-26 11:18:44] INFO: Total no. of signatures so far: 27
[2015-12-26 11:18:44] INFO: 
[2015-12-26 11:18:44] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:44] INFO: PC 5 explains 5.7% of the variance.
[2015-12-26 11:18:44] INFO: The new cumulative fraction of variance explained is 67.0%.
[2015-12-26 11:18:44] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:45] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:45] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:45] INFO: 28 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:45] INFO: Kept 28 / 28 enriched terms with E-score >= 2.0
[2015-12-26 11:18:45] INFO: Local filter: Kept 5 / 28 enriched terms.
[2015-12-26 11:18:45] INFO: Generated 5 signatures based on the enriched GO terms.
[2015-12-26 11:18:46] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:46] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:46] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:46] INFO: 19 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:46] INFO: Kept 19 / 19 enriched terms with E-score >= 2.0
[2015-12-26 11:18:46] INFO: Local filter: Kept 3 / 19 enriched terms.
[2015-12-26 11:18:46] INFO: Generated 3 signatures based on the enriched GO terms.
[2015-12-26 11:18:46] INFO: # signatures: 8
[2015-12-26 11:18:46] INFO: Global filter: kept 3 / 8 signatures.
[2015-12-26 11:18:46] INFO: Total no. of signatures so far: 30
[2015-12-26 11:18:46] INFO: 
[2015-12-26 11:18:46] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:46] INFO: PC 6 explains 2.5% of the variance.
[2015-12-26 11:18:46] INFO: The new cumulative fraction of variance explained is 69.5%.
[2015-12-26 11:18:46] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:47] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:47] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:47] INFO: 20 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:47] INFO: Kept 20 / 20 enriched terms with E-score >= 2.0
[2015-12-26 11:18:47] INFO: Local filter: Kept 5 / 20 enriched terms.
[2015-12-26 11:18:47] INFO: Generated 5 signatures based on the enriched GO terms.
[2015-12-26 11:18:47] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:47] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:47] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:47] INFO: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:47] INFO: # signatures: 5
[2015-12-26 11:18:47] INFO: Global filter: kept 3 / 5 signatures.
[2015-12-26 11:18:47] INFO: Total no. of signatures so far: 33
[2015-12-26 11:18:47] INFO: 
[2015-12-26 11:18:47] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:47] INFO: PC 7 explains 2.2% of the variance.
[2015-12-26 11:18:47] INFO: The new cumulative fraction of variance explained is 71.7%.
[2015-12-26 11:18:48] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:48] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:48] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:48] INFO: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:48] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:48] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:48] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:48] INFO: 23 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:48] INFO: Kept 23 / 23 enriched terms with E-score >= 2.0
[2015-12-26 11:18:49] INFO: Local filter: Kept 6 / 23 enriched terms.
[2015-12-26 11:18:49] INFO: Generated 6 signatures based on the enriched GO terms.
[2015-12-26 11:18:49] INFO: # signatures: 6
[2015-12-26 11:18:49] INFO: Global filter: kept 2 / 6 signatures.
[2015-12-26 11:18:49] INFO: Total no. of signatures so far: 35
[2015-12-26 11:18:49] INFO: 
[2015-12-26 11:18:49] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:49] INFO: PC 8 explains 1.8% of the variance.
[2015-12-26 11:18:49] INFO: The new cumulative fraction of variance explained is 73.5%.
[2015-12-26 11:18:49] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:50] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:50] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:50] INFO: 66 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:50] INFO: Kept 66 / 66 enriched terms with E-score >= 2.0
[2015-12-26 11:18:51] INFO: Local filter: Kept 9 / 66 enriched terms.
[2015-12-26 11:18:52] INFO: Generated 9 signatures based on the enriched GO terms.
[2015-12-26 11:18:52] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:52] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:52] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:52] INFO: 1 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:52] INFO: Kept 1 / 1 enriched terms with E-score >= 2.0
[2015-12-26 11:18:52] INFO: Generated 1 signatures based on the enriched GO terms.
[2015-12-26 11:18:52] INFO: # signatures: 10
[2015-12-26 11:18:52] INFO: Global filter: kept 8 / 10 signatures.
[2015-12-26 11:18:52] INFO: Total no. of signatures so far: 43
[2015-12-26 11:18:52] INFO: 
[2015-12-26 11:18:52] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:52] INFO: PC 9 explains 1.3% of the variance.
[2015-12-26 11:18:52] INFO: The new cumulative fraction of variance explained is 74.8%.
[2015-12-26 11:18:52] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:52] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:52] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:52] INFO: 1 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:52] INFO: Kept 1 / 1 enriched terms with E-score >= 2.0
[2015-12-26 11:18:52] INFO: Generated 1 signatures based on the enriched GO terms.
[2015-12-26 11:18:52] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:52] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:52] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:52] INFO: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:52] INFO: # signatures: 1
[2015-12-26 11:18:52] INFO: Global filter: kept 0 / 1 signatures.
[2015-12-26 11:18:52] INFO: Total no. of signatures so far: 43
[2015-12-26 11:18:52] INFO: 
[2015-12-26 11:18:52] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:52] INFO: PC 10 explains 1.2% of the variance.
[2015-12-26 11:18:52] INFO: The new cumulative fraction of variance explained is 76.0%.
[2015-12-26 11:18:52] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:52] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:52] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:52] INFO: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:52] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:52] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:52] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:52] INFO: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:53] INFO: # signatures: 0
[2015-12-26 11:18:53] INFO: Global filter: kept 0 / 0 signatures.
[2015-12-26 11:18:53] INFO: Total no. of signatures so far: 43
[2015-12-26 11:18:53] INFO: 
[2015-12-26 11:18:53] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:53] INFO: PC 11 explains 1.1% of the variance.
[2015-12-26 11:18:53] INFO: The new cumulative fraction of variance explained is 77.1%.
[2015-12-26 11:18:53] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:53] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:53] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:53] INFO: 4 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:53] INFO: Kept 4 / 4 enriched terms with E-score >= 2.0
[2015-12-26 11:18:53] INFO: Local filter: Kept 2 / 4 enriched terms.
[2015-12-26 11:18:53] INFO: Generated 2 signatures based on the enriched GO terms.
[2015-12-26 11:18:53] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:53] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:53] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:53] INFO: 10 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:53] INFO: Kept 10 / 10 enriched terms with E-score >= 2.0
[2015-12-26 11:18:53] INFO: Local filter: Kept 3 / 10 enriched terms.
[2015-12-26 11:18:54] INFO: Generated 3 signatures based on the enriched GO terms.
[2015-12-26 11:18:54] INFO: # signatures: 5
[2015-12-26 11:18:54] INFO: Global filter: kept 2 / 5 signatures.
[2015-12-26 11:18:54] INFO: Total no. of signatures so far: 45
[2015-12-26 11:18:54] INFO: 
[2015-12-26 11:18:54] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:54] INFO: PC 12 explains 1.0% of the variance.
[2015-12-26 11:18:54] INFO: The new cumulative fraction of variance explained is 78.1%.
[2015-12-26 11:18:54] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:54] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:54] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:54] INFO: 12 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:54] INFO: Kept 12 / 12 enriched terms with E-score >= 2.0
[2015-12-26 11:18:54] INFO: Local filter: Kept 3 / 12 enriched terms.
[2015-12-26 11:18:54] INFO: Generated 3 signatures based on the enriched GO terms.
[2015-12-26 11:18:54] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:55] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:55] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:55] INFO: 9 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:55] INFO: Kept 9 / 9 enriched terms with E-score >= 2.0
[2015-12-26 11:18:55] INFO: Local filter: Kept 3 / 9 enriched terms.
[2015-12-26 11:18:55] INFO: Generated 3 signatures based on the enriched GO terms.
[2015-12-26 11:18:55] INFO: # signatures: 6
[2015-12-26 11:18:55] INFO: Global filter: kept 1 / 6 signatures.
[2015-12-26 11:18:55] INFO: Total no. of signatures so far: 46
[2015-12-26 11:18:55] INFO: 
[2015-12-26 11:18:55] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:55] INFO: PC 13 explains 0.9% of the variance.
[2015-12-26 11:18:55] INFO: The new cumulative fraction of variance explained is 79.0%.
[2015-12-26 11:18:55] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:55] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:55] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:55] INFO: 4 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:55] INFO: Kept 4 / 4 enriched terms with E-score >= 2.0
[2015-12-26 11:18:55] INFO: Local filter: Kept 2 / 4 enriched terms.
[2015-12-26 11:18:55] INFO: Generated 2 signatures based on the enriched GO terms.
[2015-12-26 11:18:55] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:55] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:55] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:55] INFO: 0 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:55] INFO: # signatures: 2
[2015-12-26 11:18:55] INFO: Global filter: kept 1 / 2 signatures.
[2015-12-26 11:18:55] INFO: Total no. of signatures so far: 47
[2015-12-26 11:18:55] INFO: 
[2015-12-26 11:18:55] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:55] INFO: PC 14 explains 0.8% of the variance.
[2015-12-26 11:18:55] INFO: The new cumulative fraction of variance explained is 79.8%.
[2015-12-26 11:18:55] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:55] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:55] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:55] INFO: 1 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:55] INFO: Kept 1 / 1 enriched terms with E-score >= 2.0
[2015-12-26 11:18:55] INFO: Generated 1 signatures based on the enriched GO terms.
[2015-12-26 11:18:56] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:56] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:56] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:56] INFO: 2 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:56] INFO: Kept 2 / 2 enriched terms with E-score >= 2.0
[2015-12-26 11:18:56] INFO: Local filter: Kept 1 / 2 enriched terms.
[2015-12-26 11:18:56] INFO: Generated 1 signatures based on the enriched GO terms.
[2015-12-26 11:18:56] INFO: # signatures: 2
[2015-12-26 11:18:56] INFO: Global filter: kept 1 / 2 signatures.
[2015-12-26 11:18:56] INFO: Total no. of signatures so far: 48
[2015-12-26 11:18:56] INFO: 
[2015-12-26 11:18:56] INFO: ----------------------------------------------------------------------
[2015-12-26 11:18:56] INFO: PC 15 explains 0.7% of the variance.
[2015-12-26 11:18:56] INFO: The new cumulative fraction of variance explained is 80.5%.
[2015-12-26 11:18:56] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:56] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:56] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:56] INFO: 18 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:56] INFO: Kept 18 / 18 enriched terms with E-score >= 2.0
[2015-12-26 11:18:57] INFO: Local filter: Kept 5 / 18 enriched terms.
[2015-12-26 11:18:57] INFO: Generated 5 signatures based on the enriched GO terms.
[2015-12-26 11:18:57] INFO: Testing 6675 terms for enrichment...
[2015-12-26 11:18:57] INFO: Calculating enrichment score (using p-value threshold psi=1.0e-04) for enriched terms...
[2015-12-26 11:18:57] INFO: 1594 / 6675 GO terms (23.9%) had less than 5 genes annotated with them and were ignored.
[2015-12-26 11:18:57] INFO: 16 / 5081 tested GO terms were found to be significantly enriched (p-value <= 1.0e-06).
[2015-12-26 11:18:57] INFO: Kept 16 / 16 enriched terms with E-score >= 2.0
[2015-12-26 11:18:57] INFO: Local filter: Kept 4 / 16 enriched terms.
[2015-12-26 11:18:57] INFO: Generated 4 signatures based on the enriched GO terms.
[2015-12-26 11:18:57] INFO: # signatures: 9
[2015-12-26 11:18:57] INFO: Global filter: kept 2 / 9 signatures.
[2015-12-26 11:18:57] INFO: Total no. of signatures so far: 50
[2015-12-26 11:18:57] INFO: 
[2015-12-26 11:18:57] INFO: ======================================================================
[2015-12-26 11:18:57] INFO: GO-PCA generated 50 signatures:
[2015-12-26 11:18:57] INFO: CC: MHC class II protein complex (GO:0042613) [3:9/11, e=113.5, p=5.6e-12]
[2015-12-26 11:18:57] INFO: CC: condensed chromosome kinetochore (GO:0000777) [8:8/17, e=73.4, p=1.8e-13]
[2015-12-26 11:18:57] INFO: BP: response to fungus (GO:0009620) [3:5/6, e=65.2, p=3.6e-09]
[2015-12-26 11:18:57] INFO: BP: bicarbonate transport (GO:0015701) [-1:5/13, e=56.6, p=8.3e-08]
[2015-12-26 11:18:57] INFO: CC: endolysosome (GO:0036019) [4:5/7, e=49.1, p=3.6e-08]
[2015-12-26 11:18:57] INFO: CC: T cell receptor complex (GO:0042101) [2:8/12, e=44.4, p=4.0e-08]
[2015-12-26 11:18:57] INFO: CC: platelet alpha granule membrane (GO:0031092) [-7:6/8, e=41.6, p=6.4e-09]
[2015-12-26 11:18:57] INFO: MF: MHC class II receptor activity (GO:0032395) [4:5/7, e=35.8, p=3.5e-09]
[2015-12-26 11:18:57] INFO: BP: detection of external biotic stimulus (GO:0098581) [4:7/11, e=34.3, p=6.2e-09]
[2015-12-26 11:18:57] INFO: BP: hydrogen peroxide catabolic process (GO:0042744) [-2:7/9, e=31.8, p=1.9e-08]
[2015-12-26 11:18:57] INFO: BP: B cell prolif. (GO:0042100) [-5:5/10, e=31.1, p=8.7e-07]
[2015-12-26 11:18:57] INFO: BP: regulation of transcription involved in G1/S tr... (GO:0000083) [8:11/20, e=29.6, p=2.3e-08]
[2015-12-26 11:18:57] INFO: BP: regulation of B cell receptor signal. pathway (GO:0050855) [6:5/7, e=23.2, p=2.7e-07]
[2015-12-26 11:18:57] INFO: BP: neg. regulation of leukocyte prolif. (GO:0070664) [3:7/19, e=22.9, p=3.8e-08]
[2015-12-26 11:18:57] INFO: BP: platelet aggregation (GO:0070527) [12:8/23, e=20.5, p=1.7e-09]
[2015-12-26 11:18:57] INFO: BP: pos. regulation of cell killing (GO:0031343) [6:6/21, e=17.2, p=1.3e-07]
[2015-12-26 11:18:57] INFO: BP: natural killer cell activation (GO:0030101) [2:5/11, e=16.0, p=1.4e-07]
[2015-12-26 11:18:57] INFO: CC: stress fiber (GO:0001725) [-15:6/24, e=15.8, p=1.0e-07]
[2015-12-26 11:18:57] INFO: BP: mitotic sister chromatid segregation (GO:0000070) [8:11/44, e=15.6, p=3.9e-10]
[2015-12-26 11:18:57] INFO: MF: chemokine receptor activity (GO:0004950) [2:5/12, e=14.3, p=5.2e-07]
[2015-12-26 11:18:57] INFO: MF: heparin binding (GO:0008201) [14:7/28, e=12.3, p=8.9e-07]
[2015-12-26 11:18:57] INFO: CC: U1 snRNP (GO:0005685) [8:8/10, e=12.3, p=2.9e-09]
[2015-12-26 11:18:57] INFO: BP: phagosome maturation (GO:0090382) [4:11/30, e=11.4, p=5.1e-10]
[2015-12-26 11:18:57] INFO: BP: cellular defense response (GO:0006968) [2:11/40, e=11.0, p=2.8e-12]
[2015-12-26 11:18:57] INFO: BP: centromere complex assembly (GO:0034508) [8:10/24, e=10.8, p=1.6e-08]
[2015-12-26 11:18:57] INFO: BP: humoral immune response (GO:0006959) [4:13/45, e=9.1, p=3.3e-10]
[2015-12-26 11:18:57] INFO: BP: leukocyte aggregation (GO:0070486) [5:19/75, e=8.1, p=1.1e-13]
[2015-12-26 11:18:57] INFO: BP: neg. regulation of viral genome replication (GO:0045071) [15:9/34, e=7.6, p=4.2e-07]
[2015-12-26 11:18:57] INFO: BP: autophagy (GO:0006914) [-3:15/36, e=6.9, p=6.7e-09]
[2015-12-26 11:18:57] INFO: BP: response to type I interferon (GO:0034340) [2:20/52, e=6.9, p=6.0e-09]
[2015-12-26 11:18:57] INFO: BP: regulation of cell shape (GO:0008360) [11:11/44, e=6.5, p=2.5e-07]
[2015-12-26 11:18:57] INFO: BP: DNA replication (GO:0006260) [8:43/112, e=6.4, p=8.6e-19]
[2015-12-26 11:18:57] INFO: BP: DNA strand elongation involved in DNA replication (GO:0006271) [-4:14/29, e=6.2, p=1.1e-08]
[2015-12-26 11:18:57] INFO: BP: defense response to other organism (GO:0098542) [4:23/89, e=6.0, p=1.5e-13]
[2015-12-26 11:18:57] INFO: BP: response to IFN-gamma (GO:0034341) [4:22/82, e=6.0, p=3.8e-11]
[2015-12-26 11:18:57] INFO: BP: regulation of mitotic nuclear division (GO:0007088) [8:19/75, e=5.7, p=2.6e-10]
[2015-12-26 11:18:57] INFO: BP: regulation of wound healing (GO:0061041) [-7:13/49, e=5.7, p=2.8e-08]
[2015-12-26 11:18:57] INFO: CC: cytoplasmic membrane-bounded vesicle lumen (GO:0060205) [-1:13/48, e=5.5, p=8.6e-09]
[2015-12-26 11:18:57] INFO: BP: MyD88-dependent toll-like receptor signal. pathway (GO:0002755) [4:17/66, e=5.1, p=5.6e-10]
[2015-12-26 11:18:57] INFO: BP: response to bacterium (GO:0009617) [3:35/118, e=5.0, p=5.8e-16]
[2015-12-26 11:18:57] INFO: BP: respiratory electron transport chain (GO:0022904) [13:21/84, e=4.8, p=7.6e-09]
[2015-12-26 11:18:57] INFO: BP: pos. regulation of T cell activation (GO:0050870) [2:24/93, e=4.5, p=4.3e-12]
[2015-12-26 11:18:57] INFO: CC: cullin-RING ubiquitin ligase complex (GO:0031461) [-3:18/64, e=4.0, p=8.4e-07]
[2015-12-26 11:18:57] INFO: BP: inflammatory response (GO:0006954) [3:26/104, e=3.7, p=2.3e-10]
[2015-12-26 11:18:57] INFO: BP: platelet degranulation (GO:0002576) [3:19/66, e=3.6, p=1.3e-07]
[2015-12-26 11:18:57] INFO: BP: nucleoside phosphate biosynthetic process (GO:1901293) [8:21/81, e=3.4, p=3.6e-07]
[2015-12-26 11:18:57] INFO: BP: leukocyte migration (GO:0050900) [5:35/139, e=3.3, p=9.8e-10]
[2015-12-26 11:18:57] INFO: BP: antigen receptor-mediated signal. pathway (GO:0050851) [6:21/82, e=3.2, p=1.7e-08]
[2015-12-26 11:18:57] INFO: BP: cotranslational protein targeting to membrane (GO:0006613) [-11:31/100, e=2.9, p=1.5e-08]
[2015-12-26 11:18:57] INFO: BP: G1/S transition of mitotic cell cycle (GO:0000082) [-2:38/130, e=2.6, p=3.0e-08]
[2015-12-26 11:18:58] INFO: Total GO-PCA runtime: 53.65 s.
[2015-12-26 11:18:58] INFO: Writing GO-PCA run to pickle file "dmap_gopca.pickle"...

In [53]:
!gopca_print_info.py -g $gopca_file


GO-PCA Run
----------
- Version: 1.1rc12
- Timestamp: 2015-12-26 16:18:04.480761

GO-PCA Result
-------------
- Expression data: 8528 genes, 211 samples
- Number of PCs tested: 15
- Number of signatures generated: 50
- Config data:
    escore_pval_thresh=0.0001
    escore_thresh=2.0
    expression_file=dmap_expression.tsv
    expression_file_hash=1729bdf9de9c98dcdd87d6850e73909b
    gene_ontology_file=go-basic_2015-05-25.obo
    gene_ontology_file_hash=a5623a26da07171db485634ae214eef9
    go_annotation_file=go_annotations_human.tsv
    go_annotation_file_hash=cd2e3ad89e31f823ece6c784c04c8ceb
    go_part_of_cc_only=True
    mHG_L=1000
    mHG_X_frac=0.25
    mHG_X_min=5
    n_components=15
    no_global_filter=False
    no_local_filter=False
    output_file=dmap_gopca.pickle
    pc_max=0
    pc_permutations=15
    pc_seed=123456789
    pc_zscore_thresh=2.0
    pval_thresh=1e-06
    sel_var_genes=0
    sig_corr_thresh=0.5

Plotting of the signature matrix


In [54]:
from IPython.display import Image

dpi = 90.0

!gopca_plot_signature_matrix.py -g "$gopca_file" -o "$signature_matrix_plot_file" -r $dpi -t \
        --sample-cluster-metric euclidean
Image(filename = signature_matrix_plot_file, width=800)


[2015-12-26 11:18:59] INFO: Clustering of samples...
[2015-12-26 11:18:59] INFO: Plotting...
[2015-12-26 11:19:00] INFO: Saving to file...
/datapool001/fw36/Dropbox/sandbox/env/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):
[2015-12-26 11:19:03] INFO: Done!
Out[54]:

In [55]:
#plot the "DNA replication" signature in detail
from IPython.display import Image

dpi = 90
!gopca_plot_signature.py -g "$gopca_file" -n "DNA replication" -o "$signature_plot_file" -r $dpi -s 18 12 \
        --gene-label-size 12 --sample-cluster-metric euclidean
Image(filename = signature_plot_file, width = 800)


[2015-12-26 11:19:04] INFO: Plotting...
[2015-12-26 11:19:05] INFO: Saving to file...
/datapool001/fw36/Dropbox/sandbox/env/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):
[2015-12-26 11:19:05] INFO: Done!
Out[55]:

Plot term-by-PC matrix


In [56]:
from IPython.display import Image

dpi = 90
!gopca_plot_term_by_pc_matrix.py -g "$gopca_file" -o "$term_by_pc_plot_file" -r $dpi -s 18 12
Image(filename = term_by_pc_plot_file, width = 800)


/datapool001/fw36/Dropbox/sandbox/env/lib/python2.7/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):
Out[56]:

Save signatures to a tab-delimited text file


In [57]:
!gopca_extract_signatures.py -g "$gopca_file" -o "$signature_file"


[2015-12-26 11:19:10] INFO: Wrote 50 signatures to "dmap_signatures.tsv".

Save signatures to an Excel spreadsheet


In [58]:
!gopca_extract_signatures_excel.py -g "$gopca_file" -o "$signature_excel_file"


[2015-12-26 11:19:11] INFO: Wrote 50 signatures to "dmap_signatures.xlsx".

Save the signature matrix to a tab-delimited text file


In [59]:
!gopca_extract_signature_matrix.py -g "$gopca_file" -o "$signature_excel_file"


[2015-12-26 11:19:11] INFO: Wrote 50 x 211 expression matrix to "dmap_signatures.xlsx".
[2015-12-26 11:19:11] INFO: Wrote 50 signatures to "dmap_signatures.xlsx".

Convert output to Matlab format


In [60]:
!gopca_convert_to_matlab.py -g "$gopca_file" -o "$matlab_file"

Copyright (c) 2015 Florian Wagner.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.