In [1]:
from nsaba.nsaba import Nsaba
from nsaba.nsaba import analysis
from nsaba.nsaba import geneinfo
import os
%matplotlib inline
/Users/simonhaxby/anaconda/envs/py27/lib/python2.7/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
warnings.warn(self.msg_depr % (key, alt_key))
In [2]:
data_dir = "../../data_dir"
In [3]:
# loading class DataFrames
Nsaba.aba_load(data_dir)
Nsaba.ns_load(data_dir)
This may take a minute or two ...
SampleAnnot.csv loaded.
MicroarrayExpression.csv loaded.
Probes.csv loaded.
Nsaba.aba['mni_coords'] initialized.
This may take a minute or two ...
database.txt loaded.
features.txt loaded.
Nsaba.ns['mni_coords'] initialized.
In [4]:
# Intializing instance and loading gene expression
tsaba = Nsaba()
tsaba.load_ge_pickle(path=data_dir)
This may take a minute or two ...
'ge' dictionary successfully loaded
In [5]:
term = 'reward'
tsaba.is_term(term)
Out[5]:
True
In [6]:
tsaba.get_ns_act(term, thresh=-1, search_radii=2.5)
This may take a few minutes...
In [7]:
anal = analysis.NsabaAnalysis(tsaba)
To use inline plotting functionality in Jupyter, '%matplotlib inline' must be enabled
In [8]:
anal.term_ge_ttest(term, 1813, split_method='quant')
t-value: -4.4783
p-value: 8.559E-06
Effect size: -0.4317
In [9]:
ttest_metr = anal.term_ge_ttest_multi(term)
This may take a couple of minutes ...
In [ ]:
df = geneinfo.load_gene_file("../../")
In [10]:
anal.fetch_gene_descriptions(ttest_metr, csv_path="../../");
Gene 353134 not found in NIH database
Gene 100008589 not found in NIH database
Gene 54874 not found in NIH database
Gene 128414 not found in NIH database
Gene 388585 not found in NIH database
Gene 641311 not found in NIH database
Gene 114885 not found in NIH database
Gene 27233 not found in NIH database
Gene 23620 not found in NIH database
Corrected Bonferroni Alpha: 2.405E-06
3250 (p = 6.222E-07; d = -0.484): [u'This gene encodes a haptoglobin-related protein that binds hemoglobin as efficiently as haptoglobin. Unlike haptoglobin, plasma concentration of this protein is unaffected in patients with sickle cell anemia and extensive intravascular hemolysis, suggesting a difference in binding between haptoglobin-hemoglobin and haptoglobin-related protein-hemoglobin complexes to CD163, the hemoglobin scavenger receptor. This protein may also be a clinically important predictor of recurrence of breast cancer. [provided by RefSeq, Oct 2011]']
4257 (p = 7.822E-07; d = -0.480): [u'The MAPEG (Membrane Associated Proteins in Eicosanoid and Glutathione metabolism) family consists of six human proteins, two of which are involved in the production of leukotrienes and prostaglandin E, important mediators of inflammation. Other family members, demonstrating glutathione S-transferase and peroxidase activities, are involved in cellular defense against toxic, carcinogenic, and pharmacologically active electrophilic compounds. This gene encodes a protein that catalyzes the conjugation of glutathione to electrophiles and the reduction of lipid hydroperoxides. This protein is localized to the endoplasmic reticulum and outer mitochondrial membrane where it is thought to protect these membranes from oxidative stress. Several transcript variants, some non-protein coding and some protein coding, have been found for this gene. [provided by RefSeq, May 2012]']
6295 (p = 8.828E-07; d = -0.480): [u'Members of arrestin/beta-arrestin protein family are thought to participate in agonist-mediated desensitization of G-protein-coupled receptors and cause specific dampening of cellular responses to stimuli such as hormones, neurotransmitters, or sensory signals. S-arrestin, also known as S-antigen, is a major soluble photoreceptor protein that is involved in desensitization of the photoactivated transduction cascade. It is expressed in the retina and the pineal gland and inhibits coupling of rhodopsin to transducin in vitro. Additionally, S-arrestin is highly antigenic, and is capable of inducing experimental autoimmune uveoretinitis. Mutations in this gene have been associated with Oguchi disease, a rare autosomal recessive form of night blindness. [provided by RefSeq, Jul 2008]']
2202 (p = 8.809E-07; d = -0.479): [u'This gene encodes a member of the fibulin family of extracellular matrix glycoproteins. Like all members of this family, the encoded protein contains tandemly repeated epidermal growth factor-like repeats followed by a C-terminus fibulin-type domain. This gene is upregulated in malignant gliomas and may play a role in the aggressive nature of these tumors. Mutations in this gene are associated with Doyne honeycomb retinal dystrophy. Alternatively spliced transcript variants that encode the same protein have been described.[provided by RefSeq, Nov 2009]']
341 (p = 9.552E-07; d = -0.476): [u'The protein encoded by this gene is a member of the apolipoprotein C1 family. This gene is expressed primarily in the liver, and it is activated when monocytes differentiate into macrophages. A pseudogene of this gene is located 4 kb downstream in the same orientation, on the same chromosome. This gene is mapped to chromosome 19, where it resides within a apolipoprotein gene cluster. Alternatively spliced transcript variants have been found for this gene, but the biological validity of some variants has not been determined. [provided by RefSeq, Jul 2008]']
5445 (p = 1.234E-06; d = -0.471): No description found
348 (p = 1.925E-06; d = -0.462): [u'The protein encoded by this gene is a major apoprotein of the chylomicron. It binds to a specific liver and peripheral cell receptor, and is essential for the normal catabolism of triglyceride-rich lipoprotein constituents. This gene maps to chromosome 19 in a cluster with the related apolipoprotein C1 and C2 genes. Mutations in this gene result in familial dysbetalipoproteinemia, or type III hyperlipoproteinemia (HLP III), in which increased plasma cholesterol and triglycerides are the consequence of impaired clearance of chylomicron and VLDL remnants. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Nov 2014]']
2040 (p = 1.981E-06; d = -0.462): [u'This gene encodes a member of a highly conserved family of integral membrane proteins. The encoded protein localizes to the cell membrane of red blood cells and other cell types, where it may regulate ion channels and transporters. Loss of localization of the encoded protein is associated with hereditary stomatocytosis, a form of hemolytic anemia. There is a pseudogene for this gene on chromosome 6. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Jul 2012]']
10804 (p = 2.944E-06; d = -0.454): [u'Gap junctions allow the transport of ions and metabolites between the cytoplasm of adjacent cells. They are formed by two hemichannels, made up of six connexin proteins assembled in groups. Each connexin protein has four transmembrane segments, two extracellular loops, a cytoplasmic loop formed between the two inner transmembrane segments, and the N- and C-terminus both being in the cytoplasm. The specificity of the gap junction is determined by which connexin proteins comprise the hemichannel. In the past, connexin protein names were based on their molecular weight, however the new nomenclature uses sequential numbers based on which form (alpha or beta) of the gap junction is present. This gene encodes one of the connexin proteins. Mutations in this gene have been found in some forms of deafness and in some families with hidrotic ectodermal dysplasia. [provided by RefSeq, Jul 2008]']
10455 (p = 3.059E-06; d = -0.453): [u'This gene encodes a member of the hydratase/isomerase superfamily. The protein encoded is a key mitochondrial enzyme involved in beta-oxidation of unsaturated fatty acids. It catalyzes the transformation of 3-cis and 3-trans-enoyl-CoA esters arising during the stepwise degradation of cis-, mono-, and polyunsaturated fatty acids to the 2-trans-enoyl-CoA intermediates. Alternatively spliced transcript variants have been described. [provided by RefSeq, Aug 2011]']
6383 (p = 3.391E-06; d = -0.451): [u'The protein encoded by this gene is a transmembrane (type I) heparan sulfate proteoglycan and is a member of the syndecan proteoglycan family. The syndecans mediate cell binding, cell signaling, and cytoskeletal organization and syndecan receptors are required for internalization of the HIV-1 tat protein. The syndecan-2 protein functions as an integral membrane protein and participates in cell proliferation, cell migration and cell-matrix interactions via its receptor for extracellular matrix proteins. Altered syndecan-2 expression has been detected in several different tumor types. [provided by RefSeq, Jul 2008]']
In [11]:
anal.p_val_distr(ttest_metr)
Percent Significant (Bonferroni Correction; alpha = .05): 0.072 %
In [12]:
anal.effect_size_distr(ttest_metr)
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-12-cbdd763c2f46> in <module>()
----> 1 anal.effect_size_distr(ttest_metr)
/Users/simonhaxby/Code/Python/nsaba/nsaba/analysis.py in effect_size_distr(self, ttest_metrics, genes_of_interest, return_fig)
448
449 if genes_of_interest is not []:
--> 450 offsetter = 450/len(genes_of_interest)
451 for rec in ttest_metrics['results']:
452 if int(rec.entrez) in genes_of_interest:
ZeroDivisionError: integer division or modulo by zero
In [ ]:
Content source: voytekresearch/nsaba
Similar notebooks: