Analysis to see how many of the genes in my list were in the top 5% of total genes


In [1]:
# settings and modules
%config InlineBackend.figure_format = 'retina'
%pylab inline
from nsaba.nsaba import nsaba
from nsaba.nsaba import analysis


Populating the interactive namespace from numpy and matplotlib

In [2]:
ns_path = "/Users/Torben/Documents/ABI analysis/current_data_new/"
aba_path = '/Users/Torben/Documents/ABI analysis/normalized_microarray_donor9861/'
#nsaba.Nsaba.aba_load(aba_path)
nsaba.Nsaba.ns_load(ns_path)
nsaba.Nsaba.aba_load(aba_path)
N = nsaba.Nsaba()
N.load_ge_pickle(pkl_file='/Users/Torben/Documents/ABI analysis/normalized_microarray_donor9861/Nsaba_ABA_ge.pkl')


This may take a minute or two ...
database.txt loaded.
features.txt loaded.
Nsaba.ns['mni_coords'] initialized.

This may take a minute or two ...
Initializing gene data from normalized_microarray_donor9861
SampleAnnot.csv loaded.
MicroarrayExpression.csv loaded.
Probes.csv loaded.
Nsaba.aba['mni_coords'] initialized.

This may take a minute or two ...
'ge' dictionary successfully loaded

In [9]:
N.get_ns_act('depression', thresh=-1, method='knn',smoothing='not',estimation_method = 'sum',search_radii=2)
N.get_ns_act('dopamine', thresh=-1, method='knn',smoothing='not',estimation_method = 'sum',search_radii=2)
N.get_ns_act('reward', thresh=-1, method='knn',smoothing='not',estimation_method = 'sum',search_radii=2)
N.get_ns_act('serotonin', thresh=-1, method='knn',smoothing='not',estimation_method = 'sum',search_radii=2)
N.get_ns_act('anxiety', thresh=-1, method='knn',smoothing='not',estimation_method = 'sum',search_radii=2)
N.get_ns_act('schizophrenia', thresh=-1, method='knn',smoothing='not',estimation_method = 'sum',search_radii=2)


This may take a few minutes...
This may take a few minutes...
This may take a few minutes...
This may take a few minutes...
This may take a few minutes...
This may take a few minutes...

In [4]:
depression_genes = analysis.load_gene_list('/Users/Torben/Documents/ABI analysis/gene_collections/','DepressionGenes.csv')
dopamine_genes = analysis.load_gene_list('/Users/Torben/Documents/ABI analysis/gene_collections/','DopamineGenes2.csv')
reward_genes = analysis.load_gene_list('/Users/Torben/Documents/ABI analysis/gene_collections/','rewardGenes2.csv')
serotonin_genes = analysis.load_gene_list('/Users/Torben/Documents/ABI analysis/gene_collections/','SerotoninGenes.csv')
anxiety_genes = analysis.load_gene_list('/Users/Torben/Documents/ABI analysis/gene_collections/','AnxietyGenes.csv')
schizophrenia_genes = analysis.load_gene_list('/Users/Torben/Documents/ABI analysis/gene_collections/','SchizophreniaGenes.csv')

In [10]:
alpha_output = np.zeros((6,4))
A = analysis.NsabaAnalysis(N)
methods = ['pearson','spearman','regression','t_test']
for m in xrange(len(methods)):
    depression_alpha = A.validate_by_alpha('depression',depression_genes,method=methods[m],nih_only=True,gi_csv_path='/Users/Torben/Code/nsaba/')
    dopamine_alpha = A.validate_by_alpha('dopamine',dopamine_genes,method=methods[m],nih_only=True,gi_csv_path='/Users/Torben/Code/nsaba/')
    reward_alpha = A.validate_by_alpha('reward',reward_genes,method=methods[m],nih_only=True,gi_csv_path='/Users/Torben/Code/nsaba/')
    serotonin_alpha = A.validate_by_alpha('serotonin',serotonin_genes,method=methods[m],nih_only=True,gi_csv_path='/Users/Torben/Code/nsaba/')
    anxiety_alpha = A.validate_by_alpha('anxiety',anxiety_genes,method=methods[m],nih_only=True,gi_csv_path='/Users/Torben/Code/nsaba/')
    schizophrenia_alpha = A.validate_by_alpha('schizophrenia',schizophrenia_genes,method=methods[m],nih_only=True,gi_csv_path='/Users/Torben/Code/nsaba/')
    
    alpha_output[0,m] = len(depression_alpha)/float(len(depression_genes))
    alpha_output[1,m] = len(dopamine_alpha)/float(len(dopamine_genes))
    alpha_output[2,m] = len(reward_alpha)/float(len(reward_genes))
    alpha_output[3,m] = len(serotonin_alpha)/float(len(serotonin_genes))
    alpha_output[4,m] = len(anxiety_alpha)/float(len(anxiety_genes))
    alpha_output[5,m] = len(schizophrenia_alpha)/float(len(schizophrenia_genes))


To use inline plotting functionality in Jupyter, '%matplotlib inline' must be enabled
Using NIH described genes only; Entrez ID sample size now 18896
pearson's r must be > 0.043152524422
Using NIH described genes only; Entrez ID sample size now 18896
pearson's r must be > 0.0415508129963
Using NIH described genes only; Entrez ID sample size now 18896
pearson's r must be > 0.0393162119279
Using NIH described genes only; Entrez ID sample size now 18896
pearson's r must be > 0.0607031092656
Using NIH described genes only; Entrez ID sample size now 18896
pearson's r must be > 0.0631139737664
Using NIH described genes only; Entrez ID sample size now 18896
pearson's r must be > 0.0387556772115
Using NIH described genes only; Entrez ID sample size now 18896
spearman's r must be > 0.081439851449
Using NIH described genes only; Entrez ID sample size now 18896
spearman's r must be > 0.129343949876
Using NIH described genes only; Entrez ID sample size now 18896
spearman's r must be > 0.0713027518454
Using NIH described genes only; Entrez ID sample size now 18896
spearman's r must be > 0.0764923842076
Using NIH described genes only; Entrez ID sample size now 18896
spearman's r must be > 0.070088163011
Using NIH described genes only; Entrez ID sample size now 18896
spearman's r must be > 0.0512401770233
Using NIH described genes only; Entrez ID sample size now 18896
slope of linear regression must be > 0.136477615238
Using NIH described genes only; Entrez ID sample size now 18896
slope of linear regression must be > 0.0275640618706
Using NIH described genes only; Entrez ID sample size now 18896
slope of linear regression must be > 0.0162229724936
Using NIH described genes only; Entrez ID sample size now 18896
slope of linear regression must be > 0.309071634639
Using NIH described genes only; Entrez ID sample size now 18896
slope of linear regression must be > 0.267038617187
Using NIH described genes only; Entrez ID sample size now 18896
slope of linear regression must be > 0.067478536149
This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896

In [6]:
#print m
print alpha_output
#N.get_ns_act('dopamine', thresh=-1, method='knn',smoothing='gaussian',estimation_method = 'mean')
#A = analysis.NsabaAnalysis(N)
#dopamine_alpha = A.validate_by_alpha('dopamine',dopamine_genes)
#print dopamine_alpha


[[ 0.          0.          0.          0.        ]
 [ 0.29411765  0.23529412  0.          0.11764706]
 [ 0.23529412  0.23529412  0.11764706  0.29411765]
 [ 0.03571429  0.03571429  0.          0.        ]
 [ 0.2         0.1         0.          0.        ]
 [ 0.42857143  0.          0.          0.        ]]

In [42]:
ttest_metrics = A.t_test_multi('schizophrenia', quant = 85,nih_only=True,gi_csv_path='/Users/Torben/code/nsaba/')
t= A.fetch_gene_descriptions(ttest_metrics,gene_path='/Users/Torben/Code/nsaba/')


This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
Fetching NIH gene descriptions ...
gene_rec(entrez=100008589, cohen_d=-1.0883645912209226, p_value=1.2140863134363576e-12)
gene_rec(entrez=7280, cohen_d=-0.52464441507775128, p_value=1.622317197247864e-06)
gene_rec(entrez=2823, cohen_d=-0.44722597862836105, p_value=3.8693990165040951e-06)
gene_rec(entrez=353134, cohen_d=-0.43103146297937156, p_value=0.00035957270346876258)
gene_rec(entrez=84570, cohen_d=-0.41771269151276019, p_value=6.5695551962134314e-06)
gene_rec(entrez=100288366, cohen_d=-0.41133572984765954, p_value=0.00012498684945289303)
gene_rec(entrez=7223, cohen_d=-0.38100492683679849, p_value=3.8592120486950921e-05)
gene_rec(entrez=131034, cohen_d=-0.37146849580480173, p_value=5.9641043805490689e-05)
gene_rec(entrez=341350, cohen_d=-0.36881155546893668, p_value=6.7522271437266512e-05)
gene_rec(entrez=57496, cohen_d=-0.36293089795136774, p_value=0.00011851851992019055)
gene_rec(entrez=57554, cohen_d=-0.35717706628934004, p_value=0.00011276083154633666)
gene_rec(entrez=55711, cohen_d=-0.34855481997130272, p_value=0.0001642511375297992)
gene_rec(entrez=255743, cohen_d=-0.34784589969718971, p_value=0.00016972621147365202)
gene_rec(entrez=7101, cohen_d=-0.34721415392886434, p_value=0.00017523753245674058)
gene_rec(entrez=729722, cohen_d=-0.34640597356710356, p_value=0.00018615062484995964)
gene_rec(entrez=169834, cohen_d=-0.34362640829757612, p_value=0.00020295650592806299)
gene_rec(entrez=1630, cohen_d=-0.33836836738919757, p_value=0.00025328319238386278)
gene_rec(entrez=347730, cohen_d=-0.33634807999060984, p_value=0.00027517656086699729)
gene_rec(entrez=100288902, cohen_d=-0.3348412626235252, p_value=0.00030319661841425773)
gene_rec(entrez=55512, cohen_d=-0.33423365123577226, p_value=0.00030110639876663737)

Corrected Bonferroni Alpha: 2.405E-06


100008589 (p = 1.214E-12; d = -1.088): No description found


7280 (p = 1.622E-06; d = -0.525): [u'Microtubules, key participants in processes such as mitosis and intracellular transport, are composed of heterodimers of alpha- and beta-tubulins. The protein encoded by this gene is a beta-tubulin. Defects in this gene are associated with complex cortical dysplasia with other brain malformations-5. Two transcript variants encoding distinct isoforms have been found for this gene. [provided by RefSeq, Jul 2015]']


2823 (p = 3.869E-06; d = -0.447): No description found


353134 (p = 3.596E-04; d = -0.431): No description found


84570 (p = 6.570E-06; d = -0.418): [u'This gene encodes a brain-specific membrane associated collagen. A product of proteolytic processing of the encoded protein, CLAC (collagenous Alzheimer amyloid plaque component), binds to amyloid beta-peptides found in Alzheimer amyloid plaques but CLAC inhibits rather than facilitates amyloid fibril elongation (PMID: 16300410). A study of over-expression of this collagen in mice, however, found changes in pathology and behavior suggesting that the encoded protein may promote amyloid plaque formation (PMID: 19548013). Multiple transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Dec 2011]']


100288366 (p = 1.250E-04; d = -0.411): No description found


7223 (p = 3.859E-05; d = -0.381): [u'This gene encodes a member of the canonical subfamily of transient receptor potential cation channels. The encoded protein forms a non-selective calcium-permeable cation channel that is activated by Gq-coupled receptors and tyrosine kinases, and plays a role in multiple processes including endothelial permeability, vasodilation, neurotransmitter release and cell proliferation. Single nucleotide polymorphisms in this gene may be associated with generalized epilepsy with photosensitivity. Alternatively spliced transcript variants encoding multiple isoforms have been observed for this gene. [provided by RefSeq, Aug 2011]']


131034 (p = 5.964E-05; d = -0.371): [u'This gene belongs to the highly conserved copine family. It encodes a calcium-dependent, phospholipid-binding protein, which may be involved in membrane trafficking, mitogenesis and development. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Jan 2014]']


341350 (p = 6.752E-05; d = -0.369): No description found


57496 (p = 1.185E-04; d = -0.363): No description found


57554 (p = 1.128E-04; d = -0.357): No description found


55711 (p = 1.643E-04; d = -0.349): [u'This gene belongs to the short chain dehydrogenase/reductase superfamily. It encodes a reductase enzyme involved in the first step of wax biosynthesis wherein fatty acids are converted to fatty alcohols. The encoded peroxisomal protein utilizes saturated fatty acids of 16 or 18 carbons as preferred substrates. Alternatively spliced transcript variants have been observed for this gene. Related pseudogenes have been identified on chromosomes 2, 14 and 22. [provided by RefSeq, Nov 2012]']


255743 (p = 1.697E-04; d = -0.348): No description found


7101 (p = 1.752E-04; d = -0.347): [u'The protein encoded by this gene is an orphan receptor involved in retinal development. The encoded protein also regulates adult neural stem cell proliferation and may be involved in control of aggressive behavior. Two transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Aug 2015]']


729722 (p = 1.862E-04; d = -0.346): [u'putative ankyrin repeat domain-containing protein ENSP00000383090']


169834 (p = 2.030E-04; d = -0.344): No description found


1630 (p = 2.533E-04; d = -0.338): [u'This gene encodes a netrin 1 receptor. The transmembrane protein is a member of the immunoglobulin superfamily of cell adhesion molecules, and mediates axon guidance of neuronal growth cones towards sources of netrin 1 ligand. The cytoplasmic tail interacts with the tyrosine kinases Src and focal adhesion kinase (FAK, also known as PTK2) to mediate axon attraction. The protein partially localizes to lipid rafts, and induces apoptosis in the absence of ligand. The protein functions as a tumor suppressor, and is frequently mutated or downregulated in colorectal cancer and esophageal carcinoma. [provided by RefSeq, Oct 2009]']


347730 (p = 2.752E-04; d = -0.336): No description found


100288902 (p = 3.032E-04; d = -0.335): No description found


55512 (p = 3.011E-04; d = -0.334): No description found



In [11]:
import csv

with open('/Users/Torben/Documents/ABI analysis/validation/flat_sum_r2_alpha_validation.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile)
    for a in alpha_output:
        spamwriter.writerow(a)

In [12]:
terms = ['depression','dopamine','reward','anxiety','schizophrenia']

for term in terms:
    ttest_metrics = A.t_test_multi(term, quant = 85,nih_only=True,gi_csv_path='/Users/Torben/code/nsaba/')
    t= A.fetch_gene_descriptions(ttest_metrics,gene_path='/Users/Torben/Code/nsaba/')
    with open('/Users/Torben/Documents/ABI analysis/validation/top'+term+'_genes_t85_flat2.csv', 'wb') as csvfile:
        spamwriter = csv.writer(csvfile)
        for ti in t:
            spamwriter.writerow(ti)


This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
Fetching NIH gene descriptions ...
gene_rec(entrez=100008589, cohen_d=-1.3178099869523801, p_value=4.4852860690082376e-15)
gene_rec(entrez=25854, cohen_d=-0.65583901967178615, p_value=3.2826465229165229e-09)
gene_rec(entrez=55512, cohen_d=-0.64662698410639841, p_value=5.3381146978178766e-09)
gene_rec(entrez=285220, cohen_d=-0.63659345672565937, p_value=9.0693457569323296e-09)
gene_rec(entrez=644150, cohen_d=-0.63540726161446404, p_value=9.6991450277588558e-09)
gene_rec(entrez=254263, cohen_d=-0.63466768060339052, p_value=1.0024299633344645e-08)
gene_rec(entrez=5582, cohen_d=-0.62267402074389644, p_value=1.8586156546395949e-08)
gene_rec(entrez=100133686, cohen_d=-0.6205910647723103, p_value=2.0765417935814287e-08)
gene_rec(entrez=54072, cohen_d=-0.59078415076952651, p_value=9.3391340461906834e-08)
gene_rec(entrez=3208, cohen_d=-0.58835036847640509, p_value=1.0405278407392498e-07)
gene_rec(entrez=816, cohen_d=-0.58751678635184257, p_value=1.0830790269784414e-07)
gene_rec(entrez=65266, cohen_d=-0.58582997947903159, p_value=1.2208846875650901e-07)
gene_rec(entrez=55061, cohen_d=-0.57492136261531801, p_value=2.0074895672183517e-07)
gene_rec(entrez=58189, cohen_d=-0.56953341535298818, p_value=2.5765188370338461e-07)
gene_rec(entrez=84570, cohen_d=-0.5662972608703013, p_value=3.0205753265647424e-07)
gene_rec(entrez=23154, cohen_d=-0.5658040212017309, p_value=3.0789388202557731e-07)
gene_rec(entrez=777, cohen_d=-0.56282435777301254, p_value=3.5325282487159912e-07)
gene_rec(entrez=9890, cohen_d=-0.553377114428105, p_value=5.4892185687521472e-07)
gene_rec(entrez=7881, cohen_d=-0.5518648216541121, p_value=5.8876795002629262e-07)
gene_rec(entrez=6483, cohen_d=-0.54966617087963188, p_value=6.5066596972796984e-07)

Corrected Bonferroni Alpha: 2.405E-06


100008589 (p = 4.485E-15; d = -1.318): No description found


25854 (p = 3.283E-09; d = -0.656): No description found


55512 (p = 5.338E-09; d = -0.647): No description found


285220 (p = 9.069E-09; d = -0.637): No description found


644150 (p = 9.699E-09; d = -0.635): No description found


254263 (p = 1.002E-08; d = -0.635): [u'The protein encoded by this gene is an auxiliary subunit of the ionotropic glutamate receptor of the AMPA subtype. AMPA receptors mediate fast synaptic neurotransmission in the central nervous system. This protein has been reported to interact with the Type I AMPA receptor regulatory protein isoform gamma-8 to control assembly of hippocampal AMPA receptor complexes, thereby modulating receptor gating and pharmacology. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Aug 2012]']


5582 (p = 1.859E-08; d = -0.623): [u'Protein kinase C (PKC) is a family of serine- and threonine-specific protein kinases that can be activated by calcium and second messenger diacylglycerol. PKC family members phosphorylate a wide variety of protein targets and are known to be involved in diverse cellular signaling pathways. PKC also serve as major receptors for phorbol esters, a class of tumor promoters. Each member of the PKC family has a specific expression profile and is believed to play distinct roles in cells. The protein encoded by this gene is one of the PKC family members. This protein kinase is expressed solely in the brain and spinal cord and its localization is restricted to neurons. It has been demonstrated that several neuronal functions, including long term potentiation (LTP) and long term depression (LTD), specifically require this kinase. Knockout studies in mice also suggest that this kinase may be involved in neuropathic pain development. Defects in this protein have been associated with neurodegenerative disorder spinocerebellar ataxia-14 (SCA14). [provided by RefSeq, Jul 2008]']


100133686 (p = 2.077E-08; d = -0.621): No description found


54072 (p = 9.339E-08; d = -0.591): No description found


3208 (p = 1.041E-07; d = -0.588): [u'The protein encoded by this gene is a member of neuron-specific calcium-binding proteins family found in the retina and brain. This protein is associated with the plasma membrane. It has similarities to proteins located in the photoreceptor cells that regulate photosignal transduction in a calcium-sensitive manner. This protein displays recoverin activity and a calcium-dependent inhibition of rhodopsin kinase. It is identical to the rat and mouse hippocalcin proteins and thought to play an important role in neurons of the central nervous system in a number of species. [provided by RefSeq, Jul 2008]']


816 (p = 1.083E-07; d = -0.588): [u'The product of this gene belongs to the serine/threonine protein kinase family and to the Ca(2+)/calmodulin-dependent protein kinase subfamily. Calcium signaling is crucial for several aspects of plasticity at glutamatergic synapses. In mammalian cells, the enzyme is composed of four different chains: alpha, beta, gamma, and delta. The product of this gene is a beta chain. It is possible that distinct isoforms of this chain have different cellular localizations and interact differently with calmodulin. Alternative splicing results in multiple transcript variants. [provided by RefSeq, May 2014]']


65266 (p = 1.221E-07; d = -0.586): [u'This gene encodes a member of the WNK family of serine-threonine protein kinases. The kinase is part of the tight junction complex in kidney cells, and regulates the balance between NaCl reabsorption and K(+) secretion. The kinase regulates the activities of several types of ion channels, cotransporters, and exchangers involved in electrolyte flux in epithelial cells. Mutations in this gene result in pseudohypoaldosteronism type IIB.[provided by RefSeq, Sep 2009]']


55061 (p = 2.007E-07; d = -0.575): No description found


58189 (p = 2.577E-07; d = -0.570): [u"This gene encodes a member of the WAP-type four disulfide core domain family. The WAP-type four-disulfide core domain contains eight cysteines forming four disulfide bonds at the core of the protein, and functions as a protease inhibitor in many family members. This gene is mapped to chromosome 16q24, an area of frequent loss of heterozygosity in cancers, including prostate, breast and hepatocellular cancers and Wilms' tumor. This gene is downregulated in many cancer types and may be involved in the inhibition of cell proliferation. The encoded protein may also play a role in the susceptibility of certain CD4 memory T cells to human immunodeficiency virus infection. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Sep 2013]"]


84570 (p = 3.021E-07; d = -0.566): [u'This gene encodes a brain-specific membrane associated collagen. A product of proteolytic processing of the encoded protein, CLAC (collagenous Alzheimer amyloid plaque component), binds to amyloid beta-peptides found in Alzheimer amyloid plaques but CLAC inhibits rather than facilitates amyloid fibril elongation (PMID: 16300410). A study of over-expression of this collagen in mice, however, found changes in pathology and behavior suggesting that the encoded protein may promote amyloid plaque formation (PMID: 19548013). Multiple transcript variants encoding different isoforms have been found for this gene. [provided by RefSeq, Dec 2011]']


23154 (p = 3.079E-07; d = -0.566): No description found


777 (p = 3.533E-07; d = -0.563): [u"Voltage-dependent calcium channels are multisubunit complexes consisting of alpha-1, alpha-2, beta, and delta subunits in a 1:1:1:1 ratio. These channels mediate the entry of calcium ions into excitable cells, and are also involved in a variety of calcium-dependent processes, including muscle contraction, hormone or neurotransmitter release, gene expression, cell motility, cell division and cell death. This gene encodes the alpha-1E subunit of the R-type calcium channels, which belong to the 'high-voltage activated' group that maybe involved in the modulation of firing patterns of neurons important for information processing. Alternatively spliced transcript variants encoding different isoforms have been described for this gene. [provided by RefSeq, Apr 2011]"]


9890 (p = 5.489E-07; d = -0.553): No description found


7881 (p = 5.888E-07; d = -0.552): [u'Potassium channels represent the most complex class of voltage-gated ion channels from both functional and structural standpoints. Their diverse functions include regulating neurotransmitter release, heart rate, insulin secretion, neuronal excitability, epithelial electrolyte transport, smooth muscle contraction, and cell volume. Four sequence-related potassium channel genes - shaker, shaw, shab, and shal - have been identified in Drosophila, and each has been shown to have human homolog(s). This gene encodes a member of the potassium channel, voltage-gated, shaker-related subfamily. This member includes distinct isoforms which are encoded by alternatively spliced transcript variants of this gene. Some of these isoforms are beta subunits, which form heteromultimeric complexes with alpha subunits and modulate the activity of the pore-forming alpha subunits. [provided by RefSeq, Apr 2015]']


6483 (p = 6.507E-07; d = -0.550): [u'The protein encoded by this gene is a type II membrane protein that catalyzes the transfer of sialic acid from CMP-sialic acid to galactose-containing substrates. The encoded protein is normally found in the Golgi but can be proteolytically processed to a soluble form. This protein, which is a member of glycosyltransferase family 29, can use the same acceptor substrates as does sialyltransferase 4A. [provided by RefSeq, Jul 2008]']


This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
Fetching NIH gene descriptions ...
gene_rec(entrez=100008589, cohen_d=-1.029973730222228, p_value=4.7000392930351963e-09)
gene_rec(entrez=353134, cohen_d=-0.84680049328781004, p_value=1.4961373420900704e-08)
gene_rec(entrez=114041, cohen_d=-0.83200034244728893, p_value=6.8927580815575928e-11)
gene_rec(entrez=4879, cohen_d=-0.80217409834011999, p_value=3.0629262171906652e-10)
gene_rec(entrez=4925, cohen_d=-0.79937789750828669, p_value=3.5063609445351844e-10)
gene_rec(entrez=1813, cohen_d=-0.79122302003246725, p_value=5.2858408305843076e-10)
gene_rec(entrez=6496, cohen_d=-0.78677477587777123, p_value=6.5228259909865101e-10)
gene_rec(entrez=2810, cohen_d=-0.77111218814734706, p_value=1.3847739034587333e-09)
gene_rec(entrez=8787, cohen_d=-0.76806943496476821, p_value=1.6268348583789661e-09)
gene_rec(entrez=84832, cohen_d=-0.76563023606035119, p_value=1.7949568396965738e-09)
gene_rec(entrez=5341, cohen_d=-0.75497646397812979, p_value=2.9823440870472192e-09)
gene_rec(entrez=135, cohen_d=-0.74897823816113651, p_value=3.9936678011222376e-09)
gene_rec(entrez=2615, cohen_d=-0.7402882755105763, p_value=5.8888591634727154e-09)
gene_rec(entrez=4043, cohen_d=-0.7399562791810681, p_value=6.0870874390095837e-09)
gene_rec(entrez=2946, cohen_d=-0.73466593018549387, p_value=7.5784189026417101e-09)
gene_rec(entrez=168002, cohen_d=-0.73300048395326689, p_value=8.231936573450037e-09)
gene_rec(entrez=152, cohen_d=-0.73210042451613477, p_value=8.6549198474701199e-09)
gene_rec(entrez=1690, cohen_d=-0.73101198868565853, p_value=9.127441525674963e-09)
gene_rec(entrez=646658, cohen_d=-0.72751582567847517, p_value=1.076222600893629e-08)
gene_rec(entrez=285097, cohen_d=-0.72120985290675566, p_value=1.3938668603676782e-08)

Corrected Bonferroni Alpha: 2.405E-06


100008589 (p = 4.700E-09; d = -1.030): No description found


353134 (p = 1.496E-08; d = -0.847): No description found


114041 (p = 6.893E-11; d = -0.832): No description found


4879 (p = 3.063E-10; d = -0.802): [u"This gene is a member of the natriuretic peptide family and encodes a secreted protein which functions as a cardiac hormone. The protein undergoes two cleavage events, one within the cell and a second after secretion into the blood. The protein's biological actions include natriuresis, diuresis, vasorelaxation, inhibition of renin and aldosterone secretion, and a key role in cardiovascular homeostasis. A high concentration of this protein in the bloodstream is indicative of heart failure. The protein also acts as an antimicrobial peptide with antibacterial and antifungal activity. Mutations in this gene have been associated with postmenopausal osteoporosis. [provided by RefSeq, Nov 2014]"]


4925 (p = 3.506E-10; d = -0.799): [u'This gene encodes a protein with a suggested role in calcium level maintenance, eating regulation in the hypothalamus, and release of tumor necrosis factor from vascular endothelial cells. This protein binds calcium and has EF-folding domains. [provided by RefSeq, Oct 2011]']


1813 (p = 5.286E-10; d = -0.791): [u'This gene encodes the D2 subtype of the dopamine receptor. This G-protein coupled receptor inhibits adenylyl cyclase activity. A missense mutation in this gene causes myoclonus dystonia; other mutations have been associated with schizophrenia. Alternative splicing of this gene results in two transcript variants encoding different isoforms. A third variant has been described, but it has not been determined whether this form is normal or due to aberrant splicing. [provided by RefSeq, Jul 2008]']


6496 (p = 6.523E-10; d = -0.787): [u'This gene encodes a member of the sine oculis homeobox transcription factor family. The encoded protein plays a role in eye development. Mutations in this gene have been associated with holoprosencephaly type 2. [provided by RefSeq, Oct 2009]']


2810 (p = 1.385E-09; d = -0.771): No description found


8787 (p = 1.627E-09; d = -0.768): [u'This gene encodes a member of the RGS family of GTPase activating proteins that function in various signaling pathways by accelerating the deactivation of G proteins. This protein is anchored to photoreceptor membranes in retinal cells and deactivates G proteins in the rod and cone phototransduction cascades. Mutations in this gene result in bradyopsia. Multiple transcript variants encoding different isoforms have been found for this gene.[provided by RefSeq, Sep 2009]']


84832 (p = 1.795E-09; d = -0.766): No description found


5341 (p = 2.982E-09; d = -0.755): No description found


135 (p = 3.994E-09; d = -0.749): [u'This gene encodes a member of the guanine nucleotide-binding protein (G protein)-coupled receptor (GPCR) superfamily, which is subdivided into classes and subtypes. The receptors are seven-pass transmembrane proteins that respond to extracellular cues and activate intracellular signal transduction pathways. This protein, an adenosine receptor of A2A subtype, uses adenosine as the preferred endogenous agonist and preferentially interacts with the G(s) and G(olf) family of G proteins to increase intracellular cAMP levels. It plays an important role in many biological functions, such as cardiac rhythm and circulation, cerebral and renal blood flow, immune function, pain regulation, and sleep. It has been implicated in pathophysiological conditions such as inflammatory diseases and neurodegenerative disorders. Alternative splicing results in multiple transcript variants. A read-through transcript composed of the upstream SPECC1L (sperm antigen with calponin homology and coiled-coil domains 1-like) and ADORA2A (adenosine A2a receptor) gene sequence has been identified, but it is thought to be non-coding. [provided by RefSeq, Jun 2013]']


2615 (p = 5.889E-09; d = -0.740): [u'This gene encodes a type I membrane protein which contains 20 leucine-rich repeats. Alterations in the chromosomal region 11q13-11q14 are involved in several pathologies. [provided by RefSeq, Jul 2008]']


4043 (p = 6.087E-09; d = -0.740): [u'This gene encodes a protein that interacts with the low density lipoprotein (LDL) receptor-related protein and facilitates its proper folding and localization by preventing the binding of ligands. Mutations in this gene have been identified in individuals with myopia 23. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Dec 2013]']


2946 (p = 7.578E-09; d = -0.735): [u"Cytosolic and membrane-bound forms of glutathione S-transferase are encoded by two distinct supergene families. At present, eight distinct classes of the soluble cytoplasmic mammalian glutathione S-transferases have been identified: alpha, kappa, mu, omega, pi, sigma, theta and zeta. This gene encodes a glutathione S-transferase that belongs to the mu class. The mu class of enzymes functions in the detoxification of electrophilic compounds, including carcinogens, therapeutic drugs, environmental toxins and products of oxidative stress, by conjugation with glutathione. The genes encoding the mu class of enzymes are organized in a gene cluster on chromosome 1p13.3 and are known to be highly polymorphic. These genetic variations can change an individual's susceptibility to carcinogens and toxins as well as affect the toxicity and efficacy of certain drugs. [provided by RefSeq, Jul 2008]"]


168002 (p = 8.232E-09; d = -0.733): No description found


152 (p = 8.655E-09; d = -0.732): [u'Alpha-2-adrenergic receptors are members of the G protein-coupled receptor superfamily. They include 3 highly homologous subtypes: alpha2A, alpha2B, and alpha2C. These receptors have a critical role in regulating neurotransmitter release from sympathetic nerves and from adrenergic neurons in the central nervous system. The mouse studies revealed that both the alpha2A and alpha2C subtypes were required for normal presynaptic control of transmitter release from sympathetic nerves in the heart and from central noradrenergic neurons. The alpha2A subtype inhibited transmitter release at high stimulation frequencies, whereas the alpha2C subtype modulated neurotransmission at lower levels of nerve activity. This gene encodes the alpha2C subtype, which contains no introns in either its coding or untranslated sequences. [provided by RefSeq, Jul 2008]']


1690 (p = 9.127E-09; d = -0.731): [u'The protein encoded by this gene is highly conserved in human, mouse, and chicken, showing 94% and 79% amino acid identity of human to mouse and chicken sequences, respectively. Hybridization to this gene was detected in spindle-shaped cells located along nerve fibers between the auditory ganglion and sensory epithelium. These cells accompany neurites at the habenula perforata, the opening through which neurites extend to innervate hair cells. This and the pattern of expression of this gene in chicken inner ear paralleled the histologic findings of acidophilic deposits, consistent with mucopolysaccharide ground substance, in temporal bones from DFNA9 (autosomal dominant nonsyndromic sensorineural deafness 9) patients. Mutations that cause DFNA9 have been reported in this gene. Alternative splicing results in multiple transcript variants encoding the same protein. Additional splice variants encoding distinct isoforms have been described but their biological validities have not been demonstrated. [provided by RefSeq, Oct 2008]']


646658 (p = 1.076E-08; d = -0.728): No description found


285097 (p = 1.394E-08; d = -0.721): No description found


This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
Fetching NIH gene descriptions ...
gene_rec(entrez=100008589, cohen_d=-1.1197673899783784, p_value=9.2489393458171544e-13)
gene_rec(entrez=353134, cohen_d=-0.63393681844033789, p_value=6.0361056889599255e-07)
gene_rec(entrez=641311, cohen_d=-0.53893815172375736, p_value=8.0205220079527897e-06)
gene_rec(entrez=55784, cohen_d=-0.46353391141855305, p_value=1.9884878566130825e-06)
gene_rec(entrez=100288366, cohen_d=-0.45792474062225047, p_value=4.850565075808026e-05)
gene_rec(entrez=8404, cohen_d=-0.43667743470621095, p_value=1.7841215968664916e-05)
gene_rec(entrez=3670, cohen_d=-0.42405630596801624, p_value=1.3334000842656108e-05)
gene_rec(entrez=688, cohen_d=-0.42320048037314012, p_value=1.403112269816668e-05)
gene_rec(entrez=4504, cohen_d=-0.4155221216790726, p_value=3.2597006032915608e-05)
gene_rec(entrez=171019, cohen_d=-0.41071196267898191, p_value=2.4558250441321277e-05)
gene_rec(entrez=7280, cohen_d=-0.40741307622235451, p_value=0.00030452578505056594)
gene_rec(entrez=5909, cohen_d=-0.40708807129229918, p_value=2.8878485969356086e-05)
gene_rec(entrez=83550, cohen_d=-0.40591665315342623, p_value=3.2590406951764684e-05)
gene_rec(entrez=1390, cohen_d=-0.40186847474008769, p_value=3.6339215715549399e-05)
gene_rec(entrez=100287080, cohen_d=-0.40119296326705128, p_value=0.00017007804578007065)
gene_rec(entrez=3598, cohen_d=-0.39626936306108351, p_value=4.6570340766764595e-05)
gene_rec(entrez=8577, cohen_d=-0.38464728192576697, p_value=7.6765984983575173e-05)
gene_rec(entrez=144406, cohen_d=-0.38305521898453826, p_value=8.2163431194023757e-05)
gene_rec(entrez=5510, cohen_d=-0.36774525079520559, p_value=0.00015616677557194099)
gene_rec(entrez=1813, cohen_d=-0.36747823485154218, p_value=0.00015811783585324148)

Corrected Bonferroni Alpha: 2.405E-06


100008589 (p = 9.249E-13; d = -1.120): No description found


353134 (p = 6.036E-07; d = -0.634): No description found


641311 (p = 8.021E-06; d = -0.539): No description found


55784 (p = 1.988E-06; d = -0.464): No description found


100288366 (p = 4.851E-05; d = -0.458): No description found


8404 (p = 1.784E-05; d = -0.437): No description found


3670 (p = 1.333E-05; d = -0.424): [u'This gene encodes a member of the LIM/homeodomain family of transcription factors. The encoded protein binds to the enhancer region of the insulin gene, among others, and may play an important role in regulating insulin gene expression. The encoded protein is central to the development of pancreatic cell lineages and may also be required for motor neuron generation. Mutations in this gene have been associated with maturity-onset diabetes of the young. [provided by RefSeq, Jul 2008]']


688 (p = 1.403E-05; d = -0.423): [u'This gene encodes a member of the Kruppel-like factor subfamily of zinc finger proteins. The encoded protein is a transcriptional activator that binds directly to a specific recognition motif in the promoters of target genes. This protein acts downstream of multiple different signaling pathways and is regulated by post-translational modification. It may participate in both promoting and suppressing cell proliferation. Expression of this gene may be changed in a variety of different cancers and in cardiovascular disease. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Nov 2013]']


4504 (p = 3.260E-05; d = -0.416): No description found


171019 (p = 2.456E-05; d = -0.411): No description found


7280 (p = 3.045E-04; d = -0.407): [u'Microtubules, key participants in processes such as mitosis and intracellular transport, are composed of heterodimers of alpha- and beta-tubulins. The protein encoded by this gene is a beta-tubulin. Defects in this gene are associated with complex cortical dysplasia with other brain malformations-5. Two transcript variants encoding distinct isoforms have been found for this gene. [provided by RefSeq, Jul 2015]']


5909 (p = 2.888E-05; d = -0.407): [u'This gene encodes a type of GTPase-activating-protein (GAP) that down-regulates the activity of the ras-related RAP1 protein. RAP1 acts as a molecular switch by cycling between an inactive GDP-bound form and an active GTP-bound form. The product of this gene, RAP1GAP, promotes the hydrolysis of bound GTP and hence returns RAP1 to the inactive state whereas other proteins, guanine nucleotide exchange factors (GEFs), act as RAP1 activators by facilitating the conversion of RAP1 from the GDP- to the GTP-bound form. In general, ras subfamily proteins, such as RAP1, play key roles in receptor-linked signaling pathways that control cell growth and differentiation. RAP1 plays a role in diverse processes such as cell proliferation, adhesion, differentiation, and embryogenesis. Alternative splicing results in multiple transcript variants encoding distinct proteins. [provided by RefSeq, Aug 2011]']


83550 (p = 3.259E-05; d = -0.406): [u'The protein encoded by this gene is an orphan G protein-coupled receptor of unknown function. The encoded protein is a member of a family of proteins that contain seven transmembrane domains and transduce extracellular signals through heterotrimeric G proteins. [provided by RefSeq, Sep 2011]']


1390 (p = 3.634E-05; d = -0.402): [u'This gene encodes a bZIP transcription factor that binds to the cAMP responsive element found in many viral and cellular promoters. It is an important component of cAMP-mediated signal transduction during the spermatogenetic cycle, as well as other complex processes. Alternative promoter and translation initiation site usage allows this gene to exert spatial and temporal specificity to cAMP responsiveness. Multiple alternatively spliced transcript variants encoding several different isoforms have been found for this gene, with some of them functioning as activators and some as repressors of transcription. [provided by RefSeq, Jul 2008]']


100287080 (p = 1.701E-04; d = -0.401): No description found


3598 (p = 4.657E-05; d = -0.396): [u'The protein encoded by this gene is closely related to Il13RA1, a subuint of the interleukin 13 receptor complex. This protein binds IL13 with high affinity, but lacks cytoplasmic domain, and does not appear to function as a signal mediator. It is reported to play a role in the internalization of IL13. [provided by RefSeq, Jul 2008]']


8577 (p = 7.677E-05; d = -0.385): No description found


144406 (p = 8.216E-05; d = -0.383): [u'This protein encoded by this gene belongs to the WD repeat-containing family of proteins, which function in the formation of protein-protein complexes in a variety of biological pathways. This family member appears to function in the determination of mean platelet volume (MPV), and polymorphisms in this gene have been associated with variance in MPV. Alternative splicing of this gene results in multiple transcript variants. [provided by RefSeq, Sep 2011]']


5510 (p = 1.562E-04; d = -0.368): [u'This gene encodes a protein subunit that regulates the activity of the serine/threonine phosphatase, protein phosphatase-1. The encoded protein is required for completion of the mitotic cycle and for targeting protein phosphatase-1 to mitotic kinetochores. Alternate splicing results in multiple transcript variants. [provided by RefSeq, Sep 2013]']


1813 (p = 1.581E-04; d = -0.367): [u'This gene encodes the D2 subtype of the dopamine receptor. This G-protein coupled receptor inhibits adenylyl cyclase activity. A missense mutation in this gene causes myoclonus dystonia; other mutations have been associated with schizophrenia. Alternative splicing of this gene results in two transcript variants encoding different isoforms. A third variant has been described, but it has not been determined whether this form is normal or due to aberrant splicing. [provided by RefSeq, Jul 2008]']


This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
Fetching NIH gene descriptions ...
gene_rec(entrez=100008589, cohen_d=-1.0226814934384474, p_value=4.8196629890160717e-09)
gene_rec(entrez=353134, cohen_d=-0.51499568903432125, p_value=0.0003467604049848319)
gene_rec(entrez=11024, cohen_d=-0.48131902560570361, p_value=8.5564018081500716e-05)
gene_rec(entrez=124583, cohen_d=-0.47928744194529083, p_value=9.1542451544816199e-05)
gene_rec(entrez=285097, cohen_d=-0.47285748148113887, p_value=0.00011350298276451298)
gene_rec(entrez=55199, cohen_d=-0.45274162571829202, p_value=0.00021845082100556372)
gene_rec(entrez=54873, cohen_d=-0.42951020461995754, p_value=0.0004509369252993425)
gene_rec(entrez=7100, cohen_d=-0.42950929943859373, p_value=0.00045138793094492307)
gene_rec(entrez=1945, cohen_d=-0.42617343092999388, p_value=0.00050034507183371261)
gene_rec(entrez=4283, cohen_d=-0.42199061833213874, p_value=0.00056661790971081574)
gene_rec(entrez=1137, cohen_d=-0.41973250442517734, p_value=0.00060641974232847784)
gene_rec(entrez=116085, cohen_d=-0.41950157714566416, p_value=0.00061009517655627021)
gene_rec(entrez=79148, cohen_d=-0.41192312830416189, p_value=0.00076218316200671364)
gene_rec(entrez=100288366, cohen_d=-0.40879925529769129, p_value=0.0021894571335191204)
gene_rec(entrez=190, cohen_d=-0.40834103789971343, p_value=0.00084725612695022811)
gene_rec(entrez=7280, cohen_d=-0.40805796639965358, p_value=0.002274312939159672)
gene_rec(entrez=9317, cohen_d=-0.40705044356549058, p_value=0.00087908138515181656)
gene_rec(entrez=3784, cohen_d=-0.40631913807999198, p_value=0.00089917965391949272)
gene_rec(entrez=8941, cohen_d=-0.40624232468767818, p_value=0.00090006985210069998)
gene_rec(entrez=284021, cohen_d=-0.40463251714802589, p_value=0.00094467536288119739)

Corrected Bonferroni Alpha: 2.405E-06


100008589 (p = 4.820E-09; d = -1.023): No description found


353134 (p = 3.468E-04; d = -0.515): No description found


11024 (p = 8.556E-05; d = -0.481): [u'This gene encodes an activating member of the leukocyte immunoglobulin-like receptor (LIR) family, which is found in a gene cluster at chromosomal region 19q13.4. The encoded protein is predominantly expressed in B cells, interacts with major histocompatibility complex class I ligands, and contributes to the regulation of immune responses. Alternative splicing results in multiple transcript variants encoding different isoforms. [provided by RefSeq, May 2013]']


124583 (p = 9.154E-05; d = -0.479): [u'This protein encoded by this gene belongs to the apyrase family. It functions as a calcium-dependent nucleotidase with a preference for UDP. Mutations in this gene are associated with Desbuquois dysplasia with hand anomalies. Alternatively spliced transcript variants have been noted for this gene.[provided by RefSeq, Mar 2010]']


285097 (p = 1.135E-04; d = -0.473): No description found


55199 (p = 2.185E-04; d = -0.453): No description found


54873 (p = 4.509E-04; d = -0.430): No description found


7100 (p = 4.514E-04; d = -0.430): [u'This gene encodes a member of the toll-like receptor (TLR) family, which plays a fundamental role in pathogen recognition and activation of innate immune responses. These receptors recognize distinct pathogen-associated molecular patterns that are expressed on infectious agents. The protein encoded by this gene recognizes bacterial flagellin, the principal component of bacterial flagella and a virulence factor. The activation of this receptor mobilizes the nuclear factor NF-kappaB, which in turn activates a host of inflammatory-related target genes. Mutations in this gene have been associated with both resistance and susceptibility to systemic lupus erythematosus, and susceptibility to Legionnaire disease.[provided by RefSeq, Dec 2009]']


1945 (p = 5.003E-04; d = -0.426): [u'This gene encodes a member of the ephrin (EPH) family. The ephrins and EPH-related receptors comprise the largest subfamily of receptor protein-tyrosine kinases and have been implicated in mediating developmental events, especially in the nervous system and in erythropoiesis. Based on their structures and sequence relationships, ephrins are divided into the ephrin-A (EFNA) class, which are anchored to the membrane by a glycosylphosphatidylinositol linkage, and the ephrin-B (EFNB) class, which are transmembrane proteins. This gene encodes an EFNA class ephrin. Three transcript variants that encode distinct proteins have been identified. [provided by RefSeq, Jul 2008]']


4283 (p = 5.666E-04; d = -0.422): [u'This antimicrobial gene encodes a protein thought to be involved in T cell trafficking. The encoded protein binds to C-X-C motif chemokine 3 and is a chemoattractant for lymphocytes but not for neutrophils. [provided by RefSeq, Sep 2014]']


1137 (p = 6.064E-04; d = -0.420): [u'This gene encodes a nicotinic acetylcholine receptor, which belongs to a superfamily of ligand-gated ion channels that play a role in fast signal transmission at synapses. These pentameric receptors can bind acetylcholine, which causes an extensive change in conformation that leads to the opening of an ion-conducting channel across the plasma membrane. This protein is an integral membrane receptor subunit that can interact with either nAChR beta-2 or nAChR beta-4 to form a functional receptor. Mutations in this gene cause nocturnal frontal lobe epilepsy type 1. Polymorphisms in this gene that provide protection against nicotine addiction have been described. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Feb 2012]']


116085 (p = 6.101E-04; d = -0.420): [u'The protein encoded by this gene is a member of the organic anion transporter (OAT) family, and it acts as a urate transporter to regulate urate levels in blood. This protein is an integral membrane protein primarily found in epithelial cells of the proximal tubule of the kidney. An elevated level of serum urate, hyperuricemia, is associated with increased incidences of gout, and mutations in this gene cause renal hypouricemia type 1. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Jan 2013]']


79148 (p = 7.622E-04; d = -0.412): [u'Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of extracellular matrix for both normal physiological processes, such as embryonic development, reproduction and tissue remodeling, and disease processes, such as asthma and metastasis. This gene encodes a secreted enzyme that degrades casein. Its expression pattern suggests that it plays a role in tissue homeostasis and in wound repair. Alternative splicing of this gene results in multiple transcript variants. [provided by RefSeq, Apr 2014]']


100288366 (p = 2.189E-03; d = -0.409): No description found


190 (p = 8.473E-04; d = -0.408): [u'This gene encodes a protein that contains a DNA-binding domain. The encoded protein acts as a dominant-negative regulator of transcription which is mediated by the retinoic acid receptor. This protein also functions as an anti-testis gene by acting antagonistically to Sry. Mutations in this gene result in both X-linked congenital adrenal hypoplasia and hypogonadotropic hypogonadism. [provided by RefSeq, Jul 2008]']


7280 (p = 2.274E-03; d = -0.408): [u'Microtubules, key participants in processes such as mitosis and intracellular transport, are composed of heterodimers of alpha- and beta-tubulins. The protein encoded by this gene is a beta-tubulin. Defects in this gene are associated with complex cortical dysplasia with other brain malformations-5. Two transcript variants encoding distinct isoforms have been found for this gene. [provided by RefSeq, Jul 2015]']


9317 (p = 8.791E-04; d = -0.407): No description found


3784 (p = 8.992E-04; d = -0.406): [u'This gene encodes a voltage-gated potassium channel required for repolarization phase of the cardiac action potential. This protein can form heteromultimers with two other potassium channel proteins, KCNE1 and KCNE3. Mutations in this gene are associated with hereditary long QT syndrome 1 (also known as Romano-Ward syndrome), Jervell and Lange-Nielsen syndrome, and familial atrial fibrillation. This gene exhibits tissue-specific imprinting, with preferential expression from the maternal allele in some tissues, and biallelic expression in others. This gene is located in a region of chromosome 11 amongst other imprinted genes that are associated with Beckwith-Wiedemann syndrome (BWS), and itself has been shown to be disrupted by chromosomal rearrangements in patients with BWS. Alternatively spliced transcript variants have been found for this gene. [provided by RefSeq, Aug 2011]']


8941 (p = 9.001E-04; d = -0.406): [u'The protein encoded by this gene is a neuron-specific activator of CDK5 kinase. It associates with CDK5 to form an active kinase. This protein and neuron-specific CDK5 activator CDK5R1/p39NCK5A both share limited similarity to cyclins, and thus may define a distinct family of cyclin-dependent kinase activating proteins. [provided by RefSeq, Jul 2008]']


284021 (p = 9.447E-04; d = -0.405): No description found


This may take a couple of minutes ...
Using NIH described genes only; Entrez ID sample size now 18896
Fetching NIH gene descriptions ...
gene_rec(entrez=100008589, cohen_d=-1.2290953912044906, p_value=2.0241628896538788e-14)
gene_rec(entrez=7280, cohen_d=-0.53891549626328694, p_value=2.2966143346436776e-06)
gene_rec(entrez=729722, cohen_d=-0.4447538843741502, p_value=5.2513762728496695e-06)
gene_rec(entrez=2823, cohen_d=-0.42982147249174751, p_value=2.2383849352871024e-05)
gene_rec(entrez=1630, cohen_d=-0.41779744518214013, p_value=1.7743714428230885e-05)
gene_rec(entrez=8708, cohen_d=-0.41018703333832729, p_value=2.5042138167887821e-05)
gene_rec(entrez=144402, cohen_d=-0.40934949996661674, p_value=2.6037552211934581e-05)
gene_rec(entrez=1612, cohen_d=-0.40742959398615641, p_value=2.8304500939742207e-05)
gene_rec(entrez=5453, cohen_d=-0.39621089244356084, p_value=4.6819340922813228e-05)
gene_rec(entrez=142679, cohen_d=-0.39589815125082289, p_value=4.7184326922617708e-05)
gene_rec(entrez=5800, cohen_d=-0.39157349973731814, p_value=5.694699829860936e-05)
gene_rec(entrez=57496, cohen_d=-0.39087997096565952, p_value=7.9630143158153309e-05)
gene_rec(entrez=57630, cohen_d=-0.38939938697644283, p_value=6.2650727500072436e-05)
gene_rec(entrez=9495, cohen_d=-0.38735265267962882, p_value=6.8296039639428389e-05)
gene_rec(entrez=221662, cohen_d=-0.38226693906914988, p_value=8.4833443474628025e-05)
gene_rec(entrez=5100, cohen_d=-0.38052087836663101, p_value=9.1247508561446945e-05)
gene_rec(entrez=7087, cohen_d=-0.37927594117677249, p_value=9.6252457132455207e-05)
gene_rec(entrez=2904, cohen_d=-0.37894144570664301, p_value=9.752652267132724e-05)
gene_rec(entrez=152110, cohen_d=-0.37449902323213963, p_value=0.00011752964424929774)
gene_rec(entrez=4889, cohen_d=-0.37309968063731652, p_value=0.00012471414555258485)

Corrected Bonferroni Alpha: 2.405E-06


100008589 (p = 2.024E-14; d = -1.229): No description found


7280 (p = 2.297E-06; d = -0.539): [u'Microtubules, key participants in processes such as mitosis and intracellular transport, are composed of heterodimers of alpha- and beta-tubulins. The protein encoded by this gene is a beta-tubulin. Defects in this gene are associated with complex cortical dysplasia with other brain malformations-5. Two transcript variants encoding distinct isoforms have been found for this gene. [provided by RefSeq, Jul 2015]']


729722 (p = 5.251E-06; d = -0.445): [u'putative ankyrin repeat domain-containing protein ENSP00000383090']


2823 (p = 2.238E-05; d = -0.430): No description found


1630 (p = 1.774E-05; d = -0.418): [u'This gene encodes a netrin 1 receptor. The transmembrane protein is a member of the immunoglobulin superfamily of cell adhesion molecules, and mediates axon guidance of neuronal growth cones towards sources of netrin 1 ligand. The cytoplasmic tail interacts with the tyrosine kinases Src and focal adhesion kinase (FAK, also known as PTK2) to mediate axon attraction. The protein partially localizes to lipid rafts, and induces apoptosis in the absence of ligand. The protein functions as a tumor suppressor, and is frequently mutated or downregulated in colorectal cancer and esophageal carcinoma. [provided by RefSeq, Oct 2009]']


8708 (p = 2.504E-05; d = -0.410): [u'This gene is a member of the beta-1,3-galactosyltransferase (beta3GalT) gene family. This family encodes type II membrane-bound glycoproteins with diverse enzymatic functions using different donor substrates (UDP-galactose and UDP-N-acetylglucosamine) and different acceptor sugars (N-acetylglucosamine, galactose, N-acetylgalactosamine). The beta3GalT genes are distantly related to the Drosophila Brainiac gene and have the protein coding sequence contained in a single exon. The beta3GalT proteins also contain conserved sequences not found in the beta4GalT or alpha3GalT proteins. The carbohydrate chains synthesized by these enzymes are designated as type 1, whereas beta4GalT enzymes synthesize type 2 carbohydrate chains. The ratio of type 1:type 2 chains changes during embryogenesis. By sequence similarity, the beta3GalT genes fall into at least two groups: beta3GalT4 and 4 other beta3GalT genes (beta3GalT1-3, beta3GalT5). This gene is expressed exclusively in the brain. The encoded protein shows strict donor substrate specificity for UDP-galactose. [provided by RefSeq, Jul 2008]']


144402 (p = 2.604E-05; d = -0.409): No description found


1612 (p = 2.830E-05; d = -0.407): [u'Death-associated protein kinase 1 is a positive mediator of gamma-interferon induced programmed cell death. DAPK1 encodes a structurally unique 160-kD calmodulin dependent serine-threonine kinase that carries 8 ankyrin repeats and 2 putative P-loop consensus sites. It is a tumor suppressor candidate. Alternative splicing results in multiple transcript variants. [provided by RefSeq, Dec 2013]']


5453 (p = 4.682E-05; d = -0.396): No description found


142679 (p = 4.718E-05; d = -0.396): [u'Dual-specificity phosphatases (DUSPs) constitute a large heterogeneous subgroup of the type I cysteine-based protein-tyrosine phosphatase superfamily. DUSPs are characterized by their ability to dephosphorylate both tyrosine and serine/threonine residues. They have been implicated as major modulators of critical signaling pathways. DUSP19 contains a variation of the consensus DUSP C-terminal catalytic domain, with the last serine residue replaced by alanine, and lacks the N-terminal CH2 domain found in the MKP (mitogen-activated protein kinase phosphatase) class of DUSPs (see MIM 600714) (summary by Patterson et al., 2009 [PubMed 19228121]).[supplied by OMIM, Dec 2009]']


5800 (p = 5.695E-05; d = -0.392): [u'This gene encodes a member of the R3 subtype family of receptor-type protein tyrosine phosphatases. These proteins are localized to the apical surface of polarized cells and may have tissue-specific functions through activation of Src family kinases. This gene contains two distinct promoters, and alternatively spliced transcript variants encoding multiple isoforms have been observed. The encoded proteins may have multiple isoform-specific and tissue-specific functions, including the regulation of osteoclast production and activity, inhibition of cell proliferation and facilitation of apoptosis. This gene is a candidate tumor suppressor, and decreased expression of this gene has been observed in several types of cancer. [provided by RefSeq, May 2011]']


57496 (p = 7.963E-05; d = -0.391): No description found


57630 (p = 6.265E-05; d = -0.389): [u'This gene encodes a protein containing an N-terminus RING-finger, four SH3 domains, and a region implicated in binding of the Rho GTPase Rac. Via the RING-finger, the encoded protein has been shown to function as an ubiquitin-protein ligase involved in protein sorting at the trans-Golgi network. The encoded protein may also act as a scaffold for the c-Jun N-terminal kinase signaling pathway, facilitating the formation of a functional signaling module. [provided by RefSeq, Jul 2008]']


9495 (p = 6.830E-05; d = -0.387): [u'The A-kinase anchor proteins (AKAPs) are a group of structurally diverse proteins, which have the common function of binding to the regulatory subunit of protein kinase A (PKA) and confining the holoenzyme to discrete locations within the cell. This gene encodes a member of the AKAP family. The encoded protein binds to the RII-beta regulatory subunit of PKA, and also to protein kinase C and the phosphatase calcineurin. It is predominantly expressed in cerebral cortex and may anchor the PKA protein at postsynaptic densities (PSD) and be involved in the regulation of postsynaptic events. It is also expressed in T lymphocytes and may function to inhibit interleukin-2 transcription by disrupting calcineurin-dependent dephosphorylation of NFAT. [provided by RefSeq, Jul 2008]']


221662 (p = 8.483E-05; d = -0.382): No description found


5100 (p = 9.125E-05; d = -0.381): [u'This gene belongs to the protocadherin gene family, a subfamily of the cadherin superfamily. The gene encodes an integral membrane protein that is thought to function in cell adhesion in a CNS-specific manner. Unlike classical cadherins, which are generally encoded by 15-17 exons, this gene includes only 3 exons. Notable is the large first exon encoding the extracellular region, including 6 cadherin domains and a transmembrane region. Alternative splicing yields isoforms with unique cytoplasmic tails. [provided by RefSeq, Jul 2008]']


7087 (p = 9.625E-05; d = -0.379): [u'The protein encoded by this gene is a member of the intercellular adhesion molecule (ICAM) family. All ICAM proteins are type I transmembrane glycoproteins, contain 2-9 immunoglobulin-like C2-type domains, and bind to the leukocyte adhesion LFA-1 protein. This protein is expressed on the surface of telencephalic neurons and displays two types of adhesion activity, homophilic binding between neurons and heterophilic binding between neurons and leukocytes. It may be a critical component in neuron-microglial cell interactions in the course of normal development or as part of neurodegenerative diseases. [provided by RefSeq, Jul 2008]']


2904 (p = 9.753E-05; d = -0.379): [u'N-methyl-D-aspartate (NMDA) receptors are a class of ionotropic glutamate receptors. NMDA receptor channel has been shown to be involved in long-term potentiation, an activity-dependent increase in the efficiency of synaptic transmission thought to underlie certain kinds of memory and learning. NMDA receptor channels are heteromers composed of three different subunits: NR1 (GRIN1), NR2 (GRIN2A, GRIN2B, GRIN2C, or GRIN2D) and NR3 (GRIN3A or GRIN3B). The NR2 subunit acts as the agonist binding site for glutamate. This receptor is the predominant excitatory neurotransmitter receptor in the mammalian brain. [provided by RefSeq, Jul 2008]']


152110 (p = 1.175E-04; d = -0.374): No description found


4889 (p = 1.247E-04; d = -0.373): No description found



In [19]:
import scipy.stats as stats
r_vals = []
for gene in depression_genes:
    ge_mat = N.make_ge_ns_mat('depression',[gene])
    #r_vals.append(stats.spearmanr(ge_mat[:, 0], ge_mat[:, 1])[0])
    r_vals.append(np.corrcoef(ge_mat[:, ge_mat.shape[1]-1], ge_mat[:, r])[1, 0])
hist(r_vals)


/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:6: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
Out[19]:
(array([ 3.,  0.,  3.,  0.,  2.,  2.,  1.,  0.,  1.,  1.]),
 array([-0.05428171, -0.04125796, -0.02823422, -0.01521047, -0.00218672,
         0.01083703,  0.02386077,  0.03688452,  0.04990827,  0.06293202,
         0.07595576]),
 <a list of 10 Patch objects>)

In [44]:
from nsaba.nsaba import visualizer
V = visualizer.NsabaVisualizer(N)

A.t_test('depression',100008589,graphops='violin')


t-value: -6.5180 
p-value: 5.245E-10
Effect size: -0.9478 


In [ ]:


In [ ]: