Introducing GGBweb

...for Baligans...

(that means EGRIN 2.0)

GGBweb Highlights

  • Interact with data on a genome-scale
  • Compare multiple genomic regions, simultaneously
  • Transfer data to scripting environments for iterative refinement
  • Customize how you represent your data
  • Extend your analysis with easy data upload ...packaged in a modern and aesthetic web-based GUI

Problem:

Identify and validate transcription factors that regulate nucleotide metabolism

Today you will learn how to:

  • interact with the genome browser
  • add gene modules
  • view and compare several genomic regions
  • add data tracks
  • combine with data analysis in iPython
  • upload new data tracks

Load required tools from egrin2-tools package


In [3]:
from query.egrin2_query import *

# connect to the egrin 2.0 database
host = "primordial"
port = 27017
db = "eco_db"
client = MongoClient( 'mongodb://'+ host +':'+ str( port )+'/' )

Find genes involved in purine and pyrimidine metabolism

Nucleotide metabolism

  • GO:0009117 - nucleotide metabolic process

In [3]:
# get nucleotide gene info
gene_info = pd.DataFrame( list( client[ db ].row_info.find( { "$or": [ { "GO": { "$regex" : "GO:0009117" } }, { "TIGRRoles": { "$regex" : "nucleotide" } }, { "ECDesc": { "$regex" : "nucleotide" } } ] } ) ) )
gene_info.loc[ :,[ "name","ECDesc","TIGRRoles" ] ]
# get egrin2 gene names only
genes = gene_info.egrin2_row_name.tolist()

In [4]:
# name chromosome-:1823979-1824947 
ggbwebModule( genes, outfile = "nucleotide_module.txt", host = host, port = port, db = db )


Module written to: /Users/abrooks/Documents/git/GGBWeb/docs/nucleotide_module.txt

In [5]:
# find GREs discovered upstream of these genes
gre_candidates = agglom( genes, x_type="gene", y_type="gre", host=host, db=db, logic="or", x_input_type = "egrin2_row_name").sort(["counts","pval"],ascending=False)

gre_candidates.head()


Using or logic
Out[5]:
counts all_counts pval qval_BH qval_bonferroni
2 262 417 1.225490e-15 8.741826e-14 5.245096e-13
11 227 231 2.100609e-77 2.247651e-75 8.990605e-75
7 157 272 1.333561e-06 5.188765e-05 5.707641e-04
12 147 211 7.512623e-15 4.593432e-13 3.215403e-12
8 138 253 2.201497e-04 4.959161e-03 9.422405e-02

In [1]:
gres = gre_candidates.index[0:5].tolist()


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-e7eca1ea4863> in <module>()
----> 1 gres = gre_candidates.index[0:5].tolist()

NameError: name 'gre_candidates' is not defined

(we won't it here because it takes a couple of minutes for the entire genome)


In [7]:
# fimoFinder( locusId = "NC_000913", filterby = gres, filter_type = "gre", host=host, port=port, db=db, outfile = "fimo.txt")


Start not provided. Assuming beginning of chromosome
Stop not provided. Assuming end of chromosome
WARNING: Many of these filters are not supported currently. Only GREs!!!
Filtering motifs by gre_id
Writing file GRE_8_fimo.txt
Writing file GRE_12_fimo.txt
Writing file GRE_2_fimo.txt
Writing file GRE_11_fimo.txt
Writing file GRE_7_fimo.txt