Compile Probe Annotations



In [16]:

    
import NotebookImport
from Imports import *









    




importing IPython notebook from Imports

Got to do this in rpy2



In [1]:

    
%load_ext rpy2.ipython



In [2]:

    
import pandas as pd

from pandas.rpy.common import convert_robj
import rpy2.robjects as robjects
from pandas.rpy.common import convert_to_r_dataframe, convert_robj



In [3]:

    
%%R
require('IlluminaHumanMethylation450kanno.ilmn12.hg19')
ann = IlluminaHumanMethylation450kanno.ilmn12.hg19
data = ann@data









    





Loading required package: IlluminaHumanMethylation450kanno.ilmn12.hg19
Loading required package: minfi
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

    xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames,
    sapply, setdiff, sort, table, tapply, union, unique, unlist

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: lattice
Loading required package: reshape
Loading required package: GenomicRanges
Loading required package: IRanges

Attaching package: ‘IRanges’

The following objects are masked from ‘package:reshape’:

    expand, rename

Loading required package: XVector
Loading required package: Biostrings
Loading required package: bumphunter
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: iterators
Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22

Attaching package: ‘locfit’

The following objects are masked from ‘package:GenomicRanges’:

    left, right



In [4]:

    
islands = convert_robj(robjects.r("data.frame(data[['Islands.UCSC']])"))
locations = convert_robj(robjects.r("data.frame(data[['Locations']])"))
other = convert_robj(robjects.r("data.frame(data[['Other']])"))
other = other.ix[:, 2:] #first two columns are probe sequences which take up a lot of space
snps = convert_robj(robjects.r("data.frame(data[['SNPs.Illumina']])"))



In [5]:

    
PATH = './Data/methylation450_annotations.csv'
probe_annotations = pd.read_csv(PATH, index_col=0)



In [7]:

    
store = pd.HDFStore(DATA_STORE)
store.append('islands', islands)
store.create_table_index('islands', optlevel=9, kind='full')
store.append('locations', locations)
store.create_table_index('locations', optlevel=9, kind='full')
store.append('other', other)
store.create_table_index('other', optlevel=9, kind='full')
store.append('snps', snps)
store.create_table_index('snps', optlevel=9, kind='full')
store.append('probe_annotations', probe_annotations)
store.create_table_index('probe_annotations', optlevel=9, kind='full')

Mapping Islands on to genes



In [11]:

    
ti = lambda s: s[s].index 
isl = islands.sort(['Islands_Name','Relation_to_Island']).dropna()
isl = isl[isl.Islands_Name.isin(ti(isl.Islands_Name.value_counts() > 7))]
isl = isl[isl.Islands_Name != '']

ot = other.Regulatory_Feature_Name
ot = ot[ot.isin(ti(ot.value_counts()> 7))]

gb = pd.concat([isl, probe_annotations], axis=1)
gb = gb[gb.Gene_Symbol.notnull() & gb.Islands_Name.notnull()]
g2 = gb.sort('Islands_Name')

top_gene = lambda s: s.Gene_Symbol.value_counts().index[0]
island_to_gene = g2.groupby('Islands_Name').apply(top_gene)
probe_to_island = isl
store['probe_to_island'] = probe_to_island
store['island_to_gene'] = island_to_gene

Read in Flow Sorted Data and Convert to More Usable Format



In [8]:

    
%%R
require(minfi)
require(minfiData)
require(doParallel)

referencePkg <- "FlowSorted.Blood.450k"
require(referencePkg, character.only = TRUE)
data(list = referencePkg)
referenceRGset <- get(referencePkg)









    





Loading required package: minfiData
Loading required package: IlluminaHumanMethylation450kmanifest
Loading required package: doParallel
Loading required package: FlowSorted.Blood.450k



In [9]:

    
%%R
Mset.quantile = preprocessQuantile(referenceRGset, removeBadSamples = TRUE)
beta = getBeta(Mset.quantile)









    





[preprocessQuantile] Mapping to genome.
[preprocessQuantile] Fixing outliers.
[preprocessQuantile] Quantile normalizing.



In [10]:

    
beta = convert_robj(robjects.r("data.frame(beta)"))



In [11]:

    
labels = convert_robj(robjects.r('referenceRGset$CellType'))
label_map = pd.Series(labels, beta.columns)



In [12]:

    
store['label_map'] = label_map



In [13]:

    
beta.to_hdf(store,'flow_sorted_data')



In [14]:

    
store.close()