Compile Probe Annotations

Got to do this in rpy2


In [ ]:
HDFS_DIR = '/cellar/users/agross/Data/SSD/'

In [1]:
import pandas as pd

In [38]:
from pandas.rpy.common import convert_robj
import rpy2.robjects as robjects
from pandas.rpy.common import convert_to_r_dataframe, convert_robj

In [3]:
%load_ext rpy2.ipython

In [4]:
%%R
require('IlluminaHumanMethylation450kanno.ilmn12.hg19')
ann = IlluminaHumanMethylation450kanno.ilmn12.hg19
data = ann@data


Loading required package: IlluminaHumanMethylation450kanno.ilmn12.hg19
Loading required package: minfi
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from ‘package:stats’:

    xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
    duplicated, eval, evalq, Filter, Find, get, intersect, is.unsorted,
    lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rep.int, rownames,
    sapply, setdiff, sort, table, tapply, union, unique, unlist

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: lattice
Loading required package: reshape
Loading required package: GenomicRanges
Loading required package: IRanges

Attaching package: ‘IRanges’

The following objects are masked from ‘package:reshape’:

    expand, rename

Loading required package: XVector
Loading required package: Biostrings
Loading required package: bumphunter
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: iterators
Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22

Attaching package: ‘locfit’

The following objects are masked from ‘package:GenomicRanges’:

    left, right


In [8]:
islands = convert_robj(robjects.r("data.frame(data[['Islands.UCSC']])"))
locations = convert_robj(robjects.r("data.frame(data[['Locations']])"))
other = convert_robj(robjects.r("data.frame(data[['Other']])"))
other = other.ix[:, 2:] #first two columns are probe sequences which take up a lot of space
snps = convert_robj(robjects.r("data.frame(data[['SNPs.Illumina']])"))

In [9]:
PATH = '/cellar/users/agross/Data/GeneSets/methylation450_annotations.csv'
probe_annotations = pd.read_csv(PATH, index_col=0)

In [26]:
store = pd.HDFStore(HDFS_DIR + 'methylation_annotation.h5')
store.append('islands', islands)
store.create_table_index('islands', optlevel=9, kind='full')
store.append('locations', locations)
store.create_table_index('locations', optlevel=9, kind='full')
store.append('other', other)
store.create_table_index('other', optlevel=9, kind='full')
store.append('snps', snps)
store.create_table_index('snps', optlevel=9, kind='full')
store.append('probe_annotations', probe_annotations)
store.create_table_index('probe_annotations', optlevel=9, kind='full')

Read in Flow Sorted Data and Convert to More Usable Format


In [27]:
%%R
require(minfi)
require(minfiData)
require(doParallel)

referencePkg <- "FlowSorted.Blood.450k"
require(referencePkg, character.only = TRUE)
data(list = referencePkg)
referenceRGset <- get(referencePkg)


Loading required package: minfiData
Loading required package: IlluminaHumanMethylation450kmanifest
Loading required package: doParallel
Loading required package: FlowSorted.Blood.450k

In [28]:
%%R
Mset.quantile = preprocessQuantile(referenceRGset, removeBadSamples = TRUE)
beta = getBeta(Mset.quantile)


[preprocessQuantile] Mapping to genome.
[preprocessQuantile] Fixing outliers.
[preprocessQuantile] Quantile normalizing.

In [29]:
beta = convert_robj(robjects.r("data.frame(beta)"))

In [30]:
labels = convert_robj(robjects.r('referenceRGset$CellType'))
label_map = pd.Series(labels, beta.columns)

In [31]:
store['label_map'] = label_map

In [ ]:
beta.to_hdf(store,'flow_sorted_data')

In [33]:
store.close()

In [34]:
robjects.r.library('WGCNA');
robjects.r.source("/cellar/users/agross/Data/MethylationAge/Horvath/NORMALIZATION.R");


Loading required package: dynamicTreeCut
Loading required package: flashClust

Attaching package: ‘flashClust’

The following object is masked from ‘package:stats’:

    hclust

==========================================================================
*
*  Package WGCNA 1.41.1 loaded.
*
*    Important note: It appears that your system supports multi-threading,
*    but it is not enabled within WGCNA in R. 
*    To allow multi-threading within WGCNA with all available cores, use 
*
*          allowWGCNAThreads()
*
*    within R. Use disableWGCNAThreads() to disable threading if necessary.
*    Alternatively, set the following environment variable on your system:
*
*          ALLOW_WGCNA_THREADS=<number_of_processors>
*
*    for example 
*
*          ALLOW_WGCNA_THREADS=8
*
*    To set the environment variable in linux bash shell, type 
*
*           export ALLOW_WGCNA_THREADS=8
*
*     before running R. Other operating systems or shells will
*     have a similar command to achieve the same aim.
*
==========================================================================



Attaching package: ‘WGCNA’

The following object is masked from ‘package:IRanges’:

    cor

The following object is masked from ‘package:stats’:

    cor

Loading required package: RPMM
Loading required package: cluster

In [35]:
gold_standard = pd.read_csv('/cellar/users/agross/Data/MethylationAge/Horvath/probeAnnotation21kdatMethUsed.csv', 
                            index_col=0)

In [39]:
def normalize_meth(df):
    df = df.ix[gold_standard.index]
    df = df.T.fillna(gold_standard.goldstandard2).T

    df_r = robjects.r.t(convert_to_r_dataframe(df))
    gs = list(gold_standard.goldstandard2.ix[df.index])
    gs_r = robjects.FloatVector(gs)
    data_n_r = robjects.r.BMIQcalibration(df_r, gs_r)
    data_n = convert_robj(data_n_r).T
    return data_n

In [40]:
data_n = normalize_meth(beta)


[1] "Fitting EM beta mixture to goldstandard probes"
[1] Inf
[1] 0.008547826
[1] 0.007005981
[1] 0.009442499
[1] 0.01056228
[1] "Done"
ii= 1
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01023027
[1] 0.009826291
[1] 0.009266958
[1] 0.008553946
[1] "Done"
[1] "Start normalising input probes"
ii= 2
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009436482
[1] 0.009010343
[1] 0.008323679
[1] 0.007523282
[1] "Done"
[1] "Start normalising input probes"
ii= 3
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01068228
[1] 0.01005551
[1] 0.009282203
[1] 0.008314943
[1] "Done"
[1] "Start normalising input probes"
ii= 4
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009733978
[1] 0.009811863
[1] 0.00962571
[1] 0.009045181
[1] "Done"
[1] "Start normalising input probes"
ii= 5
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009489052
[1] 0.009409222
[1] 0.008959802
[1] 0.008283887
[1] "Done"
[1] "Start normalising input probes"
ii= 6
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009324937
[1] 0.009161499
[1] 0.008681662
[1] 0.007952304
[1] "Done"
[1] "Start normalising input probes"
ii= 7
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01075274
[1] 0.009485142
[1] 0.008580555
[1] 0.007703105
[1] "Done"
[1] "Start normalising input probes"
ii= 8
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01047105
[1] 0.009220373
[1] 0.00818817
[1] 0.007161418
[1] "Done"
[1] "Start normalising input probes"
ii= 9
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.0113729
[1] 0.01018567
[1] 0.009027107
[1] 0.007945862
[1] "Done"
[1] "Start normalising input probes"
ii= 10
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01092163
[1] 0.01100352
[1] 0.01053349
[1] 0.009762516
[1] "Done"
[1] "Start normalising input probes"
ii= 11
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01168212
[1] 0.01128838
[1] 0.01053688
[1] 0.009547311
[1] "Done"
[1] "Start normalising input probes"
ii= 12
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01132091
[1] 0.01113542
[1] 0.01039278
[1] 0.009354893
[1] "Done"
[1] "Start normalising input probes"
ii= 13
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01020447
[1] 0.009712368
[1] 0.009231272
[1] 0.008571845
[1] "Done"
[1] "Start normalising input probes"
ii= 14
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009610878
[1] 0.009545053
[1] 0.009228907
[1] 0.008651666
[1] "Done"
[1] "Start normalising input probes"
ii= 15
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.0100768
[1] 0.009967384
[1] 0.009595477
[1] 0.008919797
[1] "Done"
[1] "Start normalising input probes"
ii= 16
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01247362
[1] 0.01178717
[1] 0.01106416
[1] 0.01022293
[1] "Done"
[1] "Start normalising input probes"
ii= 17
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01161763
[1] 0.01072436
[1] 0.009896994
[1] 0.008976952
[1] "Done"
[1] "Start normalising input probes"
ii= 18
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01317808
[1] 0.01221463
[1] 0.01129939
[1] 0.01026172
[1] "Done"
[1] "Start normalising input probes"
ii= 19
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.0105017
[1] 0.00920218
[1] 0.008178436
[1] 0.007170861
[1] "Done"
[1] "Start normalising input probes"
ii= 20
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01092858
[1] 0.009448877
[1] 0.008215955
[1] 0.007062468
[1] "Done"
[1] "Start normalising input probes"
ii= 21
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01073185
[1] 0.009432575
[1] 0.008230508
[1] 0.007120087
[1] "Done"
[1] "Start normalising input probes"
ii= 22
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01127176
[1] 0.01071555
[1] 0.01012174
[1] 0.009302703
[1] "Done"
[1] "Start normalising input probes"
ii= 23
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.00970992
[1] 0.009528439
[1] 0.00917507
[1] 0.008521863
[1] "Done"
[1] "Start normalising input probes"
ii= 24
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01073021
[1] 0.01057176
[1] 0.01012315
[1] 0.009402837
[1] "Done"
[1] "Start normalising input probes"
ii= 25
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01039622
[1] 0.009810437
[1] 0.009078659
[1] 0.008133983
[1] "Done"
[1] "Start normalising input probes"
ii= 26
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01182434
[1] 0.0107445
[1] 0.009637753
[1] 0.008507685
[1] "Done"
[1] "Start normalising input probes"
ii= 27
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01080689
[1] 0.01018908
[1] 0.009363736
[1] 0.008401483
[1] "Done"
[1] "Start normalising input probes"
ii= 28
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009868283
[1] 0.009657039
[1] 0.009146975
[1] 0.008362317
[1] "Done"
[1] "Start normalising input probes"
ii= 29
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009605004
[1] 0.009230724
[1] 0.00856012
[1] 0.00776864
[1] "Done"
[1] "Start normalising input probes"
ii= 30
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009988319
[1] 0.009912126
[1] 0.009402724
[1] 0.00864858
[1] "Done"
[1] "Start normalising input probes"
ii= 31
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01083974
[1] 0.009729307
[1] 0.008578559
[1] 0.007478666
[1] "Done"
[1] "Start normalising input probes"
ii= 32
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01006624
[1] 0.009053035
[1] 0.008061232
[1] 0.007115429
[1] "Done"
[1] "Start normalising input probes"
ii= 33
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01054159
[1] 0.00945053
[1] 0.008468364
[1] 0.007496301
[1] "Done"
[1] "Start normalising input probes"
ii= 34
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01056997
[1] 0.01049988
[1] 0.009923185
[1] 0.009035759
[1] "Done"
[1] "Start normalising input probes"
ii= 35
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01217659
[1] 0.01177213
[1] 0.01101787
[1] 0.01002794
[1] "Done"
[1] "Start normalising input probes"
ii= 36
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01171168
[1] 0.01138887
[1] 0.01074079
[1] 0.009805231
[1] "Done"
[1] "Start normalising input probes"
ii= 37
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01078004
[1] 0.01057884
[1] 0.01011943
[1] 0.009335859
[1] "Done"
[1] "Start normalising input probes"
ii= 38
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01128077
[1] 0.01110618
[1] 0.01048798
[1] 0.009592993
[1] "Done"
[1] "Start normalising input probes"
ii= 39
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009769312
[1] 0.009578483
[1] 0.00922093
[1] 0.008591407
[1] "Done"
[1] "Start normalising input probes"
ii= 40
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01246212
[1] 0.01175322
[1] 0.01088872
[1] 0.009858786
[1] "Done"
[1] "Start normalising input probes"
ii= 41
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01225856
[1] 0.01145393
[1] 0.0106402
[1] 0.00969996
[1] "Done"
[1] "Start normalising input probes"
ii= 42
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01191813
[1] 0.01124039
[1] 0.01042536
[1] 0.009416451
[1] "Done"
[1] "Start normalising input probes"
ii= 43
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01070761
[1] 0.009787715
[1] 0.008684915
[1] 0.007565564
[1] "Done"
[1] "Start normalising input probes"
ii= 44
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01033669
[1] 0.009388771
[1] 0.008262022
[1] 0.00713916
[1] "Done"
[1] "Start normalising input probes"
ii= 45
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01125671
[1] 0.01000996
[1] 0.008767284
[1] 0.007568184
[1] "Done"
[1] "Start normalising input probes"
ii= 46
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01113371
[1] 0.0109196
[1] 0.01039837
[1] 0.009539701
[1] "Done"
[1] "Start normalising input probes"
ii= 47
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01079374
[1] 0.01069245
[1] 0.01001714
[1] 0.009154231
[1] "Done"
[1] "Start normalising input probes"
ii= 48
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01071208
[1] 0.01074448
[1] 0.01024947
[1] 0.009472749
[1] "Done"
[1] "Start normalising input probes"
ii= 49
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009843241
[1] 0.009000582
[1] 0.008035213
[1] 0.00710348
[1] "Done"
[1] "Start normalising input probes"
ii= 50
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.0108197
[1] 0.009909336
[1] 0.008762327
[1] 0.007660298
[1] "Done"
[1] "Start normalising input probes"
ii= 51
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01026029
[1] 0.009330855
[1] 0.008190156
[1] 0.007074137
[1] "Done"
[1] "Start normalising input probes"
ii= 52
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01034261
[1] 0.009491481
[1] 0.008380126
[1] 0.007251519
[1] "Done"
[1] "Start normalising input probes"
ii= 53
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01107918
[1] 0.01008208
[1] 0.009001733
[1] 0.007909088
[1] "Done"
[1] "Start normalising input probes"
ii= 54
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01298345
[1] 0.01190344
[1] 0.01061476
[1] 0.009380944
[1] "Done"
[1] "Start normalising input probes"
ii= 55
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009676665
[1] 0.008861619
[1] 0.007929816
[1] 0.006968677
[1] "Done"
[1] "Start normalising input probes"
ii= 56
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01009102
[1] 0.009093187
[1] 0.007956899
[1] 0.006865738
[1] "Done"
[1] "Start normalising input probes"
ii= 57
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.00976811
[1] 0.008747122
[1] 0.007736342
[1] 0.006701171
[1] "Done"
[1] "Start normalising input probes"
ii= 58
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01022828
[1] 0.009383221
[1] 0.008293084
[1] 0.007167398
[1] "Done"
[1] "Start normalising input probes"
ii= 59
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.009639483
[1] 0.008778499
[1] 0.007711985
[1] 0.006644498
[1] "Done"
[1] "Start normalising input probes"
ii= 60
[1] "Fitting EM beta mixture to input probes"
[1] Inf
[1] 0.01000023
[1] 0.009147731
[1] 0.008146869
[1] 0.007070174
[1] "Done"
[1] "Start normalising input probes"

In [42]:
data_n.to_hdf(HDFS_DIR, 'methylation_annotation.h5',
              'flow_sorted_data_horvath_norm')