Save Detection P-Values

I have saved the detection p-values in .csv files in the MINFI processing pipeline. Here I am just converting those files into HDFS to make it a bit easier to read in the data and manipulate.

For now I am also saving these in compressed form as most of the p-values are 0.



In [1]:

    
PATH = '/cellar/users/agross/TCGA_Code/Methlation/'



In [2]:

    
cd $PATH









    



/cellar/users/agross/TCGA_Code/Methlation



In [3]:

    
import NotebookImport
from Setup.Imports import *









    



/cellar/users/agross/anaconda2/lib/python2.7/site-packages/IPython/nbformat/current.py:19: UserWarning: IPython.nbformat.current is deprecated.

- use IPython.nbformat for read/write/validate public API
- use IPython.nbformat.vX directly to composing notebooks of a particular version

  """)






    




importing IPython notebook from Setup/Imports






    



Populating the interactive namespace from numpy and matplotlib

Epic Data



In [5]:

    
epic = pd.read_csv(PATH + 'data/EPIC_ITALY/detectionP.csv',
                   index_col=0)
pData = pd.read_csv(PATH + 'data/EPIC_ITALY/pData.csv',
                    dtype='str', index_col=0)
epic.columns = epic.columns.map(lambda s: '_'.join(s.split('_')[1:]))
epic = epic.replace(0, nan)
epic = epic.stack()









    



The history saving thread hit an unexpected error (OperationalError('database is locked',)).History will not be written to the database.

Hannum



In [6]:

    
hannum = pd.read_csv(PATH + 'data/Hannum/detectionP.csv',
                   index_col=0)
pData = pd.read_csv(PATH + 'data/Hannum/pData.csv',
                    dtype='str', index_col=0)
hannum.columns = hannum.columns.map(lambda s: pData.Sample_Name[s])
hannum = hannum.replace(0, nan)
hannum = hannum.stack()

UCSD



In [7]:

    
ucsd = pd.read_csv(PATH + 'data/UCSD_Methylation/detectionP.csv',
                index_col=0)
p = pd.read_csv(PATH + 'data/UCSD_Methylation/pData.csv',
                index_col=0)
ucsd.columns = p.Sample_Name
ucsd = ucsd.replace(0, nan)
ucsd = ucsd.stack()



In [8]:

    
detection_p = pd.concat([ucsd, hannum, epic])



In [9]:

    
detection_p = detection_p.reset_index()



In [10]:

    
detection_p.to_hdf(HDFS_DIR + 'dx_methylation.h5', 'detection_p')









    



/cellar/users/agross/anaconda2/lib/python2.7/site-packages/pandas/io/pytables.py:2441: PerformanceWarning: 
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed-integer,key->axis0] [items->None]

  warnings.warn(ws, PerformanceWarning)