Notebook by @gregcaporaso. The Pandas Cookbook notebooks were helpful in putting this together.

This notebook can be applied to inspect or clean up the sample metadata mapping files that are used in the analyses.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option('display.mpl_style', 'default') # Make the graphs a bit prettier
plt.rcParams['figure.figsize'] = (15, 5)

Moving Pictures mapping file

This mapping file was extracted from the QIIME Database (April 2014) from study id 550. No clean-up is necessary.


In [2]:
moving_pictures_df = pd.read_csv('moving-pictures-map-raw.txt', sep='\t', index_col='#SampleID', parse_dates=['COLLECTION_DATE'])
print moving_pictures_df.BODY_SITE.unique()


['UBERON:sebum' 'UBERON:saliva' 'UBERON:feces']

In [3]:
moving_pictures_df.to_csv('moving-pictures-map.txt', sep='\t', index_col='#SampleID')

88 soils mapping file

This mapping file was extracted from the QIIME Database (April 2014) from study id 550. Here I add a binned pH column for categorical analysis of the pH data.


In [4]:
soil_df = pd.read_csv('soils-map-raw.txt', sep='\t', index_col='#SampleID', parse_dates=['COLLECTION_DATE'])
print soil_df.PH.hist()


Axes(0.125,0.125;0.775x0.775)

In [5]:
soil_df['ph_bin'] = pd.qcut(soil_df.PH, 5, labels=['acidic', 'acidic/neutral','neutral','basic/neutral','basic'])

In [6]:
print soil_df['ph_bin']


#SampleID
IT2.141720    acidic/neutral
HI3.141676     basic/neutral
MD2.141689             basic
CA1.141704     basic/neutral
PE5.141692            acidic
CO1.141714           neutral
DF3.141696    acidic/neutral
PE1.141715            acidic
SP2.141678    acidic/neutral
CO3.141651           neutral
SA2.141687             basic
CM1.141723             basic
LQ2.141729    acidic/neutral
SR2.141673             basic
CR1.141682             basic
...
SR3.141674     basic/neutral
CF3.141691            acidic
SK2.141662           neutral
AR1.141727           neutral
BB2.141659    acidic/neutral
GB3.141652     basic/neutral
GB2.141732     basic/neutral
PE3.141731            acidic
MT1.141719     basic/neutral
SP1.141656           neutral
CL4.141667    acidic/neutral
KP4.141733     basic/neutral
VC2.141694           neutral
CL2.141671           neutral
SK1.141669    acidic/neutral
Name: ph_bin, Length: 89, dtype: object

In [7]:
soil_df.to_csv('soils-map.txt', sep='\t', index_col='#SampleID')

Whole Body mapping file

This mapping file was extracted from the QIIME Database (April 2014) from study id 550. Here I created a body_habitat_basic column, which reflects the known grouping of the samples into gut, oral, and skin/other microbial communities (as presented in Figure 1 of Costello et al. (2009)).


In [8]:
whole_body_df = pd.read_csv('whole-body-map-raw.txt', sep='\t', index_col='#SampleID', parse_dates=['COLLECTION_DATE'])

In [9]:
env_matter_to_body_habitat_basic = {'ENVO:mucus':'skin/other', 'ENVO:sebum':'skin/other', 'ENVO:sweat':'skin/other', 
                                    'ENVO:ear wax':'skin/other', 'ENVO:feces':'gut', 'ENVO:urine':'skin/other', 'ENVO:saliva':'oral'}

whole_body_df['body_habitat_basic'] = [env_matter_to_body_habitat_basic[env_matter] for env_matter in whole_body_df['ENV_MATTER']]

In [10]:
print whole_body_df['body_habitat_basic']


#SampleID
F33Nost.140487    skin/other
M11Plml.140620    skin/other
M12Aptr.140800    skin/other
F21Ewxr.140299    skin/other
M41Fcsp.140643           gut
F22Fcsw.140281           gut
M54Nost.140835    skin/other
F13Plml.140601    skin/other
M23Urin.140740    skin/other
M34Plml.140373    skin/other
F22Urin.140306    skin/other
M12Forl.140812    skin/other
F12Mout.140450          oral
M42Plml.140513    skin/other
M14Knee.140681    skin/other
...
F32Nose.140543    skin/other
M31Ewxr.140754    skin/other
F32Plml.140737    skin/other
M42Kner.140502    skin/other
M34Frhd.140638    skin/other
F23Nost.140510    skin/other
M32Fcsp.140470           gut
M21Urin.140335    skin/other
F33Knee.140702    skin/other
M22Hair.140456    skin/other
F24Uric.140658    skin/other
M44Uric.140688    skin/other
M22Pinl.140526    skin/other
F31Indr.140675    skin/other
F31Nstr.140608    skin/other
Name: body_habitat_basic, Length: 602, dtype: object

In [11]:
whole_body_df.to_csv('whole-body-map.txt', sep='\t', index_col='#SampleID')