Notebook by @gregcaporaso. The Pandas Cookbook notebooks were helpful in putting this together.
This notebook can be applied to inspect or clean up the sample metadata mapping files that are used in the analyses.
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
pd.set_option('display.mpl_style', 'default') # Make the graphs a bit prettier
plt.rcParams['figure.figsize'] = (15, 5)
This mapping file was extracted from the QIIME Database (April 2014) from study id 550. No clean-up is necessary.
In [2]:
moving_pictures_df = pd.read_csv('moving-pictures-map-raw.txt', sep='\t', index_col='#SampleID', parse_dates=['COLLECTION_DATE'])
print moving_pictures_df.BODY_SITE.unique()
In [3]:
moving_pictures_df.to_csv('moving-pictures-map.txt', sep='\t', index_col='#SampleID')
This mapping file was extracted from the QIIME Database (April 2014) from study id 550. Here I add a binned pH column for categorical analysis of the pH data.
In [4]:
soil_df = pd.read_csv('soils-map-raw.txt', sep='\t', index_col='#SampleID', parse_dates=['COLLECTION_DATE'])
print soil_df.PH.hist()
In [5]:
soil_df['ph_bin'] = pd.qcut(soil_df.PH, 5, labels=['acidic', 'acidic/neutral','neutral','basic/neutral','basic'])
In [6]:
print soil_df['ph_bin']
In [7]:
soil_df.to_csv('soils-map.txt', sep='\t', index_col='#SampleID')
This mapping file was extracted from the QIIME Database (April 2014) from study id 550. Here I created a body_habitat_basic
column, which reflects the known grouping of the samples into gut, oral, and skin/other microbial communities (as presented in Figure 1 of Costello et al. (2009)).
In [8]:
whole_body_df = pd.read_csv('whole-body-map-raw.txt', sep='\t', index_col='#SampleID', parse_dates=['COLLECTION_DATE'])
In [9]:
env_matter_to_body_habitat_basic = {'ENVO:mucus':'skin/other', 'ENVO:sebum':'skin/other', 'ENVO:sweat':'skin/other',
'ENVO:ear wax':'skin/other', 'ENVO:feces':'gut', 'ENVO:urine':'skin/other', 'ENVO:saliva':'oral'}
whole_body_df['body_habitat_basic'] = [env_matter_to_body_habitat_basic[env_matter] for env_matter in whole_body_df['ENV_MATTER']]
In [10]:
print whole_body_df['body_habitat_basic']
In [11]:
whole_body_df.to_csv('whole-body-map.txt', sep='\t', index_col='#SampleID')