Exploring which samples from Moving Pictures of the Human Microbiome we should re-sequence as controls in extended timeseries analysis.

Goal is to identify 8 samples where there is both a small and large effect size.


In [1]:
import pandas as pd
from skbio import DistanceMatrix

In [2]:
md = pd.read_csv('map.txt', sep='\t', index_col=0)

In [3]:
wdm = DistanceMatrix.from_file('weighted_unifrac_dm.txt')

Small effect size: L1S3 L1S4 (F4) samples are more different from L1S134 L1S136 (M3) than within each of those groups.


In [4]:
gut_wdm = wdm.filter(['L1S3', 'L1S4', 'L1S134', 'L1S136'])
print gut_wdm


4x4 distance matrix
IDs:
L1S3, L1S4, L1S134, L1S136
Data:
[[ 0.          0.04698566  0.44108603  0.45300514]
 [ 0.04698566  0.          0.4112005   0.42326145]
 [ 0.44108603  0.4112005   0.          0.11218037]
 [ 0.45300514  0.42326145  0.11218037  0.        ]]

In [5]:
eight_samples_wdm = wdm.filter(['L1S3', 'L1S4', 'L1S134', 'L1S136', 'L5S235', 'L5S236', 'L2S235', 'L2S236'])

Large effect size: gut samples are more similar to each other than other sample types (L5S235 and L5S236 are oral, L2S235 and L2S236 are skin).


In [6]:
print eight_samples_wdm['L1S3']
print eight_samples_wdm['L1S4']
print eight_samples_wdm['L1S134']
print eight_samples_wdm['L1S136']


[ 0.          0.04698566  0.44108603  0.45300514  0.69826639  0.71851624
  0.75133319  0.73974832]
[ 0.04698566  0.          0.4112005   0.42326145  0.71411905  0.73498209
  0.76974903  0.75647785]
[ 0.44108603  0.4112005   0.          0.11218037  0.80496795  0.82795422
  0.82298565  0.80010741]
[ 0.45300514  0.42326145  0.11218037  0.          0.82138334  0.84418623
  0.85982391  0.83227388]

The above was confirmed with unweighted unifrac as well.

Just misc notes below here...


In [7]:
md.columns


Out[7]:
Index([u'days_since_epoch', u'Subject_BodyHabitat', u'BodyHabitat', u'Subject_SampleType', u'Subject_SampleType_Detailed', u'SexIndividual', u'Mislabeled', u'Study', u'SampleType', u'SampleType_Detailed ', u'year', u'subject', u'BodyHabitat_Study'], dtype='object')

In [8]:
md[md['Subject_BodyHabitat'] == "M3_Skin"]


Out[8]:
days_since_epoch Subject_BodyHabitat BodyHabitat Subject_SampleType Subject_SampleType_Detailed SexIndividual Mislabeled Study SampleType SampleType_Detailed year subject BodyHabitat_Study
#SampleID
L2S233 14173 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S234 14174 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S235 14175 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S236 14176 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S237 14177 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S238 14178 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S239 14179 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S240 14180 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S241 14181 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S242 14182 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S243 14183 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S244 14184 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S245 14185 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S246 14186 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S247 14187 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S248 14188 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S249 14189 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S250 14190 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S251 14191 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S252 14192 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S253 14193 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S254 14194 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S255 14195 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S256 14196 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S257 14197 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S258 14198 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S259 14199 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S260 14200 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S261 14201 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
L2S262 14202 M3_Skin Skin M3_L_palm M3_L_palm M No antibiotic_timeseries L_palm L_palm 2008 M3 Skin_antibiotic_timeseries
... ... ... ... ... ... ... ... ... ... ... ... ... ...
L5S50 14560 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S51 14562 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S52 14563 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S53 14564 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S54 14565 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S55 14566 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S56 14567 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S57 14568 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S58 14569 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S59 14570 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S60 14571 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S61 14572 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S62 14573 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S63 14574 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S64 14575 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S65 14576 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S66 14578 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S67 14579 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S68 14583 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S69 14584 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S70 14585 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S71 14586 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S72 14588 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S73 14589 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S74 14590 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S75 14591 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S76 14593 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2009 M3 Skin_antibiotic_timeseries
L5S77 14611 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2010 M3 Skin_antibiotic_timeseries
L5S78 14612 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2010 M3 Skin_antibiotic_timeseries
L5S79 14613 M3_Skin Skin M3_R_palm M3_R_palm M No antibiotic_timeseries R_palm R_palm 2010 M3 Skin_antibiotic_timeseries

725 rows × 13 columns


In [8]: