Exploring which samples from Moving Pictures of the Human Microbiome we should re-sequence as controls in extended timeseries analysis.
Goal is to identify 8 samples where there is both a small and large effect size.
In [1]:
import pandas as pd
from skbio import DistanceMatrix
In [2]:
md = pd.read_csv('map.txt', sep='\t', index_col=0)
In [3]:
wdm = DistanceMatrix.from_file('weighted_unifrac_dm.txt')
Small effect size: L1S3 L1S4 (F4) samples are more different from L1S134 L1S136 (M3) than within each of those groups.
In [4]:
gut_wdm = wdm.filter(['L1S3', 'L1S4', 'L1S134', 'L1S136'])
print gut_wdm
4x4 distance matrix
IDs:
L1S3, L1S4, L1S134, L1S136
Data:
[[ 0. 0.04698566 0.44108603 0.45300514]
[ 0.04698566 0. 0.4112005 0.42326145]
[ 0.44108603 0.4112005 0. 0.11218037]
[ 0.45300514 0.42326145 0.11218037 0. ]]
In [5]:
eight_samples_wdm = wdm.filter(['L1S3', 'L1S4', 'L1S134', 'L1S136', 'L5S235', 'L5S236', 'L2S235', 'L2S236'])
Large effect size: gut samples are more similar to each other than other sample types (L5S235
and L5S236
are oral, L2S235
and L2S236
are skin).
In [6]:
print eight_samples_wdm['L1S3']
print eight_samples_wdm['L1S4']
print eight_samples_wdm['L1S134']
print eight_samples_wdm['L1S136']
[ 0. 0.04698566 0.44108603 0.45300514 0.69826639 0.71851624
0.75133319 0.73974832]
[ 0.04698566 0. 0.4112005 0.42326145 0.71411905 0.73498209
0.76974903 0.75647785]
[ 0.44108603 0.4112005 0. 0.11218037 0.80496795 0.82795422
0.82298565 0.80010741]
[ 0.45300514 0.42326145 0.11218037 0. 0.82138334 0.84418623
0.85982391 0.83227388]
The above was confirmed with unweighted unifrac as well.
Just misc notes below here...
In [7]:
md.columns
Out[7]:
Index([u'days_since_epoch', u'Subject_BodyHabitat', u'BodyHabitat', u'Subject_SampleType', u'Subject_SampleType_Detailed', u'SexIndividual', u'Mislabeled', u'Study', u'SampleType', u'SampleType_Detailed ', u'year', u'subject', u'BodyHabitat_Study'], dtype='object')
In [8]:
md[md['Subject_BodyHabitat'] == "M3_Skin"]
Out[8]:
days_since_epoch
Subject_BodyHabitat
BodyHabitat
Subject_SampleType
Subject_SampleType_Detailed
SexIndividual
Mislabeled
Study
SampleType
SampleType_Detailed
year
subject
BodyHabitat_Study
#SampleID
L2S233
14173
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S234
14174
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S235
14175
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S236
14176
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S237
14177
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S238
14178
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S239
14179
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S240
14180
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S241
14181
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S242
14182
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S243
14183
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S244
14184
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S245
14185
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S246
14186
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S247
14187
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S248
14188
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S249
14189
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S250
14190
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S251
14191
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S252
14192
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S253
14193
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S254
14194
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S255
14195
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S256
14196
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S257
14197
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S258
14198
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S259
14199
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S260
14200
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S261
14201
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
L2S262
14202
M3_Skin
Skin
M3_L_palm
M3_L_palm
M
No
antibiotic_timeseries
L_palm
L_palm
2008
M3
Skin_antibiotic_timeseries
...
...
...
...
...
...
...
...
...
...
...
...
...
...
L5S50
14560
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S51
14562
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S52
14563
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S53
14564
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S54
14565
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S55
14566
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S56
14567
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S57
14568
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S58
14569
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S59
14570
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S60
14571
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S61
14572
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S62
14573
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S63
14574
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S64
14575
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S65
14576
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S66
14578
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S67
14579
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S68
14583
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S69
14584
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S70
14585
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S71
14586
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S72
14588
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S73
14589
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S74
14590
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S75
14591
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S76
14593
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2009
M3
Skin_antibiotic_timeseries
L5S77
14611
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2010
M3
Skin_antibiotic_timeseries
L5S78
14612
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2010
M3
Skin_antibiotic_timeseries
L5S79
14613
M3_Skin
Skin
M3_R_palm
M3_R_palm
M
No
antibiotic_timeseries
R_palm
R_palm
2010
M3
Skin_antibiotic_timeseries
725 rows × 13 columns
In [8]:
Content source: gregcaporaso/sketchbook
Similar notebooks: