In [1]:
%run GLOBALS.py


3.5.1 (v3.5.1:37a07cee5969, Dec  5 2015, 21:12:44) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]

The goal of this notebook is to investigate/justify why some samples "look" weird when summing across contigs.


In [2]:
import matplotlib as mpl
% matplotlib inline

import pandas as pd
import seaborn as sns

from IPython.display import IFrame

In [3]:
import elviz_utils

Load Data


In [4]:
reduced = pd.read_csv('../results/reduced_data--all_phylogeny_remains.csv')

In [5]:
sample_info = elviz_utils.read_sample_info('../')

In [6]:
sample_info.head()


Out[6]:
ID oxy rep week project
0 1_LOW4 Low 1 4 1056013
1 13_LOW5 Low 1 5 1056037
2 25_LOW6 Low 1 6 1056061
3 37_LOW7 Low 1 7 1056085
4 49_LOW8 Low 1 8 1056109

Look into samples that have "too high" of Burkold.

See Rep 1 High O2 weeks 8, 10


In [7]:
IFrame('./plot_copies/160330_Order-Burkholderiales_Methylophilales_Methylococcales--Phylum-Bacteroidetes--rep.pdf', 
       width=800, height=300)


Out[7]:

In [8]:
ls "../plots/mixed_phylogeny/"


Family-Bacteriovoracaceae_Myxococcaceae--rep--annotated.pdf
Family-Bacteriovoracaceae_Myxococcaceae--rep.pdf
Genus-Methylobacter_Methylovulum_Methylomonas_Methylomicrobium_Methyloglobulus_Methylococcus_Methylocaldum_Methylosarcina--rep.pdf
Genus-Methylotenera_Methylovorus_Methylophilus_Methylobacillus--rep.pdf
Order-Burkholderiales_Methylophilales_Methylococcales--Phylum-Bacteroidetes--rep.pdf
Phylum-Bacteroidetes--Order-Burkholderiales_Methylophilales_Methylococcales--rep.pdf

In [9]:
sample_info[(sample_info['rep'] == 1) & 
            (sample_info['oxy'] == 'High') & 
            (sample_info['week'].isin([8, 10]))]


Out[9]:
ID oxy rep week project
48 55_HOW8 High 1 8 1056121
50 79_HOW10 High 1 10 1056169

http://genome.jgi.doe.gov/viz/plot?jgiProjectId=1056121


In [10]:
reduced[(reduced['Order']== 'Burkholderiales') &(reduced['ID']=='55_HOW8')]


Out[10]:
Kingdom Phylum Class Order Family Genus Length abundance project ID oxy rep week
28425 Bacteria Proteobacteria Betaproteobacteria Burkholderiales NaN other 7293012 0.328695 1056121 55_HOW8 High 1 8
28429 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Acidovorax 3276838 0.051180 1056121 55_HOW8 High 1 8
28430 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae other 3066762 0.047577 1056121 55_HOW8 High 1 8
28440 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae other 180848 0.007296 1056121 55_HOW8 High 1 8
28444 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Janthinobacterium 200994 0.005747 1056121 55_HOW8 High 1 8
28445 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Caldimonas 65886 0.005305 1056121 55_HOW8 High 1 8
28446 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Azohydromonas 95295 0.005024 1056121 55_HOW8 High 1 8
28447 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Variovorax 216718 0.004889 1056121 55_HOW8 High 1 8
28448 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Polaromonas 235192 0.004613 1056121 55_HOW8 High 1 8
28451 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Curvibacter 156196 0.003728 1056121 55_HOW8 High 1 8
28453 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Chitinimonas 91837 0.003180 1056121 55_HOW8 High 1 8
28454 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Comamonas 146042 0.003116 1056121 55_HOW8 High 1 8
28455 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Burkholderia 141186 0.002358 1056121 55_HOW8 High 1 8
28456 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Massilia 80712 0.002023 1056121 55_HOW8 High 1 8
28459 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Cupriavidus 50983 0.001542 1056121 55_HOW8 High 1 8
28461 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Herbaspirillum 57938 0.001331 1056121 55_HOW8 High 1 8
28464 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Alicycliphilus 52265 0.001249 1056121 55_HOW8 High 1 8
28465 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Ramlibacter 57246 0.001157 1056121 55_HOW8 High 1 8
28467 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Hydrogenophaga 34200 0.001012 1056121 55_HOW8 High 1 8
28473 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Pseudacidovorax 25347 0.000923 1056121 55_HOW8 High 1 8
28476 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Ralstonia 47142 0.000853 1056121 55_HOW8 High 1 8
28479 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Limnohabitans 50963 0.000786 1056121 55_HOW8 High 1 8
28485 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Hylemonella 30341 0.000661 1056121 55_HOW8 High 1 8
28494 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Derxia 20230 0.000476 1056121 55_HOW8 High 1 8
28495 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Delftia 34937 0.000471 1056121 55_HOW8 High 1 8
28496 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae other 22668 0.000455 1056121 55_HOW8 High 1 8
28500 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Herminiimonas 17493 0.000429 1056121 55_HOW8 High 1 8
28502 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Verminephrobacter 24390 0.000397 1056121 55_HOW8 High 1 8
28504 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Duganella 10154 0.000387 1056121 55_HOW8 High 1 8
28538 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Ottowia 24544 0.000269 1056121 55_HOW8 High 1 8
28548 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Bordetella 8768 0.000238 1056121 55_HOW8 High 1 8
28594 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Advenella 3801 0.000156 1056121 55_HOW8 High 1 8
28597 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Collimonas 13129 0.000155 1056121 55_HOW8 High 1 8
28619 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Achromobacter 8489 0.000124 1056121 55_HOW8 High 1 8
28662 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Pusillimonas 4803 0.000088 1056121 55_HOW8 High 1 8
28727 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae other 8612 0.000061 1056121 55_HOW8 High 1 8
28731 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Pandoraea 6765 0.000059 1056121 55_HOW8 High 1 8
28906 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Brachymonas 2404 0.000026 1056121 55_HOW8 High 1 8
28926 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Polynucleobacter 1121 0.000025 1056121 55_HOW8 High 1 8
28932 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Alcaligenes 2955 0.000024 1056121 55_HOW8 High 1 8
28939 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Candidatus Glomeribacter 4184 0.000023 1056121 55_HOW8 High 1 8
29110 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Sutterellaceae Parasutterella 1289 0.000012 1056121 55_HOW8 High 1 8
29174 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Lautropia 784 0.000009 1056121 55_HOW8 High 1 8
29267 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Taylorella 639 0.000006 1056121 55_HOW8 High 1 8
29340 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Oligella 1263 0.000005 1056121 55_HOW8 High 1 8
29374 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Oxalobacter 517 0.000003 1056121 55_HOW8 High 1 8

http://genome.jgi.doe.gov/viz/plot?jgiProjectId=1056169


In [11]:
reduced[(reduced['Order']== 'Burkholderiales') &(reduced['ID']=='79_HOW10')]


Out[11]:
Kingdom Phylum Class Order Family Genus Length abundance project ID oxy rep week
40950 Bacteria Proteobacteria Betaproteobacteria Burkholderiales NaN other 20055769 0.188278 1056169 79_HOW10 High 1 10
40952 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae other 8980139 0.117398 1056169 79_HOW10 High 1 10
40954 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Acidovorax 8257013 0.091013 1056169 79_HOW10 High 1 10
40961 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Comamonas 950034 0.011405 1056169 79_HOW10 High 1 10
40962 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Polaromonas 472251 0.010333 1056169 79_HOW10 High 1 10
40963 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Curvibacter 688999 0.009526 1056169 79_HOW10 High 1 10
40964 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Variovorax 496236 0.008910 1056169 79_HOW10 High 1 10
40965 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Azohydromonas 281903 0.006750 1056169 79_HOW10 High 1 10
40967 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Burkholderia 289132 0.004447 1056169 79_HOW10 High 1 10
40968 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Janthinobacterium 226737 0.004122 1056169 79_HOW10 High 1 10
40969 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae other 223136 0.003903 1056169 79_HOW10 High 1 10
40974 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Caldimonas 90472 0.003038 1056169 79_HOW10 High 1 10
40978 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Cupriavidus 160249 0.002732 1056169 79_HOW10 High 1 10
40981 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Alicycliphilus 160291 0.002632 1056169 79_HOW10 High 1 10
40984 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Limnohabitans 75564 0.002478 1056169 79_HOW10 High 1 10
40985 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Chitinimonas 120778 0.002390 1056169 79_HOW10 High 1 10
40988 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Hydrogenophaga 78541 0.001956 1056169 79_HOW10 High 1 10
40989 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Ramlibacter 72566 0.001934 1056169 79_HOW10 High 1 10
40990 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Massilia 95180 0.001860 1056169 79_HOW10 High 1 10
40997 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Herbaspirillum 67801 0.001488 1056169 79_HOW10 High 1 10
40998 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Ottowia 54297 0.001459 1056169 79_HOW10 High 1 10
41003 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Delftia 105030 0.001175 1056169 79_HOW10 High 1 10
41004 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Ralstonia 74456 0.001155 1056169 79_HOW10 High 1 10
41006 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Pseudacidovorax 40852 0.000978 1056169 79_HOW10 High 1 10
41009 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Hylemonella 44374 0.000848 1056169 79_HOW10 High 1 10
41010 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Herminiimonas 36334 0.000833 1056169 79_HOW10 High 1 10
41017 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Achromobacter 44983 0.000596 1056169 79_HOW10 High 1 10
41020 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Bordetella 28491 0.000565 1056169 79_HOW10 High 1 10
41022 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Collimonas 18295 0.000541 1056169 79_HOW10 High 1 10
41024 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Derxia 36530 0.000531 1056169 79_HOW10 High 1 10
41025 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae other 80941 0.000520 1056169 79_HOW10 High 1 10
41038 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae Duganella 19955 0.000356 1056169 79_HOW10 High 1 10
41057 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Verminephrobacter 24364 0.000277 1056169 79_HOW10 High 1 10
41063 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Pusillimonas 13958 0.000232 1056169 79_HOW10 High 1 10
41071 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Pandoraea 20781 0.000215 1056169 79_HOW10 High 1 10
41095 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae other 11963 0.000166 1056169 79_HOW10 High 1 10
41107 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Brachymonas 8489 0.000150 1056169 79_HOW10 High 1 10
41108 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Advenella 4246 0.000148 1056169 79_HOW10 High 1 10
41121 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Lautropia 4935 0.000128 1056169 79_HOW10 High 1 10
41156 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Alcaligenes 26796 0.000098 1056169 79_HOW10 High 1 10
41273 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Polynucleobacter 2814 0.000044 1056169 79_HOW10 High 1 10
41274 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae Candidatus Glomeribacter 1634 0.000043 1056169 79_HOW10 High 1 10
41448 Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae Oligella 1443 0.000008 1056169 79_HOW10 High 1 10

In [ ]: