TITLE:

Introduction

The influenza A virus (IAV) causes a respiratory illness that presents with fever, cough, muscle and joint pains, headache, sore throat, and runny nose. IAV is a member of the Orthomyxoviridae family and contains 8 negative-sense single stranded RNA segments {Urbaniak:2014wj}{Nelson:2015cg}{Webster:2014jt}. The major viral antigens of IAV are the surface glycoproteins hemagglutinin (HA) and neuraminidase (NA), these proteins are used to subtype the IAV into sixteen HA subtypes and nine NA subtypes {Webster:2014jt}. Influenza A virus has numerous host including water fowls, pigs, bats, and humans {Urbaniak:2014wj}{Nelson:2015cg}{Webster:2014jt}. Various mechanisms can change HA and NA in a specific IAV strain causing vaccines deficiencies {Goka:2014cz}. Two of these mechanisms are antigenic shift and antigenic drift. Due to the lack of proofreading mechanisms in IAV RNA replication, mutations can be easily introduced {Webster:2014jt}. These mutation can cause antigenic drift by changing base pairs in HA and NA making vaccines ineffective. Another way in which an IAV strain can change is by antigenic shift. Antigenic shift occurs when two or more IAV infect a host, different viral strains exchange genetic segments, and viral reassortment occurs {Urbaniak:2014wj}{Nelson:2015cg}{Webster:2014jt}. Pandemic strains can occur after antigenic shift improves an IAV strain virulence, evasion of the host's immune system, or introduces a new set of glycoproteins that the majority of the population have not been previously exposed to {Urbaniak:2014wj}{Nelson:2015cg}{Webster:2014jt}. Pigs are excellent host for antigenic shift to occur due to there expression of both sialic acid receptors for avian strains and mammalian strains {Nelson:2015cg}. With this information in mind, we want to determine if the host plays a role in the rate of mutation in IAV. To determine if the mutation rate was dependent on the host, we compared H1N1 strains collected from different years in both human and pig.

Methods

The sequence for H1N1 influenza A virus found in humans or pigs for the years of 1935, 1978, 2009, and 2014 was downloaded from the NCBI Influenza virus resource (http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html). The search criteria was the following; type A influenza virus, host either human or swine, northern temperate region, subtype H1 N1, and the aforementioned years.


In [52]:
from Bio import SeqIO, AlignIO, Phylo
from Bio.Align.Applications import ClustalwCommandline
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.Graphics import GenomeDiagram
from Bio import Phylo
import pandas as pd
import numpy as np
from reportlab.lib import colors
from reportlab.lib.units import cm
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import heatmap
import multigroup_barchart
import itertools


%matplotlib inline

#Make a variable point to where ClustalW2 is on your computer
clustalw_exe = r"C:\Program Files (x86)\ClustalW2\clustalw2.exe"

#Make sure you are in the same directory as the sequence FASTA files

In [59]:
#Import the aligned sequences
all_aln = AlignIO.read('all_seq.aln','clustal')
all_aln.sort()
print all_aln


SingleLetterAlphabet() alignment with 8 rows and 13379 columns
------------AATATGGAAAGAATAAAAGAACTACGAAATCT...--- h1935
--------------TATGGAAAGAATAAAAGAGCTAAGGAGTCT...--- h1978
---------------ATGGAGAGAATAAAAGAACTGAGAGATCT...--- h2009
---------------ATGGAGAGAATAAAAGAACTGAGAGATCT...TAC h2014
TCAAATATATTCAATATGGAGAGAATAAAAGAACTAAGGGATCT...--- s1935
TCAAATATATTCAATATGGAGAGAATAAAGGAACTAAGAAATCT...--- s1978
---------------ATGGAGAGAATAAAAGAACTAAGAGATCT...--- s2009
---------------ATGGAGAGAATAAAAGAACTGAGAGATCT...--- s2014

In [60]:
#collect ids and separate based on host (letter prefex preceding the year)
all_ids = [i.id for i in all_aln]
h_ids = all_ids[0:4]
s_ids = all_ids[4:]
print all_ids,'\n', h_ids, s_ids


['h1935', 'h1978', 'h2009', 'h2014', 's1935', 's1978', 's2009', 's2014'] 
['h1935', 'h1978', 'h2009', 'h2014'] ['s1935', 's1978', 's2009', 's2014']

Plot Tracks

We define the functions for creating track plots.


In [61]:
def get_diff_locations(ref,compare):
    """
    Takes 2 Biopython Seq objects of equal size (aligned).
    Returns a list of locations at which base in compare sequence != base in ref sequence.
    """
    if len(ref) != len(compare):
        raise ValueError('Seqs are not the same length!')
    
    diff_locations = []
    
    for i in range(len(ref)):
        if ref[i] != compare[i]:
            diff_locations.append(i)
        
    return diff_locations

def setup(diagram_name, ids, length):
    '''
    Inits empty track diagram (with diagram_name as title), tracks, and feature sets for all ids.
    Adds bright green bg to each track (for contrast).
    Returns tracks, features, and feature_sets.
    '''
    #Create Genome Diagram for human
    track_diagram = GenomeDiagram.Diagram(diagram_name+' H1N1 Tracks Plot', tracklines = 0)

    track = {}
    feature_set = {}

    #Generate tracks and feature sets for each year
    track_count = 1
    for i in ids: 
        track['%s' %i] = track_diagram.new_track(track_count, name=i, greytrack=False)#greytrack=True, 
                                                 #greytrack_labels=1, greytrack_fontcolor = colors.cornsilk,
                                                 #greytrack_fontsize=15)
        
        feature_set['%s'%i] = track['%s' %i].new_set(name=i)
        
        #add contrast background
        feature_set['%s'%i].add_feature(SeqFeature(FeatureLocation(0,length)), 
                                   label=True, label_angle=0, color=colors.cadetblue)
        track_count += 1
    return track_diagram, track, feature_set

def add_features_year_compare(feature_set, diffs, hostspecies, base_year):
    for k, features in feature_set.items():
        year = k[1:]
        if base_year in k:
            #if the base year...
            label = '%s: %s (reference)'%(hostspecies,year)

        else:
            #name bg feature (which labels the track)
            label ='%s: %s --- %i differences vs. %s'%(hostspecies, year, len(diffs[k]), base_year)
            # For each difference recorded in diffs, add as a feature to feature_set
            for i in diffs[k]:
                feature = SeqFeature(FeatureLocation(i,i))
                features.add_feature(feature, name=label, color=colors.aqua)
                
        # set the name of the first feature (the background feature) to label
        # this is stupid, but I'm not sure how to do it better
        features.get_features()[0].name = label 
         
def compare_strains_by_host(diffs):
    #Create Genome Diagram to compare human and swine in each year
    diagram = GenomeDiagram.Diagram('Human vs. Swine H1N1 Tracks Plot')
    tracks = {}
    feature_set = {}
    
    for n,(k,v) in enumerate(diffs.iteritems(),1):
        tracks[k] = diagram.new_track(n, greytrack=False)
        feature_set[k] = tracks[k].new_set()
    return diagram, tracks, feature_set

def add_features_host_compare(feature_set):
    for k,v in feature_set.iteritems():
        feature_set[k].add_feature(SeqFeature(FeatureLocation(0,13378)), 
                                       name = 'Human vs. Swine: %s --- %i differences'%(k,len(diffs[k])), 
                                       label=True, label_angle=0, color=colors.cadetblue)
        for i in diffs[k]:
            feature = SeqFeature(FeatureLocation(i,i))
            feature_set[k].add_feature(feature, color=colors.aqua)

def plot_tracks(tracks,tracks_path):
    '''
    Save tracks as pngs, then load them as images and display with matplotlib.
    '''
    
    #Write each diagram to a png, then read it back in using Matplotlib
    tracks.draw(format='linear', pagesize=(14*cm,7*cm), fragments=1,
             start=0, end=13378)
    plt.show()
    #get extension from tracks path
    root, ext = os.path.splitext(tracks_path)
    if not ext:
        ext = '.png'
        tracks_path += ext

    tracks.write(tracks_path, ext.strip('.'), dpi=600)
    tracks_im = mpimg.imread(tracks_path)

    #Plot each set of tracks
    fig = plt.figure(figsize=(14,7),dpi=600)
    ax = fig.add_axes([0.025,0.025,0.95,0.95],frameon=False)
    plt.axis('off')
    plt.imshow(tracks_im)

In [56]:
#Compare each year to reference year strain with same host
base_year = '1978'

h_diffs = {}
h_diffs['h1935'] = get_diff_locations(all_aln[1].seq,all_aln[0].seq)
h_diffs['h2009'] = get_diff_locations(all_aln[1].seq,all_aln[2].seq)
h_diffs['h2014'] = get_diff_locations(all_aln[1].seq,all_aln[3].seq)

s_diffs = {}
s_diffs['s1935'] = get_diff_locations(all_aln[5].seq,all_aln[4].seq)
s_diffs['s2009'] = get_diff_locations(all_aln[5].seq,all_aln[6].seq)
s_diffs['s2014'] = get_diff_locations(all_aln[5].seq,all_aln[7].seq)


h_diagram, h_features, h_feature_set = setup(diagram_name='Human', ids = h_ids,
                                             length=all_aln.get_alignment_length())
s_diagram, s_features, s_feature_set = setup(diagram_name='Swine', ids = s_ids,
                                             length=all_aln.get_alignment_length())

add_features_year_compare(feature_set = h_feature_set, diffs = h_diffs, hostspecies = 'Human',base_year = base_year)
add_features_year_compare(feature_set = s_feature_set, diffs = s_diffs, hostspecies = 'Swine',base_year = base_year)
plot_tracks(h_diagram,'human_ref_1978.png')


Results

Comparing Branch Lengths

We compare the distances between terminals of a single-host phylogeny tree from a base year (1935) to every other year. We can compare distances between all years with a heatmap.

H1N1 sequence comparison in human or pig host from the years 1935, 1978, 2009, and 2014

Using the 1935 sequence of H1N1 to compare any changes in the genome from the samples obtain in 1978, 2009, and 2014 from either human or pig host, we can see a clear increase in the number of sequence changes as time progresses. The sequence from 2014 has more differences than the sequence from 1978. Interestingly when comparing the 1978 sequence to the 1935 reference, the H1N1 strain collected in pig had more differences, 1206, than the same year sequence found in human, 850. Furthermore the 2009 sequence show a greater number of differences in sample found in human, 2274, than the sample found in pig, 1976. Lastly, both pig and human H1N1 strains had a similar number of changes in 2014, 2374 for human and 2247 for pig. These data shows that the H1N1 genome found in pig change more rapidly from 1978 to 2009, gaining 1846 differences. The highest number of changes happened between 1978 and 2009, the H1N1 virus found in human added 2323 changes.


In [63]:
#Compare each year to 1935 strain with same host
h_diffs = {}
s_diffs = {}
h_diffs['h1978'] = get_diff_locations(all_aln[0].seq,all_aln[1].seq)
h_diffs['h2009'] = get_diff_locations(all_aln[0].seq,all_aln[2].seq)
h_diffs['h2014'] = get_diff_locations(all_aln[0].seq,all_aln[3].seq)

s_diffs['s1978'] = get_diff_locations(all_aln[4].seq,all_aln[5].seq)
s_diffs['s2009'] = get_diff_locations(all_aln[4].seq,all_aln[6].seq)
s_diffs['s2014'] = get_diff_locations(all_aln[4].seq,all_aln[7].seq)

h_diagram, h_features, h_feature_set = setup(diagram_name='Human', ids = h_ids, length=all_aln.get_alignment_length())
s_diagram, s_features, s_feature_set = setup(diagram_name='Swine', ids = s_ids, length=all_aln.get_alignment_length())

add_features_year_compare(feature_set = h_feature_set, diffs = h_diffs, hostspecies = 'Human',base_year = '1935')
add_features_year_compare(feature_set = s_feature_set, diffs = s_diffs, hostspecies = 'Swine',base_year = '1935')

In [64]:
plot_tracks(h_diagram,'h_tracks.png')
plot_tracks(s_diagram, 's_tracks.png')



In [15]:
#Comparison of human 2009 and 2014
h_2009_2014_track = GenomeDiagram.Diagram('Human 2009 vs. 2014 H1N1 Track Plot')

h_2009_2014_features = h_2009_2014_track.new_track(1, greytrack=False)
h_2009_2014_feature_set = h_2009_2014_features.new_set()

#Background green to show similarities
h_2009_2014_feature_set.add_feature(SeqFeature(FeatureLocation(0,13378)), 
                                   name = 'Human: 2009 vs. 2014 --- %i differences'%len(diffs['h_2009_2014']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['h_2009_2014'])):
    feature = SeqFeature(FeatureLocation(diffs['h_2009_2014'][i],diffs['h_2009_2014'][i]))
    h_2009_2014_feature_set.add_feature(feature, color=colors.blue)

h_2009_2014_track.draw(format='linear', pagesize=(14*cm,3*cm), fragments=1,
         start=0, end=13378)
h_2009_2014_track.write('h_2009_2014_track.png', 'png', dpi=600)
h_2009_2014_track_im = mpimg.imread('h_2009_2014_track.png');

fig_h_2009_2014 = plt.figure(figsize=(14,7),dpi=600)
ax_h_2009_2014 = fig_h_2009_2014.add_axes([0.025,0.025,0.95,0.95],frameon=False)
plt.axis('off')
plt.imshow(h_2009_2014_track_im);

#Comparison of human 1978 and 2009
h_1978_2009_track = GenomeDiagram.Diagram('Human 1978 vs. 2009 H1N1 Track Plot')

h_1978_2009_features = h_1978_2009_track.new_track(1, greytrack=False)
h_1978_2009_feature_set = h_1978_2009_features.new_set()

#Background green to show similarities
h_1978_2009_feature_set.add_feature(SeqFeature(FeatureLocation(0,13378)), 
                                   name = 'Human: 1978 vs. 2009 --- %i differences'%len(diffs['h_1978_2009']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['h_1978_2009'])):
    feature = SeqFeature(FeatureLocation(diffs['h_1978_2009'][i],diffs['h_1978_2009'][i]))
    h_1978_2009_feature_set.add_feature(feature, color=colors.blue)

h_1978_2009_track.draw(format='linear', pagesize=(14*cm,3*cm), fragments=1,
         start=0, end=13378)
h_1978_2009_track.write('h_1978_2009_track.png', 'png', dpi=600)
h_1978_2009_track_im = mpimg.imread('h_1978_2009_track.png');

fig_h_1978_2009 = plt.figure(figsize=(14,7),dpi=600)
ax_h_1978_2009 = fig_h_1978_2009.add_axes([0.025,0.025,0.95,0.95],frameon=False)
plt.axis('off')
plt.imshow(h_1978_2009_track_im);



In [16]:
#Comparison of swine 2009 and 2014
s_2009_2014_track = GenomeDiagram.Diagram('Swine 2009 vs. 2014 H1N1 Track Plot')

s_2009_2014_features = s_2009_2014_track.new_track(1, greytrack=False)
s_2009_2014_feature_set = s_2009_2014_features.new_set()

#Background green to show similarities
s_2009_2014_feature_set.add_feature(SeqFeature(FeatureLocation(0,13378)), 
                                   name = 'Swine: 2009 vs. 2014 --- %i differences'%len(diffs['s_2009_2014']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['s_2009_2014'])):
    feature = SeqFeature(FeatureLocation(diffs['s_2009_2014'][i],diffs['s_2009_2014'][i]))
    s_2009_2014_feature_set.add_feature(feature, color=colors.blue)

s_2009_2014_track.draw(format='linear', pagesize=(14*cm,3*cm), fragments=1,
         start=0, end=13378)
s_2009_2014_track.write('s_2009_2014_track.png', 'png', dpi=600)
s_2009_2014_track_im = mpimg.imread('s_2009_2014_track.png');

fig_s_2009_2014 = plt.figure(figsize=(14,7),dpi=600)
ax_s_2009_2014 = fig_s_2009_2014.add_axes([0.025,0.025,0.95,0.95],frameon=False)
plt.axis('off')
plt.imshow(s_2009_2014_track_im);

#Comparison of swine 1978 and 2009
s_1978_2009_track = GenomeDiagram.Diagram('Swine 1978 vs. 2009 H1N1 Track Plot')

s_1978_2009_features = s_1978_2009_track.new_track(1, greytrack=False)
s_1978_2009_feature_set = s_1978_2009_features.new_set()

#Background green to show similarities
s_1978_2009_feature_set.add_feature(SeqFeature(FeatureLocation(0,13378)), 
                                   name = 'Swine: 1978 vs. 2009 --- %i differences'%len(diffs['s_1978_2009']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['s_1978_2009'])):
    feature = SeqFeature(FeatureLocation(diffs['s_1978_2009'][i],diffs['s_1978_2009'][i]))
    s_1978_2009_feature_set.add_feature(feature, color=colors.blue)

s_1978_2009_track.draw(format='linear', pagesize=(14*cm,3*cm), fragments=1,
         start=0, end=13378)
s_1978_2009_track.write('s_1978_2009_track.png', 'png', dpi=600)
s_1978_2009_track_im = mpimg.imread('s_1978_2009_track.png');

fig_s_1978_2009 = plt.figure(figsize=(14,7),dpi=600)
ax_s_1978_2009 = fig_s_1978_2009.add_axes([0.025,0.025,0.95,0.95],frameon=False)
plt.axis('off')
plt.imshow(s_1978_2009_track_im);



In [17]:
#Create Genome Diagram to compare human and swine in eaach year
hs_tracks = GenomeDiagram.Diagram('Human vs. Swine H1N1 Tracks Plot')

hs_features1935 = hs_tracks.new_track(1, greytrack=False)
hs_feature_set1935 = hs_features1935.new_set()

hs_features1978 = hs_tracks.new_track(2, greytrack=False)
hs_feature_set1978 = hs_features1978.new_set()

hs_features2009 = hs_tracks.new_track(3, greytrack=False)
hs_feature_set2009 = hs_features2009.new_set()

hs_features2014 = hs_tracks.new_track(4, greytrack=False)
hs_feature_set2014 = hs_features2014.new_set()

In [18]:
#Background green to show similarities
hs_feature_set1935.add_feature(SeqFeature(FeatureLocation(0,13378)), 
                                   name = 'Human vs. Swine: 1935 --- %i differences'%len(diffs['h_1935_s_1935']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['h_1935_s_1935'])):
    feature = SeqFeature(FeatureLocation(diffs['h_1935_s_1935'][i],diffs['h_1935_s_1935'][i]))
    hs_feature_set1935.add_feature(feature, color=colors.blue)

#Repeat for each year
hs_feature_set1978.add_feature(SeqFeature(FeatureLocation(0,13378)), 
                                   name = 'Human vs. Swine: 1978 --- %i differences'%len(diffs['h_1978_s_1978']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['h_1978_s_1978'])):
    feature = SeqFeature(FeatureLocation(diffs['h_1978_s_1978'][i],diffs['h_1978_s_1978'][i]))
    hs_feature_set1978.add_feature(feature, color=colors.blue)


hs_feature_set2009.add_feature(SeqFeature(FeatureLocation(0,13378)), 
                                   name = 'Human vs. Swine: 2009 --- %i differences'%len(diffs['h_2009_s_2009']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['h_2009_s_2009'])):
    feature = SeqFeature(FeatureLocation(diffs['h_2009_s_2009'][i],diffs['h_2009_s_2009'][i]))
    hs_feature_set2009.add_feature(feature, color=colors.blue)


hs_feature_set2014.add_feature(SeqFeature(FeatureLocation(0,13378)), 
                                   name = 'Human vs. Swine: 2014 --- %i differences'%len(diffs['h_2014_s_2014']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['h_2014_s_2014'])):
    feature = SeqFeature(FeatureLocation(diffs['h_2014_s_2014'][i],diffs['h_2014_s_2014'][i]))
    hs_feature_set2014.add_feature(feature, color=colors.blue)

In [19]:
#Write diagram to a png, then read it back in using Matplotlib
hs_tracks.draw(format='linear', pagesize=(14*cm,7*cm), fragments=1,
         start=0, end=13378)
hs_tracks.write('hs_tracks.png', 'png', dpi=600)
hs_tracks_im = mpimg.imread('hs_tracks.png');

H1N1 sequence comparison between hosts from the same year

When comparing the same year sequence of H1N1 between the two hosts, we find that the genomes from 1935 have 1349 differences, in 1978, the differences are 2004, and in 2009, there are 1112 changes. Surprisingly in 2014, the differences between the H1N1 genome found in human and pig is only 136. This could suggest that the 2014 H1N1 found in humans and pigs is more closely related than the strains found in 1935, 1978, or 2009.


In [65]:
diffs = {}
diffs['1935'] = get_diff_locations(all_aln[0].seq,all_aln[4].seq)
diffs['1978'] = get_diff_locations(all_aln[1].seq,all_aln[5].seq)
diffs['2009'] = get_diff_locations(all_aln[2].seq,all_aln[6].seq)
diffs['2014'] = get_diff_locations(all_aln[3].seq,all_aln[7].seq)

diagram, tracks, feature_set = compare_strains_by_host(diffs)
add_features_host_compare(feature_set)

In [66]:
plot_tracks(diagram, 'host_compare.png')


Changes in sequences over time

Plotting the differences in genome over time we can see that H1N1 changed faster in pig between 1935 and 1978, and then we see a decrease in the rate between 1978 and 2004. Finally we see an increase in the rate of change between 2009 and 2014. In humans we see a different pattern, the highest rate of change in the H1N1 genome in a human host happened between 1978 and 2009, with a slow rate between 1935 and 1978, and the slowest rate of change happening between 2009 and 2014 when only 100 changes occurred.


In [21]:
#Plot differences vs. 1935 over time

years = [1935,1978,2009,2014]

#Lists of human and siwne diff lengths vs. 1935
hdl = [len(diffs['h_1935_%i'%years[i]]) for i in range(len(years))]
sdl = [len(diffs['s_1935_%i'%years[i]]) for i in range(len(years))]

fig_diffs_over_time = plt.figure(figsize=(6,4), dpi=600);
ax_diffs_over_time = fig_diffs_over_time.add_axes([0.025,0.025,0.95,0.95])

hline = plt.plot(years,hdl, 'bo-', label='Human Viruses')
sline = plt.plot(years,sdl, 'g^-', label='Swine Viruses')

ax_diffs_over_time.set_xlabel('Year')
ax_diffs_over_time.set_ylabel('Differences vs. 1935')

plt.legend(loc='best');



In [22]:
#Plot differences vs. 1935 as a percentage of total genome length over time

years = [1935,1978,2009,2014]

#Lists of human and siwne diff lengths vs. 1935 as a percentage of total genome length
hdl_percent = [i*100./13378. for i in hdl]
sdl_percent = [i*100./13378. for i in sdl]

fig_diffs_over_time2 = plt.figure(figsize=(6,4), dpi=600);
ax_diffs_over_time2 = fig_diffs_over_time2.add_axes([0.025,0.025,0.95,0.95])

hline2 = plt.plot(years,hdl_percent, 'bo-', label='Human Viruses')
sline2 = plt.plot(years,sdl_percent, 'g^-', label='Swine Viruses')

ax_diffs_over_time2.set_xlabel('Year')
ax_diffs_over_time2.set_ylabel('% Change Since 1935')

plt.legend(loc='best');


Phylogeny Trees

We can also compare changes with Phylogeny trees.


In [71]:
aligned_dir = os.getcwd()
tree = Phylo.read(os.path.join(aligned_dir,'all_seq.dnd'),"newick")
human_tree = Phylo.read(os.path.join(aligned_dir,'all_human.dnd'),"newick")
swine_tree = Phylo.read(os.path.join(aligned_dir,'all_swine.dnd'),"newick")
Phylo.draw(tree)
Phylo.draw(human_tree)
Phylo.draw(swine_tree)



In [72]:
def terminal_dists(self):
    """Return a list of distances between all terminals."""
    def generate_pairs(self):
        named_clades=[i for i in self.find_clades(terminal=True)]
        s = itertools.combinations(named_clades,2)
        return list(s)
    return {(i[0].name,i[1].name): self.distance(*i) for i in generate_pairs(self)}

def compare_terminals_to_base_element(tree, base_element_name):
    """Return a list of distances between all terminals and specified base element"""
    base_elem = tree.find_elements(name=base_element_name)
    be = next(base_elem)
    terminals = tree.get_terminals()
    
    y_list,d_list = [],[]
    for i in terminals:
        if i != be:
            y_list.append(i.name)
            d_list.append(tree.distance(be,i))
            
    return y_list, d_list

sd = terminal_dists(swine_tree)
hd = terminal_dists(human_tree)
yh,dh = compare_terminals_to_base_element(human_tree,'h1935')
ys,ds = compare_terminals_to_base_element(swine_tree,'s1935')

We compare the distances between terminals of a single-host phylogeny tree from a base year (1935) to every other year.


In [69]:
means = zip(np.array(dh),np.array(ds))
std = [[0 for ii in i]for i in means]
group_labels = ['human','swine']
years = [i[1:] for i in ys]
multigroup_barchart.bar_chart(means,std,years,group_labels)
plt.legend(loc=2)
plt.title('Group by host')
plt.show()



In [70]:
means = [np.array(dh),np.array(ds)]
std = [np.zeros(len(i)) for i in means]
group_labels = ['human','swine']
years = [i[1:] for i in ys]
multigroup_barchart.bar_chart(means,std,group_labels,years)
plt.legend(loc=2)
plt.title('Group by year')
plt.show()


We can compare distances between all years with a heatmap.


In [73]:
mod_annotate = lambda x: heatmap.show_values(x,fmt="%.5f")

def df_from_2_key_dict(twokey_dict):

    temp ={}
    for k,v in twokey_dict.iteritems():
        #v is row value, k is row label
        k1 = k[0]
        k2 = k[1]
        try:
            #append to current sub-dict
            temp[k1][k2] = v
        except:
            #start new sub-dict
            temp[k1] = {}
            temp[k1][k2] = v

    return pd.DataFrame.from_dict(temp)

In [74]:
#Swine
compare = df_from_2_key_dict(sd).fillna(0)
plot,bar = heatmap.heatmap_from_dataframe(compare, annotate_function=mod_annotate)
plt.title('Distance between each node of the Phylogenic Tree \n(Swine)')
plt.show()



In [75]:
#Human
compare = df_from_2_key_dict(hd).fillna(0)
mod_annotate = lambda x: heatmap.show_values(x,fmt="%.5f")
plot,bar = heatmap.heatmap_from_dataframe(compare, annotate_function=mod_annotate)
plt.title('Distance between each node of the Phylogenic Tree \n(Human)')
plt.show()



In [ ]: