The influenza A virus (IAV) causes a respiratory illness that presents with fever, cough, muscle and joint pains, headache, sore throat, and runny nose. IAV is a member of the Orthomyxoviridae family and contains 8 negative-sense single stranded RNA segments {Urbaniak:2014wj}{Nelson:2015cg}{Webster:2014jt}. The major viral antigens of IAV are the surface glycoproteins hemagglutinin (HA) and neuraminidase (NA), these proteins are used to subtype the IAV into sixteen HA subtypes and nine NA subtypes {Webster:2014jt}. Influenza A virus has numerous host including water fowls, pigs, bats, and humans {Urbaniak:2014wj}{Nelson:2015cg}{Webster:2014jt}. Various mechanisms can change HA and NA in a specific IAV strain causing vaccines deficiencies {Goka:2014cz}. Two of these mechanisms are antigenic shift and antigenic drift. Due to the lack of proofreading mechanisms in IAV RNA replication, mutations can be easily introduced {Webster:2014jt}. These mutation can cause antigenic drift by changing base pairs in HA and NA making vaccines ineffective. Another way in which an IAV strain can change is by antigenic shift. Antigenic shift occurs when two or more IAV infect a host, different viral strains exchange genetic segments, and viral reassortment occurs {Urbaniak:2014wj}{Nelson:2015cg}{Webster:2014jt}. Pandemic strains can occur after antigenic shift improves an IAV strain virulence, evasion of the host's immune system, or introduces a new set of glycoproteins that the majority of the population have not been previously exposed to {Urbaniak:2014wj}{Nelson:2015cg}{Webster:2014jt}. Pigs are excellent host for antigenic shift to occur due to there expression of both sialic acid receptors for avian strains and mammalian strains {Nelson:2015cg}. With this information in mind, we want to determine if the host plays a role in the rate of mutation in IAV. To determine if the mutation rate was dependent on the host, we compared H1N1 strains collected from different years in both human and pig.


The sequence for H1N1 influenza A virus found in humans or pigs for the years of 1935, 1978, 2009, and 2014 was downloaded from the NCBI Influenza virus resource (http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html). The search criteria was the following; type A influenza virus, host either human or swine, northern temperate region, subtype H1 N1, and the aforementioned years.

In [30]:
from Bio import AlignIO
from Bio.SeqFeature import SeqFeature, FeatureLocation
from Bio.Graphics import GenomeDiagram
from reportlab.lib import colors
from reportlab.lib.units import cm
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

%matplotlib inline

In [31]:
#Import the aligned sequences
all_aln = AlignIO.read('all_seq.aln','clustal')
print all_aln

SingleLetterAlphabet() alignment with 8 rows and 13379 columns

In [32]:
h_ids = [all_aln[i].id for i in range(0,4)]
s_ids = [all_aln[i].id for i in range(4,8)]
all_ids = [all_aln[i].id for i in range(len(all_aln))]

In [33]:
#Create Genome Diagram for human
h_tracks = GenomeDiagram.Diagram('Human H1N1 Tracks Plot')

h_features = {}
h_feature_set = {}

#Generate tracks and feature sets for each year
track_count = 1
for i in h_ids: 
    h_features['%s' %i] = h_tracks.new_track(track_count, greytrack=False)
    h_feature_set['%s'%i] = h_features['%s' %i].new_set()
    track_count += 1

#Repeat for swine
s_tracks = GenomeDiagram.Diagram('Swine H1N1 Tracks Plot')

s_features = {}
s_feature_set = {}

track_count = 1
for i in s_ids:
    s_features['%s' %i] = s_tracks.new_track(track_count, greytrack=False)
    s_feature_set['%s'%i] = s_features['%s' %i].new_set()
    track_count += 1

In [34]:
def get_diff_locations(ref,compare):
    Takes 2 Biopython Seq objects of equal size (aligned).
    Returns a list of locations at which base in compare sequence != base in ref sequence.
    if len(ref) != len(compare):
        raise ValueError('Seqs are not the same length!')
    diff_locations = [0]
    for i in range(len(ref)):
        if ref[i] != compare[i]:
    del diff_locations[0]
    return diff_locations

In [35]:
years = [1935,1978,2009,2014]

diffs = {}

for i in range(len(years)):
    for ii in range(len(years)):
        diffs['h_%i_%i'% (years[i],years[ii])] = get_diff_locations(all_aln[i].seq,all_aln[ii].seq)
        diffs['s_%i_%i'% (years[i],years[ii])] = get_diff_locations(all_aln[i+4].seq,all_aln[ii+4].seq)
        diffs['h_%i_s_%i'% (years[i],years[ii])] = get_diff_locations(all_aln[i].seq,all_aln[ii+4].seq)

In [36]:
#Baseline solid green tracks for reference sequences (1935)
                                   name = 'Human: 1935 (reference)', label=True, label_angle=0);

                                   name = 'Swine: 1935 (reference)', label=True, label_angle=0);

In [37]:
#Background green to show similarities
                                   name = 'Human: 1978 --- %i differences vs. 1935'%len(diffs['h_1935_1978']), 
                                   label=True, label_angle=0);
#Add blue to show each difference
for i in range(len(diffs['h_1935_1978'])):
    feature = SeqFeature(FeatureLocation(diffs['h_1935_1978'][i],diffs['h_1935_1978'][i]))
    h_feature_set['h1978'].add_feature(feature, color=colors.blue)

#Repeat for swine
                                   name = 'Swine: 1978 --- %i differences vs. 1935'%len(diffs['s_1935_1978']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['s_1935_1978'])):
    feature = SeqFeature(FeatureLocation(diffs['s_1935_1978'][i],diffs['s_1935_1978'][i]))
    s_feature_set['s1978'].add_feature(feature, color=colors.blue)

In [38]:
#Repeat above cell for 2009
                                   name = 'Human: 2009 --- %i differences vs. 1935'%len(diffs['h_1935_2009']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['h_1935_2009'])):
    feature = SeqFeature(FeatureLocation(diffs['h_1935_2009'][i],diffs['h_1935_2009'][i]))
    h_feature_set['h2009'].add_feature(feature, color=colors.blue)

                                   name = 'Swine: 2009 --- %i differences vs. 1935'%len(diffs['s_1935_2009']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['s_1935_2009'])):
    feature = SeqFeature(FeatureLocation(diffs['s_1935_2009'][i],diffs['s_1935_2009'][i]))
    s_feature_set['s2009'].add_feature(feature, color=colors.blue)

In [39]:
#Repeat for 2014
                                   name = 'Human: 2014 --- %i differences vs. 1935'%len(diffs['h_1935_2014']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['h_1935_2014'])):
    feature = SeqFeature(FeatureLocation(diffs['h_1935_2014'][i],diffs['h_1935_2014'][i]))
    h_feature_set['h2014'].add_feature(feature, color=colors.blue)

                                   name = 'Swine: 2014 --- %i differences vs. 1935'%len(diffs['s_1935_2014']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['s_1935_2014'])):
    feature = SeqFeature(FeatureLocation(diffs['s_1935_2014'][i],diffs['s_1935_2014'][i]))
    s_feature_set['s2014'].add_feature(feature, color=colors.blue)

In [40]:
#Write each diagram to a png, then read it back in using Matplotlib
h_tracks.draw(format='linear', pagesize=(14*cm,7*cm), fragments=1,
         start=0, end=13378)
h_tracks.write('h_tracks.png', 'png', dpi=600)
h_tracks_im = mpimg.imread('h_tracks.png');

s_tracks.draw(format='linear', pagesize=(14*cm,7*cm), fragments=1,
         start=0, end=13378)
s_tracks.write('s_tracks.png', 'png', dpi=600)
s_tracks_im = mpimg.imread('s_tracks.png');


H1N1 sequence comparison in human or pig host from the years 1935, 1978, 2009, and 2014

Using the 1935 sequence of H1N1 to compare any changes in the genome from the samples obtain in 1978, 2009, and 2014 from either human or pig host, we can see a clear increase in the number of sequence changes as time progresses. The sequence from 2014 has more differences than the sequence from 1978. Interestingly when comparing the 1978 sequence to the 1935 reference, the H1N1 strain collected in pig had more differences, 1206, than the same year sequence found in human, 850. Furthermore the 2009 sequence show a greater number of differences in sample found in human, 2274, than the sample found in pig, 1976. Lastly, both pig and human H1N1 strains had a similar number of changes in 2014, 2374 for human and 2247 for pig. These data shows that the H1N1 genome found in pig change more rapidly from 1978 to 2009, gaining 1846 differences. The highest number of changes happened between 1978 and 2009, the H1N1 virus found in human added 2323 changes.

In [41]:
#Plot each set of tracks
fig_h = plt.figure(figsize=(14,7),dpi=600)
ax_h = fig_h.add_axes([0.025,0.025,0.95,0.95],frameon=False)

fig_s = plt.figure(figsize=(14,7),dpi=600)
ax_s = fig_s.add_axes([0.025,0.025,0.95,0.95],frameon=False)

In [42]:
#Comparison of human 2009 and 2014
h_2009_2014_track = GenomeDiagram.Diagram('Human 2009 vs. 2014 H1N1 Track Plot')

h_2009_2014_features = h_2009_2014_track.new_track(1, greytrack=False)
h_2009_2014_feature_set = h_2009_2014_features.new_set()

#Background green to show similarities
                                   name = 'Human: 2009 vs. 2014 --- %i differences'%len(diffs['h_2009_2014']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['h_2009_2014'])):
    feature = SeqFeature(FeatureLocation(diffs['h_2009_2014'][i],diffs['h_2009_2014'][i]))
    h_2009_2014_feature_set.add_feature(feature, color=colors.blue)

h_2009_2014_track.draw(format='linear', pagesize=(14*cm,3*cm), fragments=1,
         start=0, end=13378)
h_2009_2014_track.write('h_2009_2014_track.png', 'png', dpi=600)
h_2009_2014_track_im = mpimg.imread('h_2009_2014_track.png');

fig_h_2009_2014 = plt.figure(figsize=(14,7),dpi=600)
ax_h_2009_2014 = fig_h_2009_2014.add_axes([0.025,0.025,0.95,0.95],frameon=False)

#Comparison of human 1978 and 2009
h_1978_2009_track = GenomeDiagram.Diagram('Human 1978 vs. 2009 H1N1 Track Plot')

h_1978_2009_features = h_1978_2009_track.new_track(1, greytrack=False)
h_1978_2009_feature_set = h_1978_2009_features.new_set()

#Background green to show similarities
                                   name = 'Human: 1978 vs. 2009 --- %i differences'%len(diffs['h_1978_2009']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['h_1978_2009'])):
    feature = SeqFeature(FeatureLocation(diffs['h_1978_2009'][i],diffs['h_1978_2009'][i]))
    h_1978_2009_feature_set.add_feature(feature, color=colors.blue)

h_1978_2009_track.draw(format='linear', pagesize=(14*cm,3*cm), fragments=1,
         start=0, end=13378)
h_1978_2009_track.write('h_1978_2009_track.png', 'png', dpi=600)
h_1978_2009_track_im = mpimg.imread('h_1978_2009_track.png');

fig_h_1978_2009 = plt.figure(figsize=(14,7),dpi=600)
ax_h_1978_2009 = fig_h_1978_2009.add_axes([0.025,0.025,0.95,0.95],frameon=False)

In [43]:
#Comparison of swine 2009 and 2014
s_2009_2014_track = GenomeDiagram.Diagram('Swine 2009 vs. 2014 H1N1 Track Plot')

s_2009_2014_features = s_2009_2014_track.new_track(1, greytrack=False)
s_2009_2014_feature_set = s_2009_2014_features.new_set()

#Background green to show similarities
                                   name = 'Swine: 2009 vs. 2014 --- %i differences'%len(diffs['s_2009_2014']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['s_2009_2014'])):
    feature = SeqFeature(FeatureLocation(diffs['s_2009_2014'][i],diffs['s_2009_2014'][i]))
    s_2009_2014_feature_set.add_feature(feature, color=colors.blue)

s_2009_2014_track.draw(format='linear', pagesize=(14*cm,3*cm), fragments=1,
         start=0, end=13378)
s_2009_2014_track.write('s_2009_2014_track.png', 'png', dpi=600)
s_2009_2014_track_im = mpimg.imread('s_2009_2014_track.png');

fig_s_2009_2014 = plt.figure(figsize=(14,7),dpi=600)
ax_s_2009_2014 = fig_s_2009_2014.add_axes([0.025,0.025,0.95,0.95],frameon=False)

#Comparison of swine 1978 and 2009
s_1978_2009_track = GenomeDiagram.Diagram('Swine 1978 vs. 2009 H1N1 Track Plot')

s_1978_2009_features = s_1978_2009_track.new_track(1, greytrack=False)
s_1978_2009_feature_set = s_1978_2009_features.new_set()

#Background green to show similarities
                                   name = 'Swine: 1978 vs. 2009 --- %i differences'%len(diffs['s_1978_2009']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['s_1978_2009'])):
    feature = SeqFeature(FeatureLocation(diffs['s_1978_2009'][i],diffs['s_1978_2009'][i]))
    s_1978_2009_feature_set.add_feature(feature, color=colors.blue)

s_1978_2009_track.draw(format='linear', pagesize=(14*cm,3*cm), fragments=1,
         start=0, end=13378)
s_1978_2009_track.write('s_1978_2009_track.png', 'png', dpi=600)
s_1978_2009_track_im = mpimg.imread('s_1978_2009_track.png');

fig_s_1978_2009 = plt.figure(figsize=(14,7),dpi=600)
ax_s_1978_2009 = fig_s_1978_2009.add_axes([0.025,0.025,0.95,0.95],frameon=False)

In [44]:
#Create Genome Diagram to compare human and swine in eaach year
hs_tracks = GenomeDiagram.Diagram('Human vs. Swine H1N1 Tracks Plot')

hs_features1935 = hs_tracks.new_track(1, greytrack=False)
hs_feature_set1935 = hs_features1935.new_set()

hs_features1978 = hs_tracks.new_track(2, greytrack=False)
hs_feature_set1978 = hs_features1978.new_set()

hs_features2009 = hs_tracks.new_track(3, greytrack=False)
hs_feature_set2009 = hs_features2009.new_set()

hs_features2014 = hs_tracks.new_track(4, greytrack=False)
hs_feature_set2014 = hs_features2014.new_set()

In [45]:
#Background green to show similarities
                                   name = 'Human vs. Swine: 1935 --- %i differences'%len(diffs['h_1935_s_1935']), 
                                   label=True, label_angle=0);
#Add blue to show differences
for i in range(len(diffs['h_1935_s_1935'])):
    feature = SeqFeature(FeatureLocation(diffs['h_1935_s_1935'][i],diffs['h_1935_s_1935'][i]))
    hs_feature_set1935.add_feature(feature, color=colors.blue)

#Repeat for each year
                                   name = 'Human vs. Swine: 1978 --- %i differences'%len(diffs['h_1978_s_1978']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['h_1978_s_1978'])):
    feature = SeqFeature(FeatureLocation(diffs['h_1978_s_1978'][i],diffs['h_1978_s_1978'][i]))
    hs_feature_set1978.add_feature(feature, color=colors.blue)

                                   name = 'Human vs. Swine: 2009 --- %i differences'%len(diffs['h_2009_s_2009']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['h_2009_s_2009'])):
    feature = SeqFeature(FeatureLocation(diffs['h_2009_s_2009'][i],diffs['h_2009_s_2009'][i]))
    hs_feature_set2009.add_feature(feature, color=colors.blue)

                                   name = 'Human vs. Swine: 2014 --- %i differences'%len(diffs['h_2014_s_2014']), 
                                   label=True, label_angle=0);
for i in range(len(diffs['h_2014_s_2014'])):
    feature = SeqFeature(FeatureLocation(diffs['h_2014_s_2014'][i],diffs['h_2014_s_2014'][i]))
    hs_feature_set2014.add_feature(feature, color=colors.blue)

In [46]:
#Write diagram to a png, then read it back in using Matplotlib
hs_tracks.draw(format='linear', pagesize=(14*cm,7*cm), fragments=1,
         start=0, end=13378)
hs_tracks.write('hs_tracks.png', 'png', dpi=600)
hs_tracks_im = mpimg.imread('hs_tracks.png');

H1N1 sequence comparison between hosts from the same year

When comparing the same year sequence of H1N1 between the two hosts, we find that the genomes from 1935 have 1349 differences, in 1978, the differences are 2004, and in 2009, there are 1112 changes. Surprisingly in 2014, the differences between the H1N1 genome found in human and pig is only 136. This could suggest that the 2014 H1N1 found in humans and pigs is more closely related than the strains found in 1935, 1978, or 2009.

In [47]:
#Plot diagram
fig_hs = plt.figure(figsize=(14,7),dpi=600)
ax_hs = fig_hs.add_axes([0.025,0.025,0.95,0.95],frameon=False)

Changes in sequences over time

Plotting the differences in genome over time we can see that H1N1 changed faster in pig between 1935 and 1978, and then we see a decrease in the rate between 1978 and 2004. Finally we see an increase in the rate of change between 2009 and 2014. In humans we see a different pattern, the highest rate of change in the H1N1 genome in a human host happened between 1978 and 2009, with a slow rate between 1935 and 1978, and the slowest rate of change happening between 2009 and 2014 when only 100 changes occurred.

In [48]:
#Plot differences vs. 1935 over time

years = [1935,1978,2009,2014]

#Lists of human and siwne diff lengths vs. 1935
hdl = [len(diffs['h_1935_%i'%years[i]]) for i in range(len(years))]
sdl = [len(diffs['s_1935_%i'%years[i]]) for i in range(len(years))]

fig_diffs_over_time = plt.figure(figsize=(6,4), dpi=600);
ax_diffs_over_time = fig_diffs_over_time.add_axes([0.025,0.025,0.95,0.95])

hline = plt.plot(years,hdl, 'bo-', label='Human Viruses')
sline = plt.plot(years,sdl, 'g^-', label='Swine Viruses')

ax_diffs_over_time.set_ylabel('Differences vs. 1935')


In [49]:
#Plot differences vs. 1935 as a percentage of total genome length over time

years = [1935,1978,2009,2014]

#Lists of human and siwne diff lengths vs. 1935 as a percentage of total genome length
hdl_percent = [i*100./13378. for i in hdl]
sdl_percent = [i*100./13378. for i in sdl]

fig_diffs_over_time2 = plt.figure(figsize=(6,4), dpi=600);
ax_diffs_over_time2 = fig_diffs_over_time2.add_axes([0.025,0.025,0.95,0.95])

hline2 = plt.plot(years,hdl_percent, 'bo-', label='Human Viruses')
sline2 = plt.plot(years,sdl_percent, 'g^-', label='Swine Viruses')

ax_diffs_over_time2.set_ylabel('% Change Since 1935')


In [49]: