Mutagenesis primer design

The goal here is to design primers for mutagenesis on PB2.

The specific PB2 sequence that we are going to use is A/Victoria/361/2011.


In [1]:
%load_ext autoreload
%autoreload 2

from Bio import SeqIO

In [2]:
def report(seqrecord):
    print(seqrecord)
    print('Length: {0} n.t.'.format(len(seqrecord.seq)))

Load the Victoria PB2 sequence:


In [3]:
vicpb2 = SeqIO.read('../../data/victoria-pb2.fasta', 'fasta')
report(vicpb2)


ID: A/Victoria/361/2011|KJ942687
Name: A/Victoria/361/2011|KJ942687
Description: A/Victoria/361/2011|KJ942687
Number of features: 0
Seq('AGCAAAAGCAGGTCAATTATATTCAGTATGGAAAGAATAAAAGAACTACGGAAT...TAC', SingleLetterAlphabet())
Length: 2340 n.t.

Load the pCI sequence:


In [4]:
pCI = SeqIO.read('../../data/pCI.fasta', 'fasta')
report(pCI)


ID: pCI
Name: pCI
Description: pCI
Number of features: 0
Seq('ACGCGTGGTACCTCTAGAGTCGACCCGGGCGGCCGCTTCGAGCAGACATGATAA...GAG', SingleLetterAlphabet())
Length: 4001 n.t.

Define a split at the 1/3 and 2/3 mark of the Victoria PB2 and pCI sequences. This is a backup measure just in case full amplification does not work.


In [5]:
def splits(seqrecord, frac1):
    return round(len(seqrecord.seq) * frac1)
vicpb2_split1 = splits(vicpb2, 1/3)
vicpb2_split1

vicpb2_split2 = splits(vicpb2, 2/3)
vicpb2_split2


Out[5]:
1560

In [6]:
pCI_split1 = splits(pCI, 1/3)
pCI_split1

pCI_split2 = splits(pCI, 2/3)
pCI_split2


Out[6]:
2667

Also define a 2/5 and 3/5 split mark for the Victoria PB2 sequence. (Based on the latest PCR results, the Assembly 4 split point doesn't work.)


In [7]:
vicpb2_split3 = splits(vicpb2, 2/5)
vicpb2_split4 = splits(vicpb2, 3/5)

print(vicpb2_split3, vicpb2_split4)


936 1404

In [8]:
from Bio.SeqRecord import SeqRecord

def split_seqrecord(seqrecord, split_location):
    split1 = SeqRecord(seqrecord.seq[0:split_location], id='{0}_part1'.format(seqrecord.id))
    split2 = SeqRecord(seqrecord.seq[split_location:], id='{0}_part2'.format(seqrecord.id))
    
    return split1, split2
    
# Create the split parts based on vicpb2 split 1 (this is the 1/3 mark)
vicpb2_split1_part1, vicpb2_split1_part2 = split_seqrecord(vicpb2, vicpb2_split1)
len(vicpb2_split1_part1), len(vicpb2_split1_part2)


Out[8]:
(780, 1560)

In [9]:
# Create the split parts based on vicpb2 split 2 (this is the 2/3 mark)
vicpb2_split2_part1, vicpb2_split2_part2 = split_seqrecord(vicpb2, vicpb2_split2)
len(vicpb2_split2_part1), len(vicpb2_split2_part2)


Out[9]:
(1560, 780)

In [10]:
# Create the split parts based on vicpb2 split 3 (this is the 2/5 mark)
vicpb2_split3_part1, vicpb2_split3_part2 = split_seqrecord(vicpb2, vicpb2_split3)
len(vicpb2_split3_part1), len(vicpb2_split3_part2)


Out[10]:
(936, 1404)

In [11]:
# Create the split parts based on vicpb2 split 4 (this is the 3/5 mark)
vicpb2_split4_part1, vicpb2_split4_part2 = split_seqrecord(vicpb2, vicpb2_split4)
len(vicpb2_split4_part1), len(vicpb2_split4_part2)


Out[11]:
(1404, 936)

In [12]:
# Create the split parts based on pCI split 1
pCI_split1_part1, pCI_split1_part2 = split_seqrecord(pCI, pCI_split1)
pCI_split1_part1, pCI_split1_part2


Out[12]:
(SeqRecord(seq=Seq('ACGCGTGGTACCTCTAGAGTCGACCCGGGCGGCCGCTTCGAGCAGACATGATAA...GAA', SingleLetterAlphabet()), id='pCI_part1', name='<unknown name>', description='<unknown description>', dbxrefs=[]),
 SeqRecord(seq=Seq('AGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAACTGGA...GAG', SingleLetterAlphabet()), id='pCI_part2', name='<unknown name>', description='<unknown description>', dbxrefs=[]))

In [13]:
# Create the split parts based on pCI split 2
pCI_split2_part1, pCI_split2_part2 = split_seqrecord(pCI, pCI_split2)
pCI_split2_part1, pCI_split2_part2


Out[13]:
(SeqRecord(seq=Seq('ACGCGTGGTACCTCTAGAGTCGACCCGGGCGGCCGCTTCGAGCAGACATGATAA...GAG', SingleLetterAlphabet()), id='pCI_part1', name='<unknown name>', description='<unknown description>', dbxrefs=[]),
 SeqRecord(seq=Seq('CTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTATCCGGTA...GAG', SingleLetterAlphabet()), id='pCI_part2', name='<unknown name>', description='<unknown description>', dbxrefs=[]))

Steps:

  1. Compute primers for full amplification.
  2. Compute primers for amplification using split PB2 gene.
  3. Compute primers for amplification using split pCI backbone.

In [14]:
from mbtools.assembly import GibsonAssembler

In [15]:
# Assembly 1: Scenario where VicPB2 and pCI are amplified full-length.
assembly1 = GibsonAssembler([vicpb2, pCI])
for part, product in assembly1.pcr_products().items():
    print(part, len(product))


pCI 4041
A/Victoria/361/2011|KJ942687 2380
/Users/ericmjl/anaconda/envs/mbtools/lib/python3.5/site-packages/Bio/Seq.py:150: BiopythonWarning: Biopython Seq objects now use string comparison. Older versions of Biopython used object comparison. During this transition, please use hash(id(my_seq)) or my_dict[id(my_seq)] if you want the old behaviour, or use hash(str(my_seq)) or my_dict[str(my_seq)] for the new string hashing behaviour.
  "the new string hashing behaviour.", BiopythonWarning)

In [16]:
# Assembly 2: Scenario where VicPB2 is split into two parts at the 1/3 point.
assembly2 = GibsonAssembler([vicpb2_split1_part1, vicpb2_split1_part2, pCI])
for part, product in assembly2.pcr_products().items():
    print(part, len(product))


/Users/ericmjl/anaconda/envs/mbtools/lib/python3.5/site-packages/Bio/Seq.py:150: BiopythonWarning: Biopython Seq objects now use string comparison. Older versions of Biopython used object comparison. During this transition, please use hash(id(my_seq)) or my_dict[id(my_seq)] if you want the old behaviour, or use hash(str(my_seq)) or my_dict[str(my_seq)] for the new string hashing behaviour.
  "the new string hashing behaviour.", BiopythonWarning)
pCI 4041
A/Victoria/361/2011|KJ942687_part2 1600
A/Victoria/361/2011|KJ942687_part1 820

In [17]:
# Assembly 3: Scenario where VicPB2 and pCI are split each at their 1/3 points.
assembly3 = GibsonAssembler([vicpb2_split1_part1, vicpb2_split1_part2, pCI_split1_part1, pCI_split1_part2])
for part, product in assembly3.pcr_products().items():
    print(part, len(product))


/Users/ericmjl/anaconda/envs/mbtools/lib/python3.5/site-packages/Bio/Seq.py:150: BiopythonWarning: Biopython Seq objects now use string comparison. Older versions of Biopython used object comparison. During this transition, please use hash(id(my_seq)) or my_dict[id(my_seq)] if you want the old behaviour, or use hash(str(my_seq)) or my_dict[str(my_seq)] for the new string hashing behaviour.
  "the new string hashing behaviour.", BiopythonWarning)
pCI_part2 2707
pCI_part1 1374
A/Victoria/361/2011|KJ942687_part2 1600
A/Victoria/361/2011|KJ942687_part1 820

In [18]:
# Assembly 4: Scenario where VicPB2 and pCI are split each at their 2/3 points.
assembly4 = GibsonAssembler([vicpb2_split2_part1, vicpb2_split2_part2, pCI_split2_part1, pCI_split2_part2])
for part, product in assembly4.pcr_products().items():
    print(part, len(product))


/Users/ericmjl/anaconda/envs/mbtools/lib/python3.5/site-packages/Bio/Seq.py:150: BiopythonWarning: Biopython Seq objects now use string comparison. Older versions of Biopython used object comparison. During this transition, please use hash(id(my_seq)) or my_dict[id(my_seq)] if you want the old behaviour, or use hash(str(my_seq)) or my_dict[str(my_seq)] for the new string hashing behaviour.
  "the new string hashing behaviour.", BiopythonWarning)
pCI_part2 1374
pCI_part1 2707
A/Victoria/361/2011|KJ942687_part2 820
A/Victoria/361/2011|KJ942687_part1 1600

In [19]:
# Assembly 5: Scenario where we use the 2/5 split point for PB2 and 2/3 split point for pCI.
assembly5 = GibsonAssembler([vicpb2_split3_part1, vicpb2_split3_part2, pCI_split2_part1, pCI_split2_part2])
for part, product in assembly5.pcr_products().items():
    print(part, len(product))


/Users/ericmjl/anaconda/envs/mbtools/lib/python3.5/site-packages/Bio/Seq.py:150: BiopythonWarning: Biopython Seq objects now use string comparison. Older versions of Biopython used object comparison. During this transition, please use hash(id(my_seq)) or my_dict[id(my_seq)] if you want the old behaviour, or use hash(str(my_seq)) or my_dict[str(my_seq)] for the new string hashing behaviour.
  "the new string hashing behaviour.", BiopythonWarning)
pCI_part2 1374
pCI_part1 2707
A/Victoria/361/2011|KJ942687_part2 1444
A/Victoria/361/2011|KJ942687_part1 976

In [20]:
# Assembly 5: Scenario where we use the 3/5 split point for PB2 and 2/3 split point for pCI.
assembly6 = GibsonAssembler([vicpb2_split4_part1, vicpb2_split4_part2, pCI_split2_part1, pCI_split2_part2])
for part, product in assembly6.pcr_products().items():
    print(part, len(product))


/Users/ericmjl/anaconda/envs/mbtools/lib/python3.5/site-packages/Bio/Seq.py:150: BiopythonWarning: Biopython Seq objects now use string comparison. Older versions of Biopython used object comparison. During this transition, please use hash(id(my_seq)) or my_dict[id(my_seq)] if you want the old behaviour, or use hash(str(my_seq)) or my_dict[str(my_seq)] for the new string hashing behaviour.
  "the new string hashing behaviour.", BiopythonWarning)
pCI_part2 1374
pCI_part1 2707
A/Victoria/361/2011|KJ942687_part2 976
A/Victoria/361/2011|KJ942687_part1 1444

In [21]:
def gather_assembly_primers(assembly):
    assembly_primers = []
    for part, primers in assembly.primers().items():
        for primer, sequence in primers.items():
            prec = dict()
            prec['part'] = part
            prec['primer'] = primer
            prec['sequence'] = str(sequence)
            assembly_primers.append(prec)
    return assembly_primers

assembly4_primers = gather_assembly_primers(assembly4)

In [22]:
assembly5_primers = gather_assembly_primers(assembly5)
assembly6_primers = gather_assembly_primers(assembly6)

Check the primers from Assembly 4 into the database.


In [23]:
from tinydb import TinyDB, Query

primerdb = TinyDB('../../data/primers.db.json')

In [24]:
def checkin_primers(primers, database):
    """
    Checks the primers into the database.
    """
    for primer in primers:
        primer['received'] = False
        p = Query()
        if not database.contains(p.sequence == primer['sequence']):
            database.insert(primer)
            print('Inserting {primer}...'.format(primer=primer))
            
checkin_primers(assembly4_primers, primerdb)

In [25]:
checkin_primers(assembly5_primers, primerdb)

In [26]:
checkin_primers(assembly6_primers, primerdb)

In [27]:
len(primerdb)


Out[27]:
36

Some primers have already been received. Here is a function that marks them as "received".


In [28]:
def mark_unreceived_primers(database, eids):
    """
    Marks a list of primers as not received.
    """
    print('updating EIDs as being not received')
    database.update({'received': False}, eids=eids)
        
    for eid in eids:
        assert database.get(eid=eid)['received'] == False

In [29]:
def mark_primers_as_received(database, eids):
    assert isinstance(eids, list)
    print('updating EIDs as being received')
    database.update({'received':True}, eids=eids)
    
    # Very quick check to make sure it is truly received.
    for eid in eids:
        assert database.get(eid=eid)['received'] == True
    
mark_primers_as_received(primerdb, list(range(25, 36)))


updating EIDs as being received

Export the primers to be ordered.


In [30]:
import pandas as pd


q = Query()
primers_to_order = primerdb.search(q.received == False)
primers_to_order
for p in primers_to_order:
    eid = primerdb.get(Query().sequence == p['sequence']).eid
    p['primer_id'] = 'EM-{0}'.format(eid)
    print(p)
pd.DataFrame(primers_to_order).to_csv('primers.csv')


{'sequence': 'TCTGGTAATACTCCAACCATTCCCATCACACTGTCGATGTGTTCAACTCCCCAATTTTGA', 'primer_id': 'EM-36', 'part': 'A/Victoria/361/2011|KJ942687_part1', 'received': False, 'primer': 're_gibson'}

In [31]:
# Get primers out from each of the reactions.
a1_primers = assembly1.primers()
a1_primers


Out[31]:
defaultdict(dict,
            {'A/Victoria/361/2011|KJ942687': {'3p_sequencing': Seq('GTGTTGGTAATGAAACGAAA', SingleLetterAlphabet()),
              '5p_sequencing': Seq('AATTATGGCCATATGGTCCA', SingleLetterAlphabet()),
              'fw_gibson': Seq('CACTATAGGCTAGCCTCGAGAGCAAAAGCAGGTCAATTATATTCAGTATGGAAAGAATAA', SingleLetterAlphabet()),
              're_gibson': Seq('ACTCTAGAGGTACCACGCGTGTAGAAACAAGGTCGTTTTTAAACTATTCAGCATTAATTG', SingleLetterAlphabet())},
             'pCI': {'3p_sequencing': Seq('GCACCTATTGGTCTTACTGA', SingleLetterAlphabet()),
              '5p_sequencing': Seq('TTTCACAAATAAAGCATTTT', SingleLetterAlphabet()),
              'fw_gibson': Seq('AAAAACGACCTTGTTTCTACACGCGTGGTACCTCTAGAGTCGACCCGGGCGGCCGCTTCG', SingleLetterAlphabet()),
              're_gibson': Seq('ATAATTGACCTGCTTTTGCTCTCGAGGCTAGCCTATAGTGAGTCGTATTAAGTACTCTAG', SingleLetterAlphabet())}})

In [32]:
# Match up the Fw_gibson and Re_gibson primers for each template.
len(a1_primers['pCI']['fw_gibson'])


Out[32]:
60

In [33]:
def pprint_primers(assembly):
    primer_set = assembly.primers()
    
    for tmplt, prmrs in primer_set.items():
        print(tmplt)
        for prmr, sqnc in prmrs.items():
            if 'gibson' in prmr:
                # print(prmr)
                p = Query()
                eid = primerdb.get(p.sequence == str(sqnc)).eid
                print('  Primer: {0}'.format(prmr))
                print('  Primer ID:  EM-{0}'.format(eid))
                print('  Tm:         {0:.1f}ºC'.format(Tm_NN(sqnc)))
        print('  Size:       {0} bp'.format(len(assembly.pcr_products()[tmplt])))
        print('')

In [34]:
from Bio.SeqUtils.MeltingTemp import Tm_Wallace, Tm_staluc, Tm_NN
pprint_primers(assembly1)


pCI
  Primer: fw_gibson
  Primer ID:  EM-32
  Tm:         76.4ºC
  Primer: re_gibson
  Primer ID:  EM-25
  Tm:         67.5ºC
  Size:       4041 bp

A/Victoria/361/2011|KJ942687
  Primer: fw_gibson
  Primer ID:  EM-30
  Tm:         66.6ºC
  Primer: re_gibson
  Primer ID:  EM-27
  Tm:         67.3ºC
  Size:       2380 bp

We will amplify Victoria PB2 using EM-2 and EM-11 on the mutagenesis kit, and amplify pCI using EM-6 and EM-15 using Phusion GC master-mix.


In [35]:
for part, seq in assembly4.pcr_products().items():
    print(part, len(seq))


pCI_part2 1374
pCI_part1 2707
A/Victoria/361/2011|KJ942687_part2 820
A/Victoria/361/2011|KJ942687_part1 1600

In [36]:
# The PCR protocol for pCI is dependent on the length of the PCR product.

bases_per_min = 2000

len(assembly1.pcr_products()['pCI']) / bases_per_min


Out[36]:
2.0205

In [37]:
pprint_primers(assembly4)


pCI_part2
  Primer: fw_gibson
  Primer ID:  EM-26
  Tm:         75.3ºC
  Primer: re_gibson
  Primer ID:  EM-25
  Tm:         67.5ºC
  Size:       1374 bp

pCI_part1
  Primer: fw_gibson
  Primer ID:  EM-32
  Tm:         76.4ºC
  Primer: re_gibson
  Primer ID:  EM-31
  Tm:         74.6ºC
  Size:       2707 bp

A/Victoria/361/2011|KJ942687_part2
  Primer: fw_gibson
  Primer ID:  EM-28
  Tm:         69.7ºC
  Primer: re_gibson
  Primer ID:  EM-27
  Tm:         67.3ºC
  Size:       820 bp

A/Victoria/361/2011|KJ942687_part1
  Primer: fw_gibson
  Primer ID:  EM-30
  Tm:         66.6ºC
  Primer: re_gibson
  Primer ID:  EM-29
  Tm:         69.3ºC
  Size:       1600 bp


In [38]:
for part, seq in assembly4.pcr_products().items():
    print(part, len(seq))


pCI_part2 1374
pCI_part1 2707
A/Victoria/361/2011|KJ942687_part2 820
A/Victoria/361/2011|KJ942687_part1 1600

In [ ]: