Golden gate cloning simulation using pydna

The objective is to assemble three 50 bp sequences into one circular sequence.

We will use the assembly_fragments function and the Assembly class.


In [1]:
from pydna.all import *

The sequences below were generated here.


In [2]:
frags = parse('''

>1|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 50 bp
ccagaatacagtgccttagatctacggatcgtatctgcgatttggccgat

>2|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 50 bp
gccctgcttggtagatcaggcgagccaataacattctatagtgtagcctt

>3|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 50 bp
gagagcgctcctgtttcaatgcttgcaaactctagcagctatactgtagg ''' )

In [3]:
frags


Out[3]:
[Dseqrecord(-50), Dseqrecord(-50), Dseqrecord(-50)]

We make a list of amplicons (sequences with pairs of primers from the Dseqrecords)


In [4]:
amplicons = [primer_design(f) for f in frags]

We need a list of golden gate linkers, these could be generated automatically in some other way.


In [5]:
golden_gate_linkers = [Dseqrecord(lnk) for lnk in "GAAT GATC AATT GAAT".split()]

In [6]:
golden_gate_linkers


Out[6]:
[Dseqrecord(-4), Dseqrecord(-4), Dseqrecord(-4), Dseqrecord(-4)]

In [7]:
from itertools import chain, zip_longest

we zip together the golden gate linkers and sequences to a flat list.


In [8]:
seqlist = list( chain.from_iterable( zip_longest(golden_gate_linkers, amplicons)))[:-1]

In [9]:
seqlist


Out[9]:
[Dseqrecord(-4),
 Amplicon(50),
 Dseqrecord(-4),
 Amplicon(50),
 Dseqrecord(-4),
 Amplicon(50),
 Dseqrecord(-4)]

The optional settings below are important. Sequences with a size equal to or shorter than maxlink will be incorporated in the primers. overlap controls the overlap between the sequences in the assembly.


In [10]:
a,b,c = assembly_fragments( seqlist, maxlink=4, overlap=4 )

We get only three sequences, since the golden gate linkers are incorporated in the primers. Lets give them nicer names:


In [11]:
a.locus, b.locus, c.locus = "sequenceA", "sequenceB", "sequenceC"

In [12]:
a.figure()


Out[12]:
    5ccagaatacagtgccttag...cgatttggccgat3
                           ||||||||||||| tm 48.1 (dbd) 56.2
                          3gctaaaccggctaCTAG5
5GAATccagaatacagtgccttag3
     ||||||||||||||||||| tm 53.0 (dbd) 56.6
    3ggtcttatgtcacggaatc...gctaaaccggcta5

In [13]:
b.figure()


Out[13]:
    5gccctgcttggtaga...caataacattctatagtgtagcctt3
                       ||||||||||||||||||||||||| tm 53.7 (dbd) 57.8
                      3gttattgtaagatatcacatcggaaTTAA5
5GATCgccctgcttggtaga3
     ||||||||||||||| tm 53.7 (dbd) 57.4
    3cgggacgaaccatct...gttattgtaagatatcacatcggaa5

In [14]:
c.figure()


Out[14]:
    5gagagcgctcctgtt...aactctagcagctatactgtagg3
                       ||||||||||||||||||||||| tm 56.1 (dbd) 56.9
                      3ttgagatcgtcgatatgacatccCTTA5
5AATTgagagcgctcctgtt3
     ||||||||||||||| tm 54.4 (dbd) 56.5
    3ctctcgcgaggacaa...ttgagatcgtcgatatgacatcc5

We can assemble these by setting the limit to 4 and only_terminal_overlaps to True. With such short homology limit, we need to consider only terminal overlaps, otherwise we would get many irrelevant results.


In [15]:
from pydna.assembly import terminal_overlap

In [16]:
asm = Assembly((a,b,c), limit=4, algorithm=terminal_overlap)
asm


Out[16]:
Assembly
fragments..: 58bp 58bp 58bp
limit(bp)..: 4
G.nodes....: 4
algorithm..: terminal_overlap

We got three circular products. The second one should be the same as the theoretical one below:


In [17]:
correct = Dseqrecord("")
for s in seqlist[1:]:
    correct += s
correct = correct.looped()

In [18]:
correct.cseguid()


Out[18]:
3xa1SOyFzIkaq7SUZGYD5YrUzsc

In [19]:
candidate = asm.assemble_circular()[1]

In [20]:
candidate.cseguid()


Out[20]:
3xa1SOyFzIkaq7SUZGYD5YrUzsc

The candidate and the correct sequence has the same cseguid, so they represent the same circular sequence. We need to add the BsaI restriction enzyme recognition sequence (plus one nucleotide to get the cut right) to the primers:


In [21]:
from Bio.Restriction import BsaI

In [22]:
BsaI.site


Out[22]:
'GGTCTC'

In [23]:
for f in (a,b,c):
    f.forward_primer = BsaI.site + "a" + f.forward_primer
    f.reverse_primer = BsaI.site + "a" + f.reverse_primer
    print(f.name)
    print(f.forward_primer.format("tab"))
    print(f.reverse_primer.format("tab"))
    print(f.figure())


sequenceA
f50	GGTCTCaGAATccagaatacagtgccttag

r50	GGTCTCaGATCatcggccaaatcg

           5ccagaatacagtgccttag...cgatttggccgat3
                                  ||||||||||||| tm 48.1 (dbd) 56.2
                                 3gctaaaccggctaCTAGaCTCTGG5
5GGTCTCaGAATccagaatacagtgccttag3
            ||||||||||||||||||| tm 53.0 (dbd) 56.6
           3ggtcttatgtcacggaatc...gctaaaccggcta5
sequenceB
f50	GGTCTCaGATCgccctgcttggtaga

r50	GGTCTCaAATTaaggctacactatagaatgttattg

           5gccctgcttggtaga...caataacattctatagtgtagcctt3
                              ||||||||||||||||||||||||| tm 53.7 (dbd) 57.8
                             3gttattgtaagatatcacatcggaaTTAAaCTCTGG5
5GGTCTCaGATCgccctgcttggtaga3
            ||||||||||||||| tm 53.7 (dbd) 57.4
           3cgggacgaaccatct...gttattgtaagatatcacatcggaa5
sequenceC
f50	GGTCTCaAATTgagagcgctcctgtt

r50	GGTCTCaATTCcctacagtatagctgctagagtt

           5gagagcgctcctgtt...aactctagcagctatactgtagg3
                              ||||||||||||||||||||||| tm 56.1 (dbd) 56.9
                             3ttgagatcgtcgatatgacatccCTTAaCTCTGG5
5GGTCTCaAATTgagagcgctcctgtt3
            ||||||||||||||| tm 54.4 (dbd) 56.5
           3ctctcgcgaggacaa...ttgagatcgtcgatatgacatcc5

In [24]:
first_prod = pcr(a.forward_primer, a.reverse_primer, a.template)

In [25]:
first_prod.figure()


Out[25]:
           5ccagaatacagtgccttag...cgatttggccgat3
                                  ||||||||||||| tm 48.1 (dbd) 56.2
                                 3gctaaaccggctaCTAGaCTCTGG5
5GGTCTCaGAATccagaatacagtgccttag3
            ||||||||||||||||||| tm 53.0 (dbd) 56.6
           3ggtcttatgtcacggaatc...gctaaaccggcta5

In [26]:
first_prod.cut(BsaI)


Out[26]:
(Dseqrecord(-11), Dseqrecord(-58), Dseqrecord(-11))

In [27]:
first_prod.cut(BsaI)[1].seq


Out[27]:
Dseq(-58)
GAATccag..cgat    
    ggtc..gctaCTAG