Gibson primer design & assembly

This notebook describe primer design for assembly of linear DNA fragments by techniques like homologous recombination or Gibson assembly. The goal of this experiemtn is to create a Saccharomyces cerevisiae vector that expresses the cytochrome C1 gene CYC1 with a c-terminal GFP tag using the yeast expression vector p426GPD. We also would like to have a unique restriction site between the promoter in p426GPD (Which is the TDH3 promoter).

This notebook designs the necessary primers for this experiment. For more inormation on Gibson assembly, addgene has a nice page here.

The first step is to read the sequences from local files. The sequences can also be read directly from genbank using their accession numbers which are:


In [1]:
from pydna.readers import read

In [2]:
cyc1 = read("cyc1.gb")

In [3]:
cyc1


Out[3]:

The cyc1.gb sequence file only contains the ORF, so we can use it directly. The sequence file can be inspected using the ling above.


In [4]:
cyc1.isorf()


Out[4]:
True

In [5]:
pUG35 = read("pUG35.gb")

In [6]:
pUG35


Out[6]:

In [7]:
p426GPD = read("p426GPD.gb")

In [8]:
p426GPD


Out[8]:

The pUG35 is a plasmid containing the GFP gene. We have to find the exact DNA fragment we want. The pUG35 genbank file contains features, one of which is the GFP ORF. Inspection in ApE concluded that the feature number 5 in the list below is the GFP ORF.


In [9]:
pUG35.list_features()


Out[9]:
+-----+------------------+-----+------+------+------+--------------+------+
| Ft# | Label or Note    | Dir | Sta  | End  |  Len | type         | orf? |
+-----+------------------+-----+------+------+------+--------------+------+
|   0 | N:derived from S | --> | 0    | 6231 | 6231 | source       |  no  |
|   1 | nd               | --> | 416  | 1220 |  804 | gene         | yes  |
|   2 | nd               | --> | 416  | 1220 |  804 | CDS          | yes  |
|   3 | N:from CYC1      | --> | 2003 | 2262 |  259 | terminator   |  no  |
|   4 | nd               | <-- | 2270 | 2987 |  717 | gene         | yes  |
|   5 | nd               | <-- | 2270 | 2987 |  717 | CDS          | yes  |
|   6 | N:from MET25     | --> | 3050 | 3443 |  393 | promoter     |  no  |
|   7 | nd               | --> | 3881 | 3954 |   73 | rep_origin   |  no  |
|   8 | nd               | <-- | 4656 | 5517 |  861 | gene         | yes  |
|   9 | nd               | <-- | 4656 | 5517 |  861 | CDS          | yes  |
|  10 | N:CEN6/ARSH4     | --> | 5655 | 6170 |  515 | misc_feature |  no  |
+-----+------------------+-----+------+------+------+--------------+------+

We extract the GFP sequence from Feature #5. The GFP gene is on the antisense strand, but it is returned in the correct orientation:


In [10]:
gfp=pUG35.extract_feature(5)

In [11]:
gfp.seq


Out[11]:
Dseq(-717)
ATGT..ATAA
TACA..TATT

In [12]:
gfp.isorf()


Out[12]:
True

We need to linearize p426GPD vector before the assembly. The SmaI restriction enzyme cuts between the promoter and the terminator.


In [13]:
from Bio.Restriction import SmaI

In [14]:
linear_vector= p426GPD.linearize(SmaI)

In [15]:
linear_vector


Out[15]:
Dseqrecord(-6606)

In [16]:
from pydna.design import primer_design

We will amplify mosrt of the fragments using PCR, so we have to design primers first.


In [17]:
cyc1_amplicon = primer_design(cyc1)

The primer_design function returns an Amplicon object which describes a PCR amplification:


In [18]:
cyc1_amplicon.figure()


Out[18]:
5ATGACTGAATTCAAGGC...GAAAAAAGCCTGTGAGTAA3
                     ||||||||||||||||||| tm 51.4 (dbd) 56.3
                    3CTTTTTTCGGACACTCATT5
5ATGACTGAATTCAAGGC3
 ||||||||||||||||| tm 50.2 (dbd) 55.0
3TACTGACTTAAGTTCCG...CTTTTTTCGGACACTCATT5

In [19]:
gfp_amplicon = primer_design(gfp)

Then it is practical to collect all fragments to be assembled in a list or tuple. Note that below, the linear_vector appears both in the beginning and at the end. We do this since we would like to have a circular assembly.


In [20]:
fragments = ( linear_vector, cyc1_amplicon, gfp_amplicon, linear_vector )

We would like to have a unique cutting enzyme befor the cyc1 gene, so we should try to find some that dont cut:


In [21]:
from Bio.Restriction import BamHI

In [22]:
if not any( x.cut(BamHI) for x in fragments ):
    print("no cut!")
else:
    print("cuts!")


cuts!

In [23]:
from Bio.Restriction import NotI

BamHI apparently cuts, lets try with NotI


In [24]:
if not any( x.cut(NotI) for x in fragments ):
    print("no cut!")
else:
    print("cuts!")


no cut!

NotI does not cut, lets use this!


In [25]:
from pydna.dseqrecord import Dseqrecord

In [26]:
site = Dseqrecord(NotI.site)

In [27]:
site.seq


Out[27]:
Dseq(-8)
GCGGCCGC
CGCCGGCG

In [28]:
from pydna.design import assembly_fragments

In [29]:
linear_vector.locus = "p426GPD"
cyc1_amplicon.locus = "CYC1"
gfp_amplicon.locus = "GFP"

In [30]:
fragment_list = assembly_fragments((linear_vector, site, cyc1_amplicon,gfp_amplicon,linear_vector))

In [31]:
fragment_list


Out[31]:
[Dseqrecord(-6606), Amplicon(391), Amplicon(770), Dseqrecord(-6606)]

We note that the amplicons are now a little bit larger than before. The assembly_fragments function basically adds tails to the primers of amplicon objects to facilitate the assembly. The NotI site is small ,so it was incorporated in the formward PCR primer of the CYC1 Amplicon. We can see that the CYC1 primers are quite a bit longer:


In [32]:
fragment_list[1].figure()


Out[32]:
                                           5ATGACTGAATTCAAGGC...GAAAAAAGCCTGTGAGTAA3
                                                                ||||||||||||||||||| tm 51.4 (dbd) 56.3
                                                               3CTTTTTTCGGACACTCATTTACAGATTTCCACTTCTT5
5TAGTTTCGACGGATTCTAGAACTAGTGGATCCCCCGCGGCCGCATGACTGAATTCAAGGC3
                                            ||||||||||||||||| tm 50.2 (dbd) 55.0
                                           3TACTGACTTAAGTTCCG...CTTTTTTCGGACACTCATT5

Finally, we assemble the fragments using the Assembly class


In [33]:
from pydna.assembly import Assembly

We remove the final fragment, since we want a circular fragment.


In [34]:
fragment_list = fragment_list[:-1]

In [35]:
fragment_list


Out[35]:
[Dseqrecord(-6606), Amplicon(391), Amplicon(770)]

In [36]:
asm = Assembly(fragment_list)

In [37]:
asm


Out[37]:
Assembly
fragments..: 6606bp 391bp 770bp
limit(bp)..: 25
G.nodes....: 6
algorithm..: common_sub_strings

In [38]:
candidate = asm.assemble_circular()[0]

In [39]:
candidate


Out[39]:
 -|p426GPD|35
|          \/
|          /\
|          35|391bp_PCR_prod|36
|                            \/
|                            /\
|                            36|770bp_PCR_prod|35
|                                              \/
|                                              /\
|                                              35-
|                                                 |
 -------------------------------------------------

In [40]:
p426GPD_CYC1_GFP = candidate

In [41]:
p426GPD_CYC1_GFP.write("p426GPD_CYC1_GFP.gb")





In [42]:
from pydna.amplicon import Amplicon

In [43]:
amplicons1 = [x for x in fragment_list if isinstance(x, Amplicon)]

In [44]:
amplicons1


Out[44]:
[Amplicon(391), Amplicon(770)]

In [45]:
# Get forward and reverse primer for each Amplicon
primers1 = [(y.forward_primer, y.reverse_primer) for y in amplicons1]

In [46]:
# print primer pairs:
for pair in primers1:
    print(pair[0].format("fasta"))
    print(pair[1].format("fasta"))
    print()


>f330 CYC1
TAGTTTCGACGGATTCTAGAACTAGTGGATCCCCCGCGGCCGCATGACTGAATTCAAGGC

>r330 CYC1
TTCTTCACCTTTAGACATTTACTCACAGGCTTTTTTC


>f717 feat_AF2987871
AAAAAAGCCTGTGAGTAAATGTCTAAAGGTGAAGAATTATT

>r717 feat_AF2987871
GTATCGATAAGCTTGATATCGAATTCCTGCAGCCCTTATTTGTACAATTCATCCATAC