This pydna notebook automatically assembles the sequences of the 32 expression vectors described in
These plasmids were made from the pRS series of yeast vectors.
a [XhoI - KpnI fragment of the ScCYC1 terminator] was cloned using the same sites in pRS416 resulting in p416CYC1t
PCR generated SacI - XbaI promoter fragments of ScTDH3pr, ScTEF1pr, ScADH1pr and ScCYC1pr promoters were cloned using the same sites resulting in:
p416GPD sequenced SacI,XbaI,XhoI,KpnI are unique
p416TEF sequenced SacI,XbaI,XhoI,KpnI are unique
p416ADH
p416CYC sequenced SacI,XbaI,KpnI are unique
The SacI - KpnI promoter + terminator fragment was transferred to other pRS vectors using a strategy that is undisclosed in the paper.
ScTDH3pr-ScCYC1t
ScTEF1pr-ScCYC1t
ScADH1pr-ScCYC1t
ScCYC1pr-ScCYC1t
Probably SacI - KpnI were used. In the case of pRS423 and pRS425, there are two KpnI sites, so maybe a partial cut was used?
this part is the opposite direction in the paper (this version here is correct!)
<--------------->
mcs:
7 SpeI 19 SmaI 31 EcoRI 43 HindIII 64 XhoI
1 XbaI 13 BamHI 25 PstI 37 EcoRV 49 ClaI 58 SalI
| | | | | | | | | | |
TCTAGAACTAGTGGATCCCCCGGGCTGCAGGAATTCGATATCAAGCTTATCGATACCGTCGACCTCGAG
TCTAGAACTAGTGGATCCCCCGGGCTGCAGC...|
---------
|...GAATTCGATATCAAGCTTATCGATACCGTCG
TCTAGAACTAGTGGATCCCCCGGGCTGCAGAatggttttcggtaacaggca
ccgttcggtctttagatagttaaGAATTCGATATCAAGCTTATCGATACCGTCGACCTCGAG
>primerfw
TCTAGAACTAGTGGATCCCCCGGGCTGCAGA...
>primerrev
CGACGGTATCGATAAGCTTGATATCGAATTC...
XbaI SpeI BamHI SmaI PstI EcoRI EcoRV HindIII ClaI SalI XhoI
p413 HIS3 XbaI SpeI BamHI SmaI EcoRI EcoRV ClaI SalI XhoI
p414 TRP1 XbaI SpeI BamHI SmaI PstI EcoRI ClaI SalI XhoI
p415 LEU2 XbaI SpeI BamHI SmaI PstI HindIII SalI XhoI
p416 URA3 XbaI SpeI BamHI SmaI EcoRI HindIII ClaI SalI XhoI
p423 HIS3 SpeI BamHI SmaI EcoRI EcoRV ClaI SalI XhoI
p424 TRP1 SpeI BamHI SmaI PstI EcoRI ClaI SalI XhoI
p425 LEU2 SpeI BamHI SmaI PstI HindIII SalI XhoI
p426 URA3 SpeI BamHI SmaI EcoRI HindIII ClaI SalI XhoI
XbaI only unique in p41XXXX (All centromeric vectors, but no episomal vectors) XhoI is not unique in p4XXCYC1 (any vector with ScCYC1 promoter)
M# vector ori marker
-- -------
M1 p413GPD CEN6/ARSH4 HIS3
M2 p423GPD 2 um HIS3
M3 p414GPD CEN6/ARSH4 TRP1
M4 p424GPD 2 um TRP1
M5 p415GPD CEN6/ARSH4 LEU2
M6 p425GPD 2 um LEU2
M7 p416GPD CEN6/ARSH4 URA3 sequenced http://www.ncbi.nlm.nih.gov/nuccore/82548122
M8 p426GPD 2 um URA3 sequenced http://www.ncbi.nlm.nih.gov/nuccore/76496266
M9 p413TEF CEN6/ARSH4 HIS3
M10 p423TEF 2 um HIS3
M11 p414TEF CEN6/ARSH4 TRP1
M12 p424TEF 2 um TRP1
M13 p415TEF CEN6/ARSH4 LEU2
M14 p425TEF 2 um LEU2
M15 p416TEF CEN6/ARSH4 URA3 sequenced http://www.ncbi.nlm.nih.gov/nuccore/124365231
M16 p426TEF 2 um URA3 assemb
M17 p413ADH CEN6/ARSH4 HIS3
M18 p423ADH 2 um HIS3
M19 p414ADH CEN6/ARSH4 TRP1
M20 p424ADH 2 um TRP1
M21 p415ADH CEN6/ARSH4 LEU2
M22 p425ADH 2 um LEU2 <-bad, not sold from ATCC anymore
M23 p416ADH CEN6/ARSH4 URA3
M24 p426ADH 2 um URA3
M25 p413CYC CEN6/ARSH4 HIS3
M26 p423CYC 2 um HIS3
M27 p414CYC CEN6/ARSH4 TRP1
M28 p424CYC 2 um TRP1
M29 p415CYC CEN6/ARSH4 LEU2
M30 p425CYC 2 um LEU2
M31 p416CYC CEN6/ARSH4 URA3 sequenced
M32 p426CYC 2 um URA3
In [1]:
from pydna.all import *
gb =Genbank("bjornjobb@gmail.com")
# The DNA fragments below were obtained from Saccharomyces Genome Database
# www.sgd.org
cyc1term =Dseqrecord("ctcgagtcatgtaattagttatgtcacgcttacattcacgccctccccccacat"
"ccgctctaaccgaaaaggaaggagttagacaacctgaagtctaggtccctattt"
"atttttttatagttatgttagtattaagaacgttatttatatttcaaatttttc"
"ttttttttctgtacagacgcgtgtacgcatgtaacattatactgaaaaccttgc"
"ttgagaaggttttgggacgctcgaaggctttaatttgcggccggtaccn")
tefprom =Dseqrecord("gagctcatagcttcaaaatgtttctactccttttttactcttccagattttctcgga"
"ctccgcgcatcgccgtaccacttcaaaacacccaagcacagcatactaaatttcccc"
"tctttcttcctctagggtgtcgttaattacccgtactaaaggtttggaaaagaaaaa"
"agagaccgcctcgtttctttttcttcgtcgaaaaaggcaataaaaatttttatcacg"
"tttctttttcttgaaaattttttttttgatttttttctctttcgatgacctcccatt"
"gatatttaagttaataaacggtcttcaatttctcaagtttcagtttcatttttcttg"
"ttctattacaactttttttacttcttgctcattagaaagaaagcatagcaatctaat"
"ctaagttttctaga")
cycprom =Dseqrecord("gagctcatttggcgagcgttggttggtggatcaagcccacgcgtaggcaatcctcga"
"gcagatccgccaggcgtgtatatatagcgtggatggccaggcaactttagtgctgac"
"acatacaggcatatatatatgtgtgcgacgacacatgatcatatggcatgcatgtgc"
"tctgtatgtatataaaactcttgttttcttcttttctctaaatattctttccttata"
"cattaggacctttgcagcataaattactatacttctatagacacgcaaacacaaata"
"cacacactaatctaga")
adhprom =Dseqrecord("gagctcTAAAACAAGAAGAGGGTTGACTACATCACGATGAGGGGGATCGAAGAAATG"
"ATGGTAAATGAAATAGGAAATCAAGGAGCATGAAGGCAAAAGACAAATATAAGGGTC"
"GAACGAAAAATAAAGTGAAAAGTGTTGATATGATGTATTTGGCTTTGCGGCGCCGAA"
"AAAACGAGTTTACGCAATTGCACAATCATGCTGACTCTGTGGCGGACCCGCGCTCTT"
"GCCGGCCCGGCGATAACGCTGGGCGTGAGGCTGTGCCCGGCGGAGTTTTTTGCGCCT"
"GCATTTTCCAAGGTTTACCCTGCGCTAAGGGGCGAGATTGGAGAAGCAATAAGAATG"
"CCGGTTGGGGTTGCGATGATGACGACCACGACAACTGGTGTCATTATTTAAGTTGCC"
"GAAAGAACCTGAGTGCATTTGCAACATGAGTATACTAGAAGAATGAGCCAAGACTTG"
"CGAGACGCGAGTTTGCCGGTGGTGCGAACAATAGAGCGACCATGACCTTGAAGGTGA"
"GACGCGCATAACCGCTAGAGTACTTTGAAGAGGAAACAGCAATAGGGTTGCTACCAG"
"TATAAATAGACAGGTACATACAACACTGGAAATGGTTGTCTGTTTGAGTACGCTTTC"
"AATTCATTTGGGTGTGCACTTTATTATGTTACAATATGGAAGGGAACTTTACACTTC"
"TCCTATGCACATATATTAATTAAAGTCCAATGCTAGTAGAGAAGGGGGGTAACACCC"
"CTCCGCGCTCTTTTCCGATTTTTTTCTAAACCGTGGAATATTTCGGATATCCTTTTG"
"TTGTTTCCGGGTGTACAATATGGACTTCCTCTTTTCTGGCAACCAAACCCATACATC"
"GGGATTCCTATAATACCTTCGTTGGTCTCCCTAACATGTAGGTGGCGGAGGGGAGAT"
"ATACAATAGAACAGATACCAGACAAGACATAATGGGCTAAACAAGACTACACCAATT"
"ACACTGCCTCATTGATGGTGGTACATAACGAACTAATACTGTAGCCCTAGACTTGAT"
"AGCCATCATCATATCGAAGTTTCACTACCCTTTTTCCATTTGCCATCTATTGAAGTA"
"ATAATAGGCGCATGCAACTTCTTTTCTTTTTTTTTCTTTTCTCTCTCCCCCGTTGTT"
"GTCTCACCATATCCGCAATGACAAAAAAATGATGGAAGACACTAAAGGAAAAAATTA"
"ACGACAAAGACAGCACCAACAGATGTCGTTGTTCCAGAGCTGATGAGGGGTATCTCG"
"AAGCACACGAAACTTTTTCCTTCCTTCATTCACGCACACTACTCTCTAATGAGCAAC"
"GGTATACGGCCTTCCTTCCAGTTACTTGAATTTGAAATAAAAAAAAGTTTGCTGTCT"
"TGCTATCAAGTATAAATAGACCTGCAATTATTAATCTTTTGTTTCCTCGTCATTGTT"
"CTCGTTCCCTTTCTTCCTTGTTTCTTTTTCTGCACAATATTTCAAGCTATACCAAGC"
"ATACAATCAACTATCTCATATACAtctaga")
gpdprom =Dseqrecord("gagctcagtttatcattatcaatactcgccatttcaaagaatacgtaaataattaat"
"agtagtgattttcctaactttatttagtcaaaaaattagccttttaattctgctgta"
"acccgtacatgcccaaaatagggggcgggttacacagaatatataacatcgtaggtg"
"tctgggtgaacagtttattcctggcatccactaaatataatggagcccgctttttaa"
"gctggcatccagaaaaaaaaagaatcccagcaccaaaatattgttttcttcaccaac"
"catcagttcataggtccattctcttagcgcaactacagagaacaggggcacaaacag"
"gcaaaaaacgggcacaacctcaatggagtgatgcaacctgcctggagtaaatgatga"
"cacaaggcaattgacccacgcatgtatctatctcattttcttacaccttctattacc"
"ttctgctctctctgatttggaaaaagctgaaaaaaaaggttgaaaccagttccctga"
"aattattcccctacttgactaataagtatataaagacggtaggtattgattgtaatt"
"ctgtaaatctatttcttaaacttcttaaattctacttttatagttagtctttttttt"
"agttttaaaacaccagaacttagtttcgacggattctaga")
In [2]:
from Bio.Restriction import XhoI, KpnI, SacI, XbaI
# The Mumberg vectors depend on these pRS plasmids
pRS413 = gb.nucleotide("U03447")
pRS414 = gb.nucleotide("U03448")
pRS415 = gb.nucleotide("U03449")
pRS416 = gb.nucleotide("U03450")
pRS423 = gb.nucleotide("U03454")
pRS424 = gb.nucleotide("U03453")
pRS425 = gb.nucleotide("U03452")
pRS426 = gb.nucleotide("U03451")
In [3]:
# The CYC1 terminator was cloned in pRS416 using XhoI and KpnI
p416cyc1term = ( pRS416.cut(XhoI, KpnI)[0].rc() + cyc1term.cut(XhoI, KpnI)[1] ).looped()
In [4]:
p416cyc1term
Out[4]:
In [5]:
# The p416cyc1term was opened using SacI and XbaI
stuffer, bb = p416cyc1term.cut(SacI, XbaI)
bb, stuffer
Out[5]:
In [6]:
# Each promoter was cloned in front of the CYC1 terminator
p416GPD = ( bb + gpdprom.cut(SacI, XbaI)[1] ).looped().synced("gacgaaagggcctcgtgatacgccta")
p416TEF = ( bb + tefprom.cut(SacI, XbaI)[1] ).looped().synced("gacgaaagggcctcgtgatacgccta")
p416ADH = ( bb + adhprom.cut(SacI, XbaI)[1] ).looped().synced("gacgaaagggcctcgtgatacgccta")
p416CYC = ( bb + cycprom.cut(SacI, XbaI)[1] ).looped().synced("gacgaaagggcctcgtgatacgccta")
In [7]:
# Write the first four plamids to files
p416GPD.write("p416GPD.gb")
p416TEF.write("p416TEF.gb")
p416ADH.write("p416ADH.gb")
p416CYC.write("p416CYC.gb")
In [8]:
# cut out each type of promoter
gpdcyc = p416GPD.cut(SacI, KpnI)[0]
gpdcyc.name = "GPD"
tefcyc = p416TEF.cut(SacI, KpnI)[0]
tefcyc.name = "TEF"
adhcyc = p416ADH.cut(SacI, KpnI)[0]
adhcyc.name = "ADH"
cyccyc = p416CYC.cut(SacI, KpnI)[0]
cyccyc.name = "CYC"
In [9]:
from IPython.display import FileLink
cassettes = [ gpdcyc, tefcyc, adhcyc, cyccyc ]
pRS = [ pRS413, pRS423, pRS414, pRS424, pRS415, pRS425, pRS426 ]
for cassette in cassettes:
for v in pRS:
pl = (v.cut(SacI, KpnI)[0].rc() + cassette ).looped().synced("gacgaaagggcctcgtgatacgccta")
new_name = "p{no}{prom}.gb".format(no = v.name[3:], prom = cassette.name)
pl.write(new_name)
if new_name =="p426GPD.gb":
p426GPD=pl
In [10]:
# download four of the sequences available from genbank
p416GPD_genbank = gb.nucleotide("DQ269148")
p426GPD_genbank = gb.nucleotide("DQ019861").looped() # This Genbank record is linear although it describes a circular plasmid
p416TEF_genbank = gb.nucleotide("EF210199")
p416CYC_genbank = gb.nucleotide("EF210198")
In [11]:
# verify that the sequences made here and the ones
# available from genbank are equivalent
assert eq(p416GPD_genbank, p416GPD)
assert eq(p416TEF_genbank, p416TEF)
assert eq(p416CYC_genbank, p416CYC)
assert eq(p426GPD_genbank, p426GPD)
print("done!")