In [1]:
from DNASkittleUtils.Contigs import read_contigs, Contig, write_contigs_to_file

In [2]:
contigs = read_contigs("D:\Genomes\Ash BATG-0.5-CLCbioSSPACE\BATG-0.5-CLCbioSSPACE.fa")

Using the two contig names you sent me it's simplest to do this:


In [5]:
desired_contigs = ['Contig' + str(x) for x in [1131, 3182, 39106, 110, 5958]]
desired_contigs


Out[5]:
['Contig1131', 'Contig3182', 'Contig39106', 'Contig110', 'Contig5958']

If you have a genuinely big file then I would do the following:

desired_contigs = open('Data/BATG-selects.txt', 'r').read().splitlines() unique_desired = set(desired_contigs) len(unique_desired), desired_contigs[:10]

In [7]:
grab = [c for c in contigs if c.name in desired_contigs]
len(grab)


Out[7]:
5

Ya! There's two contigs.

assert len(grab) == len(unique_desired)

In [8]:
import os
print(os.getcwd())
write_contigs_to_file('data2/sequences_desired.fa', grab)


D:\josiah\Documents\Research\Barcoding-Fraxinus
Done writing  5 contigs and 682,487bp

In [9]:
[c.name for c in grab[:100]]


Out[9]:
['Contig110', 'Contig1131', 'Contig3182', 'Contig5958', 'Contig39106']

In [1]:
import os
os.path.realpath('')


Out[1]:
'D:\\josiah\\Documents\\Research\\Barcoding-Fraxinus'

In [ ]: