The purpose of this tool is to digest a genome file in silico using the same restriction enzymes that were used for an empirical data set to attempt to extract homologous data from the genome file. This can be a useful procedure for adding additional outgroup samples to a data set.
In [1]:
# conda install ipyrad -c bioconda
In [2]:
import ipyrad.analysis as ipa
In [3]:
genome = "/home/deren/Downloads/Ahypochondriacus_459_v2.0.fa"
You can generate single or paired-end data, and you will likely want to restrict the size of selected fragments to be within an expected size selection window, as is typically done in empirical data sets. Here I select all fragments occuring between two restriction enzymes where the intervening fragment is 300-500bp in length. I then ask that the analysis returns the digested fragments as 150bp fastq reads, and to provide 10 copies of each one.
In [4]:
digest = ipa.digest_genome(
fasta=genome,
name="amaranthus-digest",
workdir="digested_genomes",
re1="CTGCAG",
re2="AATTC",
ncopies=10,
readlen=150,
min_size=300,
max_size=500,
)
In [5]:
fio = open(genome)
scaffolds = fio.read().split(">")[1:]
In [14]:
ordered = sorted(scaffolds, key=lambda x: len(x), reverse=True)
In [16]:
len(ordered[0])
Out[16]:
In [5]:
digest.run()
In [6]:
ll digested_genomes/