Comparison between CAI and Biopython Performance

To see how Biopython and CAI perform, we're going to benchmark them. First, let's get the latest version of CAI.


In [1]:
! pip install -e ../


Obtaining file:///Users/BenjaminLee/Desktop/Python/Research/cai/CodonAdaptionIndex
Requirement already satisfied: scipy in /Users/BenjaminLee/Desktop/Python/Research/cai/env/lib/python3.6/site-packages (from CAI==0.1.8) (1.1.0)
Requirement already satisfied: biopython in /Users/BenjaminLee/Desktop/Python/Research/cai/env/lib/python3.6/site-packages (from CAI==0.1.8) (1.71)
Requirement already satisfied: click in /Users/BenjaminLee/Desktop/Python/Research/cai/env/lib/python3.6/site-packages (from CAI==0.1.8) (6.7)
Requirement already satisfied: numpy>=1.8.2 in /Users/BenjaminLee/Desktop/Python/Research/cai/env/lib/python3.6/site-packages (from scipy->CAI==0.1.8) (1.14.3)
Installing collected packages: CAI
  Found existing installation: CAI 0.1.8
    Uninstalling CAI-0.1.8:
      Successfully uninstalled CAI-0.1.8
  Running setup.py develop for CAI
Successfully installed CAI

Now, we'll import the two libraries.


In [2]:
from Bio import SeqIO

from CAI import CAI, relative_adaptiveness
from Bio.SeqUtils import CodonUsage

We're going to use the highly expressed genes of E. coli for our reference set as well as a test set of 100 3000bp CDSs generated from the Sequence Manipulation Site.


In [3]:
reference = [str(seq.seq) for seq in SeqIO.parse("ecoli.heg.fasta", "fasta")]
sequence = [str(seq.seq) for seq in SeqIO.parse("test.fasta", "fasta")]

Biopython


In [4]:
bp = CodonUsage.CodonAdaptationIndex()
bp.generate_index("ecoli.heg.fasta")
%timeit [bp.cai_for_gene(seq) for seq in sequence]


777 ms ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

CAI


In [5]:
weights = relative_adaptiveness(sequences=sequence)
%timeit [CAI(seq, weights=weights) for seq in sequence]


469 ms ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)