explotiv de novo transcriptome explorer

Mar 16, 2016


Camille Scott

Lab for Data Intensive Biology

UC Davis



Sample Sunburst Plots

  • explotiv's primary output
  • These convey the "affinity" of a collection of transcripts for all clades in a base tree
  • Color is scaled 0.0-1.0 based on the affinity score


Calculating Scores

  • We start with a mapping between transcripts and proteins
  • Each transcript will likely map to many proteins
  • The protein databaes are currated, and provide a species mapping


In [1]:
from dammit.fileio import maf
import pandas as pd

aln_df = pd.concat(maf.MafParser('/Users/camille/w/scratch/explotiv-data/lamprey.500.fa.dammit/lamprey.500.fa.x.orthodb.maf'))

In [3]:
aln_df.sort_values('q_name').head()


Out[3]:
E EG2 q_aln_len q_len q_name q_start q_strand s_aln_len s_len s_name s_start s_strand score bitscore
700 2.000000e-32 3.600000e-26 288 528 Transcript_102 4 + 96 296 ASIS016320-PA 199 + 269 144.316791
717 4.300000e-06 2.600000e+00 66 528 Transcript_102 461 - 22 388 ADAC009551-PA 0 + 106 58.425095
716 2.800000e-31 4.600000e-25 297 528 Transcript_102 230 - 106 503 AGAP003997-PB 88 + 262 140.628191
715 2.800000e-31 4.600000e-25 297 528 Transcript_102 230 - 106 435 AFAF019329-PA 14 + 262 140.628191
714 4.800000e-36 1.200000e-29 294 528 Transcript_102 230 - 99 393 TCOGS2:TC005606-PA 11 + 291 155.909535

  • Take the normalized alignment scores for each transcript and map them to their corresponding species
  • These will be leaf nodes
  • Then propogate these scores up the tree
  • Score for an internal node is (sum of scores of its descendants) / (sum of branch lengths)
    • Inspired by existing methods for calculating phylogenetic signal
    • Caveat: my tree is actually a taxonomy, and all branch lengths are 1.
  • Do this for every transcript to get a score distribution for each node

*Rhinella marina*, the cane toad

![example](Rhinella_marina.svg)

*Seriola lalandi*, a ray-finned fish

![example](Seriola_lalandi.svg)

The chickadee, a small bird / greatly diminshed dinosaur

![example](chickadee.svg)

*Crella elegans*, a sponge

![example](Crella_elegans.svg)

*Homaru americanus*, a lobster

![example](Homaru_americanus.svg)

species from *Pomacea*, group of freshwater snails

![example](Pomacea.svg)

A skink species in the genus *Carlia*

![example](Carlia_N.svg)

*Bactrocera tryona*, the queensland fruit fly

![example](Bactrocera_tryona.svg)

*Montastraea cavernosa*, a coral

![example](Montastraea_cavernosa.svg)

*Campylomormyrus compressirostris*, an electric fish

![example](Campylomormyrus_compressirostris.svg)

*Squalius pyrenaicus*, a freshwater fish

![example](Squalius_pyrenaicus.svg)

*Petromyzon marinus*, the sea lamprey

![example](lamp10.svg)

  • The original plan included a complete annotation browser to be integrated with dammit
  • This was overambitious.
    • Did a fair amount of work on it, but paused in favor of phylogenetic view
  • Integrating with existing software is more work than starting from scratch...