Notebook 3.3: Newick Assignment

Complete the notebook then download as an HTML file (toolbar -> File -> Download as) and submit your assignment by emailing to Natalie (natalie.niepoth@columbia.edu).


In [1]:
import toytree

Newick tree files

We learned in notebook 3.2 that the file format to store phylogenetic trees is simply a text file containing a string of names within nested parentheses. When researchers publish phylogenetic results these are the types of tree files that they produce and publish. The files are sometimes saved as something like "treefile.newick" or "trees.nwk" or "birds.tree", etc. There are databases online where such files are saved including those that are specialized for tree data (like TreeBase) or general data repositories like Data Dryad.

Newick trees

The toytree.tree() function reads the newick file to create a Tree object in memory. As input it can take a newick string as text, or it can take a URL pointing to a text file online, or it can take a filename that is located on your computer.


In [16]:
newick = "((a,b),(c, d));"

In [42]:
tre = toytree.tree(newick)

In [43]:
tre.draw();


dcba

An example using a URL from treebase


In [44]:
URL = "https://treebase.org/treebase-web/search/downloadATree.html?id=11298&treeid=31264"

In [45]:
tre = toytree.tree(URL)

In [47]:
tre.draw(tip_labels_align=True, height=800, width=600);


Boeremia_telephiiBoeremia_exiguaBoeremia_lycopersiciStagonosporopsis_dorenboschiiPhoma_commelinicolaStagonosporopsis_andigenaPhoma_eupatoriiPhoma_minorPhoma_piperisPhoma_poolensisPhoma_sylvaticaPhoma_multirostrataPhoma_pereupyrenaEpicoccum_sorghiEpicoccum_nigrumPhoma_brasiliensisPhoma_plurivoraPhoma_dimorphaPhoma_boeremaePeyronellaea_calorpreferensPeyronellaea_eucalypticaPhoma_pedeiaePhoma_subherbarumPhoma_digitalisPhoma_macrostomaPhoma_bellidisPhoma_viburnicolaPhoma_versabilisLeptosphaerulina_trifoliiLeptosphaerulina_australisPhoma_gossypiicolaMacroventuria_anomochaetaMacroventuria_wentiiPhoma_infossaPhoma_omnivirensPhoma_bulgaricaPhoma_eupyrena_BEpicoccum_ndophoma_elongataPhoma_eupyrena_AMicrosphaeropsis_olivaceaPhoma_clematidinaPhoma_aquilegiicolaPhoma_complanataPhoma_nigripycnidiaPhoma_medicaginisPhoma_herbicolaDidymella_rabieiAtradidymella_muscivoraPhoma_herbarumBPhoma_herbarumAPhoma_nebulosaPhoma_aubrietiaePhoma_selaginellicolaPhoma_putaminumDidymella_adianticolaDidymella_pisiDidymella_fabaeDidymella_applanataPhoma_xanthinaAscochyta_hordei

The NEXUS file format

The file above that we accessed from treebase is actually a slightly different format called NEXUS. This is simply a file that contains within it a newick tree structure as well as sometimes additional information such as the sequence data that was used to infer the tree. Toytree will parse this file format the same as newick, and simply discard the extra information.

Assignment

  1. Find a published tree online and plot it in this notebook.
  2. Provide a link to the published paper.
  3. Describe how the authors used the phylogeny in the study. What is its significance?
  4. Record you answers in code or markdown text below, download as HTML, and send to Natalie.

In [ ]: