03 - Integrating Data As A Reproducible Document (20min)

Introduction

Much of practical bioinformatics and computational biology involves the management, integration, and understanding of data.

These data can come from a wide range of local and online resources, and it can often be tricky to collate and integrate that data into one place, and present it coherently to colleagues (you may have seen lab books with printouts of BLAST searches pasted into the pages). This is one area where reproducible research, and the use of interactive notebooks such as these can make your life much easier.

By performing data retrieval, analysis, and integration in a live document with explanatory text and the actual code that was used to perform the work, you can share these notebooks in the knowledge that your work is reproducible and accurately documented.

In this part of the workshop, we will go through the process of building a reproducible document from scratch, to:

  1. acquire a protein sequence from a local file
  2. build a custom BLAST database
  3. BLAST your protein query against this custom database
  4. get information about the molecular function of your sequence from the UniProt database
  5. get information about the metabolic function of your sequence from the KEGG database
  6. visualise information about your results using seaborn

2. Building a Reproducible Document (20min)

Please click on the link below to start the Building a Reproducible Document notebook.

This notebook contains a sketched-out framework for constructing a reproducible analysis that integrates public data sources about a candidate nucleotide sequence.
We will fill out the framework and construct a reproducible document describing our analysis during the workshop.

3. An Example Reproducible Document

Please click on the link below to start the An Example Reproducible Document notebook.

This notebook is an example of what could be produced in the `Building a Reproducible Document` notebook.