This is very helpful when running the notebook with, e.g. Cell -> Run All
or Kernel -> Restart & Run All
from the menu bar, all the libraries are available throughout the document.
In [ ]:
# The line below allows the notebooks to show graphics inline
%pylab inline
import io # This lets us handle streaming data
import os # This lets us communicate with the operating system
import pandas as pd # This lets us use dataframes
import seaborn as sns # This lets us draw pretty graphics
# Biopython is a widely-used library for bioinformatics
# tasks, and integrating with software
from Bio import SeqIO # This lets us handle sequence data
from Bio.KEGG import REST # This lets us connect to the KEGG databases
# The bioservices library allows connections to common
# online bioinformatics resources
from bioservices import UniProt # This lets us connect to the UniProt databases
from IPython.display import Image # This lets us display images (.png etc) from code
The os.makedirs()
function allows us to create a new directory, and the exist_ok
option will prevent the notebook code from stopping and throwing an error if the directory already exists.
In [ ]:
# Create a new directory for notebook output
OUTDIR = os.path.join("data", "reproducible", "output")
os.makedirs(OUTDIR, exist_ok=True)
The to_df()
function will turn tabular data into a pandas
dataframe
In [ ]:
# A small function to return a Pandas dataframe, given tabular text
def to_df(result):
return pd.read_table(io.StringIO(result), header=None)
Our plan is to:
UniProt
KEGG