Python and the web

As a general purpose language, Python is very popular for server side scripting. If Javascript rules as the scripting language of the web client, on the web server Python is ubiquitous due to it's fast prototyping. Only very recently Javascript started to also be popular, with frameworks like node.js.

Why would this matter for you?

  • You can present your research interactively.
  • Interactivity also helps you work with your own data.
  • A web interface allows anyone to inspect your data or your findings.
  • It allows you to link your data to public datasets and the opposite.

Flask

Flask is a very capable microframework widely used for web development.

http://flask.pocoo.org/

Run the data/flasktest.py file and open the browser at :http://0.0.0.0:5001/hello


In [2]:
# Do not run this cell!
from flask import Flask
app = Flask("the_flask_module")

@app.route("/hello")
def hello_page():
    return "I'm a hello page"

@app.route("/hello/details")
def hello_deeper():
    return "I'm a details page"

app.run(host="0.0.0.0", port=5001)

Django

Worth just mentioning Django is a similarly popular yet more mature web framework that was amont the first to use a model-view-controller architecture wich simplifies reusability. One can write entire websites only from python code and html templates, although in general Javascript is also used for complex websites along with manual database configurations.

Interaction

While it is possible to turn Jupyter into an interactive web form with buttons and other standard widgets, we will not have time to do this.

We will also learn how to use Python to create interactive web plots inside the plotting chapter.

Python and the databases

Why would you ever need to know database interaction through Python?

  • Almost every piece of biological or even scientific data is stored in a database.
  • Relational databases can be interogated with a very simple query language called SQL.
  • Most programs are mere interfaces to databases.
  • Stop pushing buttons, a bit of Python and a bit of SQL is all you need to bring you to the data!

SQLite

This is a very simple database. Most R annotation packages to not do anything but download a SQLite database into your computers. It is faster to directly interogate it through Python than to learn how to use a package specific set of functions.

The code bellow creates a test database with a table of SNPs and inserts a few records.


In [4]:
import sqlite3 as lite
import sys

snps = (
    (1, 'Gene1', 52642),
    (2, 'Gene2', 57127),
    (3, 'Gene3', 9000),
    (4, 'Gene4', 29000)
)


con = lite.connect('test.db')

with con:
    cur = con.cursor()    
    cur.execute("DROP TABLE IF EXISTS snps")
    cur.execute("CREATE TABLE snps(Id INT, GeneSYM TEXT, NucleodidePos INT)")
    cur.executemany("INSERT INTO snps VALUES(?, ?, ?)", snps)

Now let us interogate the database:


In [6]:
import sqlite3 as lite
import sys


con = lite.connect('test.db')

with con:    
    
    cur = con.cursor()    
    cur.execute("SELECT * FROM snps")

    rows = cur.fetchall()

    for row in rows:
        print(row)


(1, 'Gene1', 52642)
(2, 'Gene2', 57127)
(3, 'Gene3', 9000)
(4, 'Gene4', 29000)

SQL is an interogation language that can get relatively complex and it falls out of the scope of this course. However in data science it is extremely useful to be able to operate databases because relational databases allow for very fast data access and operations, together with data compression. However there are many other database types, used predominantly in big data, such as document databases, graph databases and others, also known as NoSQL databases, and Python can bridge to them all.

Remote API calls example

Getting information as fast as possible into our Python data structures is vital. Only as a last resource should one program his own downloaders and parsers. When this is not found in Python, it can be possible to call libraries from Perl or Python or access web records with specified API calls. BioPython wraps a few API calls such as Entrez resources. Entrez is a federated search engine over various NCBI and NIH resource databases.


In [3]:
from Bio import Entrez
Entrez.email = "your@mail.here"     # Always tell NCBI who you are
handle = Entrez.einfo()
#result = handle.read()
record = Entrez.read(handle)
print(record.keys())
print(record["DbList"])


dict_keys(['DbList'])
['pubmed', 'protein', 'nuccore', 'nucleotide', 'nucgss', 'nucest', 'structure', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'clone', 'gap', 'gapplus', 'grasp', 'dbvar', 'epigenomics', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'probe', 'proteinclusters', 'pcassay', 'biosystems', 'pccompound', 'pcsubstance', 'pubmedhealth', 'seqannot', 'snp', 'sra', 'taxonomy', 'unigene', 'gencoll', 'gtr']

BioPython

So let us for example find the exact lineage for this amazing breed of bacteria that changed both plants and atmosphere in the earlier days of our planet... As biologists that try to learn Python, I hope you will love BioPython at least as much as I do. A number of programmers created Bio::Perl which is to date containing a few more modules than BioPython, however I got the feeling the Python version is more updated. It is unfortunate that we don't have time to explore it in a great detail. We will use it again over the course.

Aside from BioPython, web API can be ofered by virtually any website and with a little effort one can either download an Python access package or program his own. Functional annotation for example, is weakly covered in Python, but DAVID is another API independent from BioPython.

First, install with:

conda install -c https://conda.anaconda.org/anaconda biopython

In [9]:
from Bio import Entrez
Entrez.email = "your@mail.here"     # Always tell NCBI who you are
handle = Entrez.esearch(db="Taxonomy", term="Synechocystis")
record = Entrez.read(handle)
print(record["IdList"])
#assuming only one record is returned
handle = Entrez.efetch(db="Taxonomy", id=record["IdList"][0], retmode="xml")
records = Entrez.read(handle)
print(records[0].keys())
print(records[0]["Lineage"])


['1142']
dict_keys(['UpdateDate', 'ParentTaxId', 'GeneticCode', 'TaxId', 'ScientificName', 'OtherNames', 'CreateDate', 'PubDate', 'MitoGeneticCode', 'LineageEx', 'Lineage', 'Rank', 'Division'])
cellular organisms; Bacteria; Terrabacteria group; Cyanobacteria/Melainabacteria group; Cyanobacteria; Oscillatoriophycideae; Chroococcales

In [ ]: