"The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more." (jupyter.org)

We will be using Jupyter Notebooks to analyze and plot data that we gather in class, using Python as the underlying language

Python is a programming language that is flexible, well supported and open source (python.org). It may be of use to you unexpectedly in the future (especially you historians and written-art majors!), so keep these notes for future reference.

Instructions

  • 'Shift+Enter' within each of the cells below to run them.
  • While the computations included in the cell are running, you should see "In [*]" on the left
  • When the computations have finished running, a number will appear within the brackets (e.g. "In [2]")
  • '#' denotes a comment in the code
  • Jupyter uses tab for code completion
  • Indexing in python starts at 0

In [ ]:
# this would be a comment
# cells like this are like an advanced calculator
# for example:
2+2

Pandas is the software package that you will use to generate "data frames" which are just Python representations of data that you have collected. Just as in processing, you can use any of the pandas functions by using pd.functionname

Numpy is the software package that you will use for computations and analysis. We will just be scratching its surface in terms of capabilities.

Pygal is the software package that you will use to generate plots and graphs. Although is has limitations on data formatting, the graphs are high quality and easy to format. Also Pygal is well documented


In [ ]:
# Load the packages into memory by running this cell
import pandas as pd
import numpy as np
import pygal

In [ ]:
# Example of how to use pandas to read and load a "comma-separated-value" or csv file. 
# You can create csv files in any text editor (like notepad)
# or in programs that use spreadsheets (Excel/Numbers/Google Sheets)
ecoli = pd.read_csv("kb_ecoli.csv")

In [ ]:
# You can display the data you just loaded in a table
ecoli

In [ ]:
# Start by replacing "ab#" in the csv file by the real antibiotic name
# that we used in the microbiology laboratory and then reload the data
# if you did this correctly, the table should have the correct names
ecoli = pd.read_csv("kb_ecoli.csv")
ecoli

In [ ]:
# We can extract the data from a single column using its name
antibiotic1=ecoli.ab1

In [ ]:
# or by its location in the data frame
antibiotic12=ecoli.iloc[0:,11]
antibiotic12

In [ ]:
# you can also check the name of the column (remember python indexing starts at 0!)
ecoli.columns[0]

In [ ]:
# Or we can directly calculate average values using numpy
antibiotic1=np.mean(ecoli.ab1)
antibiotic1

In [ ]:
antibiotic12=np.mean(ecoli.ab12)
antibiotic12

In [ ]:
# and we can already create a bar graph that displays the data with pygal
bar_chart = pygal.Bar()
bar_chart.title = "Kirby Bauer results for E.coli"
bar_chart.x_labels = 'ab1','ab12';
bar_chart.add('name of ab1', antibiotic1)
bar_chart.add(ecoli.columns[11], antibiotic12)
bar_chart.render_to_file('kirbybauer_ecoli.svg')
# the graph was saved as an svg file in your working directory
# you can open that svg file in a new browser tab

In [ ]:
# we can use some optional arguments to put labels
bar_chart = pygal.Bar()
bar_chart.title = "Kirby Bauer results for E.coli"
bar_chart.x_title = 'Antibiotics';
bar_chart.y_title = 'Zone of inhibition (mm)';
bar_chart.add('name of ab1', antibiotic1)
bar_chart.add(ecoli.columns[11], antibiotic12)
# bar_chart.x_labels = [{'label': 'AB1','value': 1},{'label': 'AB12','value': 12}]
bar_chart.render_to_file('kirbybauer_ecoli.svg')
# reload the tab that contains the graph

Add the rest of the antibiotis to the graph


In [ ]:
# you could even use advanced options to put error bars 
# and using numpy's standard deviation function: np.std()
bar_chart = pygal.Bar()
bar_chart.title = "Kirby Bauer results for E.coli"
bar_chart.x_title = 'Antibiotics';
bar_chart.y_title = 'Zone of inhibition (mm)';
bar_chart.add('name of ab1', antibiotic1)
bar_chart.add(ecoli.columns[11], [{'value': antibiotic12, 'ci': 
                                  {'low': np.mean(ecoli.ab12)-np.std(ecoli.ab12), 'high': np.mean(ecoli.ab12)+np.std(ecoli.ab12)}}])
# bar_chart.add('Second', [{'value': np.mean(ecoli.ab2), 'ci': {'high': 5}}])
bar_chart.render_to_file('kirbybauer_ecoli.svg')
# reload the tab that contains the graph

Then use the data contained in kb_gfp.csv and kb_env.csv and similar procedures to generate graphs for the rest of the data we collected with the Kirby-Bauer assay

Once you are done, let me know and we can move on to other plots


In [ ]: