In [1]:
# Imports
import pandas as pd
from pydiffexp import DEAnalysis, volcano_plot, tsplot
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
# Load the data
test_path = "/Users/jfinkle/Documents/Northwestern/MoDyLS/Python/sprouty/data/raw_data/all_data_formatted.csv"
raw_data = pd.read_csv(test_path, index_col=0)
Here's a look at the data.
In [3]:
raw_data.head()
Out[3]:
In this example the column labels include all the experiment information. We can extract that by specifiying a heirarchy. We'll also do microarray background correction while we're at it. The background has already been subtracted so anything negative is "meaningless"
In [4]:
hierarchy = ['condition', 'well', 'time', 'replicate']
raw_data[raw_data <= 0] = 1
Now we just make our fit object, specify the contrast we want to look at and do the fit! We'll take a look at some of the results of the fit
In [5]:
dea = DEAnalysis(raw_data, index_names=hierarchy, reference_labels=['condition', 'time'])
c_string = "KO_0-WT_0"
dea.fit(c_string)
dea.results.head()
Out[5]:
To get a sense of the differential expression overall we can make a volcano plot. Notice that we can specify how we want to select the top genes. If multiple criteria are chosen, the union of the top_n of each criteria are colored.
In [6]:
volcano_plot(dea.results, top_n=10, top_by=['logFC', '-log10p'], show_labels=True)
If we want to look at the raw data time series we can also plot that. A typical view would be a plot of the timeseries of each condition. Here we plot the mean and SEM at each timepoint. For Angptl4 there is a large and significant divergence by the end of the timecourse, while for Zik1 there isn't much difference at any point.
In [7]:
data = dea.data.loc['ANGPTL4']
tsplot(data)
data = dea.data.loc['ZIK1']
tsplot(data)
In [ ]: