Example plots

Here we show some example plots that can be made with timeseries data and differential expression analysis of that data


In [1]:
# Imports
import pandas as pd
from pydiffexp import DEAnalysis, volcano_plot, tsplot
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# Load the data
test_path = "/Users/jfinkle/Documents/Northwestern/MoDyLS/Python/sprouty/data/raw_data/all_data_formatted.csv"
raw_data = pd.read_csv(test_path, index_col=0)

Here's a look at the data.


In [3]:
raw_data.head()


Out[3]:
WT_1_0_A WT_2_0_B WT_3_0_C WT_4_15_A WT_5_15_B WT_6_15_C WT_7_60_A WT_8_60_B WT_9_60_C WT_10_120_A ... KO_22_60_A KO_23_60_B KO_24_60_C KO_25_120_A KO_26_120_B KO_27_120_C KO_28_240_A KO_29_240_B KO_30_240_C KO_32_0_D
TargetID
0610006I08RIK 244.6 234.3 272.2 280.4 236.0 263.2 224.0 271.0 246.6 265.9 ... 142.9 170.1 143.9 156.8 169.1 171.1 174.1 157.5 162.1 174.6
0610007C21RIK 1259.9 1269.9 1450.2 1363.3 1307.2 1406.4 1222.9 1283.8 1197.2 1385.5 ... 1646.7 1728.6 1495.5 1815.2 1871.5 1847.7 1367.5 1545.0 1383.0 1695.7
0610007P08RIK 29.0 23.5 26.1 27.6 23.5 26.1 22.5 23.0 17.7 13.4 ... 18.0 21.2 18.7 24.2 53.1 7.5 25.0 22.6 16.5 14.2
0610007P14RIK 607.5 540.0 589.6 643.7 724.6 638.9 437.6 409.3 545.6 561.7 ... 1245.0 1477.0 1310.1 1827.0 1928.7 1697.4 1399.4 1675.5 1269.6 1856.9
0610007P22RIK 529.4 564.8 646.0 490.9 553.9 577.6 481.7 556.6 449.6 590.8 ... 356.2 304.5 319.8 402.7 439.3 408.4 450.5 459.3 409.4 427.5

5 rows × 32 columns

In this example the column labels include all the experiment information. We can extract that by specifiying a heirarchy. We'll also do microarray background correction while we're at it. The background has already been subtracted so anything negative is "meaningless"


In [4]:
hierarchy = ['condition', 'well', 'time', 'replicate']
raw_data[raw_data <= 0] = 1

Now we just make our fit object, specify the contrast we want to look at and do the fit! We'll take a look at some of the results of the fit


In [5]:
dea = DEAnalysis(raw_data, index_names=hierarchy, reference_labels=['condition', 'time'])
c_string = "KO_0-WT_0"
dea.fit(c_string)
dea.results.head()


Out[5]:
logFC AveExpr t pval adj_pval B -log10p
PTN 9.494964 4.728120 150.377573 8.647337e-36 1.568454e-31 60.488854 30.804528
IGFBP7 7.095179 8.116964 58.855698 1.721438e-26 1.561172e-22 48.804411 21.806549
REPIN1 5.566773 2.708824 55.377563 6.877096e-26 4.157892e-22 47.721864 21.381127
DCN -3.500038 11.313522 -53.341784 1.610666e-25 6.592131e-22 47.040785 21.180974
BICC1 6.808397 3.364110 52.647519 2.168878e-25 6.592131e-22 46.799900 21.180974

To get a sense of the differential expression overall we can make a volcano plot. Notice that we can specify how we want to select the top genes. If multiple criteria are chosen, the union of the top_n of each criteria are colored.


In [6]:
volcano_plot(dea.results, top_n=10, top_by=['logFC', '-log10p'], show_labels=True)


If we want to look at the raw data time series we can also plot that. A typical view would be a plot of the timeseries of each condition. Here we plot the mean and SEM at each timepoint. For Angptl4 there is a large and significant divergence by the end of the timecourse, while for Zik1 there isn't much difference at any point.


In [7]:
data = dea.data.loc['ANGPTL4']
tsplot(data)

data = dea.data.loc['ZIK1']
tsplot(data)



In [ ]: