This data set collection is from an classic early microarray paper on the yeast cell cycle, Spellman et al (1998).
In [1]:
import pods
import pylab as plt
%matplotlib inline
In [2]:
data = pods.datasets.spellman_yeast()
The data is from two colour spotted cDNA arrays. It has been widely studied in computational biology. There are four different time series in the data as well as induction experiments. The data is returned in the form of a pandas
data frame which can be described as follows.
In [3]:
data['Y'].describe()
Out[3]:
The first five columns are the clb2 and cln3 induction experiments. The columns that follow are the alpha, cdc15, cdc28 and elutriation time course experiments. The index gives the gene names. The columns are named according to the experiment.
In [5]:
print(data['Y'].columns)
And the index is given by the gene name, there are 6178 genes in total.
In [6]:
print(data['Y'].index)
We also provide a variant of the data for just the cdc15 time course.
In [7]:
data = pods.datasets.spellman_yeast_cdc15()
And in this data we also provide the associated time points.
In [8]:
plt.plot(data['t'], data['Y']['YAR015W'],'rx')
plt.title('Gene YAR015W from Spellman et al for the cdc15 Time Course')
plt.xlabel('time')
plt.ylabel('$\log_2$ expression ratio')
Out[8]:
As normal we include the citation information for the data.
In [9]:
print(data['citation'])
And extra information about the data is included, as standard, under the keys info
and details
.
In [10]:
print(data['info'])
print()
print(data['details'])
In [ ]: