Python
is an alternative scripting language that has become very popular among data analysts. In contrast to R
, Python
is a general scripting language that has had some interesting statistical and visualization packages developed for it, namely numpy
(and the related scipy
) and matplotlib
. Although matplotlib
is much closer to R
's base graphics, it does provide a general visualization framework for the Python
scripting language. Note that the syntax is very similar to plotting in MatLab
.
In [1]:
%pylab inline
# If you are using a new version of ipython, change this to:
# %matplotlib inline
In [2]:
import matplotlib.pyplot as plt
In [3]:
x = linspace(0, 5, 10)
y = x ** 2
fig = plt.figure()
axes = fig.add_axes([0.1, 0.1, 1, 1]) # add an axes at the particular location, with the specified height [left, bottom, width, height]
axes.plot(x, y, 'g')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title')
Out[3]:
Here we have defined the xvalues as a series of 10 points from 0 ... 5
, and yvalues as the square of x.
In [4]:
import pandas as pd
In [5]:
metaDF = pd.read_csv("metabolomics_reshapedData.csv")
In [6]:
freshTom = metaDF['value'][metaDF['treat'] == "fresh"][metaDF['Species'] == 'tomatillo'][metaDF['value'] <= 5000]
freshPum = metaDF['value'][metaDF['treat'] == "fresh"][metaDF['Species'] == 'pumpkin'][metaDF['value'] <= 5000]
In [7]:
nBin = 100
plt.hist(freshTom, bins=nBin)
Out[7]:
In [8]:
plt.hist(freshPum, bins=nBin)
Out[8]:
In [9]:
lyophTom = metaDF['value'][metaDF['treat'] == "lyoph"][metaDF['Species'] == 'tomatillo'][metaDF['value'] <= 5000]
lyophPum = metaDF['value'][metaDF['treat'] == "lyoph"][metaDF['Species'] == 'pumpkin'][metaDF['value'] <= 5000]
In [10]:
plt.hist(lyophTom, bins=nBin)
Out[10]:
In [11]:
plt.hist(lyophPum, bins=nBin)
Out[11]:
That's great, but is there a way we can have all 4 together?
In [12]:
fig, axes = plt.subplots(nrows=2, ncols=2, sharey=True, sharex=True, squeeze=True)
axes[0,0].hist(lyophPum, bins=nBin)
axes[0,0].set_title("Lyoph Pumpkin")
axes[0,1].hist(lyophTom, bins=nBin)
axes[0,1].set_title("Lyoph Tomatillo")
axes[1,0].hist(freshPum, bins=nBin)
axes[1,0].set_title("Fresh Pumpkin")
axes[1,1].hist(freshTom, bins=nBin)
axes[1,1].set_title("Fresh Tomatillo")
fig.tight_layout()
Can we do any overlap of the histograms as we did in R
with ggplot
?
In [14]:
bins = linspace(0, 5000, 20) # set up the bins in advance
plt.hist(lyophPum, bins, alpha=0.5, label='Lyoph Pumpkin')
plt.hist(lyophTom, bins, alpha=0.5, label='Lyoph Tomatillo')
plt.legend()
Out[14]:
There is a python
port of ggplot
that is available, however, it appears to still be very much beta software.
In [15]:
from ggplot import *
In [16]:
ggplot(diamonds, aes('carat', 'price')) + geom_point(alpha=1/20.) + ylim(0, 20000)
Out[16]:
In [17]:
ggplot(diamonds, aes(x='price', color='cut')) + geom_density()
In [37]:
x = linspace(0, 5, 10)
y = x ** 2
fig = plt.figure()
axes = fig.add_axes([0.1, 0.1, 1, 1]) # add an axes at the particular location, with the specified height [left, bottom, width, height]
axes.plot(x, y, 'g')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title');
fig.savefig("testFig.png")
In [38]:
x = linspace(0, 5, 10)
y = x ** 2
fig = plt.figure(dpi=400)
axes = fig.add_axes([0.1, 0.1, 1, 1]) # add an axes at the particular location, with the specified height [left, bottom, width, height]
axes.plot(x, y, 'g')
axes.set_xlabel('x')
axes.set_ylabel('y')
axes.set_title('title');
fig.savefig("hiRes.png")