In [1]:
import matplotlib
%matplotlib inline
In [2]:
matplotlib.__version__
Out[2]:
In [3]:
import pandas as pd
In [4]:
pd.__version__
Out[4]:
Let's start with 1D data, e.g., a time series. The time-series plot is the most common form of graph. pandas provides a great toolkit for working with time series data (finance, atmospheric and oceanic sciences...).
In [5]:
ts = pd.Series.from_csv('data/coherence_timeseries.csv')
In [6]:
ts.plot()
Out[6]:
In [7]:
matplotlib.style.use('ggplot')
In [8]:
ts.plot()
Out[8]:
Introduce data-ink. Touch on the principle of maximizing the data-ink ratio.
How many time scales can you see (guess)?
What does ts.plot(xlim=[0, 20]) visualize?
ts is a time series, so it's indexed with... time.
In [9]:
ts.index[100]
Out[9]:
In [10]:
ts.loc[:20].plot()
Out[10]:
Can you (comfortably) visualize both time scales on the same one plot?
In [11]:
import matplotlib.pyplot as plt
Let's take a closer look at cyclical data and strong variations.
In [12]:
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
ax1.plot(ts.loc[:18])
ax2.plot(ts.loc[:6])
ax2.plot(ts.loc[6:12])
ax2.plot(ts.loc[12:18])
ax2.legend(['first', 'second', 'third'], loc=4)
f.set_size_inches(10, 4)
Which of the above do you prefer?
In [13]:
ts_reverse = ts.sort_index(ascending=False).reset_index()
In [14]:
ts_reverse.columns
Out[14]:
In [15]:
del ts_reverse['index']
In [16]:
ts_reverse.columns = ['signal']
In [17]:
ts_reverse.head()
Out[17]:
In [18]:
import numpy as np
In [19]:
ts_reverse['variations'] = 5 + ts_reverse['signal'] * 6 + 0.001 * np.arange(len(ts)) ** 2
In [20]:
ts_reverse['variations'].plot()
Out[20]:
Let's say we are done with exploration, we are now communicating results. What does the above visualize poorly? What does it visualize well? How about the following:
In [21]:
ts_reverse['variations'].plot(logy=True)
Out[21]:
?
In [22]:
ts_reverse['offset'] = ts_reverse['signal'] + 10
In [23]:
ts_reverse[:100].plot(figsize=(4, 4))
Out[23]:
Touch on the Shrink Principle and multivariate analysis.