Visualizing 1D data


In [1]:
import matplotlib
%matplotlib inline

In [2]:
matplotlib.__version__


Out[2]:
'1.5.dev1'

In [3]:
import pandas as pd

In [4]:
pd.__version__


Out[4]:
'0.16.2'

Let's start with 1D data, e.g., a time series. The time-series plot is the most common form of graph. pandas provides a great toolkit for working with time series data (finance, atmospheric and oceanic sciences...).

References:

  • Tufte, Edward R. (2001), The Visual Display of Quantitative Information (2nd ed.), Cheshire, CT: Graphics Press
  • Time series and date functionality in pandas

In [5]:
ts = pd.Series.from_csv('data/coherence_timeseries.csv')

In [6]:
ts.plot()


Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f83c9f077f0>

In [7]:
matplotlib.style.use('ggplot')

In [8]:
ts.plot()


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f83c9c3e208>

Introduce data-ink. Touch on the principle of maximizing the data-ink ratio.

Challenges (High-Resolution Data Graphics)

How many time scales can you see (guess)?

What does ts.plot(xlim=[0, 20]) visualize?

ts is a time series, so it's indexed with... time.


In [9]:
ts.index[100]


Out[9]:
40.0

In [10]:
ts.loc[:20].plot()


Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f83c9bdf080>

Can you (comfortably) visualize both time scales on the same one plot?


In [11]:
import matplotlib.pyplot as plt

Let's take a closer look at cyclical data and strong variations.


In [12]:
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
ax1.plot(ts.loc[:18])
ax2.plot(ts.loc[:6])
ax2.plot(ts.loc[6:12])
ax2.plot(ts.loc[12:18])
ax2.legend(['first', 'second', 'third'], loc=4)
f.set_size_inches(10, 4)


Which of the above do you prefer?


In [13]:
ts_reverse = ts.sort_index(ascending=False).reset_index()

In [14]:
ts_reverse.columns


Out[14]:
Index(['index', 0], dtype='object')

In [15]:
del ts_reverse['index']

In [16]:
ts_reverse.columns = ['signal']

In [17]:
ts_reverse.head()


Out[17]:
signal
0 -0.063590
1 0.194785
2 0.145389
3 -0.168586
4 0.067726

In [18]:
import numpy as np

In [19]:
ts_reverse['variations'] = 5 + ts_reverse['signal'] * 6 + 0.001 * np.arange(len(ts)) ** 2

In [20]:
ts_reverse['variations'].plot()


Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f83c9cc1908>

Let's say we are done with exploration, we are now communicating results. What does the above visualize poorly? What does it visualize well? How about the following:


In [21]:
ts_reverse['variations'].plot(logy=True)


Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f83c9a57ef0>

?


In [22]:
ts_reverse['offset'] = ts_reverse['signal'] + 10

In [23]:
ts_reverse[:100].plot(figsize=(4, 4))


Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f83c9888908>

Touch on the Shrink Principle and multivariate analysis.