Exercise 1a - Anscombe's Quartet


We'll now play with Anscombe's quartet data set. Try to do the following:

  • Load anscombe.csv from the directory

  • Calculate the following summary statistics for all 4 distributions

    • mean
    • median
    • variance
    • standard error
  • Plot all 4 using matplotlib/seaborn


In [28]:
## Importing  Ipython and other libraries needed for plotting and manipulation
import IPython
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# import seaborn as sb (What is seaborn?)

% matplotlib inline

In [16]:
# read anscombe.csv into data
# hint: what library from the above is used for importing data into a data frame? how do you *read* a .csv file?
data = '''do something here'''

In [ ]:
### Calculate and print mean of y1, y2, y3 and y4
### Hint: What library handles the numerical computation for data analysis?

In [ ]:
## Hint 2 : print np.mean(data.y1)

In [19]:
# Calculate and print variance value of y1, y2, y3 and y4
# Hint: numpy

In [ ]:


In [20]:
# Calculate and print mean of x1, x2, x3 and x4

In [21]:



9.0
9.0
9.0
9.0

In [24]:
# Calculate and print variance of x1, x2, x3 and x4

In [ ]:


In [25]:
# Calculate the covariance between each pair of x and y series

In [ ]:

So, these datasets are almost identical, right?


Not so fast! Let's plot the data first. For each pair of x and y series, plot a scatter plot between x and y. Matplotlib, the library we imported as plt provides flexible and easy plotting capability. Lets draw scatter plots of y vs x using matplotlib.The first one has been done for you.


In [29]:
plt.scatter(data.x1,data.y1)


Out[29]:
<matplotlib.collections.PathCollection at 0x109a069d0>

In [ ]: