Exercise 1a - Anscombe's Quartet

We'll now play with Anscombe's quartet data set. Try to do the following:

Load anscombe.csv from the directory
Calculate the following summary statistics for all 4 distributions
- mean
- median
- variance
- standard error
Plot all 4 using matplotlib/seaborn



In [28]:

    
## Importing  Ipython and other libraries needed for plotting and manipulation
import IPython
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# import seaborn as sb (What is seaborn?)

% matplotlib inline



In [16]:

    
# read anscombe.csv into data
# hint: what library from the above is used for importing data into a data frame? how do you *read* a .csv file?
data = '''do something here'''



In [ ]:

    
### Calculate and print mean of y1, y2, y3 and y4
### Hint: What library handles the numerical computation for data analysis?



In [ ]:

    
## Hint 2 : print np.mean(data.y1)



In [19]:

    
# Calculate and print variance value of y1, y2, y3 and y4
# Hint: numpy



In [ ]:



In [20]:

    
# Calculate and print mean of x1, x2, x3 and x4



In [21]:



In [24]:

    
# Calculate and print variance of x1, x2, x3 and x4



In [ ]:



In [25]:

    
# Calculate the covariance between each pair of x and y series



In [ ]:

So, these datasets are almost identical, right?

Not so fast! Let's plot the data first. For each pair of x and y series, plot a scatter plot between x and y. Matplotlib, the library we imported as plt provides flexible and easy plotting capability. Lets draw scatter plots of y vs x using matplotlib.The first one has been done for you.



In [29]:

    
plt.scatter(data.x1,data.y1)









    Out[29]:





<matplotlib.collections.PathCollection at 0x109a069d0>



In [ ]: