In [ ]:
import pandas as pd
import seaborn as sbn
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline

In [ ]:
df = pd.read_csv("data/brain_size.csv", sep=";", index_col=0, na_values='.')
df.head()

Plotting options of pandas series and dataframes


In [ ]:
df.plot?

Exercise: Try different plotting options from the kind parameter in the plot command


In [ ]:
ax = pd.tools.plotting.scatter_matrix(df[['Weight', 'Height', 'MRI_Count']])

In [ ]:
ax = pd.tools.plotting.scatter_matrix(df[['FSIQ', 'PIQ', 'VIQ']])

Q: Do the clusters mean anything?

Exercise: Plot scatter matrices for males and females separately. What is the inference?


In [ ]:
# Enter code here

Introducing Seaborn

Combining simple statistics with visualization


In [ ]:
df = pd.read_csv("data/wages.csv")
df.head()

In [ ]:
ax = sbn.pairplot(df, vars=['WAGE', 'AGE', 'EDUCATION'], kind="reg")

Q: What about categorical variables?


In [ ]:
ax = sbn.pairplot(df, vars=['WAGE', 'AGE', 'EDUCATION'], kind="reg", hue="SEX")

Simple regression with lmplot


In [ ]:
ax = sbn.lmplot(y="WAGE", x="EDUCATION", data=df)

Exercise:

1. Do wages depend on age or experience?

2. Answer the question above separately for men and women.


In [ ]:
# enter code here