Pandas Data Visualization Exercise


In [37]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df3 = pd.read_csv('df3')
%matplotlib inline

In [2]:
df3.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 4 columns):
a    500 non-null float64
b    500 non-null float64
c    500 non-null float64
d    500 non-null float64
dtypes: float64(4)
memory usage: 15.7 KB

In [3]:
df3.head()


Out[3]:
a b c d
0 0.336272 0.325011 0.001020 0.401402
1 0.980265 0.831835 0.772288 0.076485
2 0.480387 0.686839 0.000575 0.746758
3 0.502106 0.305142 0.768608 0.654685
4 0.856602 0.171448 0.157971 0.321231

Create scatter plot of b vs a


In [8]:
df3.plot.scatter(x='b', y='a', figsize=(12,3), color='red')


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x9ad6898>

Create a histogram of the 'a' column.


In [9]:
df3['a'].hist()


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x961e278>

Using style sheets to set the style to 'ggplot'


In [13]:
plt.style.use('ggplot')
df3['a'].plot.hist(bins=30)


Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0xb5a5dd8>

Create a boxplot comparing the a and b columns.


In [25]:
df3[['a', 'b']].plot.box()


Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0xcb31978>

Create a kde plot of the 'd' column


In [26]:
df3['d'].plot.kde()


Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0xcb4aa58>

Increase the linewidth and make the linestyle dashed


In [29]:
df3['d'].plot.kde(lw=5, ls='--')


Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0xd5d26a0>

Create an area plot of all the columns for just the rows up to 30


In [40]:
df3.ix[0:30].plot.area(alpha=0.4)


Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0xf0dfba8>

Displaying the legend outside the plot frame


In [49]:
f = plt.figure()
df3.ix[0:30].plot.area(alpha=0.4, ax=f.gca())
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()