Introduction to Python for Data Sciences |
Franck Iutzeler Fall. 2018 |
Seaborn is a package that produces somewhat nicer and more data oriented plots than Matplotlib. It also gives a fresher look to matlotlib plots.
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
In [3]:
# Create some data
rng = np.random.RandomState(0)
x = np.linspace(0, 10, 500)
y = np.cumsum(rng.randn(500, 3), 0)
In [4]:
plt.plot(x, y)
plt.legend('one two three'.split(' '));
Let us import seaborn and change the matplotlib style with sns.set()
In [5]:
import seaborn as sns
sns.set()
In [6]:
# Same command but now seaborn is set
plt.plot(x, y)
plt.legend('one two three'.split(' '));
In [22]:
data = np.random.multivariate_normal([0, 1.5], [[1, 0.2], [0.2, 2]], size=2000)
data = pd.DataFrame(data, columns=['x', 'y'])
for col in 'xy':
plt.hist(data[col], alpha=0.5) # alpha=0.5 provides semi-transparent plots
kdeplot provides density plots from an array or series (shade=True provide filled ones).
In [23]:
sns.kdeplot(data['x'])
sns.kdeplot(data['y'],shade=True)
Out[23]:
distplot is a mix of the two previous ones.
In [24]:
sns.distplot(data['x'])
sns.distplot(data['y'])
Out[24]:
Two-dimensional dataset may be represented by level sets with kdeplot.
In [32]:
sns.kdeplot(data['x'],data['y'], shade=True, shade_lowest=False , cmap="Reds", cbar=True)
Out[32]:
Joint distribution and the marginal distributions can be displayed together using jointplot
In [33]:
sns.jointplot("x", "y", data, kind='kde');
In [34]:
import pandas as pd
import numpy as np
iris = pd.read_csv('data/iris.csv')
print(iris.shape)
iris.head()
Out[34]:
In [37]:
sns.pairplot(iris, hue='species')
Out[37]:
factorplot also provides error plots.
In [47]:
sns.factorplot( x = "species" , y="sepal_length" , data=iris , kind="box")
Out[47]:
For displaying classification data, it is sometimes interesting to melt dataframes, that is separating
The command pd.melt return a dataframe with as columns: the id, the variable (former column) name, and associated value.
In [50]:
irisS = pd.melt(iris,id_vars="species",value_vars=["sepal_length","sepal_width","petal_length","petal_width"])
irisS.head()
Out[50]:
In [57]:
sns.factorplot( x= "species" , y = "value" , col="variable" , data=irisS , kind="box")
Out[57]:
In [ ]:
import lib.notebook_setting as nbs
packageList = ['IPython', 'numpy', 'scipy', 'matplotlib', 'cvxopt', 'pandas', 'seaborn', 'sklearn', 'tensorflow']
nbs.packageCheck(packageList)
nbs.cssStyling()