## Bivariate Kernel Density Estimation

### Bivariate Kernel Density Estimation using Seaborn, Statsmodels, and Scikit-learn Python libraries

This notebook briefly demonstrates basic usage of the three different functions, from three different python libraries, for managing bivariate kernel density estimation:

• kdeplot from seaborn
• KDEMultivariate from statsmodels
• KernelDensity from sklearn

Furthermore, triangulation with mesh refinement (from matplotlib) is employed (instead of the meshgrid approach) for the evaluation of the kernel density. This triangulation can be used with KDEMultivariate and KernelDensity functions, as demonstrated hereafter. Using trinagulation ought to be faster and more adaptable (with refinement) than the more-classical meshgrid approach. Finally, grid search and cross-validation is used with KernelDensity from sklearn, for the optimal bandwidth selection.

``````

In [1]:

from __future__ import print_function

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.tri as tri
from statsmodels.nonparametric.kernel_density import KDEMultivariate
from sklearn.neighbors import KernelDensity
from sklearn.grid_search import GridSearchCV
import seaborn as sns

# Seaborn style
sns.set(style='darkgrid', font_scale=1.2)

# Inline figures
%matplotlib inline

``````

Input data (random) for kernel density estimation

``````

In [2]:

# Generate some input data from N(0,1)
npts = 600
x = np.random.randn(npts)
y = np.random.randn(npts)

``````

#### Kernel Density Estimation using Seaborn

``````

In [3]:

# Seaborn kdeplot
fig, ax = plt.subplots(figsize=(7,7))
ax.scatter(x, y, s=10, c='darkgrey', marker='o', alpha=0.5)
ax.set_title('Bivariate Standardized Normal Distribution')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()

``````
``````

``````

#### Kernel Density Estimation using Statsmodels

``````

In [4]:

# Kernel density estimation (statsmodels)
model = KDEMultivariate([x, y], 'cc', bw='cv_ml')
# cross validation maximum likelihood bandwidth selection
print('Bandwidth:', model.bw)
zi = model.pdf()

``````
``````

Bandwidth: [ 0.46000813  0.34701119]

``````
``````

In [11]:

# Triangulation with refinement
triang = tri.Triangulation(x, y)
refiner = tri.UniformTriRefiner(triang)
tri_refi, z_refi = refiner.refine_field(zi, subdiv=3)  # three levels of subdivisions

``````
``````

In [6]:

# Graphical visualization
fig, ax = plt.subplots(figsize=(7,7))
ax.set_title('Bivariate Standardized Normal Distribution')
ax.triplot(triang, color='white', lw=0.5)
cax = ax.tricontourf(tri_refi, z_refi, 16, cmap=plt.cm.GnBu)
ax.tricontour(tri_refi, z_refi, 16, colors=['0.25', '0.5', '0.5', '0.5'],
linewidths=[1.0, 0.5, 1.0, 0.5])
cbar = fig.colorbar(cax)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.tight_layout()
plt.show()

``````
``````

``````

#### Kernel Density Estimation using scikit-learn

``````

In [7]:

# Kernel density estimation (scikit-learn)
data = np.vstack((x, y)).T

# Using grid search cross-validation to optimize the bandwidth
params = {'bandwidth': np.logspace(-1, 1, 20)}
grid = GridSearchCV(KernelDensity(), params)
grid.fit(data)
bw = grid.best_estimator_.bandwidth
print("Best bandwidth: {:g}".format(bw))

# Model
model = KernelDensity(bandwidth=bw, algorithm='auto', kernel='gaussian')
model.fit(data)

# Evaluate the density model on the data
logdata = model.score_samples(data)
zdata = np.exp(logdata)

``````
``````

Best bandwidth: 0.428133

``````
``````

In [8]:

# Triangulation
triang = tri.Triangulation(x, y)
# Triangulation refinment
refiner = tri.UniformTriRefiner(triang)
tri_refi, z_refi = refiner.refine_field(zdata, subdiv=3)

``````
``````

In [9]:

# Graphical visualization
fig, ax = plt.subplots(figsize=(7,7))
ax.set_title('Bivariate Standardized Normal Distribution')
cax = ax.tricontourf(tri_refi, z_refi, 16, cmap=plt.cm.GnBu)
ax.tricontour(tri_refi, z_refi, 16, colors=['0.25', '0.5', '0.5', '0.5'],
linewidths=[1.0, 0.5, 1.0, 0.5])
ax.scatter(x, y, s=10, c='darkgrey', marker='o', alpha=0.5)
cbar = fig.colorbar(cax)
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.tight_layout()
plt.show()

``````
``````

``````
``````

In [10]:

# Generating random samples from the Kernel model (scikit-learn)
samples = model.sample(10)
print('Random samples:\n',samples)

``````
``````

Random samples:
[[-1.6289274   1.40016955]
[ 2.77335275 -1.03678107]
[-0.13370906 -1.66372754]
[-0.50445188  1.57188034]
[ 1.7567684   0.61192732]
[ 0.08484377 -1.18525784]
[-0.8224073   0.04602298]
[ 0.59912065  0.35647967]
[ 0.43950839  0.40743642]
[-0.46963738 -0.08988496]]

``````
``````

In [ ]:

``````