lpproj
is a Python implementation of Locality Preserving Projections, built to be compatible with scikit-learn. It can be installed with pip; e.g.
pip install lpproj
For more information, see http://github.com/jakevdp/lpproj
This notebook contains a very short example showing the use of the code.
In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
In [2]:
from sklearn.datasets import make_blobs
X, y = make_blobs(1000, n_features=300, centers=4,
cluster_std=8, random_state=42)
In [3]:
fig, ax = plt.subplots(2, 2, figsize=(10, 10))
rand = np.random.RandomState(42)
for axi in ax.flat:
i, j = rand.randint(X.shape[1], size=2)
axi.scatter(X[:, i], X[:, j], c=y)
In [4]:
from lpproj import LocalityPreservingProjection
lpp = LocalityPreservingProjection(n_components=2)
X_2D = lpp.fit_transform(X)
Plotting this projection, we confirm that it has kept nearby points together, as represented by the distinct clusters visible in the projection:
In [5]:
plt.scatter(X_2D[:, 0], X_2D[:, 1], c=y)
plt.title("Projected from 500->2 dimensions");
For more information, see the Locality Preserving Projection website
In [6]:
from sklearn.decomposition import PCA
Xpca = PCA(n_components=2).fit_transform(X)
plt.scatter(Xpca[:, 0], Xpca[:, 1], c=y);
It is important to keep in mind, though, that these are two fundamentally different models: PCA finds a linear projection which maximizes the preserved variance in the data. LPP finds a linear projection which maximizes the preserved locality in the data.
The difference between the two can be made more clear by looking at data with different properties. One example is data with outliers. Because PCA is a variance-based method, it is strongly affected by the presence of outliers. LPP, on the other hand, focuses on preserving local neighborhoods and thus outliers do not have as strong an effect.
First we'll add ten outliers to our original data:
In [7]:
rand = np.random.RandomState(42)
Xnoisy = X.copy()
Xnoisy[:10] += 1000 * rand.randn(10, X.shape[1])
Now we can compute the PCA and LPP projections and view the results:
In [8]:
Xpca = PCA(n_components=2).fit_transform(Xnoisy)
Xlpp = LocalityPreservingProjection(n_components=2).fit_transform(Xnoisy)
fig, ax = plt.subplots(1, 2, figsize=(16, 5))
ax[0].scatter(Xlpp[:, 0], Xlpp[:, 1], c=y)
ax[0].set_title('LPP with outliers')
ax[1].scatter(Xpca[:, 0], Xpca[:, 1], c=y)
ax[1].set_title('PCA with outliers');
In the presence of outliers, the projection found by LPP is much more useful than the projection found by PCA.