t-SNE is a nonlinear dimensionality reduction technique for high-dimensional data.
More info in the usual place: https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding
In [ ]:
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import numpy
import pickle
from dscribe.descriptors import MBTR
from visualise import view
We are going to apply this technique to a database of wine samples. The inputs are 13 chemical descriptors, the output is the index of its class (cheap, ok, good). In principle we do not know the output.
In [ ]:
dataIn = numpy.genfromtxt('./data/wineInputs.txt', delimiter=',')
dataOut = numpy.genfromtxt('./data/wineOutputs.txt', delimiter=',')
# find indexes of wines for each class
idx1 = numpy.where(dataOut==1)
idx2 = numpy.where(dataOut==2)
idx3 = numpy.where(dataOut==3)
In [ ]:
# compute the tSNE transformation of the inputs in 2 dimensions
comp = TSNE(n_components=2).fit_transform(dataIn)
# plot the resulting 2D points
plt.plot(comp[:,0],comp[:,1],'ro')
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()
The transform had no idea about the output classes, and still three clusters of points can be seen. We can overlay the knowledge of correct classifaction to check if the clusters correspond to what we know:
In [ ]:
plt.plot(comp[idx1,0],comp[idx1,1],'go')
plt.plot(comp[idx2,0],comp[idx2,1],'ro')
plt.plot(comp[idx3,0],comp[idx3,1],'bo')
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()
In [ ]:
import ase.io
# load the database
samples = ase.io.read("data/clusters.extxyz", index=':')
# samples is now a list of ASE Atoms objects, ready to use!
# the first 55 clusters are FCC, the last 55 are BCC
# define MBTR setup
mbtr = MBTR(
species=["Fe"],
periodic=False,
k2={
"geometry": {"function": "distance"},
"grid": { "min": 0, "max": 2, "sigma": 0.01, "n": 200 },
"weighting": {"function": "exp", "scale": 0.4, "cutoff": 1e-2}
},
k3={
"geometry": {"function": "cosine"},
"grid": { "min": -1.0, "max": 1.0, "sigma": 0.02, "n": 200 },
"weighting": {"function": "exp", "scale": 0.4, "cutoff": 1e-2}
},
flatten=True,
sparse=False,
)
# calculate MBTR descriptor for each sample - takes a few secs
mbtrs = mbtr.create(samples)
print(mbtrs.shape)
Plot the t-SNE projection of MBTR output and see if you can see the two classes of structures accurately
In [ ]:
# ...
Plot the original MBTR descriptors and see if the structural differences are visible there
In [ ]:
# ...
Try changing the MBTR and t-SNE parameters and see how the projection changes
In [ ]:
# ...