In [1]:
%autosave 10
nltk.PunktTokeniser. -> Cleaned, parsed, tokensized strings.sklearn.TfidVectorizer -> TF-IDF 2D matrixsklearn.TruncatedSVD -> SVD N-D matrixNow, we can:
sklearn.MiniBatchKmeans (ridiculously fast, coupled with grid search for hyperparameters)sklearn.AffinityPropagationsklearn.RadiusNeighborsClassifiernltk and scikit-learn allowed rapid development and testing