Template for guiding principle component analysis (PCA) using Python's Scikit Learn (sklearn) library. Sklearn will be imported as skl. The common convention for import I typically see is from sklearn import [library] or from sklearn.library import [library function]. In my experience this can get pretty confusing especially if custom tools are imported in a similiar manner.
Typical imports when not using the convention above
from sklearn.feature_selection import SelectKBest
from sklearn.decomposition import PCA
from sklearn import preprocessing
from sklearn.grid_search import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score
from sklearn import preprocessing
from sklearn import cross_validation
This iPython notebook demonstrates use of this workflow
import numpy as np
import sklearn
scaled_pca_data = sklearn.preprocessing.MinMaxscaler().fit_transform(my_features)
perc_var = 0.95
pca = sklearn.decomposition.PCA(n_components = perc_var)
my_pipe = sklearn.pipeline.Pipeline(steps= [('pca, pca)]),
('my_estimator', my_estimator))
estimator = [('reduce_dim',sklearn.decomposition.PCA(),
('dec-tree', base_estimator)]
my_estimator_pipe = sklearn.pipeline.Pipeline(estimator)
my_params = dict(reduce_dim = [perc_var], my_params = #a param list or tuple)
my_grid_search = sklearn.grid_search.GridSearchCV(my_estimator_pipe_object,
my_param_grid_dict, my_scoring_function,
my_cross_validator)
my_grid_search.fit(features, labels)
my_best_estimator = my_grid_search.best_estimator
In [ ]: