In [ ]:
# 1. Import sklearn, import pandas as pd, and pd.read_csv the CFPB CSV file into dataframe 'df'.

In [ ]:
# 2. Filter your df down to 'Product', 'Consumer Claim', 'Amount Received' using [[]] notation. Which is our target?

In [ ]:
# 3. From sklearn.cross_validation import train_test_split. Make a train/test split 80/20 (we won't use it though).

In [ ]:
# 4. Assign df[['Consumer Claim', 'Amount Received']] to 'X'

In [ ]:
# 5. Convert to raw values df['Product'].values and assign to 'y'

In [ ]:
# 6. From sklearn.preprocessing import StandardScaler. From sklearn.pipeline import Pipeline.

In [ ]:
# 7. From sklearn.neighbors import KNeighborsClassifier. Make a scalar/knn pipeline.

In [ ]:
# 8. Fit your pipeline with your X and y.

In [ ]:
# 9. Use your newly fitted pipeline to predict classifications for [[100, 80], [5000, 4000], [350, 900]] .

In [ ]:
# 10. From sklearn.cross_validation import cross_val_score. Run cross val score on your pipeline.

In [ ]:
# 11. Get the mean of cross validation scores from your pipeline,

In [ ]:
# 12. Now repeat with Support Vector Machine Classifier (sklearn.svm.SVC) pipeline. Which yields better results?