Basic Scikit-Learn Exercises



In [ ]:

    
# 1. Import sklearn, import pandas as pd, and pd.read_csv the CFPB CSV file into dataframe 'df'.



In [ ]:

    
# 2. Filter your df down to 'Product', 'Consumer Claim', 'Amount Received' using [[]] notation. Which is our target?



In [ ]:

    
# 3. From sklearn.cross_validation import train_test_split. Make a train/test split 80/20 (we won't use it though).



In [ ]:

    
# 4. Assign df[['Consumer Claim', 'Amount Received']] to 'X'



In [ ]:

    
# 5. Convert to raw values df['Product'].values and assign to 'y'



In [ ]:

    
# 6. From sklearn.preprocessing import StandardScaler. From sklearn.pipeline import Pipeline.



In [ ]:

    
# 7. From sklearn.neighbors import KNeighborsClassifier. Make a scalar/knn pipeline.



In [ ]:

    
# 8. Fit your pipeline with your X and y.



In [ ]:

    
# 9. Use your newly fitted pipeline to predict classifications for [[100, 80], [5000, 4000], [350, 900]] .



In [ ]:

    
# 10. From sklearn.cross_validation import cross_val_score. Run cross val score on your pipeline.



In [ ]:

    
# 11. Get the mean of cross validation scores from your pipeline,



In [ ]:

    
# 12. Now repeat with Support Vector Machine Classifier (sklearn.svm.SVC) pipeline. Which yields better results?

Answers