First Classifier

Goals

  1. Build our first classifier!
  2. Choosing a machine learning algorithms

Introduction

In lesson two we transformed our twitter sentiment data into a feature matrix. Now we can apply virtually any machine learning algorithm to our data and the python scikit-learn package makes it really easy to try any technique we want. The algorithms have subtle tradeoffs and the names can be really confusing. Many people new to machine learning spend too much time looking for the perfect machine learning algorithm for their data. In reality training data and feature extraction are almost always more important, however chosing the wrong algorithm can cause problems. In this notebook, we're going to go over how to pick an algorithm and evaluate if it's working well.

Picking an Algorithm

Scikit learn has a great flowchart for chosing an algorithm at scikit-learn.org/stable/tutorial/machine_learning_map/.

Let's walk through this flowchart on our data starting at the "START" circle in the upper right:

  1. Do we have >50 samples? Yes.
  2. Are we predicting a category? Yes. We have four categories Positive, Negative, No Emotion and Can't tell.
  3. Do we have labeled data? Yes.
  4. Do we have <100K samples? Yes.
  5. We have arrived at LinearSVC.

If you are actually looking at the flowchart on the scikit webpage, you can click on the green box and go to the LinearSVC documentation.


In [4]:
import pandas as pd
import numpy as np

df = pd.read_csv('../scikit/tweets.csv')
target = df['is_there_an_emotion_directed_at_a_brand_or_product']
text = df['tweet_text']

# We need to remove the empty rows from the text before we pass into CountVectorizer
fixed_text = text[pd.notnull(text)]
fixed_target = target[pd.notnull(text)]

# Do the feature extraction
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()              # initialize the count vectorizer
count_vect.fit(fixed_text)                  # set up the columns for the feature matrix
counts = count_vect.transform(fixed_text)   # counts is the feature matrix

from sklearn.svm import LinearSVC
# Build a classifier using the LinearSVC algorithm
clf = LinearSVC()                           # initialize our classifier
clf.fit(counts, fixed_target)               # fit our classifier to the training data
print(clf.predict(count_vect.transform(['i love my iphone'])))   # try making a prediction


['Positive emotion']

All classification algorithms in scikit-learn have three important functions:

  1. an initialization function where you pass in parameters (more on this later)
  2. a "fit" function that learns a specific classifier on the training data.
  3. a "predict" function that makes predictions on new data.

Remember that the classifier will only work on feature vectors. We use our count_vect object to turn our training data into features and then we use it again to turn our new data into features. Together our count_vect obejct and our clf object work as a classifier that decide if tweets are positive or negative.

Let's try some more examples!


In [6]:
print('I hate my iphone', clf.predict(count_vect.transform(['I hate my iphone'])))  
print('my iphone is great', clf.predict(count_vect.transform(['my iphone is great'])))  
print('my iphone sucks', clf.predict(count_vect.transform(['my iphone sucks'])))   
print('I do not love my iphone', clf.predict(count_vect.transform(['I do not love my iphone'])))


i hate my iphone ['Negative emotion']
my iphone is great ['Positive emotion']
my iphone sucks ['Negative emotion']
i do not love my iphone ['Positive emotion']

Hm, this all looks promising, except for the last one. Take a second to think about why our classifier might have gotten the last one wrong (think about the feature extraction processs).

Try some of your own examples. How well do you think the classifier is working?

Our second classifier

Since all machine learning algorithms take in the same type of feature input, it's easy to try using a different classifier. If we go back to the diagram at the top and we follow the "Not Working" line coming out of the LinearSVC box, it takes us to the question "Text Data"? We are working with text data, so we find ourselves at the "Naive Bayes" node.

Let's switch our classifier to Naive Bayes. This is another common type of classifier which is extremely fast and easy to deploy.


In [8]:
# Build a classifier using the Naive Bayes algorithm

from sklearn.naive_bayes import MultinomialNB

nb = MultinomialNB()
nb.fit(counts, fixed_target)

print(nb.predict(count_vect.transform(['i love my iphone'])))   # try making a prediction


['Positive emotion']

In [9]:
print('I hate my iphone', nb.predict(count_vect.transform(['I hate my iphone'])))  
print('my iphone is great', nb.predict(count_vect.transform(['my iphone is great'])))  
print('my iphone sucks', nb.predict(count_vect.transform(['my iphone sucks'])))   
print('I do not love my iphone', nb.predict(count_vect.transform(['I do not love my iphone'])))


I hate my iphone ['Negative emotion']
my iphone is great ['Positive emotion']
my iphone sucks ['Negative emotion']
I do not love my iphone ['Positive emotion']

More on choosing an algorithm

One of the most popular machine learning websites, Kaggle, did a survey in 2017 asking data scients which alogrithms they used.

The technique we used first was a type of SVM and the technique we used second was a type of Bayesian algorithm. These are especially good algorithms for text data.

But before we get too fancy, we need to put in place a framework to evaluate our algorithms.

Takeaways

  1. There are many types of algorithms, but they all generally have the same API, so one great way to pick an algorithm is by trial and error.
  2. Algorithms generally have similar accuracy if configured properly and given good features.
  3. Speed of training and speed of runtime are really important things to consider when choosing an algorithm.
  4. SVMs can work great for text data, but the runtime usually gets slower with more training data.

Questions

  1. Is there a better way we could have done the feature extraction step?
  2. What happens when we see a new word that wasn't in the training data?

In [ ]: