Naive Bayes classification example from the book "Principles of Data Science"

P(spam|<sentence>) = P(<sentence>|spam) * P(spam) / P(<sentence>)
P(non-spam|<sentence>) = P(<sentence>|non-spam) * P(non-spam) / P(<sentence>)



In [1]:

    
import pandas as pd
import sklearn



In [3]:

    
df = pd.read_table('https://raw.githubusercontent.com/sinanuozdemir/sfdat22/master/data/sms.tsv', sep='\t', header=None, names=['label', 'msg'])
df









    Out[3]:






  
    
      
      label
      msg
    
  
  
    
      0
      ham
      Go until jurong point, crazy.. Available only ...
    
    
      1
      ham
      Ok lar... Joking wif u oni...
    
    
      2
      spam
      Free entry in 2 a wkly comp to win FA Cup fina...
    
    
      3
      ham
      U dun say so early hor... U c already then say...
    
    
      4
      ham
      Nah I don't think he goes to usf, he lives aro...
    
    
      5
      spam
      FreeMsg Hey there darling it's been 3 week's n...
    
    
      6
      ham
      Even my brother is not like to speak with me. ...
    
    
      7
      ham
      As per your request 'Melle Melle (Oru Minnamin...
    
    
      8
      spam
      WINNER!! As a valued network customer you have...
    
    
      9
      spam
      Had your mobile 11 months or more? U R entitle...
    
    
      10
      ham
      I'm gonna be home soon and i don't want to tal...
    
    
      11
      spam
      SIX chances to win CASH! From 100 to 20,000 po...
    
    
      12
      spam
      URGENT! You have won a 1 week FREE membership ...
    
    
      13
      ham
      I've been searching for the right words to tha...
    
    
      14
      ham
      I HAVE A DATE ON SUNDAY WITH WILL!!
    
    
      15
      spam
      XXXMobileMovieClub: To use your credit, click ...
    
    
      16
      ham
      Oh k...i'm watching here:)
    
    
      17
      ham
      Eh u remember how 2 spell his name... Yes i di...
    
    
      18
      ham
      Fine if thats the way u feel. Thats the way ...
    
    
      19
      spam
      England v Macedonia - dont miss the goals/team...
    
    
      20
      ham
      Is that seriously how you spell his name?
    
    
      21
      ham
      I‘m going to try for 2 months ha ha only joking
    
    
      22
      ham
      So ü pay first lar... Then when is da stock co...
    
    
      23
      ham
      Aft i finish my lunch then i go str down lor. ...
    
    
      24
      ham
      Ffffffffff. Alright no way I can meet up with ...
    
    
      25
      ham
      Just forced myself to eat a slice. I'm really ...
    
    
      26
      ham
      Lol your always so convincing.
    
    
      27
      ham
      Did you catch the bus ? Are you frying an egg ...
    
    
      28
      ham
      I'm back &amp; we're packing the car now, I'll...
    
    
      29
      ham
      Ahhh. Work. I vaguely remember that! What does...
    
    
      ...
      ...
      ...
    
    
      5542
      ham
      Armand says get your ass over to epsilon
    
    
      5543
      ham
      U still havent got urself a jacket ah?
    
    
      5544
      ham
      I'm taking derek &amp; taylor to walmart, if I...
    
    
      5545
      ham
      Hi its in durban are you still on this number
    
    
      5546
      ham
      Ic. There are a lotta childporn cars then.
    
    
      5547
      spam
      Had your contract mobile 11 Mnths? Latest Moto...
    
    
      5548
      ham
      No, I was trying it all weekend ;V
    
    
      5549
      ham
      You know, wot people wear. T shirts, jumpers, ...
    
    
      5550
      ham
      Cool, what time you think you can get here?
    
    
      5551
      ham
      Wen did you get so spiritual and deep. That's ...
    
    
      5552
      ham
      Have a safe trip to Nigeria. Wish you happines...
    
    
      5553
      ham
      Hahaha..use your brain dear
    
    
      5554
      ham
      Well keep in mind I've only got enough gas for...
    
    
      5555
      ham
      Yeh. Indians was nice. Tho it did kane me off ...
    
    
      5556
      ham
      Yes i have. So that's why u texted. Pshew...mi...
    
    
      5557
      ham
      No. I meant the calculation is the same. That ...
    
    
      5558
      ham
      Sorry, I'll call later
    
    
      5559
      ham
      if you aren't here in the next  &lt;#&gt;  hou...
    
    
      5560
      ham
      Anything lor. Juz both of us lor.
    
    
      5561
      ham
      Get me out of this dump heap. My mom decided t...
    
    
      5562
      ham
      Ok lor... Sony ericsson salesman... I ask shuh...
    
    
      5563
      ham
      Ard 6 like dat lor.
    
    
      5564
      ham
      Why don't you wait 'til at least wednesday to ...
    
    
      5565
      ham
      Huh y lei...
    
    
      5566
      spam
      REMINDER FROM O2: To get 2.50 pounds free call...
    
    
      5567
      spam
      This is the 2nd time we have tried 2 contact u...
    
    
      5568
      ham
      Will ü b going to esplanade fr home?
    
    
      5569
      ham
      Pity, * was in mood for that. So...any other s...
    
    
      5570
      ham
      The guy did some bitching but I acted like i'd...
    
    
      5571
      ham
      Rofl. Its true to its name
    
  

5572 rows × 2 columns



In [4]:

    
df.label.value_counts()









    Out[4]:





ham     4825
spam     747
Name: label, dtype: int64



In [13]:

    
value_probablity = df.label.value_counts()/len(df)
spam_probability = value_probablity.spam
ham_probability = value_probablity.ham
print('spam probability: {}, ham probability: {}'.format(spam_probability, ham_probability))









    



spam probability: 0.13406317300789664, ham probability: 0.8659368269921034



In [17]:

    
spams = df[df.label == 'spam']
sentence = 'send cash now'
spam_words_probability = 1
for word in sentence.split():
    word_probability = spams[spams.msg.str.contains(word)].shape[0]/float(spams.shape[0])
    print("word {} probability: {}".format(word, word_probability))
    spam_words_probability *= word_probability
spam_words_probability *= spam_probability
print('spam words probability: {}'.format(spam_words_probability))









    



word send probability: 0.06693440428380187
word cash probability: 0.06827309236947791
word now probability: 0.1994645247657296
spam words probability: 0.00012220082487226017



In [19]:

    
hams = df[df.label == 'ham']
sentence = 'send cash now'
ham_words_probability = 1
for word in sentence.split():
    word_probability = hams[hams.msg.str.contains(word)].shape[0]/float(hams.shape[0])
    print("word {} probability: {}".format(word, word_probability))
    ham_words_probability *= word_probability
ham_words_probability *= ham_probability
print('ham words probability: {}'.format(ham_words_probability))









    



word send probability: 0.023626943005181346
word cash probability: 0.002694300518134715
word now probability: 0.10051813471502591
ham words probability: 5.540949590575691e-06



In [20]:

    
if spam_words_probability > ham_words_probability:
    print('{} is more likely a spam'.format(sentence))
else:
    print('{} is more likely NOT a spam'.format(sentence))









    



send cash now is more likely a spam

Use the built-in library methods



In [31]:

    
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
x_train, x_test, y_train, y_test = train_test_split(df.msg, df.label, random_state=1)
vect = CountVectorizer()
train_dtm = vect.fit_transform(x_train)
test_dtm = vect.transform(x_test)
nb = MultinomialNB()
nb.fit(train_dtm, y_train)
predicts = nb.predict(test_dtm)
predicts









    Out[31]:





array(['ham', 'ham', 'ham', ..., 'ham', 'spam', 'ham'], 
      dtype='<U4')



In [36]:

    
from sklearn import metrics
print('accuracy: {}, confusion matrix: {}'
      .format(metrics.accuracy_score(y_test, predicts), metrics.confusion_matrix(y_test, predicts)))









    



accuracy: 0.9885139985642498, confusion matrix: [[1203    5]
 [  11  174]]

for confusion matrix, see: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html



In [ ]:

	label	msg
0	ham	Go until jurong point, crazy.. Available only ...
1	ham	Ok lar... Joking wif u oni...
2	spam	Free entry in 2 a wkly comp to win FA Cup fina...
3	ham	U dun say so early hor... U c already then say...
4	ham	Nah I don't think he goes to usf, he lives aro...
5	spam	FreeMsg Hey there darling it's been 3 week's n...
6	ham	Even my brother is not like to speak with me. ...
7	ham	As per your request 'Melle Melle (Oru Minnamin...
8	spam	WINNER!! As a valued network customer you have...
9	spam	Had your mobile 11 months or more? U R entitle...
10	ham	I'm gonna be home soon and i don't want to tal...
11	spam	SIX chances to win CASH! From 100 to 20,000 po...
12	spam	URGENT! You have won a 1 week FREE membership ...
13	ham	I've been searching for the right words to tha...
14	ham	I HAVE A DATE ON SUNDAY WITH WILL!!
15	spam	XXXMobileMovieClub: To use your credit, click ...
16	ham	Oh k...i'm watching here:)
17	ham	Eh u remember how 2 spell his name... Yes i di...
18	ham	Fine if thats the way u feel. Thats the way ...
19	spam	England v Macedonia - dont miss the goals/team...
20	ham	Is that seriously how you spell his name?
21	ham	I‘m going to try for 2 months ha ha only joking
22	ham	So ü pay first lar... Then when is da stock co...
23	ham	Aft i finish my lunch then i go str down lor. ...
24	ham	Ffffffffff. Alright no way I can meet up with ...
25	ham	Just forced myself to eat a slice. I'm really ...
26	ham	Lol your always so convincing.
27	ham	Did you catch the bus ? Are you frying an egg ...
28	ham	I'm back & we're packing the car now, I'll...
29	ham	Ahhh. Work. I vaguely remember that! What does...
...	...	...
5542	ham	Armand says get your ass over to epsilon
5543	ham	U still havent got urself a jacket ah?
5544	ham	I'm taking derek & taylor to walmart, if I...
5545	ham	Hi its in durban are you still on this number
5546	ham	Ic. There are a lotta childporn cars then.
5547	spam	Had your contract mobile 11 Mnths? Latest Moto...
5548	ham	No, I was trying it all weekend ;V
5549	ham	You know, wot people wear. T shirts, jumpers, ...
5550	ham	Cool, what time you think you can get here?
5551	ham	Wen did you get so spiritual and deep. That's ...
5552	ham	Have a safe trip to Nigeria. Wish you happines...
5553	ham	Hahaha..use your brain dear
5554	ham	Well keep in mind I've only got enough gas for...
5555	ham	Yeh. Indians was nice. Tho it did kane me off ...
5556	ham	Yes i have. So that's why u texted. Pshew...mi...
5557	ham	No. I meant the calculation is the same. That ...
5558	ham	Sorry, I'll call later
5559	ham	if you aren't here in the next <#> hou...
5560	ham	Anything lor. Juz both of us lor.
5561	ham	Get me out of this dump heap. My mom decided t...
5562	ham	Ok lor... Sony ericsson salesman... I ask shuh...
5563	ham	Ard 6 like dat lor.
5564	ham	Why don't you wait 'til at least wednesday to ...
5565	ham	Huh y lei...
5566	spam	REMINDER FROM O2: To get 2.50 pounds free call...
5567	spam	This is the 2nd time we have tried 2 contact u...
5568	ham	Will ü b going to esplanade fr home?
5569	ham	Pity, * was in mood for that. So...any other s...
5570	ham	The guy did some bitching but I acted like i'd...
5571	ham	Rofl. Its true to its name