In this project we will be working with a fake advertising data set, indicating whether or not a particular internet user clicked on an Advertisement. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user.
This data set contains the following features:
Import a few libraries you think you'll need (Or just import them as you go along!)
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [2]:
ad_data = pd.read_csv('advertising.csv')
Check the head of ad_data
In [4]:
ad_data.head()
Out[4]:
Use info and describe() on ad_data
In [5]:
ad_data.info()
In [6]:
ad_data.describe()
Out[6]:
In [15]:
sns.distplot(ad_data['Age'],kde=False,bins=30,color='blue')
Out[15]:
Create a jointplot showing Area Income versus Age.
In [16]:
sns.jointplot(data=ad_data,x='Age',y='Area Income')
Out[16]:
Create a jointplot showing the kde distributions of Daily Time spent on site vs. Age.
In [20]:
sns.jointplot(data=ad_data,x='Age',y='Daily Time Spent on Site',kind='kde')
Out[20]:
Create a jointplot of 'Daily Time Spent on Site' vs. 'Daily Internet Usage'
In [22]:
sns.jointplot(data=ad_data,x='Daily Time Spent on Site',y='Daily Internet Usage',color='green')
Out[22]:
Finally, create a pairplot with the hue defined by the 'Clicked on Ad' column feature.
In [23]:
sns.pairplot(ad_data,hue='Clicked on Ad')
Out[23]:
Split the data into training set and testing set using train_test_split
In [27]:
ad_data.columns
Out[27]:
In [ ]:
In [24]:
from sklearn.model_selection import train_test_split
In [28]:
X = ad_data[['Daily Time Spent on Site', 'Age', 'Area Income','Daily Internet Usage','Male']]
y = ad_data['Clicked on Ad']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=101)
Train and fit a logistic regression model on the training set.
In [29]:
from sklearn.linear_model import LogisticRegression
In [31]:
logmodel = LogisticRegression()
In [36]:
logmodel.fit(X_train,y_train)
Out[36]:
In [37]:
logmodel.coef_
Out[37]:
In [33]:
predictions = logmodel.predict(X_test)
Create a classification report for the model.
In [34]:
from sklearn.metrics import classification_report
In [38]:
print(classification_report(y_test,predictions))
In [42]:
from collections import OrderedDict
In [83]:
d = OrderedDict({'Daily Time Spent on Site': 500, 'Age': 18, 'Area Income':23000,'Daily Internet Usage': 160,'Male': 1})
df = pd.DataFrame(d,index=[0])
In [84]:
sample_predict = logmodel.predict(df)
In [85]:
print(sample_predict)
In [ ]: