In [1]:
%autosave 0
from IPython.core.display import HTML, display
display(HTML('<style>.container { width:100%; !important }</style>'))
We import the module pandas
. This module implements so called data frames and is more convenient than the module csv
when reading a csv file.
In [2]:
import pandas as pd
The data we want to read is contained in the csv file 'cars.csv'
.
In [3]:
cars = pd.read_csv('cars.csv')
cars
Out[3]:
We want to convert the columns containing mpg
and displacement
into NumPy arrays.
In [4]:
import numpy as np
X = np.array(cars['displacement'])
Y = np.array(cars['mpg'])
We convert cubic inches into litres.
In [5]:
X = 0.0163871 * X
In order to use SciKit-Learn we have to reshape the array X into a matrix.
In [7]:
X = np.reshape(X, (len(X), 1))
X
Out[7]:
We convert miles per gallon into kilometer per litre.
In [8]:
Y = 1.60934 / 3.78541 * Y
We convert kilometer per litre into litre per 100 kilometer.
In [9]:
Y = 100 / Y
We plot fuel consumption versus engine displacement.
In [10]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.figure(figsize=(12, 10))
sns.set(style='darkgrid')
plt.scatter(X, Y, c='b') # 'b' is blue color
plt.xlabel('engine displacement in litres')
plt.ylabel('litre per 100 km')
plt.title('fuel consumption versus engine displacement')
plt.show()
We import the linear_model
from SciKit-Learn:
In [11]:
import sklearn.linear_model as lm
We create a linear model.
In [12]:
model = lm.LinearRegression()
We train this model using the data we have.
In [13]:
M = model.fit(X, Y)
The model M
represents a linear relationship between X
and Y
of the form
$$ \texttt{Y} = \vartheta_0 + \vartheta_1 \cdot \texttt{X} $$
We extract the coefficients $\vartheta_0$ and $\vartheta_1$.
In [14]:
ϑ0 = M.intercept_
ϑ0
Out[14]:
In [15]:
ϑ1 = M.coef_[0]
ϑ1
Out[15]:
The values are, of course, the same values that we had already computed with the notebook
Simple-Linear-Regression.ipynb
. We plot the data together with the regression line.
In [16]:
xMax = max(X) + 0.2
plt.figure(figsize=(12, 10))
sns.set(style='darkgrid')
plt.scatter(X, Y, c='b')
plt.plot([0, xMax], [ϑ0, ϑ0 + ϑ1 * xMax], c='r')
plt.xlabel('engine displacement in cubic inches')
plt.ylabel('fuel consumption in litres per 100 km')
plt.title('Fuel Consumption versus Engine Displacement')
plt.show()
In [ ]: