Both Linear Regression and k Nearest Neighbor are popular machine learning models used make perdition of an event. Linear regression is used to predict result with scalar values. In linear regression, we are trying to use a linear predictor function, the best fit line to predict the result with known attribute values. And the best fit line was calculated to minimize the residual error. In K Nearest Neighbor, we believe that new data with certain attributes value should be classified into the group with the closest attribute. Thus, a prediction problem, whose results are continuous values, is easier to be modeled by linear regression, but prediction problem, whose job is to determine if an object or data is a member of certain group is easier to be modeled with K Nearest Neighbor.
In the problem of my choice, I want to predict the bike load on Manhantan Bridge based on the temperature and precipitation. The bike loads on the bridge are continuous scalar values, so I chose to use linear regression.
In this prediction, I use the linear model from python scikit learn library. To use the linear model, I need to
In [137]:
import sqlite3
import pandas as pd
from pprint import pprint
from pandas import DataFrame
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import math
import numpy as np
conn = sqlite3.connect('bicycle.db')
c=conn.cursor()
c.execute('SELECT HiTemp, LoTemp, Precip FROM bicycle')
features=c.fetchall()
c.execute('SELECT Manhattan FROM bicycle')
results =c.fetchall()
In [138]:
X=DataFrame(features, columns=['HiTemp', 'LoTemp', 'Precip'])
y=DataFrame(results, columns=['Man_Count'])
#training set
X_tr=X[::2]
y_tr=y[::2]
#testing set
X_ts=X[1::2]
y_ts=y[1::2]
In [139]:
model=LinearRegression()
model.fit(X_tr,y_tr)
prd_y=model.predict(X_ts)
In [140]:
def getMeanAbsErr(predicted, actual):
n = len(predicted)
totalerr = 0
for i in range(n):
totalerr = totalerr + abs(predicted[i]-actual[i])
return (totalerr/n)
In [141]:
d1=DataFrame(a, columns=['Actual'])
d2=DataFrame(prd_y, columns=['Predicted'])
cmp=d1.join(d2)
print(cmp)
In [142]:
#print(getMeanAbsErr(prd_y, y_ts))
#print(cmp['Actual'])
score=model.score(X_ts, y_ts)
print(score)
err=getMeanAbsErr(cmp['Actual'], cmp['Predicted'])
print(err)
conn.close()