This is going to be a very basic examples of Linear Regression. Basically, we have generated data of total amount of meals and tips. We would like to use this historical data to predict the tip for any given amount of bill.

The data is going to be perfect because I just want to show how easy it is to do Linear Regression.

Best example so far I have found to calculate Linear Regression.

http://onlinestatbook.com/2/regression/intro.html

http://spin.atomicobject.com/2014/06/24/gradient-descent-linear-regression/

```
In [1]:
```%matplotlib inline
import pandas as pd
import numpy as np
from scipy import stats
import collections
import time
from sklearn.linear_model import SGDRegressor

```
In [3]:
```total_bills = np.random.randint(100, size=1000)
tips = total_bills * 0.10

```
In [4]:
```x = pd.Series(tips, name='tips')
y = pd.Series(total_bills, name='total_bills')

```
In [6]:
```df = pd.concat([x, y], axis=1)

```
In [7]:
```df.plot(kind='scatter', x='total_bills', y='tips');

```
```

```
In [8]:
```slope, intercept, r_value, p_value, std_err = stats.linregress(x=total_bills, y=tips)
print("slope is %f and intercept is %s" % (slope,intercept))

```
```

Let's say if the customer spent $70 how much the customer will tip

```
In [9]:
```predicted_tips = (slope * 70) + intercept

```
In [10]:
```print('The customer will leave the tip of $%f' % predicted_tips)

```
```

```
In [20]:
```large_total_bills = np.random.randint(10000, size=100000000)
large_tips = total_bills * 0.10

```
In [21]:
```now = time.time()
slope, intercept, r_value, p_value, std_err = stats.linregress(x=large_total_bills, y=large_tips)
predicted_tips = (slope * 700) + intercept
later = time.time()
difference = int(later - now)
print('The customer will leave the tip of $%f' % predicted_tips)
print('The time spent is %f seconds' % difference)

```
```

Now, I'm going to use Gradient Decent to find the fitted line. It's been known that Gradient Decent is better for large dataset. Let's see how well it performs. I'm going to use the code example from https://github.com/mattnedrich/GradientDescentExample

```
In [17]:
```def compute_error_for_line_given_points (b, m, points):
totalError = 0
for i in range(0, len(points)):
totalError += (points[i].y - (m * points[i].x + b)) ** 2
return totalError / float(len(points))

```
In [18]:
```def step_gradient(b_current, m_current, points, learningRate):
b_gradient = 0
m_gradient = 0
N = float(len(points))
for i in range(0, len(points)):
b_gradient += -(2/N) * (points[i].y - ((m_current*points[i].x) + b_current))
m_gradient += -(2/N) * points[i].x * (points[i].y - ((m_current * points[i].x) + b_current))
new_b = b_current - (learningRate * b_gradient)
new_m = m_current - (learningRate * m_gradient)
return [new_b, new_m]

```
In [19]:
```def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
b = starting_b
m = starting_m
for i in range(num_iterations):
b, m = step_gradient(b, m, points, learning_rate)
return [b, m]

```
In [22]:
```class point:
def __init__(self,x,y):
self.x=x
self.y=y
x = np.random.randint(100, size=1000)
y = x * 0.10
np.column_stack((x,y))
points = []
collections.namedtuple('Point', ['x', 'y'])
for i in range(len(x)):
points.append(point(x[i],y[i]))
learning_rate = 0.0001
initial_b = 0 # initial y-intercept guess
initial_m = 0 # initial slope guess
num_iterations = 1000
print("Starting gradient descent at b = {0}, m = {1}, error = {2}".format(initial_b, initial_m, compute_error_for_line_given_points(initial_b, initial_m, points)))
print("Running...")
[b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations)
print("After {0} iterations b = {1}, m = {2}, error = {3}".format(num_iterations, b, m, compute_error_for_line_given_points(b, m, points)))

```
```

Let's see after 1000 interations how close are we. Pretty close I think

```
In [25]:
```gradient_predicted_tips = (m * 70) + b
gradient_predicted_tips

```
Out[25]:
```

But you really don't need to write that on your own as Scikit provides that for you already.

```
In [3]:
```x = np.random.randint(100, size=100000000)
y = x * 0.10
x = x[:,None]
now = time.time()
clf = SGDRegressor()
clf.fit(x, y)
later = time.time()
difference = int(later - now)
print("Time spent for SGDRegressor is %d seconds" % difference)
print("slope is %f and intercept is %s" % (clf.coef_, clf.intercept_[0]))

```
```

```
In [4]:
```clf.predict(70) # How much tip

```
Out[4]:
```

```
In [ ]:
```