Introduction

We're going to explore Pizza Franshise data set from http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html

We want to know if we should be opening the next pizza franshise or not.

In the following data X = annual franchise fee ($1000) Y = start up cost ($1000) for a pizza franchise


In [44]:
%matplotlib inline
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

Data Exploring


In [47]:
df = pd.read_csv('slr12.csv', names=['annual', 'cost'], header=0)
df.describe()


Out[47]:
annual cost
count 36.000000 36.000000
mean 1134.777778 1291.055556
std 158.583211 124.058038
min 700.000000 1050.000000
25% 1080.000000 1250.000000
50% 1162.500000 1277.500000
75% 1250.000000 1300.000000
max 1375.000000 1830.000000

In [48]:
df.head()


Out[48]:
annual cost
0 1000 1050
1 1125 1150
2 1087 1213
3 1070 1275
4 1100 1300

In [49]:
df.annual.plot()


Out[49]:
<matplotlib.axes._subplots.AxesSubplot at 0x10e90d400>

In [50]:
df.cost.plot()


Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ea1e128>

In [24]:
df.plot(kind='scatter', x='X', y='Y');



In [34]:
slope, intercept, r_value, p_value, std_err = stats.linregress(df['X'], df['Y'])

In [40]:
plt.plot(df['X'], df['Y'], 'o', label='Original data', markersize=2)
plt.plot(df['X'], slope*df['X'] + intercept, 'r', label='Fitted line')
plt.legend()
plt.show()


So from this trend we can predict that if you annual fee is high then you need your startup cost will be high as well.