We're going to explore Pizza Franshise data set from http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html
We want to know if we should be opening the next pizza franshise or not.
In the following data X = annual franchise fee ($1000) Y = start up cost ($1000) for a pizza franchise
In [44]:
%matplotlib inline
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
In [47]:
df = pd.read_csv('slr12.csv', names=['annual', 'cost'], header=0)
df.describe()
Out[47]:
In [48]:
df.head()
Out[48]:
In [49]:
df.annual.plot()
Out[49]:
In [50]:
df.cost.plot()
Out[50]:
In [24]:
df.plot(kind='scatter', x='X', y='Y');
In [34]:
slope, intercept, r_value, p_value, std_err = stats.linregress(df['X'], df['Y'])
In [40]:
plt.plot(df['X'], df['Y'], 'o', label='Original data', markersize=2)
plt.plot(df['X'], slope*df['X'] + intercept, 'r', label='Fitted line')
plt.legend()
plt.show()
So from this trend we can predict that if you annual fee is high then you need your startup cost will be high as well.