# Introduction

We're going to explore Pizza Franshise data set from http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/slr/frames/frame.html

We want to know if we should be opening the next pizza franshise or not.

In the following data X = annual franchise fee (\$1000) Y = start up cost (\$1000) for a pizza franchise

``````

In :

%matplotlib inline
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

``````

# Data Exploring

``````

In :

df.describe()

``````
``````

Out:

annual
cost

count
36.000000
36.000000

mean
1134.777778
1291.055556

std
158.583211
124.058038

min
700.000000
1050.000000

25%
1080.000000
1250.000000

50%
1162.500000
1277.500000

75%
1250.000000
1300.000000

max
1375.000000
1830.000000

``````
``````

In :

``````
``````

Out:

annual
cost

0
1000
1050

1
1125
1150

2
1087
1213

3
1070
1275

4
1100
1300

``````
``````

In :

df.annual.plot()

``````
``````

Out:

<matplotlib.axes._subplots.AxesSubplot at 0x10e90d400>

``````
``````

In :

df.cost.plot()

``````
``````

Out:

<matplotlib.axes._subplots.AxesSubplot at 0x10ea1e128>

``````
``````

In :

df.plot(kind='scatter', x='X', y='Y');

``````
``````

``````
``````

In :

slope, intercept, r_value, p_value, std_err = stats.linregress(df['X'], df['Y'])

``````
``````

In :

plt.plot(df['X'], df['Y'], 'o', label='Original data', markersize=2)
plt.plot(df['X'], slope*df['X'] + intercept, 'r', label='Fitted line')
plt.legend()
plt.show()

``````
``````

``````

So from this trend we can predict that if you annual fee is high then you need your startup cost will be high as well.