In [50]:
%%writefile test.py
def hello_world():
print('Hello World')
In [12]:
import test
test.hello_world()
If you look in the file system, you'll see we have a file called test.py. If it's in the same directory as you, you can get everything from the test file using import. Here's some examples of it's somewhere else:
test.py is in the parent directory of yours: import ..testtest.py is in a subdirectory called sub: import sub.test. To do that though you need to have an empty file called __init__.py inside of the sub folder
In [13]:
%%writefile test.py
pi = 3.0
def square(x):
return x*x
def hello_world():
print('Hello World')
In [14]:
import test
print('pi is exactly {}'.format(test.pi))
Uh-oh! It is using an outdated of test.py. To get python to reload it, we can restart the kernel or use the reload command
In [15]:
from importlib import reload
reload(test)
print('pi is exactly {}'.format(test.pi))
In [16]:
%%writefile test.py
'''This module contains nonsense'''
pi = 3.0
def square(x):
'''Want to square a number? This function will help'''
return x*x
def hello_world():
print('Hello World')
In [17]:
reload(test)
help(test)
The reason for creating a module like test.py is to write a function once and for all so you don't need to copy-pasta. Let's try this for confidence intervals of data. Here are the steps:
I'll be writing out the documentation. I'll use a docstring format called Napoleon. This is more complex than what we've seen before. We specify the function, how it works, examples, what it takes and what it returns. It's important to write your documentation FIRST, so you know what to write
In [18]:
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
In [19]:
import scipy.stats as ss
import numpy as np
data = [4,3,5,3,6, 7]
interval_type = 'double'
confidence = 0.95
center = np.mean(data)
s = np.std(data, ddof=1)
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
print(center, width, ppf)
Now let's try adding some logic for the interval_type of confidence interval
In [20]:
interval_type = 'lower'
if interval_type == 'lower':
ppf = confidence
t = ss.t.ppf(ppf, len(data))
top = s / np.sqrt(len(data)) * t
print(center, top)
The lower confidence interval should run from neg-infinity to a value above the mean. We need to adjust the code.
In [21]:
interval_type = 'lower'
if interval_type == 'lower':
ppf = confidence
t = ss.t.ppf(ppf, len(data))
top = s / np.sqrt(len(data)) * t
print(center, center + top)
In [22]:
interval_type = 'upper'
if interval_type == 'upper':
ppf = 1 - confidence
t = ss.t.ppf(ppf, len(data))
top = s / np.sqrt(len(data)) * t
print(center, center + top)
We can see there is quite a bit of code-repeat. Let's try to put the whole thing together without repeats
In [23]:
import scipy.stats as ss
import numpy as np
data = [4,3,5,3,6, 7]
interval_type = 'lower'
confidence = 0.95
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = width + center
print(center, width, ppf)
In [24]:
%%writefile utilities.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
In [25]:
import utilities
reload(utilities)
Out[25]:
I wrote some example code with the documentation. Let's see if it works
In [26]:
data = [4,3,2,5]
center, width = utilities.conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
In [27]:
#see if it recovers the correct mean
data = ss.norm.rvs(size=1000, loc=12.4)
print(utilities.conf_interval(data))
In [28]:
#see if it can handle upper/lower
print(utilities.conf_interval(data, 'upper'))
In [29]:
print(utilities.conf_interval(data, 'lower'))
In [30]:
#Check different confidence values
print(utilities.conf_interval(data, confidence=0.75))
In [31]:
utilities.conf_interval(data, confidence=95)
Out[31]:
This is a pretty usual mistake. We should probably check that confidence is a valid probability.
In [32]:
utilities.conf_interval([3], confidence=0.5)
Out[32]:
Uh-oh, only one value was given. We should probably warn if there are not enough values.
In [33]:
%%writefile utilities.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
if(len(data) < 3):
print('Not enough data given. Must have at least 3 values')
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
In [34]:
reload(utilities)
utilities.conf_interval([3])
Out[34]:
Ah, but notice it didn't actually stop the program!
In [35]:
raise RuntimeError('This is a problem')
In [36]:
raise ValueError('Your value is bad and you should feel bad')
In [37]:
%%writefile utilities.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
if(len(data) < 3):
raise ValueError('Not enough data given. Must have at least 3 values')
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
In [38]:
reload(utilities)
utilities.conf_interval([3])
In [39]:
%%writefile utilities.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
if(len(data) < 3):
raise ValueError('Not enough data given. Must have at least 3 values')
if(interval_type not in ['upper', 'lower', 'double']):
raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
if(0 > confidence or confidence > 1):
raise ValueError('Confidence must be between 0 and 1')
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
In [40]:
reload(utilities)
utilities.conf_interval([3])
In [41]:
utilities.conf_interval([3,4,32], confidence=95)
You need to arrange your files and folders in a special way. Let's say I'm putting all my functions together into a package called che116. I need to arrange it like this:
che116-package/ <-- the top directory
setup.py <-- the file which gives info about the package
che116/ <-- a folder where the code is stored
__init__.py <-- a completely empty file. The name is important
stats.py <-- where I would put some functions related to stats
Here's the contents of the three files we need to make. NOTE: You need to create the folders above before you can run this. Change the stuff after %%writefile to match where you want it.
In [43]:
%%writefile unit_15/che116-package/setup.py
from setuptools import setup
setup(name = 'che116', #the name for install purposes
author = 'Andrew White', #for your own info
description = 'Some stuff I wrote for CHE 116', #displayed when install/update
version='1.0',
packages=['che116']) #This name should match the directory where you put your code
In [44]:
%%writefile unit_15/che116-package/che116/__init__.py
'''You can put some comments in here if you want. They should describe the package.'''
In [45]:
%%writefile unit_15/che116-package/che116/stats.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
if(len(data) < 3):
raise ValueError('Not enough data given. Must have at least 3 values')
if(interval_type not in ['upper', 'lower', 'double']):
raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
if(0 > confidence or confidence > 1):
raise ValueError('Confidence must be between 0 and 1')
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
In [46]:
%system pip install -e unit_15/che116-package
Out[46]:
In [47]:
#YOU MUST RESTART KERNEL FIRST TIME THROUGH
#after intall + restart, you'll always have your package available
import che116.stats as cs
cs.conf_interval([4,3,4])
Out[47]:
In [48]:
help(che116)
In [49]:
help(che116.stats)