Example of how to set up your lab notebook

Analysis in this notebook

  • [Dead end] Does year predict production?
  • Does "hours worked" correlate with production?

Tip

Standard imports at the top

Imports should be grouped in the following order:

  1. magics
  2. Alphabetical order
    1. standard library imports
    2. related third party imports
    3. local application/library specific imports

In [1]:
# Magics first (server issues)
%matplotlib inline 
# Do below if you want interactive matplotlib plot ()
# %matplotlib notebook 

# https://ipython.org/ipython-doc/dev/config/extensions/autoreload.html
%load_ext autoreload
%autoreload 2

# %install_ext http://raw.github.com/jrjohansson/version_information/master/version_information.py
%load_ext version_information
%version_information numpy, scipy, matplotlib, pandas


Out[1]:
SoftwareVersion
Python2.7.10 64bit [GCC 4.2.1 (Apple Inc. build 5577)]
IPython3.2.1
OSDarwin 14.4.0 x86_64 i386 64bit
numpy1.9.2
scipy0.15.1
matplotlib1.4.3
pandas0.16.2
Fri Jul 24 10:26:20 2015 PDT

In [2]:
# Standard library
import os
import sys
sys.path.append("../src/")

# Third party imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Local imports
from simpleexample import example_func

In [3]:
# Customizations
sns.set() # matplotlib defaults

# Any tweaks that normally go in .matplotlibrc, etc., should explicitly go here
plt.rcParams['figure.figsize'] = (12, 12)
%config InlineBackend.figure_format='retina'

In [10]:
# Find the notebook the saved figures came from
fig_prefix = "../figures/2015-07-24-jw-"

Importing cleaned data

See ../deliver/coal_data_cleanup.ipynb for how the raw data was cleaned.


In [4]:
from IPython.display import FileLink
FileLink("../deliver/coal_data_cleanup.ipynb")





In [5]:
dframe = pd.read_csv("../data/coal_prod_cleaned.csv")

[Dead end] Does year predict production?


In [6]:
plt.scatter(dframe['Year'], dframe['Production_short_tons'])


Out[6]:
<matplotlib.collections.PathCollection at 0x102db5150>

Does Hours worked correlate with output?


In [7]:
df2 = dframe.groupby('Mine_State').sum()

In [8]:
df2 = df2[df2.index != 'Wyoming']

In [11]:
sns.jointplot('Labor_Hours', 'Production_short_tons', data=df2, kind="reg", ) 
plt.xlabel("Labor Hours Worked")
plt.ylabel("Total Amount Produced") 
plt.tight_layout()
plt.savefig(fig_prefix + "production-vs-hours-worked.png")



In [ ]:


In [ ]:

Advanced example, come back if time!


In [23]:
%load_ext autoreload
%autoreload 2

In [24]:
import sys
sys.path.append("../src/")

In [26]:
from simpleexample import example_func
example_func()


Out[26]:
'This works.'

In [27]:
example_func()


Out[27]:
'This works, seriously you can update this.'

In [ ]: