Best practices

Let's start with pep8 (https://www.python.org/dev/peps/pep-0008/)

Imports should be grouped in the following order:

  • standard library imports
  • related third party imports
  • local application/library specific imports

You should put a blank line between each group of imports. Put any relevant all specification after the imports.


In [1]:
%matplotlib inline
%config InlineBackend.figure_format='retina' 

# Add this to python2 code to make life easier
from __future__ import absolute_import, division, print_function

import numpy as np
# don't do:
# from numpy import *

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import ipywidgets
import os
import sys
import warnings

sns.set()
plt.rcParams['figure.figsize'] = (12, 8)
sns.set_style("darkgrid")
sns.set_context("poster", font_scale=1.3)

warnings.filterwarnings('ignore')

Look at Pandas Dataframes

this is italicized


In [3]:
df = pd.read_csv("../data/coal_prod_cleaned.csv")

In [4]:
df.head()


Out[4]:
MSHA_ID Average_Employees Company_Type Labor_Hours Mine_Basin Mine_County Mine_Name Mine_State Mine_Status Mine_Type Operating_Company Operating_Company_Address Operation_Type Production_short_tons Union_Code Year
0 103295 18.0 Independent Producer Operator 39175.0 Appalachia Southern Bibb Seymour Mine Alabama Active Surface Hope Coal Company Inc P.O. Box 249, Maylene, AL 35114 Mine only 105082.0 NaN 2008
1 103117 19.0 Operating Subsidiary 29926.0 Appalachia Southern Cullman Mine #2, #3, #4 Alabama Active, men working, not producing Surface Twin Pines Coal Company Inc 1874 County Road 15, Bremen, AL 35033 Mine only 10419.0 NaN 2008
2 103361 20.0 Operating Subsidiary 42542.0 Appalachia Southern Cullman Cold Springs West Mine Alabama Active Surface Twin Pines Coal Company 74 Industrial Parkway, Jasper, AL 35502 Mine only 143208.0 NaN 2008
3 100759 395.0 Operating Subsidiary 890710.0 Appalachia Southern Fayette North River # 1 Underground Mi Alabama Active Underground Chevron Mining Inc 3114 County Road 63 S, Berry, AL 35546 Mine and Preparation Plant 2923261.0 United Mine Workers of America 2008
4 103246 22.0 Independent Producer Operator 55403.0 Appalachia Southern Franklin Bear Creek Alabama Active Surface Birmingham Coal & Coke Co., In 912 Edenton Street, Birmingham, AL 35242 Mine only 183137.0 NaN 2008

In [5]:
df.shape


Out[5]:
(9042, 16)

In [8]:
# import qgrid # Put imports at the top
# qgrid.nbinstall(overwrite=True)

# qgrid.show_grid(df[['MSHA_ID',
#                     'Year',
#                     'Mine_Name',
#                     'Mine_State',
#                     'Mine_County']], remote_js=True)
# Check out http://nbviewer.ipython.org/github/quantopian/qgrid/blob/master/qgrid_demo.ipynb for more (including demo)

In [16]:
!conda install pivottablejs -y


Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /Users/jonathan/miniconda3/envs/testpy3:

The following NEW packages will be INSTALLED:

    pivottablejs: 2.7.0-py36_0

pivottablejs-2 100% |################################| Time: 0:00:00   3.30 MB/s

In [9]:
df = pd.read_csv("../data/mps.csv", encoding="ISO-8859-1")

In [10]:
df.head(10)


Out[10]:
Name Party Province Age Gender
0 Liu, Laurin NDP Quebec 22.0 Female
1 Mourani, Maria Bloc Quebecois Quebec 43.0 Female
2 Sellah, Djaouida NDP Quebec NaN Female
3 St-Denis, Lise NDP Quebec 72.0 Female
4 Fry, Hedy Liberal British Columbia 71.0 Female
5 Turmel, Nycole NDP Quebec 70.0 Female
6 Sgro, Judy Liberal Ontario 68.0 Female
7 Raynault, Francine NDP Quebec 67.0 Female
8 Davidson, Patricia Conservative Ontario 66.0 Female
9 Smith, Joy Conservative Manitoba 65.0 Female

Enhanced Pandas Dataframe Display


In [11]:
# Province, Party, Average, Age, Heatmap

In [12]:
from pivottablejs import pivot_ui

In [13]:
pivot_ui(df)


Out[13]:

Tab


In [14]:
import numpy as np

In [14]:
np.random.


  File "<ipython-input-14-cc2fb4ff81f1>", line 1
    np.random.
              ^
SyntaxError: invalid syntax

shift-tab


In [ ]:
np.linspace(start=, )

shift-tab-tab

(equivalent in in Lab to shift-tab)


In [ ]:
np.linspace(50, 150, num=100,)

shift-tab-tab-tab-tab

(doesn't work in lab)


In [ ]:
np.linspace(start=, )

?


In [15]:
np.linspace?


Signature: np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
Docstring:
Return evenly spaced numbers over a specified interval.

Returns `num` evenly spaced samples, calculated over the
interval [`start`, `stop`].

The endpoint of the interval can optionally be excluded.

Parameters
----------
start : scalar
    The starting value of the sequence.
stop : scalar
    The end value of the sequence, unless `endpoint` is set to False.
    In that case, the sequence consists of all but the last of ``num + 1``
    evenly spaced samples, so that `stop` is excluded.  Note that the step
    size changes when `endpoint` is False.
num : int, optional
    Number of samples to generate. Default is 50. Must be non-negative.
endpoint : bool, optional
    If True, `stop` is the last sample. Otherwise, it is not included.
    Default is True.
retstep : bool, optional
    If True, return (`samples`, `step`), where `step` is the spacing
    between samples.
dtype : dtype, optional
    The type of the output array.  If `dtype` is not given, infer the data
    type from the other input arguments.

    .. versionadded:: 1.9.0

Returns
-------
samples : ndarray
    There are `num` equally spaced samples in the closed interval
    ``[start, stop]`` or the half-open interval ``[start, stop)``
    (depending on whether `endpoint` is True or False).
step : float, optional
    Only returned if `retstep` is True

    Size of spacing between samples.


See Also
--------
arange : Similar to `linspace`, but uses a step size (instead of the
         number of samples).
logspace : Samples uniformly distributed in log space.

Examples
--------
>>> np.linspace(2.0, 3.0, num=5)
array([ 2.  ,  2.25,  2.5 ,  2.75,  3.  ])
>>> np.linspace(2.0, 3.0, num=5, endpoint=False)
array([ 2. ,  2.2,  2.4,  2.6,  2.8])
>>> np.linspace(2.0, 3.0, num=5, retstep=True)
(array([ 2.  ,  2.25,  2.5 ,  2.75,  3.  ]), 0.25)

Graphical illustration:

>>> import matplotlib.pyplot as plt
>>> N = 8
>>> y = np.zeros(N)
>>> x1 = np.linspace(0, 10, N, endpoint=True)
>>> x2 = np.linspace(0, 10, N, endpoint=False)
>>> plt.plot(x1, y, 'o')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.plot(x2, y + 0.5, 'o')
[<matplotlib.lines.Line2D object at 0x...>]
>>> plt.ylim([-0.5, 1])
(-0.5, 1)
>>> plt.show()
File:      ~/miniconda3/envs/insightpy/lib/python3.6/site-packages/numpy/core/function_base.py
Type:      function

??

(Lab can scroll if you click)


In [16]:
np.linspace??


Signature: np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
Source:   
def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None):
    """
    Return evenly spaced numbers over a specified interval.

    Returns `num` evenly spaced samples, calculated over the
    interval [`start`, `stop`].

    The endpoint of the interval can optionally be excluded.

    Parameters
    ----------
    start : scalar
        The starting value of the sequence.
    stop : scalar
        The end value of the sequence, unless `endpoint` is set to False.
        In that case, the sequence consists of all but the last of ``num + 1``
        evenly spaced samples, so that `stop` is excluded.  Note that the step
        size changes when `endpoint` is False.
    num : int, optional
        Number of samples to generate. Default is 50. Must be non-negative.
    endpoint : bool, optional
        If True, `stop` is the last sample. Otherwise, it is not included.
        Default is True.
    retstep : bool, optional
        If True, return (`samples`, `step`), where `step` is the spacing
        between samples.
    dtype : dtype, optional
        The type of the output array.  If `dtype` is not given, infer the data
        type from the other input arguments.

        .. versionadded:: 1.9.0

    Returns
    -------
    samples : ndarray
        There are `num` equally spaced samples in the closed interval
        ``[start, stop]`` or the half-open interval ``[start, stop)``
        (depending on whether `endpoint` is True or False).
    step : float, optional
        Only returned if `retstep` is True

        Size of spacing between samples.


    See Also
    --------
    arange : Similar to `linspace`, but uses a step size (instead of the
             number of samples).
    logspace : Samples uniformly distributed in log space.

    Examples
    --------
    >>> np.linspace(2.0, 3.0, num=5)
    array([ 2.  ,  2.25,  2.5 ,  2.75,  3.  ])
    >>> np.linspace(2.0, 3.0, num=5, endpoint=False)
    array([ 2. ,  2.2,  2.4,  2.6,  2.8])
    >>> np.linspace(2.0, 3.0, num=5, retstep=True)
    (array([ 2.  ,  2.25,  2.5 ,  2.75,  3.  ]), 0.25)

    Graphical illustration:

    >>> import matplotlib.pyplot as plt
    >>> N = 8
    >>> y = np.zeros(N)
    >>> x1 = np.linspace(0, 10, N, endpoint=True)
    >>> x2 = np.linspace(0, 10, N, endpoint=False)
    >>> plt.plot(x1, y, 'o')
    [<matplotlib.lines.Line2D object at 0x...>]
    >>> plt.plot(x2, y + 0.5, 'o')
    [<matplotlib.lines.Line2D object at 0x...>]
    >>> plt.ylim([-0.5, 1])
    (-0.5, 1)
    >>> plt.show()

    """
    # 2016-02-25, 1.12
    num = _index_deprecate(num)
    if num < 0:
        raise ValueError("Number of samples, %s, must be non-negative." % num)
    div = (num - 1) if endpoint else num

    # Convert float/complex array scalars to float, gh-3504
    # and make sure one can use variables that have an __array_interface__, gh-6634
    start = asanyarray(start) * 1.0
    stop  = asanyarray(stop)  * 1.0

    dt = result_type(start, stop, float(num))
    if dtype is None:
        dtype = dt

    y = _nx.arange(0, num, dtype=dt)

    delta = stop - start
    if num > 1:
        step = delta / div
        if step == 0:
            # Special handling for denormal numbers, gh-5437
            y /= div
            y = y * delta
        else:
            # One might be tempted to use faster, in-place multiplication here,
            # but this prevents step from overriding what class is produced,
            # and thus prevents, e.g., use of Quantities; see gh-7142.
            y = y * step
    else:
        # 0 and 1 item long sequences have an undefined step
        step = NaN
        # Multiply with delta to allow possible override of output class.
        y = y * delta

    y += start

    if endpoint and num > 1:
        y[-1] = stop

    if retstep:
        return y.astype(dtype, copy=False), step
    else:
        return y.astype(dtype, copy=False)
File:      ~/miniconda3/envs/insightpy/lib/python3.6/site-packages/numpy/core/function_base.py
Type:      function

Inspect everything


In [17]:
def silly_absolute_value_function(xval):
    """Takes a value and returns the value."""
    xval_sq = xval ** 2.0
    1 + 4
    xval_abs = np.sqrt(xval_sq)
    return xval_abs

In [19]:
silly_absolute_value_function(2)


Out[19]:
2.0

In [20]:
silly_absolute_value_function?


Signature: silly_absolute_value_function(xval)
Docstring: Takes a value and returns the value.
File:      ~/github/jupyter-tips-and-tricks/deliver/<ipython-input-17-5aaf162ec65f>
Type:      function

In [20]:
silly_absolute_value_function??


Signature: silly_absolute_value_function(xval)
Source:   
def silly_absolute_value_function(xval):
    """Takes a value and returns the value."""
    xval_sq = xval ** 2.0
    1 + 4
    xval_abs = np.sqrt(xval_sq)
    return xval_abs
File:      ~/github/jupyter-tips-and-tricks/deliver/<ipython-input-17-5aaf162ec65f>
Type:      function

In [ ]:

Keyboard shortcuts

For help, ESC + h

h doesn't work in Lab l / shift L for line numbers


In [ ]:


In [ ]:
# in select mode, shift j/k (to select multiple cells at once)
# split cell with ctrl shift -

In [21]:
first = 1

In [21]:
second = 2

In [21]:
third = 3

In [ ]:
first = 1

In [ ]:
second = 2

In [ ]:
third = 3

In [71]:
# a new cell above
# b new cell below

Headings and LaTeX

With text and $\LaTeX$ support.

$$\begin{align} B'&=-\nabla \times E,\\ E'&=\nabla \times B - 4\pi j \end{align}$$

In [21]:
%%latex

If you want to get crazier...

\begin{equation}
\oint_S {E_n dA = \frac{1}{{\varepsilon _0 }}} Q_\textrm{inside}
\end{equation}


If you want to get crazier... \begin{equation} \oint_S {E_n dA = \frac{1}{{\varepsilon _0 }}} Q_\textrm{inside} \end{equation}

More markdown


In [24]:
# Indent
# Cmd + [ 
# Cmd + ]

# Comment
# Cmd + /

You can also get monospaced fonts by indenting 4 spaces:

mkdir toc
cd toc

Wrap with triple-backticks and language:

mkdir toc
cd toc
wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
SELECT *
FROM tablename

In [22]:
# note difference w/ lab
SELECT first_name,
       last_name,
       year_of_birth
FROM presidents
WHERE year_of_birth > 1800;

In [23]:
%%bash
pwd 
for i in *.ipynb
do
    echo ${i} | awk -F . '{print $1}'
done

echo
echo "break"
echo

for i in *.ipynb
do
    echo $i | awk -F - '{print $2}'
done


/Users/jonathan/github/jupyter-tips-and-tricks/deliver
00-Overview
01-Tips-and-tricks
02-Visualization-and-code-organization
03-Pandas-and-Plotting
04-SQL-Example
05-interactive-splines
06-R-stuff
07-Some_basics
08-More_basics
09-Extras
Data_Cleaning
Introduction-to-Jupyter-Notebook-Functionality

break

Overview.ipynb
Tips
Visualization
Pandas
SQL
interactive
R
Some_basics.ipynb
More_basics.ipynb
Extras.ipynb

to

Other cell-magics


In [78]:
%%writefile ../scripts/temp.py
from __future__ import absolute_import, division, print_function

I promise that I'm not cheating!


Overwriting ../scripts/temp.py

In [79]:
!cat ../scripts/temp.py


from __future__ import absolute_import, division, print_function

I promise that I'm not cheating!

Autoreload is cool -- don't have time to give it the attention that it deserves.

https://gist.github.com/jbwhit/38c1035c48cdb1714fc8d47fa163bfae


In [24]:
%load_ext autoreload
%autoreload 2

In [ ]:


In [25]:
example_dict = {}

In [26]:
# Indent/dedent/comment
for _ in range(5):
    example_dict["one"] = 1
    example_dict["two"] = 2
    example_dict["three"] = 3
    example_dict["four"] = 4

Multicursor magic

Hold down option, click and drag.


In [87]:
example_dict["one_better_name"] = 1
example_dict["two_better_name"] = 2
example_dict["three_better_name"] = 3
example_dict["four_better_name"] = 4

Find and replace -- regex notebook (or cell) wide.


In [27]:
import numpy as np

In [24]:
!conda install -c r rpy2 -y


Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /Users/jonathan/miniconda3/envs/insightpy:

The following NEW packages will be INSTALLED:

    cairo:          1.14.6-4              conda-forge
    curl:           7.54.1-0              conda-forge
    fontconfig:     2.12.1-4              conda-forge
    gettext:        0.19.8.1-0            conda-forge
    glib:           2.51.4-0              conda-forge
    graphite2:      1.3.9-0               conda-forge
    gsl:            2.2.1-blas_openblas_2 conda-forge [blas_openblas]
    harfbuzz:       1.4.3-0               conda-forge
    krb5:           1.14.2-0              conda-forge
    libgcc:         4.8.5-1                          
    libiconv:       1.14-4                conda-forge
    libssh2:        1.8.0-1               conda-forge
    libxml2:        2.9.4-4               conda-forge
    pango:          1.40.4-0              conda-forge
    pcre:           8.39-0                conda-forge
    pixman:         0.34.0-0              conda-forge
    r-base:         3.4.1-0               conda-forge
    rpy2:           2.8.6-py36r3.4.1_2    r          
    singledispatch: 3.4.0.3-py36_0        conda-forge

The following packages will be DOWNGRADED:

    zlib:           1.2.11-0              conda-forge --> 1.2.8-3 conda-forge

gettext-0.19.8 100% |################################| Time: 0:00:01   2.82 MB/s
graphite2-1.3. 100% |################################| Time: 0:00:00  24.81 MB/s
krb5-1.14.2-0. 100% |################################| Time: 0:00:00   1.92 MB/s
libgcc-4.8.5-1 100% |################################| Time: 0:00:00   6.88 MB/s
libiconv-1.14- 100% |################################| Time: 0:00:00   3.54 MB/s
libssh2-1.8.0- 100% |################################| Time: 0:00:00 556.93 kB/s
pcre-8.39-0.ta 100% |################################| Time: 0:00:00   1.20 MB/s
pixman-0.34.0- 100% |################################| Time: 0:00:00   1.59 MB/s
zlib-1.2.8-3.t 100% |################################| Time: 0:00:00  12.25 MB/s
glib-2.51.4-0. 100% |################################| Time: 0:00:03   1.36 MB/s
libxml2-2.9.4- 100% |################################| Time: 0:00:02 828.35 kB/s
curl-7.54.1-0. 100% |################################| Time: 0:00:00 833.21 kB/s
fontconfig-2.1 100% |################################| Time: 0:00:00   1.13 MB/s
gsl-2.2.1-blas 100% |################################| Time: 0:00:02 873.71 kB/s
cairo-1.14.6-4 100% |################################| Time: 0:00:01 829.58 kB/s
singledispatch 100% |################################| Time: 0:00:00 187.29 kB/s
harfbuzz-1.4.3 100% |################################| Time: 0:00:01 694.29 kB/s
pango-1.40.4-0 100% |################################| Time: 0:00:00 671.23 kB/s
r-base-3.4.1-0 100% |################################| Time: 0:00:08   2.75 MB/s
rpy2-2.8.6-py3 100% |################################| Time: 0:00:00   1.69 MB/s

In [28]:
import rpy2

In [29]:
%load_ext rpy2.ipython


---------------------------------------------------------------------
ImportError                         Traceback (most recent call last)
<ipython-input-29-a69f80d0128e> in <module>()
----> 1 get_ipython().magic('load_ext rpy2.ipython')

~/miniconda3/envs/insightpy/lib/python3.6/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
   2144         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2145         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2146         return self.run_line_magic(magic_name, magic_arg_s)
   2147 
   2148     #-------------------------------------------------------------------------

~/miniconda3/envs/insightpy/lib/python3.6/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
   2065                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2066             with self.builtin_trap:
-> 2067                 result = fn(*args,**kwargs)
   2068             return result
   2069 

<decorator-gen-65> in load_ext(self, module_str)

~/miniconda3/envs/insightpy/lib/python3.6/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    185     # but it's overkill for just that one bit of state.
    186     def magic_deco(arg):
--> 187         call = lambda f, *a, **k: f(*a, **k)
    188 
    189         if callable(arg):

~/miniconda3/envs/insightpy/lib/python3.6/site-packages/IPython/core/magics/extension.py in load_ext(self, module_str)
     31         if not module_str:
     32             raise UsageError('Missing module name.')
---> 33         res = self.shell.extension_manager.load_extension(module_str)
     34 
     35         if res == 'already loaded':

~/miniconda3/envs/insightpy/lib/python3.6/site-packages/IPython/core/extensions.py in load_extension(self, module_str)
     83             if module_str not in sys.modules:
     84                 with prepended_to_syspath(self.ipython_extension_dir):
---> 85                     mod = import_module(module_str)
     86                     if mod.__file__.startswith(self.ipython_extension_dir):
     87                         print(("Loading extensions from {dir} is deprecated. "

~/miniconda3/envs/insightpy/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

~/miniconda3/envs/insightpy/lib/python3.6/importlib/_bootstrap.py in _gcd_import(name, package, level)

~/miniconda3/envs/insightpy/lib/python3.6/importlib/_bootstrap.py in _find_and_load(name, import_)

~/miniconda3/envs/insightpy/lib/python3.6/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

~/miniconda3/envs/insightpy/lib/python3.6/importlib/_bootstrap.py in _load_unlocked(spec)

~/miniconda3/envs/insightpy/lib/python3.6/importlib/_bootstrap_external.py in exec_module(self, module)

~/miniconda3/envs/insightpy/lib/python3.6/importlib/_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

~/miniconda3/envs/insightpy/lib/python3.6/site-packages/rpy2/ipython/__init__.py in <module>()
----> 1 from .rmagic import load_ipython_extension

~/miniconda3/envs/insightpy/lib/python3.6/site-packages/rpy2/ipython/rmagic.py in <module>()
     50 # numpy and rpy2 imports
     51 
---> 52 import rpy2.rinterface as ri
     53 import rpy2.robjects as ro
     54 import rpy2.robjects.packages as rpacks

~/miniconda3/envs/insightpy/lib/python3.6/site-packages/rpy2/rinterface/__init__.py in <module>()
     90 del(os)
     91 
---> 92 from rpy2.rinterface._rinterface import (baseenv,
     93                                          emptyenv,
     94                                          endr,

ImportError: dlopen(/Users/jonathan/miniconda3/envs/insightpy/lib/python3.6/site-packages/rpy2/rinterface/_rinterface.cpython-36m-darwin.so, 2): Library not loaded: @rpath/libicuuc.54.dylib
  Referenced from: /Users/jonathan/miniconda3/envs/insightpy/lib/python3.6/site-packages/rpy2/rinterface/_rinterface.cpython-36m-darwin.so
  Reason: image not found

In [91]:
X = np.array([0,1,2,3,4])
Y = np.array([3,5,4,6,7])

In [30]:
%%R?


Object `%%R` not found.

In [93]:
%%R -i X,Y -o XYcoef
XYlm = lm(Y~X)
XYcoef = coef(XYlm)
print(summary(XYlm))
par(mfrow=c(2,2))
plot(XYlm)


Call:
lm(formula = Y ~ X)

Residuals:
   1    2    3    4    5 
-0.2  0.9 -1.0  0.1  0.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.2000     0.6164   5.191   0.0139 *
X             0.9000     0.2517   3.576   0.0374 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7958 on 3 degrees of freedom
Multiple R-squared:   0.81,	Adjusted R-squared:  0.7467 
F-statistic: 12.79 on 1 and 3 DF,  p-value: 0.03739


In [94]:
type(XYcoef)


Out[94]:
numpy.ndarray

In [95]:
XYcoef**2


Out[95]:
array([ 10.24,   0.81])

In [34]:
thing()


Out[34]:
12