Simple start to Jupyter notebooks

A: Oh snap! The initializing cell is missing!

B: No problem! Jupyter notebooks are able to use magic


In [2]:
!ls


Simple.ipynb        __pycache__         somemodule.py
Visualization.ipynb snippets

In [3]:
!pip install --user pandas matplotlib sklearn seaborn


Requirement already satisfied: pandas in /Users/ggruben/anaconda3/lib/python3.5/site-packages
Requirement already satisfied: matplotlib in /Users/ggruben/anaconda3/lib/python3.5/site-packages
Requirement already satisfied: sklearn in /Users/ggruben/.local/lib/python3.5/site-packages
Requirement already satisfied: seaborn in /Users/ggruben/anaconda3/lib/python3.5/site-packages
Requirement already satisfied: python-dateutil>=2 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from pandas)
Requirement already satisfied: pytz>=2011k in /Users/ggruben/.local/lib/python3.5/site-packages (from pandas)
Requirement already satisfied: numpy>=1.7.0 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from pandas)
Requirement already satisfied: six>=1.10 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied: cycler>=0.10 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=1.5.6 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied: scikit-learn in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from sklearn)

In [4]:
!pip install version_information


Requirement already satisfied: version_information in /Users/ggruben/anaconda3/lib/python3.5/site-packages

In [5]:
%load_ext version_information
%version_information pandas, sklearn


Out[5]:
SoftwareVersion
Python3.5.3 64bit [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
IPython5.3.0
OSDarwin 16.5.0 x86_64 i386 64bit
pandas0.19.2
sklearn0.18.1
Wed Jun 28 17:07:22 2017 CEST

In [1]:
!pip install watermark


Collecting watermark
  Downloading watermark-1.4.0.tar.gz
Requirement already satisfied: ipython in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from watermark)
Building wheels for collected packages: watermark
  Running setup.py bdist_wheel for watermark ... done
  Stored in directory: /Users/ggruben/Library/Caches/pip/wheels/04/33/ef/a05c24dee8b3d1f21955471968bb3fbcab737890f68c6d30c4
Successfully built watermark
Installing collected packages: watermark
Successfully installed watermark-1.4.0

In [2]:
%load_ext watermark
%watermark -a "Gerrit Gruben" -d -t -v -p numpy,pandas -g


Gerrit Gruben 2017-06-29 16:35:58 

CPython 3.5.3
IPython 5.4.1

numpy 1.12.1
pandas 0.19.2
Git hash: 0f9cd5bce15c31679e45a02ba4981226798bc1b2

Importing modules


In [6]:
from somemodule import hello

In [7]:
hello()


Helo Word!

In [8]:
hello()


Helo Word!

In [25]:
del hello


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-60746d489e2b> in <module>()
----> 1 del hello

NameError: name 'hello' is not defined

In [20]:
%load_ext autoreload
%autoreload 2


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

In [27]:
%aimport somemodule

In [28]:
somemodule.hello()


Helo Word!

In [30]:
somemodule.hello()


Hello World!

In [31]:
from IPython.display import FileLink

In [33]:
FileLink("Simple.ipynb")


Out[33]:

Some Editing tricks

Demonstrating auto complete.

TAB for auto-completion of identifier,

Shift+Enter for auto-completion of parameters

Note to self: show merging of cells, split etc. here


In [1]:
import sklearn

In [2]:
from sklearn.datasets import load_boston

In [ ]:
df = load_boston()

In [ ]:
X, y = df.data, df.target

In [ ]:
from sklearn.cross_validation import train_test_split

In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [ ]:
from sklearn.metrics import mean_squared_error

In [ ]:
from sklearn.ensemble import RandomForestRegressor

In [3]:
rf_reg = RandomForestRegressor(2)

In [29]:
rf_reg.fit(X_train, y_train)
print(mean_squared_error(y_test, rf_reg.predict(X_test)))


15.913484252

In [51]:
# Just need a df
from sklearn.datasets import california_housing

cal = california_housing.fetch_california_housing()
df = pd.DataFrame(data=cal.data, columns=cal.feature_names, index=cal.target)

df.head(10)


Out[51]:
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude
4.526 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 -122.23
3.585 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 -122.22
3.521 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 -122.24
3.413 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 -122.25
3.422 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 -122.25
2.697 4.0368 52.0 4.761658 1.103627 413.0 2.139896 37.85 -122.25
2.992 3.6591 52.0 4.931907 0.951362 1094.0 2.128405 37.84 -122.25
2.414 3.1200 52.0 4.797527 1.061824 1157.0 1.788253 37.84 -122.25
2.267 2.0804 42.0 4.294118 1.117647 1206.0 2.026891 37.84 -122.26
2.611 3.6912 52.0 4.970588 0.990196 1551.0 2.172269 37.84 -122.25

Very quick plotting (just for export really)


In [56]:
%matplotlib inline
import matplotlib.pyplot as plt

In [62]:
plt.scatter(df.MedInc, df.index)


Out[62]:
<matplotlib.collections.PathCollection object at 0x114c87470>

In [63]:
import seaborn as sns

In [64]:
sns.jointplot(df.MedInc, df.index)


Out[64]:
<seaborn.axisgrid.JointGrid object at 0x114c96588>

Try export > html, > pdf (requires pandoc or even a LaTeX installation).

Then try again and compare the figures with the setting:


In [65]:
%config InlineBackend.figure_format = "retina"

Some magic and multiple outputs


In [31]:
x, y = 5, 3
x
y


Out[31]:
3

In [32]:
# Show all output values
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [33]:
x, y = 5, 3
x
y


Out[33]:
5
Out[33]:
3

Jupyter has a kind of meta-commands starting with the percent character. Some of these are useful for displaying information, such as writing formulas with latex.


In [34]:
%lsmagic


Out[34]:
Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

In [52]:
%whos


Variable                Type                     Data/Info
----------------------------------------------------------
InteractiveShell        MetaHasTraits            <class 'IPython.core.inte<...>eshell.InteractiveShell'>
RandomForestRegressor   ABCMeta                  <class 'sklearn.ensemble.<...>t.RandomForestRegressor'>
X                       ndarray                  506x13: 6578 elems, type `float64`, 52624 bytes
X_test                  ndarray                  127x13: 1651 elems, type `float64`, 13208 bytes
X_train                 ndarray                  379x13: 4927 elems, type `float64`, 39416 bytes
cal                     Bunch                    {'DESCR': 'California hou<...>      , -121.24      ]])}
california_housing      module                   <module 'sklearn.datasets<...>s/california_housing.py'>
df                      DataFrame                       MedInc  HouseAge  <...>n[20640 rows x 8 columns]
load_boston             function                 <function load_boston at 0x10c167ea0>
mean_squared_error      function                 <function mean_squared_error at 0x10c4980d0>
pd                      module                   <module 'pandas' from '/U<...>ages/pandas/__init__.py'>
rf_reg                  RandomForestRegressor    RandomForestRegressor(boo<...>bose=0, warm_start=False)
sklearn                 module                   <module 'sklearn' from '/<...>ges/sklearn/__init__.py'>
train_test_split        function                 <function train_test_split at 0x10c5852f0>
x                       int                      5
y                       int                      3
y_test                  ndarray                  127: 127 elems, type `float64`, 1016 bytes
y_train                 ndarray                  379: 379 elems, type `float64`, 3032 bytes

In [35]:
%%latex
$$ x^3 + C = \int{\frac{1}{3} x^2 \; dx} \quad (C \in \mathbb{R})$$


$$ x^3 + C = \int{\frac{1}{3} x^2 \; dx} \quad (C \in \mathbb{R})$$

In [36]:
%%system
ls -laH
du -sh .


Out[36]:
['total 7784',
 'drwxr-xr-x  11 ggruben  1566476737      374 Jun 29 21:23 .',
 'drwxr-xr-x  13 ggruben  1566476737      442 Jun 29 16:05 ..',
 'drwxr-xr-x   8 ggruben  1566476737      272 Jun 29 17:12 .ipynb_checkpoints',
 '-rw-r--r--   1 ggruben  1566476737    13452 Jun 29 21:23 1_Simple.ipynb',
 '-rw-r--r--   1 ggruben  1566476737      972 Jun 29 15:26 2_UI.ipynb',
 '-rw-r--r--   1 ggruben  1566476737     1843 Jun 29 14:41 3_Debugging_Profiling.ipynb',
 '-rw-r--r--@  1 ggruben  1566476737  3737689 Jun 29 21:23 Extra_Visualization_in_Python.ipynb',
 '-rw-r--r--   1 ggruben  1566476737   213477 Jun 28 11:17 Visualization.ipynb',
 'drwxr-xr-x   3 ggruben  1566476737      102 Jun 28 17:09 __pycache__',
 'drwxr-xr-x   6 ggruben  1566476737      204 Jun 29 17:11 snippets',
 '-rw-r--r--   1 ggruben  1566476737       39 Jun 28 17:09 somemodule.py',
 '7.6M\t.']

Useful to know that we can also set environment variables (also useful for Theano)


In [1]:
%env OMP_NUM_THREADS=8


env: OMP_NUM_THREADS=8

%store to pass variables between notebooks!

Retrieve in other notebook with %store -r var_name


In [10]:
%%writefile some_code.py

import numpy as np
from scipy.stats import kendalltau
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="ticks")

rs = np.random.RandomState(11)
x = rs.gamma(2, size=1000)
y = -.5 * x + rs.normal(size=1000)

sns.jointplot(x, y, kind="hex", stat_func=kendalltau, color="#4CB391")
plt.show()


Overwriting some_code.py

No clue what kendalltau is?


In [14]:
kendalltau?

In [12]:
%pycat some_code.py

In [13]:
%run some_code.py



In [6]:
%matplotlib inline

In [7]:
%run some_code.py

In [37]:
InteractiveShell.ast_node_interactivity = "last"