Simple start to Jupyter notebooks

A: Oh snap! The initializing cell is missing!

B: No problem! Jupyter notebooks are able to use magic



In [2]:

    
!ls









    



Simple.ipynb        __pycache__         somemodule.py
Visualization.ipynb snippets



In [3]:

    
!pip install --user pandas matplotlib sklearn seaborn









    



Requirement already satisfied: pandas in /Users/ggruben/anaconda3/lib/python3.5/site-packages
Requirement already satisfied: matplotlib in /Users/ggruben/anaconda3/lib/python3.5/site-packages
Requirement already satisfied: sklearn in /Users/ggruben/.local/lib/python3.5/site-packages
Requirement already satisfied: seaborn in /Users/ggruben/anaconda3/lib/python3.5/site-packages
Requirement already satisfied: python-dateutil>=2 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from pandas)
Requirement already satisfied: pytz>=2011k in /Users/ggruben/.local/lib/python3.5/site-packages (from pandas)
Requirement already satisfied: numpy>=1.7.0 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from pandas)
Requirement already satisfied: six>=1.10 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied: cycler>=0.10 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=1.5.6 in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied: scikit-learn in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from sklearn)



In [4]:

    
!pip install version_information









    



Requirement already satisfied: version_information in /Users/ggruben/anaconda3/lib/python3.5/site-packages



In [5]:

    
%load_ext version_information
%version_information pandas, sklearn









    Out[5]:




Software Version
Python 3.5.3 64bit [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
IPython 5.3.0
OS Darwin 16.5.0 x86_64 i386 64bit
pandas 0.19.2
sklearn 0.18.1
Wed Jun 28 17:07:22 2017 CEST



In [1]:

    
!pip install watermark









    



Collecting watermark
  Downloading watermark-1.4.0.tar.gz
Requirement already satisfied: ipython in /Users/ggruben/anaconda3/lib/python3.5/site-packages (from watermark)
Building wheels for collected packages: watermark
  Running setup.py bdist_wheel for watermark ... done
  Stored in directory: /Users/ggruben/Library/Caches/pip/wheels/04/33/ef/a05c24dee8b3d1f21955471968bb3fbcab737890f68c6d30c4
Successfully built watermark
Installing collected packages: watermark
Successfully installed watermark-1.4.0



In [2]:

    
%load_ext watermark
%watermark -a "Gerrit Gruben" -d -t -v -p numpy,pandas -g









    



Gerrit Gruben 2017-06-29 16:35:58 

CPython 3.5.3
IPython 5.4.1

numpy 1.12.1
pandas 0.19.2
Git hash: 0f9cd5bce15c31679e45a02ba4981226798bc1b2

Importing modules



In [6]:

    
from somemodule import hello



In [7]:

    
hello()









    



Helo Word!



In [8]:

    
hello()









    



Helo Word!



In [25]:

    
del hello









    



---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-60746d489e2b> in <module>()
----> 1 del hello

NameError: name 'hello' is not defined



In [20]:

    
%load_ext autoreload
%autoreload 2









    



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload



In [27]:

    
%aimport somemodule



In [28]:

    
somemodule.hello()









    



Helo Word!



In [30]:

    
somemodule.hello()









    



Hello World!



In [31]:

    
from IPython.display import FileLink



In [33]:

    
FileLink("Simple.ipynb")









    Out[33]:




Simple.ipynb

Some Editing tricks

Demonstrating auto complete.

TAB for auto-completion of identifier,

Shift+Enter for auto-completion of parameters

Note to self: show merging of cells, split etc. here



In [1]:

    
import sklearn



In [2]:

    
from sklearn.datasets import load_boston



In [ ]:

    
df = load_boston()



In [ ]:

    
X, y = df.data, df.target



In [ ]:

    
from sklearn.cross_validation import train_test_split



In [ ]:

    
X_train, X_test, y_train, y_test = train_test_split(X, y)



In [ ]:

    
from sklearn.metrics import mean_squared_error



In [ ]:

    
from sklearn.ensemble import RandomForestRegressor



In [3]:

    
rf_reg = RandomForestRegressor(2)



In [29]:

    
rf_reg.fit(X_train, y_train)
print(mean_squared_error(y_test, rf_reg.predict(X_test)))









    



15.913484252



In [51]:

    
# Just need a df
from sklearn.datasets import california_housing

cal = california_housing.fetch_california_housing()
df = pd.DataFrame(data=cal.data, columns=cal.feature_names, index=cal.target)

df.head(10)

Very quick plotting (just for export really)



In [56]:

    
%matplotlib inline
import matplotlib.pyplot as plt



In [62]:

    
plt.scatter(df.MedInc, df.index)









    Out[62]:





<matplotlib.collections.PathCollection object at 0x114c87470>



In [63]:

    
import seaborn as sns



In [64]:

    
sns.jointplot(df.MedInc, df.index)









    Out[64]:





<seaborn.axisgrid.JointGrid object at 0x114c96588>

Try export > html, > pdf (requires pandoc or even a LaTeX installation).

Then try again and compare the figures with the setting:



In [65]:

    
%config InlineBackend.figure_format = "retina"

Some magic and multiple outputs



In [31]:

    
x, y = 5, 3
x
y









    Out[31]:





3



In [32]:

    
# Show all output values
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"



In [33]:

    
x, y = 5, 3
x
y









    Out[33]:





5






    Out[33]:





3

Jupyter has a kind of meta-commands starting with the percent character. Some of these are useful for displaying information, such as writing formulas with latex.



In [34]:

    
%lsmagic









    Out[34]:





Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.



In [52]:

    
%whos









    



Variable                Type                     Data/Info
----------------------------------------------------------
InteractiveShell        MetaHasTraits            <class 'IPython.core.inte<...>eshell.InteractiveShell'>
RandomForestRegressor   ABCMeta                  <class 'sklearn.ensemble.<...>t.RandomForestRegressor'>
X                       ndarray                  506x13: 6578 elems, type `float64`, 52624 bytes
X_test                  ndarray                  127x13: 1651 elems, type `float64`, 13208 bytes
X_train                 ndarray                  379x13: 4927 elems, type `float64`, 39416 bytes
cal                     Bunch                    {'DESCR': 'California hou<...>      , -121.24      ]])}
california_housing      module                   <module 'sklearn.datasets<...>s/california_housing.py'>
df                      DataFrame                       MedInc  HouseAge  <...>n[20640 rows x 8 columns]
load_boston             function                 <function load_boston at 0x10c167ea0>
mean_squared_error      function                 <function mean_squared_error at 0x10c4980d0>
pd                      module                   <module 'pandas' from '/U<...>ages/pandas/__init__.py'>
rf_reg                  RandomForestRegressor    RandomForestRegressor(boo<...>bose=0, warm_start=False)
sklearn                 module                   <module 'sklearn' from '/<...>ges/sklearn/__init__.py'>
train_test_split        function                 <function train_test_split at 0x10c5852f0>
x                       int                      5
y                       int                      3
y_test                  ndarray                  127: 127 elems, type `float64`, 1016 bytes
y_train                 ndarray                  379: 379 elems, type `float64`, 3032 bytes



In [35]:

    
%%latex
$$ x^3 + C = \int{\frac{1}{3} x^2 \; dx} \quad (C \in \mathbb{R})$$









    





$$ x^3 + C = \int{\frac{1}{3} x^2 \; dx} \quad (C \in \mathbb{R})$$



In [36]:

    
%%system
ls -laH
du -sh .









    Out[36]:





['total 7784',
 'drwxr-xr-x  11 ggruben  1566476737      374 Jun 29 21:23 .',
 'drwxr-xr-x  13 ggruben  1566476737      442 Jun 29 16:05 ..',
 'drwxr-xr-x   8 ggruben  1566476737      272 Jun 29 17:12 .ipynb_checkpoints',
 '-rw-r--r--   1 ggruben  1566476737    13452 Jun 29 21:23 1_Simple.ipynb',
 '-rw-r--r--   1 ggruben  1566476737      972 Jun 29 15:26 2_UI.ipynb',
 '-rw-r--r--   1 ggruben  1566476737     1843 Jun 29 14:41 3_Debugging_Profiling.ipynb',
 '-rw-r--r--@  1 ggruben  1566476737  3737689 Jun 29 21:23 Extra_Visualization_in_Python.ipynb',
 '-rw-r--r--   1 ggruben  1566476737   213477 Jun 28 11:17 Visualization.ipynb',
 'drwxr-xr-x   3 ggruben  1566476737      102 Jun 28 17:09 __pycache__',
 'drwxr-xr-x   6 ggruben  1566476737      204 Jun 29 17:11 snippets',
 '-rw-r--r--   1 ggruben  1566476737       39 Jun 28 17:09 somemodule.py',
 '7.6M\t.']

Useful to know that we can also set environment variables (also useful for Theano)



In [1]:

    
%env OMP_NUM_THREADS=8









    



env: OMP_NUM_THREADS=8

%store to pass variables between notebooks!

Retrieve in other notebook with %store -r var_name



In [10]:

    
%%writefile some_code.py

import numpy as np
from scipy.stats import kendalltau
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="ticks")

rs = np.random.RandomState(11)
x = rs.gamma(2, size=1000)
y = -.5 * x + rs.normal(size=1000)

sns.jointplot(x, y, kind="hex", stat_func=kendalltau, color="#4CB391")
plt.show()









    



Overwriting some_code.py

No clue what kendalltau is?



In [14]:

    
kendalltau?



In [12]:

    
%pycat some_code.py



In [13]:

    
%run some_code.py



In [6]:

    
%matplotlib inline



In [7]:

    
%run some_code.py



In [37]:

    
InteractiveShell.ast_node_interactivity = "last"

	MedInc	HouseAge	AveRooms	AveBedrms	Population	AveOccup	Latitude	Longitude
4.526	8.3252	41.0	6.984127	1.023810	322.0	2.555556	37.88	-122.23
3.585	8.3014	21.0	6.238137	0.971880	2401.0	2.109842	37.86	-122.22
3.521	7.2574	52.0	8.288136	1.073446	496.0	2.802260	37.85	-122.24
3.413	5.6431	52.0	5.817352	1.073059	558.0	2.547945	37.85	-122.25
3.422	3.8462	52.0	6.281853	1.081081	565.0	2.181467	37.85	-122.25
2.697	4.0368	52.0	4.761658	1.103627	413.0	2.139896	37.85	-122.25
2.992	3.6591	52.0	4.931907	0.951362	1094.0	2.128405	37.84	-122.25
2.414	3.1200	52.0	4.797527	1.061824	1157.0	1.788253	37.84	-122.25
2.267	2.0804	42.0	4.294118	1.117647	1206.0	2.026891	37.84	-122.26
2.611	3.6912	52.0	4.970588	0.990196	1551.0	2.172269	37.84	-122.25

Software	Version
Python	3.5.3 64bit [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
IPython	5.3.0
OS	Darwin 16.5.0 x86_64 i386 64bit
pandas	0.19.2
sklearn	0.18.1
Wed Jun 28 17:07:22 2017 CEST