How to use notebook

Notebook is a wonderful environment to write your research notes. You can merge comments and code in a single document, pass this to your colleagues and let them check what you did.

Starting is super easy. It comes with the anaconda distribution. So just start with:

jupiter notebook

It will open the browser you chose as default and you will be able to open an existent notebook (files ending in *.ipynb as ipython notebook) or create a new one by opening the menu File -> New Notebook.

Writing in a cell

A cell can be either text or code. As a default it starts as code.

You can change it into text (markdown) with ESC-m and back into code with Esc-c.

Markdown cells

You can define a header by adding a pound sign (#) or two (##) to obtain different sizes in boldface.

Headers

# H1

## H2

### H3

#### H4

##### H5

###### H6

H1

H2

H3

H4

H5
H6

Emphasis

Italic: * asterisks * or _underscores_

  • asterisks * or underscores

Bold: ** asterisks ** or __ underscores __

asterisks or underscores

Combined: ** asterisks and __ underscores __ **

asterisks and underscores

Strikethrough: ~~ scratch this ~~

~~ scratch this ~~

Each of these symbols can be escaped using a backslash before it, such a \# or \*.

Lists

Let's learn with an example:

1. First ordered list item

2. Another item

⋅⋅* Unordered sub-list.

1. Actual numbers don't matter, just that it's a number

⋅⋅1. Ordered sub-list

4. And another item.

⋅⋅⋅You can have properly indented paragraphs within list items. Notice the blank line above, and the leading spaces (at least one, but we'll use three here to also align the raw Markdown).

⋅⋅⋅To have a line break without a paragraph, you will need to use two trailing spaces.⋅⋅ ⋅⋅⋅Note that this line is separate, but within the same paragraph.⋅⋅ ⋅⋅⋅(This is contrary to the typical GFM line break behaviour, where trailing spaces are not required.)

* Unordered list can use asterisks

- Or minuses

+ Or pluses


This is the result:

  1. First ordered list item
  2. Another item ⋅⋅* Unordered sub-list.
  3. Actual numbers don't matter, just that it's a number ⋅⋅1. Ordered sub-list
  4. And another item.

⋅⋅⋅You can have properly indented paragraphs within list items. Notice the blank line above, and the leading spaces (at least one, but we'll use three here to also align the raw Markdown).

⋅⋅⋅To have a line break without a paragraph, you will need to use two trailing spaces.⋅⋅ ⋅⋅⋅Note that this line is separate, but within the same paragraph.⋅⋅ ⋅⋅⋅(This is contrary to the typical GFM line break behaviour, where trailing spaces are not required.)

  • Unordered list can use asterisks
  • Or minuses
  • Or pluses

Mathematical formulae

Mathematical formulae can be used by surrounding a typical Latex expression with \$ ... \$:

$ \int x^2 dx = x^3/3 $

URL

You can add a link with [ linked name ] (http://...)

ipython

or simply by writing the link inside angle brackets:

http://ipython.org

or simply with http at the beginning:

http://ipython.org

Code highlighting

You can write highlighted code surrounding it with ```python .... ```

print('hello world')

or the same with bash:

for i in (a b c);
do
echo $i
done

Inserting images

You can insert images with: ![alt](...)

To add an hovering title: ![alt text](... "some text"):

Tables

Tables are created using pipes "|"

Alignment inside the cells can be written using:

  • |:- to align on the left
  • |-: to align on the right
  • |:-: to center

For instance:

| Column 1 | Column 2 | Column 3 |

| :- | :-: | :- |

| column | center | left |

will give:

Column 1 Column 2 Column 3
column center left

Typical shortcuts

Command Definition
Esc m change to markdown cell
Esc c change to code cell
Shift Enter run cell
Ctrl Enter run cell in place
Alt Enter run cell and open new cell
Esc Enter go back to edit mode
Ctrl Shift - split a cell
Shift + merge the cell below
Esc dd delete the current cell
Esc a insert new cell above
Esc b insert new cell below
Enter switch to edit mode

Writing in a code cell

First of all, we can use the esclamation mark to use shell commands.


In [1]:
! pwd


/media/Data/workspace/Workshops/CompSkills4Astro

In [2]:
names = !ls *.py
names[:3]


Out[2]:
['foo.py', 'myfun.py', 'mymem.py']

See the source of python functions/classes with question marks (? or ??) The information appears in a pop-up window


In [3]:
%pycat?

%load

Use %load http://matplotlib.org/mpl_examples/pylab_examples/contour_demo.py to load the content of a python script.

%%writefile

Overwrite pythoncode.py


In [23]:
%%writefile pythoncode.py

import numpy
def append_if_not_exists(arr, x):
    if x not in arr:
        arr.append(x)
        
def some_useless_slow_function():
    arr = list()
    for i in range(10000):
        x = numpy.random.randint(0, 10000)
        append_if_not_exists(arr, x)


Overwriting pythoncode.py

%run

to run an external code or another notebook


In [27]:
%run ./foo.py

%load code

to load code directly into a cell

Notebook magics

By running %lsmagic in a cell we get a list of all magic commands available.


In [5]:
%lsmagic


Out[5]:
Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%perl  %%prun  %%pypy  %%python  %%python2  %%python3  %%ruby  %%script  %%sh  %%svg  %%sx  %%system  %%time  %%timeit  %%writefile

Automagic is ON, % prefix IS NOT needed for line magics.

You absolutely need to know how to make appear plots in a page using: % matplolib inline


In [59]:
%matplotlib inline
import  matplotlib.pyplot as plt
import numpy as np
# The semicolon at the end avoid to print the function
plt.hist(np.linspace(0, 1, 1000)**1.5);


%bash to run cell with bash


In [1]:
%%bash
for i in a b c;
do
echo $i
done


a
b
c

%%latex to render cell content in LaTeX


In [2]:
%%latex
\begin{align}
a = \frac{1}{2} && b = \frac{1}{3} && c = \frac{1}{4}\\
a && b && c \\
1 && 2 && 3 
\end{align}


\begin{align} a = \frac{1}{2} && b = \frac{1}{3} && c = \frac{1}{4}\\ a && b && c \\ 1 && 2 && 3 \end{align}

Collapsing windows

After embedding big figures, it is possible to collapse or expand them by clicking on the left side of them. By clicking two times, the window collapses. By clicking one time, the window opens.

Using R from Python ...

To use R from notebook, install rpy2:

conda install -c r rpy2


In [2]:
%load_ext rpy2.ipython

In [5]:
# example of R ...
%R X=c(1,4,5,7); sd(X); mean(X)


Out[5]:
array([ 4.25])

Mixing Python and R


In [14]:
import numpy as np
import pylab
X = np.array([0,1,2,3,4])
Y = np.array([3,5,4,6,7])

# Push variables in rpy2
%Rpush X Y
v1 = %R plot(X,Y); print(summary(lm(Y~X))); vv=mean(X)*mean(Y)
# Compute the fit with R and save coefficients in Python
b = %R lm(Y~X)$coef

# Or compute variables in rpy2 and then pull them from the rpy2 space
#%R a=resid(lm(Y-X))
#%Rpull a

#print a, b


Call:
lm(formula = Y ~ X)

Residuals:
   1    2    3    4    5 
-0.2  0.9 -1.0  0.1  0.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.2000     0.6164   5.191   0.0139 *
X             0.9000     0.2517   3.576   0.0374 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7958 on 3 degrees of freedom
Multiple R-squared:   0.81,	Adjusted R-squared:  0.7467 
F-statistic: 12.79 on 1 and 3 DF,  p-value: 0.03739

  • %R push X is equivalent to %R -i X
  • %R pull X is equivalent to %R -o X

In [20]:
b = %R a=resid(lm(Y~X))
%Rpull a
print "a", a
print "b", b


a [-0.2  0.9 -1.   0.1  0.2]
b [-0.2  0.9 -1.   0.1  0.2]

We can also directly write in R


In [15]:
%%R -i X,Y -o XYcoef
XYlm = lm(Y~X)
XYcoef = coef(XYlm)
print(summary(XYlm))
par(mfrow=c(2,2))
plot(XYlm)


Call:
lm(formula = Y ~ X)

Residuals:
   1    2    3    4    5 
-0.2  0.9 -1.0  0.1  0.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   3.2000     0.6164   5.191   0.0139 *
X             0.9000     0.2517   3.576   0.0374 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7958 on 3 degrees of freedom
Multiple R-squared:   0.81,	Adjusted R-squared:  0.7467 
F-statistic: 12.79 on 1 and 3 DF,  p-value: 0.03739

Timing and profiling

Notebook has several functions that allow one to time their scripts and profile them.

%time times a script


In [11]:
%time {1 for i in xrange(10*1000000)}


CPU times: user 320 ms, sys: 0 ns, total: 320 ms
Wall time: 317 ms
Out[11]:
{1}

%%timeit does several times the same operation and computes the average time


In [12]:
%%timeit
x = range(10000)
max(x)


1000 loops, best of 3: 254 µs per loop

To limit the number of repetitions use "-n number"


In [13]:
%%timeit -n 100
x = range(10000)
max(x)


100 loops, best of 3: 315 µs per loop

To run profilers (processing and memory) you have to install further packages:

conda install line_profiler

conda install memory_profiler

%prun profiles a script by checking how much time each functions takes to run


In [30]:
from numpy.random import randn
def add_and_sum(x, y):
    added = x + y
    summed = added.sum(axis=1)
    return summed

x = randn(3000, 3000)
y = randn(3000, 3000)

%prun add_and_sum(x, y)


 

In [34]:
!python -m cProfile foo.py


         2 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 foo.py:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



In [36]:
!python -m cProfile -o foo.out foo.py

In [37]:
import pstats
stats = pstats.Stats('foo.out')
stats.print_stats()


Wed Oct  5 11:16:10 2016    foo.out

         2 function calls in 0.000 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 foo.py:1(<module>)


Out[37]:
<pstats.Stats instance at 0x7f37d004e7a0>

%memit Profiling the memory usage


In [41]:
%load_ext memory_profiler
%memit range(1000000)
%memit list(range(1000000))


peak memory: 227.16 MiB, increment: 30.64 MiB
peak memory: 242.28 MiB, increment: 45.27 MiB

%mprun for profiling a code (testmem in the code mymem.py)


In [53]:
import numpy as np

def testmem():
    a = np.arange(1000000)
    b = list(range(1000000))
    del(a)
    del(b)
    
%mprun -f testmem testmem()


ERROR: Could not find file <ipython-input-53-40403981ae5f>
NOTE: %mprun can only be used on functions defined in physical files, and not in the IPython environment.
('',)

This works only on an external file. Results appear in another window. So, first let's write a code externally and then profile it.


In [64]:
%%writefile mymem.py
#mymem.py
import numpy as np

def testmem():
    a = np.arange(1000000)
    b = list(range(1000000))
    del(a)
    del(b)

testmem()


Overwriting mymem.py

In [65]:
from mymem import testmem
%mprun -f testmem testmem()


('',)

In [67]:
from pythoncode import some_useless_slow_function, append_if_not_exists

In [68]:
%prun some_useless_slow_function()


 

In [69]:
%load_ext memory_profiler
%mprun -f append_if_not_exists some_useless_slow_function()


The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler
('',)

The line profiler doesn't currently work. We use the following work-around.


In [70]:
import line_profiler
lp = line_profiler.LineProfiler()
lp.add_function(some_useless_slow_function)
lp.runctx('some_useless_slow_function()', locals=locals(), globals=globals())
lp.print_stats()


Timer unit: 1e-06 s

Total time: 0.539637 s
File: pythoncode.py
Function: some_useless_slow_function at line 7

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     7                                           def some_useless_slow_function():
     8         1            2      2.0      0.0      arr = list()
     9     10001         3871      0.4      0.7      for i in range(10000):
    10     10000        19475      1.9      3.6          x = numpy.random.randint(0, 10000)
    11     10000       516289     51.6     95.7          append_if_not_exists(arr, x)

Speed

Check how many cores there are in your machine


In [72]:
import multiprocessing
print multiprocessing.cpu_count(), 'xeon cores'


8 xeon cores

Widgets

First, we have to install the package ipywidgets:

conda install ipywidgets

then, import it.


In [4]:
from ipywidgets import widgets

Text widgets


In [5]:
from IPython.display import display
text = widgets.Text()
display(text)

def handle_submit(sender):
    print(text.value)
    
text.on_submit(handle_submit)

Buttons


In [6]:
button = widgets.Button(description="Click me !")
display(button)

def on_button_clicked(b):
    print ("Yes, you clicked me !")
    
button.on_click(on_button_clicked)


Yes, you clicked me !

Progress bar

Add a bar to show the progress of a loop


In [8]:
from IPython.display import display
from ipywidgets import FloatProgress
import numpy as np

f = FloatProgress(min=0,max=1000)
f.value=0
display(f)

for i in range(1000):
    a = np.exp(np.arange(100.))
    f.value += 1

Interaction

To produce a slider:


In [9]:
from ipywidgets import interact
def f(x):
    print(x)
interact(f,x=10)


20

To create a checkbox:


In [15]:
interact(f, x=True);


True

To pass a string:


In [16]:
interact(f, x='Hi there!');


Hi there!

As a decorator of a function:


In [17]:
@interact(x=True, y=1.0)
def g(x, y):
    return (x, y)


(True, 1.0)

Finally, we can bind the input of one widget to the output of another one.


In [10]:
outputText = widgets.Text()
outputText

In [11]:
inputText = widgets.Text()

def makeUpperCase(sender):
    outputText.value = inputText.value.upper()
    
inputText.on_submit(makeUpperCase)
inputText

Interactive visualization:


In [12]:
#from IPython.html.widgets import *
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
t = np.arange(0.0,1.0,0.01)

def pltsin(f):
    plt.plot(t,np.sin(2*np.pi*t*f))
    plt.show()
    
interact(pltsin, f=(1,10,0.1))



In [ ]: