Importing Jupyter Notebooks as "Objects"

Jupyter Notebooks are great for data exploration, visualizing, documenting, prototyping and iteracting with the code, but when it comes to creating an actual program out of a notebook they fall short. I often get myself copying cells from a notebook into a script so that I can run the code with command line arguments. There is no easy way to run a notebook and return a result from its execution, passing arguments to a notebook and running individual code cells programatically. Have you ever wrapped a code cell to a function just so you want to call it in a loop with different paramethers?

I wrote a small utility tool nbloader that enables code reusing from jupyter notebooks. With it you can import a notebook as an object, pass variables to it's name space, run code cells and pull out variables from its name space.

This tutorial will show you how to make your notebooks resusable with nbloader.

Instal nbloader with pip

pip install nbloader --upgrade

Load a Notebook


In [1]:
from nbloader import Notebook

loaded_notebook = Notebook('test.ipynb')

The above commad loades a notebook as an object. This can be done inside a jupyter notebook or a regular python script.

Run all cells


In [2]:
loaded_notebook.run_all()


I am inside loaded_notebook!
Out[2]:
<nbloader.Notebook at 0x1049c71d0>

After loaded_notebook.run_all() is called:

  • The notebook is initialized with empty starting namespace.
  • All cells of the loaded notebook are executed one after another by the order they are the file.
  • All print statement or any other stdout from the loaded notebook will output.
  • All warnings or errors will be raised unless catched.
  • All variables from the loaded notebook's namespace will be accesable.

Here is the contents of loaded_notebook.ipynb

This is how you access the namespace of the loaded notebook


In [3]:
loaded_notebook.ns['a']


Out[3]:
6

The notebooks namesace is just a dict so if you try to get something thats not there will get an error.


In [4]:
loaded_notebook.ns['b']


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-0892bc683411> in <module>()
----> 1 loaded_notebook.ns['b']

KeyError: 'b'

Run individual cells if they are tagged.


In [5]:
loaded_notebook.run_tag('add_one')
print(loaded_notebook.ns['a'])
loaded_notebook.run_tag('add_one')
print(loaded_notebook.ns['a'])


7
8

If a cell have a comment on its first line it will become a tag.

This is how you mess with its namespace


In [6]:
loaded_notebook.ns['a'] = 0
loaded_notebook.run_tag('add_one')
print(loaded_notebook.ns['a'])


1

Example workflows:

Create a notebook to parse one file and then call it in a loop when changing its namespace with new value for filename.

Create a notebook with a model and then optimize it with different paramethers

Since It's namespace is just a dic, there is no performance penalty when passing large objects the notebook. All the code from it's cells is compiled and can be called in a loop with the spead of a regular function.

Added some magic_tags to make it act more like an a objects

  • if a cell has a tag name __init__ will be runned at initialization and when restarted.
  • if a cell has a tag name __del__ will be runned when deleted (or not).

[Warning] on best practices!

You may be tempted to load the curent notebook and then loop a cell. I don't think this is a good practice.

Easy to run from command line

nbloader is just a conviniance wrap for loading a notebook from command line with the default python.

nbloader test_notebook.ipynb --learing_rate=4

Inside test_notebook you can parse the sys.args anyhow you want for the learing_rate argument.