Installation instructions are hosted by the Jupyter project here. While this official documentation should be considered as the ultimate reference for installing the software, I will briefly describe how to install the Jupyter Notebook using the Anaconda Python distribution.
Anaconda by Continuum Analytics is a free Python distribution that contains more than 300 of the best Python packages for data analysis and scientific research. Armed with conda, the cross-platform and Python-agnostic binary package manager shipped with Anaconda, the Anaconda distribution makes it easy and quick to install Python packages without worrying about third party library requirements, compiling, or version incompatibilities.
Our own research computing support team at Boston University recommends and describes using Anaconda when using Python on the GEO/SCC cluster.
Download the Anaconda distribution from their website:
https://www.continuum.io/downloads
Select the download for your operating system (Windows, Mac, or Linux) and your computer's architecture (32-bit and 64-bit are available, most you most likely want 64-bit). I recommend using the Python 3 download as it is the present and future version of the language unless you're tied to Python 2 because of some third party library incompatibility with Python 3 (e.g., QGIS). See this page for more information on the history of the versions and the difference between them.
Users on Windows must use the graphical installer while users on Macs have the optional of a graphical or terminal installation. Linux users must use the terminal based installer.
Installation instructions from Continuum Analytics are provided on their download site and also here: http://docs.continuum.io/anaconda/install
Instructions for the terminal based installers are also included below:
Users must run Anaconda using bash
even if they're not using bash
as their shell. Navigate to the download and execute the installer as follows:
cd Downloads/
bash Anaconda3-2.3.0-MacOSX-x86_64.sh
or
cd Downloads/
bash Anaconda3-2.3.0-Linux-x86_64.sh
Note that the version numbers in the download filename will change. You should substitute the version number of the installer you downloaded as necessary.
Read and agree to the license terms. Unless you have good reason to do otherwise, it is perfectly okay to use the default installation options.
If you did not allow the installer to preprend the Anaconda installation to your PATH
by editing your .bashrc
, manually append it to your PATH
:
export PATH=/home/ceholden/anaconda3/bin:$PATH
To test the conda
installation, please try to update conda
:
cmd
)conda update conda
conda update conda
conda update conda
Once Anaconda
has been installed and is configured, you can install the Jupyter
notebook simply as follows:
conda install jupyter
With Jupyter
installed, you can run a notebook session as follows:
jupyter notebook
Read the text that the notebook program prints to the console. The Jupyter
program is a web browser based application, so what you're seeing is some information about the web server that Jupyter
has launched. You will most likely be assigned port 8888
, but it might be different. Your log should look something like this:
[I 10:48:03.225 NotebookApp] Serving notebooks from local directory: /home/ceholden/Downloads
[I 10:48:03.225 NotebookApp] 0 active kernels
[I 10:48:03.225 NotebookApp] The IPython Notebook is running at: http://localhost:8889/
The URL listed (e.g., http://localhost:8889/
) is the URL of the page you want to navigate to using your web browser.
You may wish to change your directory in the terminal before launching jupyter notebook
because your current direectory controls what notebooks can be opened. You can access any notebook files below the current directory when you launched the notebook, but you cannot access any notebooks above it.
Running jupyter notebook
from your local machine is fairly trivial. It is also trivial to run the notebook from the GEO/SCC cluster, but there are a few security considerations that make it somewhat harder to run the notebook responsibly from the cluster.
The jupyter notebook
works just like a web server in that anyone with the IP and port number can access it through the internet. What makes it different from normal web servers is that the jupyter notebook
has tremendous power and does not limit code execution. A malicious user could, for instance, delete your entire project folder if they had access to your notebook session. Thus, it is important to secure and protect your notebook session using encryption.
I will not try to include all information relevant to securing the notebook in this tutorial because it is likely to rapidly become out of date and, thus, insecure. Instead, I will link to the official security documentation and be happy to help anyone follow along:
For more information on how to generate a password for your notebooks and an SSL certificate to encrypt your communication, please follow the guide linked below:
http://jupyter-notebook.readthedocs.org/en/latest/public_server.html
Once you follow the security steps listed above, it is pretty easy to connect to a head node on the cluster running the jupyter notebook
session by forwarding the port of the notebook server from the remote host to your local host using SSH. For instance,
ssh -L 8888:localhost:8888 -N ceholden@geo.bu.edu
Users familiar with accessing the GEO/SCC head nodes via Remote Desktop VNC sessions will be used to this procedure. For those unfamiliar, the -L
option to the ssh
command is "forwarding" port 8888 on the geo.bu.edu
server to my local machine. When I access localhost:8888
in my web browser, the ssh
session is forwarding my request to my local machine for port 8888 to the geo.bu.edu
server which is hosting my jupyter notebook
session.
As all users of the GEO/SCC cluster should know, the head nodes are only to be used for lightweight tasks. Intensive computing and data visualization using the jupyter notebook
is not one of these tasks. Instead of runnign the notebook server on the head node, one should use the qsub
or qsh
system to run the notebook server on a compute node.
Consider the following scenario:
ceholden@geo: > qsh -V -l h_rt=24:00:00 -N jupyter_nb
...
Your job 9992339 ("jupyter_nb") has been submitted
waiting for interactive job to be scheduled ....
...
ceholden@scc-gb08: > jupyter notebook
[I 11:35:11.149 NotebookApp] Serving notebooks from local directory: /usr3/graduate/ceholden
[I 11:35:11.150 NotebookApp] 0 active kernels
[I 11:35:11.150 NotebookApp] The IPython Notebook is running at: https://[all ip addresses on your system]:8888/
I cannot directly access the compute node I've been assigned, scc-gb08
, from my local machine. I have to go through a machine that can access it -- the "head nodes" scc1.bu.edu
, scc2.bu.edu
, geo.bu.edu
, or scc4.bu.edu
.
The first step is to ssh
into the compute node from the head node to confirm the RSA signature of the compute node (e.g., confirm that the machine is who it says it is):
The authenticity of host 'scc-* (IP ADDRESS)' can't be established.
RSA key fingerprint is `RSA KEY`.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'scc-*' (RSA) to the list of known hosts.
This only needs to be done once per compute node. When we connect to the compute node in the next step we will node have a chance to manually accept the RSA key so it is important to do this first.
Next, tunnel through the head node (geo.bu.edu
in this example) and into the compute node (scc-gb08.bu.edu
in this example) using two SSH commands:
ssh -L 8890:localhost:8890 ceholden@geo.bu.edu ssh -L 8890:localhost:8888 -N ceholden@scc-gb08
The first SSH command is almost identical to the forwarding we would perform if we only needed to access the head node. However, instead of not doing anything by passing the -N
option, we will instead do another ssh
command into the compute node.
We are forwarding port 8888 from the compute node to port 8890 on the head node. From there we are forwarding port 8890 from the head node to our local machine. This is commonly referred to as "multiple hops" or "multi-hop". Since port 8888 on the compute node is now port 8890 on our local machine, we can simply access localhost:8890
on our local web browser.
See the following StackOverflow question and answers for more information: