Overview of Python Visualization Toolkit

These two talks provide pretty good overview of the Python visualization landscape.

Python environment setup instruction

Local setup

Anaconda environment

First download Anaconda for your system (Python 3) from here. The miniconda is a minimal version on which you need to install necessary packages. If you don't have much space or prefer to install only necessary packages, miniconda will suit you. Anaconda comes with a package manager called conda.

If you haven't, you may want to install the core Python data packages.

conda install numpy scipy pandas scikit-learn matplotlib seaborn jupyter jupyterlab


You always want to use a virtual environment for each of your project. By using virtual environments, you can isolate each environment from the others and maintain separate sets (versions) of packages. conda has a built-in support for virtual environments.

conda create -n dviz python=3.7

This command creates a virtual environment named dviz with Python 3.7 and Anaconda.

You can activate the environment (whenever you begins to work on this course) by running

conda activate dviz

and deactivate (when you're done) by running

conda deactivate

For the full documentation, see https://conda.io/docs/user-guide/tasks/manage-environments.html

Pipenv

If you are not using Anaconda but using pip, a nice option to manage virtual environments is using pipenv. It is similar to conda, but of course can be used without installing Anaconda.

You can install it by running

pip install --user pipenv

Check out the full documentation about installation: https://pipenv.readthedocs.io/en/latest/install

If you want to install a new package (and create a new virtual environment), you run

pipenv install package-name

If you want to use this virtual environment, run

pipenv shell

If you want to deactivate the virtual env, you can simply type exit.

Using conda/pipenv with Jupyter

In Jupyter notebook/lab, you can choose the python kernel. Say if you have both python3.5 and python3.7, Jupyter lets you use the version of your choice. Furthermore, by choosing a kernel, you also use the packages installed with that kernel. So if you use dviz virtual environment that you set up with Anaconda, you can also use the python kernel and the packages installed in this virtual environment in Jupyter.

in your system Jupyter notebook or lab, you need to install ipykernel package first.

conda install ipykernel

or

pipenv install ipykernel

Then you can install a custom Python kernel (for your virtual env) for Jupyter by running the following (replace dviz with any name you want). First activate your environment, and then:

python -m ipykernel install --user --name=dviz

After doing this, you will be able to choose the kernel you created from Jupyter environment. When you click "New", it allows you to choose a kernel from a list. You'll see your kernel (e.g. "dviz...") in this list.

Jupyter

Once you have setup your local environment, you can run

jupyter notebook

or Install Jupyter lab

conda install jupyterlab

and run:

jupyter lab

Jupyter lab is the 'next generation' system that aims to replace Jupyter notebook and it has many powerful features. Some packages that we use work more nicely with Jupyter lab too (although for some lab assignments you may need to use jupyter notebook instead of the lab).

nteract

A convenient way to use Jupyter is using the nteract app. This is essentially a desktop Jupyter app. Instead of running Jupyter server and using a web browser, you can simply open a notebook file with nteract app and you'll be able to run the code as if you're using the Jupyter on the web. If you use Atom editor, you can make it interactive by using https://nteract.io/atom.

Cloud setup

These are good cloud Jupyter notebook options. They are not necessarily supporting every package that we use but they may be an excellent option especially if you have a hard time installing packages. They also allow you to work on your code anywhere with internet access. The best option is Google colaboratory. It allows installation of many packages. It even lets you use GPUs! (although we don't really need to use any).

Google colaboratory

is Google's collaborative Jupyter notebook service. This is the recommended cloud option. You can install packages by running

!pip install packagename

Azure notebooks

Microsoft also has a cloud notebook service called Azure notebooks. This service also allows installing new packages through !pip install ....

CoCalc

CoCalc (https://cocalc.com/) is a service by SageMath. You can use it freely but the free version is slow and can be turned off without warning. Most of the packages that we use are pre-installed. We may be able to provide a subscription through the school.

Kaggle Kernels

The famous machine learning / data science competition service Kaggle offers cloud-based notebooks called Kaggle kernels. Because you can directly use all the Kaggle datasets, it is an excellent option to do your project if you use one of the Kaggle datasets. It allows uploading your own dataset and install some packages, but not all packages are supported.

Lab assignment

  1. Set up your local Python environment following the instructions. You should be using a virtual environment on your local machine.
  2. Install Jupyter notebook and Jupyter lab.
  3. Launch jupyter notebook (lab)
  4. Create a new notebook and play with it. Print "Hello world!".

If you want to use a cloud environment,

  1. Try out the cloud environments listed above. (Google colaboratory is recommended)
  2. Try installing the following packages.

Finally, these are the packages that we plan to use. So check out their homepages and figure out what they are about.

Install them using your package manager (conda or pip).

Once you have installed the Jupyter locally or succeeded with a cloud environment, run the following import cell to make sure that every package is installed successfully. Submit the notebook on the canvas.


In [4]:
import numpy
import scipy
import matplotlib
import seaborn
import pandas
import altair
import vega_datasets
import sklearn
import bokeh
import datashader
import holoviews
import wordcloud
import spacy

In [ ]: