These two talks provide pretty good overview of the Python visualization landscape.
First download Anaconda for your system (Python 3) from here. The miniconda
is a minimal version on which you need to install necessary packages. If you don't have much space or prefer to install only necessary packages, miniconda
will suit you. Anaconda comes with a package manager called conda
.
If you haven't, you may want to install the core Python data packages.
conda install numpy scipy pandas scikit-learn matplotlib seaborn jupyter jupyterlab
You always want to use a virtual environment for each of your project. By using virtual environments, you can isolate each environment from the others and maintain separate sets (versions) of packages. conda
has a built-in support for virtual environments.
conda create -n dviz python=3.7
This command creates a virtual environment named dviz
with Python 3.7 and Anaconda.
You can activate the environment (whenever you begins to work on this course) by running
conda activate dviz
and deactivate (when you're done) by running
conda deactivate
For the full documentation, see https://conda.io/docs/user-guide/tasks/manage-environments.html
If you are not using Anaconda but using pip
, a nice option to manage virtual environments is using pipenv
. It is similar to conda, but of course can be used without installing Anaconda.
You can install it by running
pip install --user pipenv
Check out the full documentation about installation: https://pipenv.readthedocs.io/en/latest/install
If you want to install a new package (and create a new virtual environment), you run
pipenv install package-name
If you want to use this virtual environment, run
pipenv shell
If you want to deactivate the virtual env, you can simply type exit
.
In Jupyter notebook/lab, you can choose the python kernel. Say if you have both python3.5 and python3.7, Jupyter lets you use the version of your choice. Furthermore, by choosing a kernel, you also use the packages installed with that kernel. So if you use dviz
virtual environment that you set up with Anaconda, you can also use the python kernel and the packages installed in this virtual environment in Jupyter.
in your system Jupyter notebook or lab, you need to install ipykernel
package first.
conda install ipykernel
or
pipenv install ipykernel
Then you can install a custom Python kernel (for your virtual env) for Jupyter by running the following (replace dviz
with any name you want). First activate your environment, and then:
python -m ipykernel install --user --name=dviz
After doing this, you will be able to choose the kernel you created from Jupyter environment. When you click "New", it allows you to choose a kernel from a list. You'll see your kernel (e.g. "dviz...") in this list.
Once you have setup your local environment, you can run
jupyter notebook
conda install jupyterlab
and run:
jupyter lab
Jupyter lab is the 'next generation' system that aims to replace Jupyter notebook and it has many powerful features. Some packages that we use work more nicely with Jupyter lab too (although for some lab assignments you may need to use jupyter notebook instead of the lab).
A convenient way to use Jupyter is using the nteract
app. This is essentially a desktop Jupyter app. Instead of running Jupyter server and using a web browser, you can simply open a notebook file with nteract
app and you'll be able to run the code as if you're using the Jupyter on the web. If you use Atom editor, you can make it interactive by using https://nteract.io/atom.
These are good cloud Jupyter notebook options. They are not necessarily supporting every package that we use but they may be an excellent option especially if you have a hard time installing packages. They also allow you to work on your code anywhere with internet access. The best option is Google colaboratory. It allows installation of many packages. It even lets you use GPUs! (although we don't really need to use any).
is Google's collaborative Jupyter notebook service. This is the recommended cloud option. You can install packages by running
!pip install packagename
Microsoft also has a cloud notebook service called Azure notebooks. This service also allows installing new packages through !pip install ...
.
CoCalc (https://cocalc.com/) is a service by SageMath. You can use it freely but the free version is slow and can be turned off without warning. Most of the packages that we use are pre-installed. We may be able to provide a subscription through the school.
The famous machine learning / data science competition service Kaggle offers cloud-based notebooks called Kaggle kernels. Because you can directly use all the Kaggle datasets, it is an excellent option to do your project if you use one of the Kaggle datasets. It allows uploading your own dataset and install some packages, but not all packages are supported.
If you want to use a cloud environment,
Finally, these are the packages that we plan to use. So check out their homepages and figure out what they are about.
Install them using your package manager (conda or pip).
Once you have installed the Jupyter locally or succeeded with a cloud environment, run the following import cell to make sure that every package is installed successfully. Submit the notebook on the canvas.
In [4]:
import numpy
import scipy
import matplotlib
import seaborn
import pandas
import altair
import vega_datasets
import sklearn
import bokeh
import datashader
import holoviews
import wordcloud
import spacy
In [ ]: