In this book, we will use Python 3. For a good introduction, see e.g., the free books Whirlwind tour of Python by Jake Vanderplas or Dive into Python 3 by Mark Pilgrim.
This document is an example of a Jupyter notebook, which mixes code and results. When developing larger software projects, it is often better to use an IDE (interactive development environment), which keeps the code in separate files. I recommend Spyder, although many people use JupyterLab for a browser-based solution.
We will leverage many standard libraries from Python's "data science stack", listed in the table below. For a good introduction to these, see e.g., the free book Python Datascience Handbook by Jake Vanderplas, or the class Computational Statistics in Python by Cliburn Chen at Duke University. For an excellent book on scikit-learn, see Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow v2 by Aurelion Geron.
Name | Functionality |
---|---|
Numpy | Vector and matrix computations |
Scipy | Various scientific / math / stats / optimization functions |
Matplotlib | Plotting |
Seaborn | Extension of Matplotlib |
Pandas | Manipulating tabular data and time series |
Scikit-learn | Implements many "Classical" ML methods |
Deep learning is about composing differentiable functions into more complex functions, represented as a computation graph, and then using automatic differentiation ("autograd") to compute gradients, which we can pass to an optimizer, to fit the function to data. This is sometimes called "differentiable programming".
There are several libraries that can execute such computation graphs on hardware accelerators, such as GPUs. (Some libraries also support distributed computation, but we will not need use this feature in this book.) We list a few popular libraries below.
Name | Functionality | More info |
---|---|---|
Tensorflow 2.0 | Accelerated numpy-like library with autograd support. Keras API. | |
JAX | Accelerated numpy, functional code transformations (autograd, JIT, VMAP, etc) | |
Pytorch | Similar to TF 2.0 | Official PyTorch tutorials |
MXNet | Similar to TF 2.0. Gluon API. | Dive into deep learning book |
In this book, we will be focusing on probabilistic models, both supervised (conditional) models of the form $p(y|x)$, as well as unsupervised models of the form $p(z,x)$, where $x$ are the features, $y$ are the labels (if present), and $z$ are the latent variables. GMMs and PCA, which we discuss in the unsupervised learning notebook, are very simple examples of such latent variable models. However, to create more complex models, we need to move beyond scikit-learn. In addition, we will often need more than just gradient-based optimization, in order to handle discrete variables and randomly-shaped data structures.
There are a variety of Python libraries for probabilistic modeling, some of which build on top of deep learning libraries, and extend them to handle stochastic functions and probabilistic inference. If the model is specified declaratively, using a domain specific language (DSL) or an application programming interface (API), we will call it a "probabilistic modeling language" (PML). If the system uses a lower level interface, and allows the creation of more flexible models (e.g., using stochastic control flow), we will call it a "probabilistic programming language" (PPL). We list a few examples below.
Name | Functionality |
---|---|
Pyro | PPL built on top of PyTorch. |
NumPyro | Lightweight version of Pyro, using JAX instead of PyTorch as the backend. |
TF Probability (TFP) | PPL on top of Tensorflow. |
PyStan | Python interface to Stan, which uses the BUGS DSL for PGMs. Supports MCMC and VI. Custom C++ autodiff library. |
PyMc | Similar functionality to PyStan without the C++ part. v3 uses Theano for autograd. v4 will use TF for autograd. |
Pgmpy | Python API for (non-Bayesian) discrete PGMs. No support for autodiff or GPUs. |