Copyright and License.



In [ ]:

    
#@title Agreement

# Copyright (c) 2021 Kevin P. Murphy (murphyk@gmail.com) and Mahmoud Soliman (mjs@aucegypt.edu)
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

Setup and environment sanity checks

Check the hardware specifications for the GCP VM this notebook running on and the software stack installed.



In [ ]:

    
#@title Imports
from tensorflow.python.client import device_lib
from psutil import virtual_memory
import cv2
from google.colab.patches import cv2_imshow
%tensorflow_version 2.x
import tensorflow as tf
import os



In [ ]:

    
#@title Hardware check



def find_accelerator():
  
  mem = virtual_memory()
  devices=device_lib.list_local_devices()
  RAM="Physical RAM: {:.2f} GB".format(mem.total/(1024*1024*1024))
  try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  
    device=["TPU at "+str(tpu.cluster_spec().as_dict()['worker'])]  
  except ValueError:
    device =[d.physical_device_desc for d in devices if d.device_type=="GPU"]
  if not device:
    return None, RAM
  return device ,  RAM 

a,r=find_accelerator()
print("Accelerator found:",a,r)



In [ ]:

    
#@title Install the extra required packages
!apt install octave  -qq > /dev/null
!apt-get install liboctave-dev -qq > /dev/null



In [ ]:

    
#@title Clone PyProbML repo and set enviroment variables
!git clone https://github.com/probml/pyprobml/ -q
os.environ["PYPROBML"]='/content/pyprobml/'

Figures



In [ ]:

    
#@title Helper code to display images
def display_image(image,ratio):
    img = cv2.imread(image, cv2.IMREAD_UNCHANGED)
    img=cv2.resize(img, (0,0), fx=ratio, fy=ratio) 
    cv2_imshow(img)
    print("\n")

Figure 1.1:

Boxplots of MPG (miles per gallon) vs

(a) country of origin, or

(b) year of manufacture. The dotted red line is the average



In [ ]:

    
#@title Figure 1.1
%cd /content/pyprobml/scripts
%run /content/pyprobml/scripts/autompg_plot.py

Figure 1.2:

(a) Linear regression on some 1d data.

(b) The vertical lines denote the residuals between the observed output value for each input (blue circle) and its predicted value (red cross). The goal of least squares regression is to pick a line that minimizes the sum of squared residuals.



In [ ]:

    
#@title Figure 1.2
%cd /content/pyprobml/scripts
%run /content/pyprobml/scripts/linreg_residuals_plot.py

Figure 1.5:

Visualization of the Iris data as a pairwise scatter plot. The diagonal plots the marginal histograms of the 4 features. The off diagonals contain scatterplots of all possible pairs of features.



In [ ]:

    
#@title Figure 1.5
%cd /content/pyprobml/scripts
%run /content/pyprobml/scripts/iris_plot.py

Figure 1.6:

NLL loss surface for binary logistic regression applied to Iris dataset with 1 feature and 1 bias term.



In [ ]:

    
#@title Figure 1.6

%run /content/pyprobml/scripts/iris_logreg_loss_surface.py

Fig 1.7b

Illustration of a nonconvex 2d function with many local maxima.



In [ ]:

    
#@title Figure 1.7b
%cd /content/pyprobml/scripts/matlab
!octave -W '/content/pyprobml/scripts/matlab/maxGMMplot.m' >> _
display_image("./output1.jpg",0.4)
print("\n")
display_image("./output2.jpg",0.4)
%cd /content/

Figure 1.8:

(a-c) Polynomial of degrees 1, 14 and 20 fit to 21 datapoints (the same data as in Fig. 1.2). With a degree- 20 polynomial, we can perfectly interpolate all $N=21$ training points, as we see.

(d) MSE vs degree.



In [ ]:

    
#@title Figure 1.8
%cd /content/pyprobml/scripts/
%run /content/pyprobml/scripts/linreg_poly_vs_degree.py

Figure 1.10

Illustration of the binomial distribution with $N=10$ and

(a) $\mu=0.25$ and

(b) $\mu=0.9$.



In [ ]:

    
#@title Figure 1.10 
%run /content/pyprobml/scripts/binom_dist_plot.py

Figure 1.11:

(a) The sigmoid (logistic) function $\sigma(a) =(1+e^{a})^ {-1}$

(b) The Heaviside function $I(a>0)$.

Figure 1.24:

Plots of some popular activation functions.



In [ ]:

    
#@title Figure 1.11 and Figure 1.24
%run /content/pyprobml/scripts/activation_fun_plot.py

Figure 1.12:

Logistic regression applied to a 1-dimensional, 2-class version of the Iris dataset.

Figure 1.13b:

Visualization of optimal linear decision boundary induced by logistic regression on a 2-class, 2-feature version of the iris dataset

Figure 1.17:

Logistic regression on the 3-class, 2-feature version of the Iris dataset.



In [ ]:

    
#@title Figure 1.12 and Figure 1.13b and 1.17
%run /content/pyprobml/scripts/iris_logreg.py

Figure 1.15:

Plots of $\sigma(w_{1}x_{1}+w_{2}x_{2})$. Here $w=(w_{1},w_{2})$ defines the normal to the decision boundary. Points to the right of this have $\sigma(w^T,x)>0.5$, and points to the left have $\sigma(w^T,x)<0.5$.



In [ ]:

    
#@title Figure 1.15
%run /content/pyprobml/scripts/sigmoid_2d_plot.py

Figure 1.16:

Softmax distribution $\mathcal{S}(a/T)$, where $a=(3,0,1)$, at temperatures of $T=100$, $T=2$ and $T=1$. When the temperature is high (left), the distribution is uniform, whereas when the temperature is low (right), the distribution is “spiky”, with most of its mass on the largest element



In [ ]:

    
#@title Figure 1.16
%run /content/pyprobml/scripts/softmax_plot.py

Figure 1.18:

Example of 3-class logistic regression with 2d inputs.

(a) Original features.

(b) Quadratic features.



In [ ]:

    
#@title Figure 1.18
%cd /content/pyprobml/scripts/
%run /content/pyprobml/scripts/logreg_multiclass_demo.py

Figure 1.19:

(a) A Gaussian pdf with mean 0 and variance 1. (This is known as the standard normal.)

(b) Visualization of the conditional density model $p(y|x,\theta)=\mathcal{N}(y|w_{0} + w_{1},\sigma ^{2})$. The density falls off exponentially fast as we move away from the regression line.



In [ ]:

    
#@title Figure 1.19
%run /content/pyprobml/scripts/gauss_plot.py
%run /content/pyprobml/scripts/linreg_wedge_plot.py

Figure 1.20:

Polynomial regression applied to 2d data. Vertical axis is temperature, horizontal axes are location within a room. Data was collected by some remote sensing motes at Intel’s lab in Berkeley, CA (data courtesy of Romain Thibaux).

(a) The fitted plane has the form $\widehat{f}(x)=w_{0} + w_{1}x_{1} + w_{2}x_{2}$.

(b) Temperature data is fitted with a quadratic of the form $\widehat{f}(x)=w_{0} + w_{1}x_{1} + w_{2}x_{2} +w_{3}x_{1}^{2}+w_{4}x_{2}^{2}$



In [ ]:

    
#@title Figure 1.20
%run /content/pyprobml/scripts/linreg_2d_surface_demo.py

Figure 1.21:

(a) Contours of the RSS error surface for the example in Fig. 1.2. The blue cross represents the MLE.

(b) Corresponding surface plot.



In [ ]:

    
#@title Figure 1.21
%run /content/pyprobml/scripts/linreg_contours_sse_plot.py

Figure 1.23:

Linear regression using Gaussian output with mean $\mu(x)=b + wx$ and

(a) fixed variance $\sigma^{2}$ (homoskedastic) or

(b) input-dependent variance $\sigma(x)^{2}$ (heteroskedastic).



In [ ]:

    
#@title Figure 1.23
%run /content/pyprobml/scripts/linreg_1d_hetero_tfp.py

Figure 1.27:

Illustration of predictions from an MLP fit using MLE to a 1d regression dataset with growing noise.

(a) Output variance is input-dependent, as in Fig. 1.26.

(b) Mean is computed using same model as in (a), but output variance is treated as a fixed parameter $\sigma^{2}$, which is estimated by MLE after training, as in Sec. 10.3.4.2



In [ ]:

    
#@title Figure 1.27
%run /content/pyprobml/scripts/nonlinreg_1d_hetero_tfp.py

Figure 1.28:

(a) Visualization of the MNIST dataset [LeC+98a; YB19]. Each image is $32\times32\times1$, where the final dimension of size 1 refers to gray scale. There are 60k training examples and 10k test examples. There are 10 classes, corresponding to the digits 0–9.

(b) Visualization of the FashionMNIST dataset [XRV17]. Each image is $32\times32\times1$, where the final dimension of size 1 refers to gray scale. There are 60k training examples and 10k test examples. There are 10 classes: ’T-shirt/top’, ’Trouser’, ’Pullover’, ’Dress’, ’Coat’, ’Sandal’, ’Shirt’, ’Sneaker’, ’Bag’, ’Ankle boot’. We show the first 25 images from the training set.



In [ ]:

    
#@title Figure 1.28
%run /content/pyprobml/scripts/mnist_mlp_tf.py
%run /content/pyprobml/scripts/fashion_mlp_tf.py

Figure 1.29:

(a) Some images from the CIFAR-10 dataset . Each image is $32\times32\times3$, where the final dimension of size 3 refers to RGB. There are 50k training examples and 10k test examples. There are 10 classes: plane, car, bird, cat, deer, dog, frog, horse, ship, and truck. We show the first 25 images from the training set.



In [ ]:

    
#@title Figure 1.29
%run /content/pyprobml/scripts/cifar_viz_tf.py

Figure 1.37:

A simple regression tree on two inputs.



In [ ]:

    
#@title Figure 1.37

%cd /content/pyprobml/scripts/matlab
!octave -W '//content/pyprobml/scripts/matlab/regtreeSurfaceDemo.m' >> _
display_image("./output1.jpg",0.4)

Figure 1.38:

(a) Iris data. We only show the first two features, sepal length and sepal width, and ignore petal length and petal width.

(b) Decision boundaries learned by an unpruned decision tree.



In [ ]:

    
#@title Figure 1.38 and 1.39

%cd /content/pyprobml/scripts/matlab

!octave -W '/content/pyprobml/scripts/matlab/dtreeDemoIris.m' >> _
display_image("./output1.jpg",0.4)

Figure 1.40:

(a) Illustration of a K-nearest neighbors classifier in $2d$ for $K=5$. The nearest neighbors of test point $x$ have labels {1, 1, 1, 0, 0} so we predict $p(y=1|x,\mathcal{D})=3/5$.

(b) Illustration of the Voronoi tesselation induced by $1-NN$.



In [ ]:

    
#@title Figure 1.40
%run /content/pyprobml/scripts/knn_voronoi_plot.py

Figure 1.41:

Decision boundaries induced by a KNN classifier.

(a) $K=1$.

(b) $K =2$.

(d) Train and test error vs $K$.



In [ ]:

    
#@title Figure 1.41
%run /content/pyprobml/scripts/knn_classify_demo.py

Figure 1.42b:

Illustration of the curse of dimensionality.

(b)We plot the edge length of a cube needed to cover a given volume of the unit cube as a function of the number of dimensions.



In [ ]:

    
#@title Figure 1.42
%run /content/pyprobml/scripts/curse_dimensionality_plot.py

Figure 1.43:

(a-c) Ridge regression applied to a degree 14 polynomial fit to 21 datapoints.

(d) MSE vs strength of regularizer. The degree of regularization increases from left to right, so model complexity decreases from left to right.



In [ ]:

    
#@title Figure 1.43
%run /content/pyprobml/scripts/linreg_poly_ridge.py

Figure 1.44:

Predictions made by a polynomial regression model fit to a small dataset.

(a) Plugin approximation to predictive density using the MLE. Black curve is posterior mean, error bars are 2 standard deviations.

(b) Bayesian posterior predictive density, obtained by integrating out the parameters. Generated by linreg_post_pred_plot.py.



In [ ]:

    
#@title Figure 1.44
%run /content/pyprobml/scripts/linreg_post_pred_plot.py

Figure 1.45:

Performance of a text classifier (an MLP applied to a bag of word embeddings using average pooling) vs number of training epochs on the IMDB movie sentiment dataset. $Blue = train, red = validation$.

(a) Cross entropy loss. Early stopping is triggered at about epoch 25.

(b) Classification accuracy.



In [ ]:

    
#@title Figure 1.45
%cd /content/pyprobml/scripts
%run /content/pyprobml/scripts/imdb_mlp_bow_tf.py

Figure 1.46:

MSE on training and test sets vs size of training set, for data generated from a degree 2 polynomial with Gaussian noise of variance $\sigma^{2}$=4. We fit polynomial models of varying degree to this data.



In [ ]:

    
#@title Figure 1.46
%run /content/pyprobml/scripts/linreg_poly_vs_n.py

Figure 1.49:

(a) A scatterplot of the petal features from the iris dataset.

(b) The result of unsupervised clustering using $K=3$.



In [ ]:

    
#@title Figure 1.49
%run /content/pyprobml/scripts/iris_kmeans.py

Figure 1.50:

(a) Some 3d data points.

(b) We fit a 2d linear subspace to the 3d data using PCA.



In [ ]:

    
#@title Figure 1.50
%run /content/pyprobml/scripts/pca_demo.py