Conclusion

I hope you enjoyed the lecture and could do crazy bayesian stuff in your next job as data scientist !

Check list

  • Understand what machine learning is in terms of probabilites
  • Can get started in constructing probabilistic graphical models using pgmpy
  • Can get started in building bayesian models in pymc
  • Understand the difference between frequentist and bayesian machine learning
  • Understand regression in bayesian settings
  • Understand when to prefere bayesian machine learning over other approaches

Homework 1: Monti hal problem

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

write a code to help you pass the game show ?


In [9]:
#your code here

Homework 2: Localization

Suppose you are in a mobile car, and you can sense the noisy distnace between you and fixed communication towers (using signal stright).

Can you estimate your true location ?


In [26]:
# Your true distance 
import numpy as np
import pymc as pm
import math
import matplotlib.pyplot as plt
import random
import scipy as sci

%matplotlib inline

def noisyDistance(x, y, noiseSigma):
    return np.sqrt(((x - y) ** 2).sum()) + sci.stats.norm.rvs(0, noiseSigma)

def generateData(landscapeSize, numCommTowers, noiseSigma):
    towersTrueLocation = [np.array([random.random() * landscapeSize, random.random()* landscapeSize]) for i in range(0, numCommTowers)]
    yourTrueLocation = np.array([random.random() * landscapeSize, random.random() * landscapeSize ])
    noisyDistances = [noisyDistance(i, yourTrueLocation, noiseSigma) for i in towersTrueLocation]
    return (towersTrueLocation, yourTrueLocation, noisyDistances)

def drawLandscape(landscapeSize, towersTrueLocation, yourTrueLocation, estimatedLocations = None):
    plt.xlim(0, landscapeSize)
    plt.ylim(0, landscapeSize)
    for i in towersTrueLocation:
        plt.scatter(i[0], i[1],  marker='+')
    plt.scatter(yourTrueLocation[0], yourTrueLocation[1],  marker='*', color = 'red')
    if estimatedLocations is not None:
        for i in estimatedLocations:
            plt.scatter(i[0], i[1],  marker='o')

In [25]:
(towersTrueLocation, yourTrueLocation, noisyDistances) = generateData(10000, 25, 50)

drawLandscape(10000, towersTrueLocation, yourTrueLocation)

print ("True towers locations: ", towersTrueLocation, "\n")
print ("Your true location: ", yourTrueLocation, "\n")
print ("Noisy distances: ", noisyDistances)


True towers locations:  [array([ 6258.84064453,  3223.18772518]), array([ 4355.47954009,  9621.17722459]), array([ 9879.7474009 ,  5878.66898346]), array([ 2090.5262    ,  9506.56502056]), array([ 4380.04351251,  5503.48893528]), array([ 3758.10916458,  2236.200656  ]), array([ 6029.19296561,  8189.86400275]), array([  414.72789337,  1040.47618699]), array([ 5650.63481931,  1149.06079376]), array([ 7892.9867875,  4105.2534715]), array([ 8859.32361924,  6296.39078001]), array([ 8672.9470976 ,  2638.46422627]), array([ 1223.1328621 ,  9984.14179862]), array([ 8650.23044368,  6495.76780248]), array([ 1157.35694999,  3948.88606925]), array([ 5180.42569695,  4619.06268634]), array([ 4794.9107609 ,  4519.97181112]), array([ 6496.52141661,  5456.42652108]), array([ 5900.28311032,  1049.63316189]), array([ 8235.03830068,  6095.9155491 ]), array([ 6373.01102043,  5362.86797399]), array([  408.39821927,  5696.78071893]), array([ 2004.13924602,  9340.70463701]), array([ 7983.11790277,  3434.11457796]), array([ 1357.05596679,  8239.95147505])] 

Your true location:  [ 4400.42657991  9927.3871389 ] 

Noisy distances:  [6990.8266784578336, 296.96291666300556, 6897.7318675085426, 2355.1330336112756, 4446.804145042428, 7680.5298621165539, 2400.3101253670125, 9708.1166344455378, 8912.5927557843461, 6797.7502163131157, 5684.0645061008445, 8445.767036404337, 3139.5031595062851, 5479.0176598385196, 6685.2592710742447, 5400.7726985790341, 5527.6767641895012, 4967.7299203468174, 9002.9898533952273, 5456.4947447459799, 4970.6325708178283, 5838.7956458795898, 2425.1222510032089, 7433.6647245112463, 3386.7912015964707]
/home/condauser/anaconda3/lib/python3.4/site-packages/matplotlib/collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):

In [27]:
# Your code here

Homework 3: Mixture of Gaussians

On statistics example, the prof found that there are two peaks in the curve of students scores, he speculate the reason is due to variation in the university students acceptance standards (Elmoazi !). Could help him to rediscover the students two groups.


In [69]:
from matplotlib.pyplot import hist
from numpy.random import normal
import random

data = [normal(55, 5) for i in xrange(100)]
data += [normal(85, 5) for i in xrange(100)]
random.shuffle(data)

hist(data, 20)


Out[69]:
(array([  3.,   8.,  20.,  13.,  17.,  20.,  14.,   3.,   2.,   1.,   1.,
          3.,   6.,  15.,  22.,  19.,  17.,  12.,   2.,   2.]),
 array([ 42.62571374,  45.40253822,  48.1793627 ,  50.95618717,
         53.73301165,  56.50983613,  59.28666061,  62.06348508,
         64.84030956,  67.61713404,  70.39395852,  73.17078299,
         75.94760747,  78.72443195,  81.50125643,  84.27808091,
         87.05490538,  89.83172986,  92.60855434,  95.38537882,  98.16220329]),
 <a list of 20 Patch objects>)

In [68]:
#your code here