Exercise 1 (20 points)

Implement a function that returns $n$ samples from a multivariate Gaussian distribution in C++ and wrap it for use in Python using pybind11. Use only standard C++ and the Eigen library. The function signature in Python is

def mvnorm(mu, Sigma, n):
    """Returns n random samples from a multivariate Gaussian distribution.

    mu is a mean vector
    Sigma is a covariance matrix

    Returns an n by p matrix, where p is the dimension of the distribution.
    """

In [ ]:
%%file random.cpp
<%
cfg['compiler_args'] = ['-std=c++11']
cfg['include_dirs'] = ['../eigen']
setup_pybind11(cfg)
%>

#include <pybind11/pybind11.h>
#include <pybind11/eigen.h>

#include <Eigen/LU>

namespace py = pybind11;

// convenient matrix indexing comes for free
Eigen::MatrixXd mvn(Eigen::VectorXd mu, Eigen::MatrixXd Sigma, int n) {
    return Sigma;
}

PYBIND11_PLUGIN(random) {
    pybind11::module m("random", "auto-compiled c++ extension");
    m.def("mvn", &mvn);
    return m.ptr();
}

Exercise 2 (20 points)

  • Consider a sequence of $n$ Bernoulli trials with success probability $p$ per trial. A string of consecutive successes is known as a success run. Write a function that returns the counts for runs of length $k$ for each $k$ observed in a dictionary.

For example: if the trials were [0, 1, 0, 1, 1, 0, 0, 0, 0, 1], the function should return

{1: 2, 2: 1})
  • What is the probability of observing at least one run of length 5 or more when $n=100$ and $p=0.5$?. Estimate this from 100,000 simulated experiments. Is this more, less or equally likely than finding runs of length 7 or more when $p=0.7$?

In [ ]:

Exercise 3 (20 points).

  • Consider an unbiased random walk of length $n$ as simulated with a sequence of -1 or +1 values. If we start from 0, plot the distribution of last return times for 100,000 simulations with $n = 100$. The last return time is the last time the cumulative sum of the random walk is zero - this may be the starting point if the walk never returns to zero in 100 steps.

  • Do a maximum likeliood fit of a beta distribution to the set of last return times using the beta.fit() function from scipy.stats. Set the lower bound (loc) = 0 and the upper bound (scale) = 100 for plotting. Superimpose the fitted beta PDF on the normalized histogram of last reutrn times.


In [ ]:

Exercise 4 (20 points)

The Cauchy distribution is given by $$ f(x) = \frac{1}{\pi (1 + x^2)}, \ \ -\infty \lt x \lt \infty $$

  • Integrate the tail probability $P(X > 2)$ using Monte Carlo
    1. Sampling from the Cauchy distribution directly
    2. Sampling from the uniform distribution using an appropriate change of variables
  • Plot the 95% CI for the Monte Carlo estimates for n = 1 to 1000
    1. For sampling from the Cauchy distribution using mulitple Monte Carlo sequences
    2. For sampling from the uniform distribution using bootstrap samples of a single Monte Carlo sequence

In [ ]:

Exercise 5 (20 points).

Estimate the following integral using Monte Carlo integration

$$ \int_{-\infty}^{\infty} x^2 \frac{1}{2}e^{-|x|} dx $$

Hint: See notes on importance sampling and figure.


In [ ]: