Performing Maximum Likelihood Estimates (MLEs) in IPython

By Delaney Granizo-Mackenzie and Andrei Kirilenko.

This notebook developed in collaboration with Prof. Andrei Kirilenko as part of the Masters of Finance curriculum at MIT Sloan.

Part of the Quantopian Lecture Series:

Notebook released under the Creative Commons Attribution 4.0 License.

In this tutorial notebook we'll do the following things:

Compute the MLE for a normal distribution.
Compute the MLE for an exponential distribution.
Fit a normal distribution to asset returns using MLE.

First we need to import some libraries



In [13]:

    
import math
import matplotlib.pyplot as plt
import numpy as np
import scipy
import scipy.stats

Normal Distribution

We'll start by sampling some data from a normal distribution.



In [14]:

    
TRUE_MEAN = 40
TRUE_STD = 10
X = np.random.normal(TRUE_MEAN, TRUE_STD, 1000)

Now we'll define functions that given our data, will compute the MLE for the $\mu$ and $\sigma$ parameters of the normal distribution.

Recall that

$$\hat\mu = \frac{1}{T}\sum_{t=1}^{T} x_t$$$$\hat\sigma = \sqrt{\frac{1}{T}\sum_{t=1}^{T}{(x_t - \hat\mu)^2}}$$



In [15]:

    
def normal_mu_MLE(X):
    # Get the number of observations
    T = len(X)
    # Sum the observations
    s = sum(X)
    return 1.0/T * s

def normal_sigma_MLE(X):
    T = len(X)
    # Get the mu MLE
    mu = normal_mu_MLE(X)
    # Sum the square of the differences
    s = sum( np.power((X - mu), 2) )
    # Compute sigma^2
    sigma_squared = 1.0/T * s
    return math.sqrt(sigma_squared)

Now let's try our functions out on our sample data and see how they compare to the built-in np.mean and np.std



In [16]:

    
print "Mean Estimation"
print normal_mu_MLE(X)
print np.mean(X)
print "Standard Deviation Estimation"
print normal_sigma_MLE(X)
print np.std(X)









    



Mean Estimation
39.5829392522
39.5829392522
Standard Deviation Estimation
10.2920350139
10.2920350139

Now let's estimate both parameters at once with scipy's built in fit() function.



In [17]:

    
mu, std = scipy.stats.norm.fit(X)
print "mu estimate: " + str(mu)
print "std estimate: " + str(std)









    



mu estimate: 39.5829392522
std estimate: 10.2920350139

Now let's plot the distribution PDF along with the data to see how well it fits. We can do that by accessing the pdf provided in scipy.stats.norm.pdf.



In [18]:

    
pdf = scipy.stats.norm.pdf
# We would like to plot our data along an x-axis ranging from 0-80 with 80 intervals
# (increments of 1)
x = np.linspace(0, 80, 80)
plt.hist(X, bins=x, normed='true')
plt.plot(pdf(x, loc=mu, scale=std))
plt.xlabel('Value')
plt.ylabel('Observed Frequency')
plt.legend(['Fitted Distribution PDF', 'Observed Data', ]);

Exponential Distribution

Let's do the same thing, but for the exponential distribution. We'll start by sampling some data.



In [37]:

    
TRUE_LAMBDA = 5
X = np.random.exponential(TRUE_LAMBDA, 1000)

numpy defines the exponential distribution as $$\frac{1}{\lambda}e^{-\frac{x}{\lambda}}$$

So we need to invert the MLE from the lecture notes. There it is

$$\hat\lambda = \frac{T}{\sum_{t=1}^{T} x_t}$$

Here it's just the reciprocal, so

$$\hat\lambda = \frac{\sum_{t=1}^{T} x_t}{T}$$



In [38]:

    
def exp_lamda_MLE(X):
    T = len(X)
    s = sum(X)
    return s/T



In [39]:

    
print "lambda estimate: " + str(exp_lamda_MLE(X))









    



lambda estimate: 4.71513802853



In [40]:

    
# The scipy version of the exponential distribution has a location parameter
# that can skew the distribution. We ignore this by fixing the location
# parameter to 0 with floc=0
_, l = scipy.stats.expon.fit(X, floc=0)



In [41]:

    
pdf = scipy.stats.expon.pdf
x = range(0, 80)
plt.hist(X, bins=x, normed='true')
plt.plot(pdf(x, scale=l))
plt.xlabel('Value')
plt.ylabel('Observed Frequency')
plt.legend(['Fitted Distribution PDF', 'Observed Data', ]);

MLE for Asset Returns

Now we'll fetch some real returns and try to fit a normal distribution to them using MLE.



In [42]:

    
prices = get_pricing('TSLA', fields='price', start_date='2014-01-01', end_date='2015-01-01')
# This will give us the number of dollars returned each day
absolute_returns = np.diff(prices)
# This will give us the percentage return over the last day's value
# the [:-1] notation gives us all but the last item in the array
# We do this because there are no returns on the final price in the array.
returns = absolute_returns/prices[:-1]

Let's use scipy's fit function to get the $\mu$ and $\sigma$ MLEs.



In [43]:

    
mu, std = scipy.stats.norm.fit(returns)
pdf = scipy.stats.norm.pdf
x = np.linspace(-1,1, num=100)
h = plt.hist(returns, bins=x, normed='true')
l = plt.plot(x, pdf(x, loc=mu, scale=std))

Of course, this fit is meaningless unless we've tested that they obey a normal distribution first. We can test this using the Jarque-Bera normality test. The Jarque-Bera test will reject the hypothesis of a normal distribution if the p-value is under a c



In [45]:

    
from statsmodels.stats.stattools import jarque_bera
jarque_bera(returns)









    Out[45]:





(242.53773931956744,
 2.1557534018848578e-53,
 0.7405363946340933,
 7.582279228661398)



In [46]:

    
jarque_bera(np.random.normal(0, 1, 100))









    Out[46]:





(0.9856724005093672,
 0.61089132471452023,
 -0.10806345762002881,
 2.5642815902086933)



In [ ]: