How to create and populate a histogram


In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Let's generate a data array


In [2]:
data = np.random.rand(500)*1200;

Make a histogram of the data


In [3]:
fig = plt.figure();
plt.hist(data)
plt.show()



What does a hist() fucntion returns?

We can use the hist() function and assign it to a tuple of size 3 to get back some information about what the histogram does.

The whole output is :


In [4]:
n, my_bins, my_patches = plt.hist(data, bins=10);


In this case:

  • n : is an array or a list of arrays that hold the values of the histogram bins. (Careful in case of the weights and/or normed options are used.

In [5]:
n
len(n)


Out[5]:
10
  • my_bins : Is an array. This holds the edges of the bins. The length of the my_bins is nbins+1 (that is nbins left edges and the right edge of the last bin). This is always a single array, even if more than one datasets are passed in.

In [6]:
my_bins


Out[6]:
array([  5.64547712e-03,   1.19967390e+02,   2.39929135e+02,
         3.59890880e+02,   4.79852624e+02,   5.99814369e+02,
         7.19776114e+02,   8.39737858e+02,   9.59699603e+02,
         1.07966135e+03,   1.19962309e+03])
  • my_patches : This is a silent list of the individual patches that are used to create the histogram or list of such list if multiple datasets are plotted.

In [7]:
my_patches


Out[7]:
<a list of 10 Patch objects>

Manipulate The Histogram Aesthetics

Number of bins

Use the bins= option.


In [8]:
plt.hist(data, bins=100);



Range of histogram

Use the range=(tuple) option


In [9]:
plt.hist(data, bins=100, range=(0,1000));



Normalizing your histogram

To normalize a histogram use the normed=True option.


In [10]:
plt.hist(data, normed=True);


This assures that the integral of the distribution is equal to unity.

If stacked is also True, the sum of the histograms is normalized to 1.


Special Normalize

However, sometimes it is useful to visualize the height of the bins to sum up to unity. For this we generate weights for the histogram. Each bin has the weight: 1/(number of data points)

N.B. : Using this technique you MUST NOT USE the normed=True option.

This way adding up the bars will give you 1.


In [11]:
weights = np.ones_like(data)/len(data);

In [12]:
plt.hist(data, weights=weights);  ## We have NOT used the normed = True option



Weights of your input

To weight your data use the weights=(array) option.

The weights array must be of the same shape of the data provided. Each data point provided in the data array only contributes its associated weight towards the bin count (instead of 1)

If you also use normed=True the weights will be normalized so that the integral of the density over the range is unity.

Again, sometimes it is useful to visualize the height of the bins to sum up to unity. For this we generate weights for the histogram. Each bin has the weight: 1/(number of data points)

N.B. : Using this technique you MUST NOT USE the normed=True option.

This way adding up the bars will give you 1.


In [13]:
weights = np.ones_like(data)/len(data);

In [14]:
plt.hist(data, weights=weights);



Cumulative histogram

This is to create the cumulative histogram. Use cimulative=True, so that now each bin has its counts and also all the counts of the previous bins.


In [15]:
plt.hist(data, weights=weights, cumulative=True);



Raise your histogram using bottom

You can raise your histogram by adding either a scalar (fixed) amount on your y-axis, or even an array-like raise. To do this use the bottom=(array,scalar,None) option


In [16]:
plt.hist(data, weights=weights, bottom=5);



In [17]:
nbins = 10
bot = 5*np.random.rand(nbins)

In [18]:
plt.hist(data, bins=nbins, bottom=bot);



Different draw types

Use the histtype= option for other draw options of your histogram. Basics are:

  • bar -> Traditional bar type histogram

In [19]:
plt.hist(data, bins=nbins,histtype='bar');


  • barstacked -> bar-type where multiple data are stacked on-top of the other

In [20]:
plt.hist(data, bins=nbins,histtype='barstacked');


  • step -> To create the line plot only

In [21]:
plt.hist(data, bins=nbins,histtype='step');


  • stepfilled -> to create the step but also fill it (similar to bar but without the vertical lines)

In [22]:
plt.hist(data, bins=nbins,histtype='stepfilled');



Align of the histogram

One can use the align='left'|'mid'|'right' option

  • 'left' -> bars are centered on the left bin edges

In [23]:
plt.hist(data, align='left');


  • 'mid' -> bars centered between bin edges

In [24]:
plt.hist(data, align='mid');


  • 'right' -> guess...

In [25]:
plt.hist(data, align='right');



Orientation of the bins

You can orient the histogram vertical or horizontal using the orientation option.


In [26]:
plt.hist(data, orientation="horizontal");



In [27]:
plt.hist(data, orientation="vertical");



Relative width of the bars

The option rwidth=(scalar,None) defines the relative width of the bars as a fraction of the bin width. If None (default) automatically computes the width.


In [28]:
plt.hist(data);



In [29]:
plt.hist(data, rwidth=0.2);



In [30]:
plt.hist(data, rwidth=0.8);



Logarithmic Scale

To enable the logarithmic scale use the log=True option. The histogram axis will be set to log scale. For logarithmic histograms, empty bins are filtered out.


In [31]:
plt.hist(data, log=True);



Color of your histogram

You can use the presets or array_like of colors.


In [32]:
plt.hist(data, color='red');



In [33]:
plt.hist(data, color=[0.2, 0.3, 0.8, 0.3]); # RGBA



Label your histogram

Use the label=string option. This takes a string or a sequence of strings.


In [34]:
plt.hist(data, label="Histogram");



Stack multiple histograms

To stack more than one histogram use the stacked=True option.

If True multiple data are stacked on top of each other, otherwise, if False multiple data are aranged side by side (or on-top of each other)


In [35]:
data2 = np.random.rand(500)*1300;

In [36]:
plt.hist(data, stacked=True);
plt.hist(data2, stacked=True);


Add Info about the data on the canvas

First of all we can get the mean, median, std of the data plotted and add them on the canvas


In [37]:
entries = len(data);
mean = data.mean();
stdev = data.std();

Then create the string and add these values


In [38]:
textstr = 'Entries=$%i$\nMean=$%.2f$\n$\sigma$=$%.2f$'%(entries, mean, stdev)

In [39]:
plt.hist(data, label=textstr);
plt.ylim(0,100);
plt.legend(loc='best',markerscale=0.01);


Or using a textbox...


In [40]:
plt.hist(data);
plt.ylim(0,100);
#plt.text(800,80,textstr);
plt.annotate(textstr, xy=(0.7, 0.8), xycoords='axes fraction') # annotate for specifying the
                                                                # fraction of the canvas


Out[40]:
<matplotlib.text.Annotation at 0x113a53a10>

How to fit a histogram

Let's generate a normal distribution


In [41]:
fit_data = np.random.randn(500)*200;
plt.hist(fit_data);


Assume now that seeing these data we think that a gaussian distribution will fit the best on the given dataset.

We load the gaussian (normal) distribution from scipy.stats:


In [42]:
from scipy.stats import norm

Now, looking at this function norm we see it has the loc option and the scale option.

loc is the mean value and scale the standard deviation

To fit a gaussian on top of the histogram, we need the normed histogram and also to get the mean and std of the gaussian that fits the data. Therefore we have


In [43]:
plt.hist(fit_data, normed=True);



In [44]:
mean, std = norm.fit(fit_data);

In [45]:
mean


Out[45]:
0.82013770997502522

In [46]:
std


Out[46]:
198.59198168540044

Then we create the curve for that using the norm.pdf in the range of fit_data.min() and fit_data.max()


In [47]:
x = np.linspace(fit_data.min(), fit_data.max(), 1000);

In [48]:
fit_gaus_func = norm.pdf(x, mean, std);

In [49]:
plt.hist(fit_data, normed=True);
plt.plot(x,fit_gaus_func, lw=4);


Fit using Kernel Density Estimation

Instead of specifying a distribution we can fit the best probability density function. This can be achieved thanks to the non-parametric techique of kernel density estimation.

KDE is a non parametric way to estimate the probability density function of a random variable.

How it works? Suppose $(x_1, x_2, ..., x_n)$ are i.i.d. with unknown density $f$. We want to estimate the shape of this function $f$. Its kernel density estimator is

$ \hat{f}_{h}(x) = \frac{1}{n} \sum_{i=1}^{n}(x-x_i) = \frac{1}{nh}\sum_{i=1}^{n}K\left(\frac{x-x_i}{h}\right)$

where the $K()$ is the kernel. Kernel is a non-negative function that intergrates to one and has mean zero, also h>0 is a smoothing parameter called bandwidth. A kernel with a subscript h is called a scaled kernel and is defined as $K_h(x)=\frac{1}{h}K(\frac{x}{h})$. Usually one wants to use small $h$, but is always a trade of between the bias of the estimator and its variance.

Kernel functions commonly used:

- uniform    
- triangular
- biweight
- triweight
- Epanechinikov
- normal 

More under https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use

In python this is done using the scipy.stats.kde submodule. For gaussian kernel density estimation we use the gaussian kde


In [50]:
from scipy.stats import gaussian_kde
pdf_gaus = gaussian_kde(fit_data);
pdf_gaus


Out[50]:
<scipy.stats.kde.gaussian_kde at 0x113701c50>

In [51]:
pdf_gaus = pdf_gaus.evaluate(x);  # get the "y" values from the pdf for the "x" axis, this is an array

In [52]:
pdf_gaus


Out[52]:
array([  2.71474858e-05,   2.75296549e-05,   2.79099491e-05,
         2.82882584e-05,   2.86644779e-05,   2.90385082e-05,
         2.94102550e-05,   2.97796295e-05,   3.01465484e-05,
         3.05109343e-05,   3.08727151e-05,   3.12318249e-05,
         3.15882032e-05,   3.19417956e-05,   3.22925537e-05,
         3.26404347e-05,   3.29854020e-05,   3.33274251e-05,
         3.36664792e-05,   3.40025458e-05,   3.43356123e-05,
         3.46656722e-05,   3.49927249e-05,   3.53167761e-05,
         3.56378375e-05,   3.59559268e-05,   3.62710678e-05,
         3.65832905e-05,   3.68926310e-05,   3.71991315e-05,
         3.75028404e-05,   3.78038122e-05,   3.81021079e-05,
         3.83977945e-05,   3.86909455e-05,   3.89816406e-05,
         3.92699663e-05,   3.95560152e-05,   3.98398868e-05,
         4.01216872e-05,   4.04015292e-05,   4.06795328e-05,
         4.09558247e-05,   4.12305388e-05,   4.15038165e-05,
         4.17758063e-05,   4.20466647e-05,   4.23165556e-05,
         4.25856511e-05,   4.28541311e-05,   4.31221841e-05,
         4.33900071e-05,   4.36578056e-05,   4.39257943e-05,
         4.41941969e-05,   4.44632466e-05,   4.47331863e-05,
         4.50042687e-05,   4.52767565e-05,   4.55509232e-05,
         4.58270527e-05,   4.61054398e-05,   4.63863906e-05,
         4.66702226e-05,   4.69572652e-05,   4.72478597e-05,
         4.75423596e-05,   4.78411310e-05,   4.81445530e-05,
         4.84530175e-05,   4.87669300e-05,   4.90867093e-05,
         4.94127884e-05,   4.97456142e-05,   5.00856478e-05,
         5.04333652e-05,   5.07892568e-05,   5.11538282e-05,
         5.15276002e-05,   5.19111088e-05,   5.23049055e-05,
         5.27095578e-05,   5.31256485e-05,   5.35537767e-05,
         5.39945575e-05,   5.44486219e-05,   5.49166172e-05,
         5.53992071e-05,   5.58970712e-05,   5.64109056e-05,
         5.69414225e-05,   5.74893501e-05,   5.80554329e-05,
         5.86404314e-05,   5.92451217e-05,   5.98702958e-05,
         6.05167609e-05,   6.11853397e-05,   6.18768698e-05,
         6.25922034e-05,   6.33322073e-05,   6.40977621e-05,
         6.48897621e-05,   6.57091148e-05,   6.65567405e-05,
         6.74335715e-05,   6.83405520e-05,   6.92786371e-05,
         7.02487926e-05,   7.12519940e-05,   7.22892257e-05,
         7.33614809e-05,   7.44697600e-05,   7.56150704e-05,
         7.67984253e-05,   7.80208429e-05,   7.92833457e-05,
         8.05869589e-05,   8.19327100e-05,   8.33216275e-05,
         8.47547398e-05,   8.62330741e-05,   8.77576550e-05,
         8.93295038e-05,   9.09496368e-05,   9.26190642e-05,
         9.43387889e-05,   9.61098050e-05,   9.79330966e-05,
         9.98096361e-05,   1.01740383e-04,   1.03726284e-04,
         1.05768267e-04,   1.07867245e-04,   1.10024113e-04,
         1.12239743e-04,   1.14514989e-04,   1.16850679e-04,
         1.19247617e-04,   1.21706582e-04,   1.24228326e-04,
         1.26813569e-04,   1.29463004e-04,   1.32177290e-04,
         1.34957053e-04,   1.37802886e-04,   1.40715345e-04,
         1.43694948e-04,   1.46742175e-04,   1.49857467e-04,
         1.53041224e-04,   1.56293803e-04,   1.59615520e-04,
         1.63006644e-04,   1.66467403e-04,   1.69997975e-04,
         1.73598495e-04,   1.77269049e-04,   1.81009676e-04,
         1.84820367e-04,   1.88701064e-04,   1.92651659e-04,
         1.96671997e-04,   2.00761872e-04,   2.04921029e-04,
         2.09149166e-04,   2.13445929e-04,   2.17810917e-04,
         2.22243682e-04,   2.26743726e-04,   2.31310507e-04,
         2.35943435e-04,   2.40641876e-04,   2.45405153e-04,
         2.50232544e-04,   2.55123289e-04,   2.60076584e-04,
         2.65091591e-04,   2.70167433e-04,   2.75303199e-04,
         2.80497945e-04,   2.85750697e-04,   2.91060451e-04,
         2.96426179e-04,   3.01846826e-04,   3.07321320e-04,
         3.12848566e-04,   3.18427456e-04,   3.24056866e-04,
         3.29735663e-04,   3.35462706e-04,   3.41236851e-04,
         3.47056948e-04,   3.52921854e-04,   3.58830426e-04,
         3.64781530e-04,   3.70774044e-04,   3.76806858e-04,
         3.82878881e-04,   3.88989040e-04,   3.95136287e-04,
         4.01319600e-04,   4.07537987e-04,   4.13790487e-04,
         4.20076176e-04,   4.26394168e-04,   4.32743617e-04,
         4.39123723e-04,   4.45533729e-04,   4.51972931e-04,
         4.58440673e-04,   4.64936354e-04,   4.71459428e-04,
         4.78009405e-04,   4.84585856e-04,   4.91188412e-04,
         4.97816766e-04,   5.04470673e-04,   5.11149953e-04,
         5.17854490e-04,   5.24584236e-04,   5.31339204e-04,
         5.38119476e-04,   5.44925198e-04,   5.51756581e-04,
         5.58613902e-04,   5.65497499e-04,   5.72407774e-04,
         5.79345187e-04,   5.86310262e-04,   5.93303575e-04,
         6.00325761e-04,   6.07377504e-04,   6.14459542e-04,
         6.21572655e-04,   6.28717672e-04,   6.35895458e-04,
         6.43106918e-04,   6.50352988e-04,   6.57634635e-04,
         6.64952849e-04,   6.72308642e-04,   6.79703040e-04,
         6.87137082e-04,   6.94611813e-04,   7.02128279e-04,
         7.09687521e-04,   7.17290575e-04,   7.24938459e-04,
         7.32632174e-04,   7.40372694e-04,   7.48160965e-04,
         7.55997896e-04,   7.63884356e-04,   7.71821168e-04,
         7.79809101e-04,   7.87848870e-04,   7.95941127e-04,
         8.04086456e-04,   8.12285372e-04,   8.20538311e-04,
         8.28845627e-04,   8.37207592e-04,   8.45624386e-04,
         8.54096095e-04,   8.62622708e-04,   8.71204114e-04,
         8.79840099e-04,   8.88530338e-04,   8.97274402e-04,
         9.06071745e-04,   9.14921712e-04,   9.23823529e-04,
         9.32776309e-04,   9.41779043e-04,   9.50830609e-04,
         9.59929763e-04,   9.69075146e-04,   9.78265278e-04,
         9.87498567e-04,   9.96773302e-04,   1.00608766e-03,
         1.01543971e-03,   1.02482740e-03,   1.03424860e-03,
         1.04370103e-03,   1.05318237e-03,   1.06269015e-03,
         1.07222184e-03,   1.08177481e-03,   1.09134636e-03,
         1.10093371e-03,   1.11053399e-03,   1.12014429e-03,
         1.12976163e-03,   1.13938296e-03,   1.14900522e-03,
         1.15862527e-03,   1.16823995e-03,   1.17784609e-03,
         1.18744046e-03,   1.19701985e-03,   1.20658103e-03,
         1.21612077e-03,   1.22563585e-03,   1.23512306e-03,
         1.24457921e-03,   1.25400115e-03,   1.26338574e-03,
         1.27272990e-03,   1.28203060e-03,   1.29128486e-03,
         1.30048975e-03,   1.30964243e-03,   1.31874011e-03,
         1.32778008e-03,   1.33675973e-03,   1.34567652e-03,
         1.35452802e-03,   1.36331189e-03,   1.37202588e-03,
         1.38066787e-03,   1.38923582e-03,   1.39772783e-03,
         1.40614209e-03,   1.41447693e-03,   1.42273078e-03,
         1.43090219e-03,   1.43898986e-03,   1.44699257e-03,
         1.45490925e-03,   1.46273894e-03,   1.47048082e-03,
         1.47813416e-03,   1.48569838e-03,   1.49317300e-03,
         1.50055766e-03,   1.50785212e-03,   1.51505624e-03,
         1.52216999e-03,   1.52919346e-03,   1.53612682e-03,
         1.54297036e-03,   1.54972443e-03,   1.55638950e-03,
         1.56296612e-03,   1.56945491e-03,   1.57585656e-03,
         1.58217184e-03,   1.58840158e-03,   1.59454668e-03,
         1.60060807e-03,   1.60658676e-03,   1.61248377e-03,
         1.61830019e-03,   1.62403710e-03,   1.62969565e-03,
         1.63527699e-03,   1.64078229e-03,   1.64621271e-03,
         1.65156946e-03,   1.65685371e-03,   1.66206665e-03,
         1.66720945e-03,   1.67228327e-03,   1.67728926e-03,
         1.68222853e-03,   1.68710218e-03,   1.69191130e-03,
         1.69665691e-03,   1.70134003e-03,   1.70596162e-03,
         1.71052262e-03,   1.71502393e-03,   1.71946638e-03,
         1.72385079e-03,   1.72817792e-03,   1.73244849e-03,
         1.73666316e-03,   1.74082258e-03,   1.74492731e-03,
         1.74897789e-03,   1.75297482e-03,   1.75691853e-03,
         1.76080943e-03,   1.76464789e-03,   1.76843422e-03,
         1.77216872e-03,   1.77585162e-03,   1.77948314e-03,
         1.78306346e-03,   1.78659273e-03,   1.79007107e-03,
         1.79349859e-03,   1.79687535e-03,   1.80020142e-03,
         1.80347683e-03,   1.80670159e-03,   1.80987574e-03,
         1.81299925e-03,   1.81607213e-03,   1.81909436e-03,
         1.82206594e-03,   1.82498684e-03,   1.82785704e-03,
         1.83067655e-03,   1.83344535e-03,   1.83616345e-03,
         1.83883085e-03,   1.84144758e-03,   1.84401365e-03,
         1.84652913e-03,   1.84899404e-03,   1.85140847e-03,
         1.85377249e-03,   1.85608619e-03,   1.85834967e-03,
         1.86056305e-03,   1.86272645e-03,   1.86484002e-03,
         1.86690389e-03,   1.86891823e-03,   1.87088319e-03,
         1.87279895e-03,   1.87466568e-03,   1.87648355e-03,
         1.87825274e-03,   1.87997342e-03,   1.88164578e-03,
         1.88326996e-03,   1.88484613e-03,   1.88637444e-03,
         1.88785502e-03,   1.88928799e-03,   1.89067346e-03,
         1.89201150e-03,   1.89330219e-03,   1.89454556e-03,
         1.89574163e-03,   1.89689038e-03,   1.89799178e-03,
         1.89904574e-03,   1.90005216e-03,   1.90101090e-03,
         1.90192178e-03,   1.90278460e-03,   1.90359911e-03,
         1.90436501e-03,   1.90508198e-03,   1.90574967e-03,
         1.90636766e-03,   1.90693552e-03,   1.90745276e-03,
         1.90791887e-03,   1.90833330e-03,   1.90869546e-03,
         1.90900471e-03,   1.90926041e-03,   1.90946186e-03,
         1.90960834e-03,   1.90969911e-03,   1.90973339e-03,
         1.90971039e-03,   1.90962927e-03,   1.90948922e-03,
         1.90928936e-03,   1.90902883e-03,   1.90870675e-03,
         1.90832224e-03,   1.90787440e-03,   1.90736234e-03,
         1.90678516e-03,   1.90614198e-03,   1.90543190e-03,
         1.90465405e-03,   1.90380757e-03,   1.90289162e-03,
         1.90190535e-03,   1.90084796e-03,   1.89971867e-03,
         1.89851671e-03,   1.89724133e-03,   1.89589184e-03,
         1.89446756e-03,   1.89296784e-03,   1.89139207e-03,
         1.88973968e-03,   1.88801012e-03,   1.88620290e-03,
         1.88431755e-03,   1.88235365e-03,   1.88031080e-03,
         1.87818867e-03,   1.87598693e-03,   1.87370534e-03,
         1.87134364e-03,   1.86890165e-03,   1.86637921e-03,
         1.86377620e-03,   1.86109254e-03,   1.85832816e-03,
         1.85548306e-03,   1.85255725e-03,   1.84955076e-03,
         1.84646366e-03,   1.84329606e-03,   1.84004806e-03,
         1.83671980e-03,   1.83331146e-03,   1.82982321e-03,
         1.82625525e-03,   1.82260778e-03,   1.81888105e-03,
         1.81507527e-03,   1.81119070e-03,   1.80722760e-03,
         1.80318622e-03,   1.79906683e-03,   1.79486971e-03,
         1.79059511e-03,   1.78624332e-03,   1.78181461e-03,
         1.77730926e-03,   1.77272754e-03,   1.76806971e-03,
         1.76333606e-03,   1.75852684e-03,   1.75364234e-03,
         1.74868280e-03,   1.74364851e-03,   1.73853972e-03,
         1.73335670e-03,   1.72809973e-03,   1.72276906e-03,
         1.71736498e-03,   1.71188778e-03,   1.70633773e-03,
         1.70071514e-03,   1.69502033e-03,   1.68925361e-03,
         1.68341532e-03,   1.67750584e-03,   1.67152553e-03,
         1.66547480e-03,   1.65935409e-03,   1.65316385e-03,
         1.64690457e-03,   1.64057678e-03,   1.63418103e-03,
         1.62771794e-03,   1.62118815e-03,   1.61459234e-03,
         1.60793126e-03,   1.60120570e-03,   1.59441649e-03,
         1.58756455e-03,   1.58065081e-03,   1.57367631e-03,
         1.56664212e-03,   1.55954939e-03,   1.55239932e-03,
         1.54519320e-03,   1.53793236e-03,   1.53061824e-03,
         1.52325231e-03,   1.51583614e-03,   1.50837136e-03,
         1.50085968e-03,   1.49330287e-03,   1.48570280e-03,
         1.47806139e-03,   1.47038062e-03,   1.46266258e-03,
         1.45490940e-03,   1.44712327e-03,   1.43930649e-03,
         1.43146137e-03,   1.42359032e-03,   1.41569579e-03,
         1.40778031e-03,   1.39984643e-03,   1.39189678e-03,
         1.38393402e-03,   1.37596086e-03,   1.36798004e-03,
         1.35999435e-03,   1.35200660e-03,   1.34401962e-03,
         1.33603628e-03,   1.32805945e-03,   1.32009201e-03,
         1.31213687e-03,   1.30419689e-03,   1.29627499e-03,
         1.28837402e-03,   1.28049684e-03,   1.27264630e-03,
         1.26482519e-03,   1.25703629e-03,   1.24928234e-03,
         1.24156602e-03,   1.23388995e-03,   1.22625673e-03,
         1.21866886e-03,   1.21112879e-03,   1.20363888e-03,
         1.19620142e-03,   1.18881862e-03,   1.18149258e-03,
         1.17422532e-03,   1.16701875e-03,   1.15987468e-03,
         1.15279481e-03,   1.14578072e-03,   1.13883386e-03,
         1.13195559e-03,   1.12514711e-03,   1.11840950e-03,
         1.11174372e-03,   1.10515059e-03,   1.09863078e-03,
         1.09218482e-03,   1.08581313e-03,   1.07951594e-03,
         1.07329337e-03,   1.06714539e-03,   1.06107181e-03,
         1.05507231e-03,   1.04914642e-03,   1.04329353e-03,
         1.03751289e-03,   1.03180358e-03,   1.02616459e-03,
         1.02059474e-03,   1.01509272e-03,   1.00965710e-03,
         1.00428630e-03,   9.98978640e-04,   9.93732309e-04,
         9.88545379e-04,   9.83415812e-04,   9.78341465e-04,
         9.73320091e-04,   9.68349351e-04,   9.63426812e-04,
         9.58549959e-04,   9.53716200e-04,   9.48922871e-04,
         9.44167243e-04,   9.39446529e-04,   9.34757892e-04,
         9.30098450e-04,   9.25465286e-04,   9.20855452e-04,
         9.16265980e-04,   9.11693886e-04,   9.07136179e-04,
         9.02589869e-04,   8.98051975e-04,   8.93519530e-04,
         8.88989590e-04,   8.84459240e-04,   8.79925606e-04,
         8.75385853e-04,   8.70837202e-04,   8.66276929e-04,
         8.61702377e-04,   8.57110957e-04,   8.52500161e-04,
         8.47867562e-04,   8.43210822e-04,   8.38527698e-04,
         8.33816046e-04,   8.29073826e-04,   8.24299110e-04,
         8.19490080e-04,   8.14645035e-04,   8.09762398e-04,
         8.04840712e-04,   7.99878651e-04,   7.94875015e-04,
         7.89828737e-04,   7.84738884e-04,   7.79604656e-04,
         7.74425390e-04,   7.69200560e-04,   7.63929775e-04,
         7.58612782e-04,   7.53249464e-04,   7.47839839e-04,
         7.42384060e-04,   7.36882412e-04,   7.31335312e-04,
         7.25743305e-04,   7.20107061e-04,   7.14427375e-04,
         7.08705162e-04,   7.02941454e-04,   6.97137396e-04,
         6.91294241e-04,   6.85413349e-04,   6.79496179e-04,
         6.73544287e-04,   6.67559320e-04,   6.61543011e-04,
         6.55497173e-04,   6.49423695e-04,   6.43324539e-04,
         6.37201726e-04,   6.31057341e-04,   6.24893521e-04,
         6.18712447e-04,   6.12516347e-04,   6.06307480e-04,
         6.00088138e-04,   5.93860636e-04,   5.87627307e-04,
         5.81390497e-04,   5.75152558e-04,   5.68915844e-04,
         5.62682705e-04,   5.56455480e-04,   5.50236495e-04,
         5.44028055e-04,   5.37832439e-04,   5.31651897e-04,
         5.25488646e-04,   5.19344861e-04,   5.13222676e-04,
         5.07124178e-04,   5.01051402e-04,   4.95006328e-04,
         4.88990880e-04,   4.83006918e-04,   4.77056241e-04,
         4.71140580e-04,   4.65261596e-04,   4.59420880e-04,
         4.53619949e-04,   4.47860247e-04,   4.42143140e-04,
         4.36469916e-04,   4.30841788e-04,   4.25259887e-04,
         4.19725265e-04,   4.14238898e-04,   4.08801678e-04,
         4.03414420e-04,   3.98077861e-04,   3.92792659e-04,
         3.87559396e-04,   3.82378577e-04,   3.77250634e-04,
         3.72175927e-04,   3.67154740e-04,   3.62187294e-04,
         3.57273739e-04,   3.52414160e-04,   3.47608580e-04,
         3.42856962e-04,   3.38159209e-04,   3.33515173e-04,
         3.28924649e-04,   3.24387385e-04,   3.19903082e-04,
         3.15471398e-04,   3.11091947e-04,   3.06764309e-04,
         3.02488028e-04,   2.98262617e-04,   2.94087559e-04,
         2.89962313e-04,   2.85886316e-04,   2.81858986e-04,
         2.77879722e-04,   2.73947914e-04,   2.70062939e-04,
         2.66224166e-04,   2.62430961e-04,   2.58682689e-04,
         2.54978711e-04,   2.51318397e-04,   2.47701118e-04,
         2.44126256e-04,   2.40593200e-04,   2.37101355e-04,
         2.33650136e-04,   2.30238976e-04,   2.26867328e-04,
         2.23534659e-04,   2.20240461e-04,   2.16984246e-04,
         2.13765550e-04,   2.10583932e-04,   2.07438978e-04,
         2.04330297e-04,   2.01257527e-04,   1.98220331e-04,
         1.95218399e-04,   1.92251450e-04,   1.89319228e-04,
         1.86421506e-04,   1.83558082e-04,   1.80728783e-04,
         1.77933460e-04,   1.75171990e-04,   1.72444275e-04,
         1.69750240e-04,   1.67089835e-04,   1.64463028e-04,
         1.61869812e-04,   1.59310197e-04,   1.56784211e-04,
         1.54291899e-04,   1.51833323e-04,   1.49408557e-04,
         1.47017686e-04,   1.44660809e-04,   1.42338031e-04,
         1.40049464e-04,   1.37795228e-04,   1.35575443e-04,
         1.33390235e-04,   1.31239726e-04,   1.29124039e-04,
         1.27043293e-04,   1.24997600e-04,   1.22987068e-04,
         1.21011793e-04,   1.19071864e-04,   1.17167355e-04,
         1.15298329e-04,   1.13464832e-04,   1.11666895e-04,
         1.09904529e-04,   1.08177729e-04,   1.06486466e-04,
         1.04830693e-04,   1.03210336e-04,   1.01625302e-04,
         1.00075470e-04,   9.85606952e-05,   9.70808066e-05,
         9.56356067e-05,   9.42248705e-05,   9.28483462e-05,
         9.15057537e-05,   9.01967856e-05,   8.89211064e-05,
         8.76783527e-05,   8.64681337e-05,   8.52900313e-05,
         8.41436002e-05,   8.30283690e-05,   8.19438402e-05,
         8.08894916e-05,   7.98647763e-05,   7.88691242e-05,
         7.79019429e-05,   7.69626184e-05,   7.60505168e-05,
         7.51649852e-05,   7.43053532e-05,   7.34709341e-05,
         7.26610266e-05,   7.18749159e-05,   7.11118758e-05,
         7.03711698e-05,   6.96520531e-05,   6.89537741e-05,
         6.82755761e-05,   6.76166992e-05,   6.69763816e-05,
         6.63538619e-05,   6.57483805e-05,   6.51591813e-05,
         6.45855135e-05,   6.40266333e-05,   6.34818057e-05,
         6.29503059e-05,   6.24314207e-05,   6.19244508e-05,
         6.14287115e-05,   6.09435344e-05,   6.04682689e-05,
         6.00022834e-05,   5.95449664e-05,   5.90957280e-05,
         5.86540004e-05,   5.82192394e-05,   5.77909251e-05,
         5.73685625e-05,   5.69516828e-05,   5.65398430e-05,
         5.61326274e-05,   5.57296476e-05,   5.53305424e-05,
         5.49349788e-05,   5.45426516e-05,   5.41532832e-05,
         5.37666240e-05,   5.33824519e-05,   5.30005722e-05,
         5.26208166e-05,   5.22430437e-05,   5.18671373e-05,
         5.14930068e-05,   5.11205854e-05,   5.07498300e-05,
         5.03807201e-05,   5.00132566e-05,   4.96474608e-05,
         4.92833736e-05,   4.89210537e-05,   4.85605771e-05,
         4.82020349e-05,   4.78455331e-05,   4.74911901e-05,
         4.71391360e-05,   4.67895113e-05,   4.64424647e-05,
         4.60981525e-05,   4.57567366e-05,   4.54183832e-05,
         4.50832616e-05,   4.47515422e-05,   4.44233957e-05,
         4.40989915e-05,   4.37784960e-05,   4.34620719e-05,
         4.31498763e-05,   4.28420598e-05,   4.25387655e-05,
         4.22401273e-05,   4.19462693e-05,   4.16573047e-05,
         4.13733347e-05,   4.10944477e-05,   4.08207186e-05,
         4.05522078e-05,   4.02889611e-05,   4.00310084e-05,
         3.97783637e-05,   3.95310246e-05,   3.92889720e-05,
         3.90521696e-05,   3.88205642e-05,   3.85940850e-05,
         3.83726444e-05,   3.81561373e-05,   3.79444419e-05,
         3.77374194e-05,   3.75349148e-05,   3.73367571e-05,
         3.71427596e-05,   3.69527208e-05,   3.67664247e-05,
         3.65836415e-05,   3.64041287e-05,   3.62276311e-05,
         3.60538827e-05,   3.58826067e-05,   3.57135166e-05,
         3.55463177e-05,   3.53807073e-05,   3.52163763e-05,
         3.50530102e-05,   3.48902900e-05,   3.47278932e-05,
         3.45654952e-05,   3.44027703e-05,   3.42393927e-05,
         3.40750378e-05,   3.39093830e-05,   3.37421091e-05,
         3.35729015e-05,   3.34014505e-05,   3.32274534e-05,
         3.30506146e-05,   3.28706470e-05,   3.26872730e-05,
         3.25002251e-05,   3.23092470e-05,   3.21140944e-05,
         3.19145356e-05,   3.17103524e-05,   3.15013406e-05,
         3.12873108e-05,   3.10680888e-05,   3.08435161e-05,
         3.06134506e-05,   3.03777666e-05,   3.01363553e-05,
         2.98891252e-05,   2.96360020e-05,   2.93769290e-05,
         2.91118672e-05,   2.88407951e-05,   2.85637087e-05,
         2.82806218e-05])

In [53]:
plt.hist(fit_data, normed=1);
plt.plot(x, pdf_gaus, 'k', lw=3)


Out[53]:
[<matplotlib.lines.Line2D at 0x112c7f390>]

In [54]:
plt.plot(x,fit_gaus_func, lw=4, label="fit");
plt.plot(x, pdf_gaus, 'k', lw=3, label="KDE");
plt.legend();


N.B.: Notice the difference in the two fit curves! This comes from the fact that the Gaussian kernel is a mixture of normal distrubutions; a Gaussian mixture may be skew or heavy-, light-tailed or multimodal. Thus it does not assume the original distrubution of any particular form.

Fit using Scipy's Optimize submodule

Scipy comes with an optimize submodule that provides several commonly used optimization algorithms.

One of the easiest is curve_fit, which uses non-linear least squares to fit a function $f$ to the data. It assumes that :

$ y_{data} = f(x_{data}, *params) + eps$

The declaration of the function is :

scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, bounds=(-inf, inf), method=None, jac=None, **kwargs)

where

  • f : model function (callable) f(x, ... ) the independent variable must be its first argument
  • xdata : sequence or array (array (k,M) when we have multiple funcitons with k predictors)
  • ydata : sequence of dependent data
  • p0 : initial guess for the parameters
  • sigma : if it is not set to None this provides the uncertainties in the ydata array. These are used as weights in the LSquere problem, i.e. minimizing : $\sum{\left(\frac{f(xdata, popt)-ydata}{sigma}\right)^2}$. If set to None, uncertainties are assumed to be 1. -absolute_sigma : (bool) When False, sigma denotes relative weights of datapoints. The returned covariance matrix is based on estimated errors in the data and is not affected by the overall magnitude of the values in sigma. Only the relative magnitudes of the sigma values matter. If true, then sigma describes one standard deviation errors of the input data points. The estimated covariance in pcov is based on these values.
  • method : 'lm', 'trf', 'dogbox' (N.B.: lm does not work when the number of observations is less than the number of variables)

The function returns

  • popt : array of optimal values for the parameters so that the sum of the squared error of $f(xdata, popt)-ydata$ is minimized

  • pcov : the covariance matrix of popot. To compute one standard deviation errors on the parameters use : $$ perr = np.sqrt(np.diag(pcov)) $$

Errors raised by the module:

  • ValueError : if there are any NaN's or incompatible options used
  • RuntimeError : if least squares minimization fails
  • OptimizeWarning : if covariance of the parameters cannot be estimated.

Example of curve fit :


In [55]:
from scipy.optimize import curve_fit

## define the model function:
def func(x, a,b,c):
    return a*np.exp(-b*x)+c

## the x-points
xdata = np.linspace(0,4,50);

## get some data from the model function...
y = func(xdata, 2.5, 1.3, 0.5)

##and then add some gaussian errors to generate the "data"
ydata = y + 0.2*np.random.normal(size=len(xdata))

In [56]:
## now run the curve_fit()
popt, pcov = curve_fit(func, xdata, ydata)

In [57]:
popt


Out[57]:
array([ 2.21865978,  1.33348349,  0.58840873])

In [58]:
pcov


Out[58]:
array([[ 0.0155489 ,  0.00750671, -0.00077657],
       [ 0.00750671,  0.02584644,  0.00621311],
       [-0.00077657,  0.00621311,  0.00286014]])

In [59]:
### To constrain the optimization to the region of 0<a<3.0 , 0<b<2 and 0<c<1
popt, pcov = curve_fit(func, xdata, ydata, bounds=(0, [3., 2., 1.]))

In [60]:
popt


Out[60]:
array([ 2.21866033,  1.33348536,  0.58840918])

In [61]:
pcov


Out[61]:
array([[ 0.01554872,  0.00750645, -0.00077659],
       [ 0.00750645,  0.02584589,  0.00621318],
       [-0.00077659,  0.00621318,  0.00286022]])

Curve fit on histogram

To use curve_fit on a histogram we need to get the bin heights and model the histogram as a set of data points. One way to do it is to take the height of the center bin as (x,y) datapoints.


In [62]:
### Start with the histogram, but now get the values and binEdges
n,bins,patches = plt.hist(fit_data, bins=10);



In [63]:
n


Out[63]:
array([   3.,    8.,   46.,  105.,  136.,  107.,   61.,   25.,    6.,    3.])

In [64]:
bins


Out[64]:
array([-623.09381041, -490.53113569, -357.96846098, -225.40578626,
        -92.84311154,   39.71956318,  172.28223789,  304.84491261,
        437.40758733,  569.97026205,  702.53293676])

In [65]:
### Calculate the bin centers as (bins[1:]+bins[:-1])/2
binCenters = 0.5 * (bins[1:] + bins[:-1]); # we throw away the first (in the first) and last (in the second) edge
binCenters


Out[65]:
array([-556.81247305, -424.24979834, -291.68712362, -159.1244489 ,
        -26.56177418,  106.00090053,  238.56357525,  371.12624997,
        503.68892469,  636.2515994 ])

In [66]:
## function to model is a gaussian:
def func_g(x, a, b, c):
    return a*np.exp(-(x-b)**2/(2*c**2))

## xdata are the centers, ydata are the values
xdata = binCenters
ydata = n

popt, pcov = curve_fit(func_g, xdata, ydata, p0=[1, ydata.mean(), ydata.std()])#,diag=(1./xdata.mean(),1./ydata.mean())) # setting some initial guesses to "help" the minimizer

In [67]:
popt


Out[67]:
array([ 134.75985608,  -11.30784948,  195.28577603])

In [68]:
pcov


Out[68]:
array([[  1.08245601e+01,   1.67054681e-03,  -1.04615549e+01],
       [  1.67054681e-03,   3.03045979e+01,  -5.29893108e-03],
       [ -1.04615549e+01,  -5.29893108e-03,   3.03217877e+01]])

In [74]:
plt.plot(xdata,ydata, "ro--", lw=2, label="Data Bin Center");
plt.hist(fit_data, bins=10, label="Data Hist");
plt.plot(np.linspace(xdata.min(),xdata.max(),100), 
         func_g(np.linspace(xdata.min(),xdata.max(),100),popt[0],popt[1],popt[2]), 
         "g-", lw=3, label="Fit");  # I increased the x axis points to have smoother curve

plt.legend();


What about the fit errors?

To get the standard deviation of the parameters simply get the square root of the sum of the diagonal elements of the covariance matrix.


In [77]:
errors = []
for i in range(len(popt)):
    try:
        errors.append(np.absolute(pcov[i][i])**0.5)
    except:
        errors.append(0.00)

In [78]:
for i in range(len(popt)):
    print popt[i],"+/-",errors[i]


134.759856082 +/- 3.29006992176
-11.3078494807 +/- 5.50496121037
195.285776031 +/- 5.50652229064

However this works when using curve_fit.

The optimize.leastsq method will return the fractional covariance matrix. We have to multiply the elements of this matrix by the residual variance (the reduced chi squared) and then take the square root of the diagonal elements, to get an estimate of the standard deviation of the fit parameters.

How can I be sure about my fit errors?

To get the proper estimate of the standard error in the fitted parameters is a complicated statistical problem. In detail, the resulting covariance matrix of the optimize.leastsq and optimize.curve_fit, is based on the assumptions regarding the probability distribution of the errors and the interactions between parameters; these interaction may exist depending on the specific fit function $f(x)$. A good way to deal with a complicated $f(x)$ is to use the bootstrap method.


In [ ]: