Fit real histogram


In [1]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib

In [2]:
from scipy.stats import norm

Create normally distributed data


In [3]:
# fake some data
data = norm.rvs(loc=0.0, scale=1.0, size =150)
plt.hist(data, rwidth=0.85, facecolor='black');
plt.ylabel('Number of events');
plt.xlabel('Value');


Obtain the fitting to a normal distribution

This is simply the mean and the standard deviation of the sample data


In [4]:
mean, stdev = norm.fit(data)
print('Mean =%f, Stdev=%f'%(mean,stdev))


Mean =0.050569, Stdev=1.022829

To adapt the normalized PDF of the normal distribution we simply have to multiply every value by the area of the histogram obtained

Get the histogram data from NumPy


In [5]:
histdata = plt.hist(data, bins=10, color='black', rwidth=.85) # we set 10 bins



In [8]:
counts, binedge = np.histogram(data, bins=10);
print(binedge)


[-2.58148003 -2.0579089  -1.53433777 -1.01076664 -0.48719551  0.03637562
  0.55994674  1.08351787  1.607089    2.13066013  2.65423126]

In [9]:
#G et bincenters from bin edges
bincenter = [0.5 * (binedge[i] + binedge[i+1]) for i in xrange(len(binedge)-1)]

In [10]:
bincenter


Out[10]:
[-2.3196944640956341,
 -1.7961233353834976,
 -1.272552206671361,
 -0.7489810779592242,
 -0.22540994924708757,
 0.29816117946504916,
 0.82173230817718612,
 1.3453034368893229,
 1.8688745656014594,
 2.3924456943135963]

In [11]:
binwidth = (max(bincenter) - min(bincenter)) / len(bincenter) 
print(binwidth)


0.471214015841

Scale the normal PDF to the area of the histogram


In [12]:
x = np.linspace( start = -4 , stop = 4, num = 100)
mynorm = norm(loc = mean, scale = stdev)

In [13]:
# Scale Norm PDF to the area (binwidth)*number of samples of the histogram
myfit = mynorm.pdf(x)*binwidth*len(data)

In [14]:
# Plot everthing together
plt.hist(data, bins=10, facecolor='white', histtype='stepfilled');
plt.fill(x, myfit, 'r', alpha=.5);
plt.ylabel('Number of observations');
plt.xlabel('Value');