Histogram Plots
It's a graphical representation of a frequency distribution of some numerical data. Rectangles with equal sizes in the horizontal directions have heights with the corresponding frequencies.
If we construct a histogram, we start with distribute the range of possible x values into usually equal sized and adjacent intervals or bins
In [1]:
# import
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [23]:
# generating some data points
X = np.random.random_integers(20, 50, 1000)
Y = np.random.random_integers(20, 50, 1000)
Plotting Histogram
In [24]:
plt.hist(X)
plt.xlabel("Value of X")
plt.ylabel("Freq")
Out[24]:
In [25]:
gaussian_numbers = np.random.normal(size=10000)
plt.hist(gaussian_numbers)
plt.title("Gaussian Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
In [30]:
n, bins, patches = plt.hist(gaussian_numbers)
print("n: ",n, np.sum(n)) # freq
print("bins: ", bins)
print("patches: ", patches)
for p in patches:
print(p,)
By default, hist is using 10 equal bins to plot the data, we can increase this no by using bins=n
In [31]:
n, bins, patches = plt.hist(gaussian_numbers, bins=100)
Another important keyword parameter of hist is "normed". "normed" is optional and the default value is 'False'. If it is set to 'True', the first element of the return tuple will be the counts normalized to form a probability density,
i.e., "n/(len(x)`dbin)", ie the integral of the histogram will sum to 1.
In [32]:
n, bins, patches = plt.hist(gaussian_numbers, bins=100, normed=True)
If both the parameters 'normed' and 'stacked' are set to 'True', the sum of the histograms is normalized to 1.
In [34]:
plt.hist(gaussian_numbers,
bins=100,
normed=True,
stacked=True,
edgecolor="#6A9662",
color="#DDFFDD")
plt.show()
can plot it as a cumulative distribution function as well by setting the parameter 'cumulative'
In [35]:
plt.hist(gaussian_numbers,
bins=100,
normed=True,
stacked=True,
cumulative=True)
plt.show()
Bar Plots
In [39]:
bars = plt.bar([1,2,3,4], [1,4,9,16])
bars[0].set_color('green')
plt.show()
In [40]:
f=plt.figure()
ax=f.add_subplot(1,1,1)
ax.bar([1,2,3,4], [1,4,9,16])
children = ax.get_children()
children[3].set_color('g')
In [41]:
years = ('2010', '2011', '2012', '2013', '2014')
visitors = (1241, 50927, 162242, 222093, 296665 / 8 * 12)
index = np.arange(len(visitors))
bar_width = 1.0
plt.bar(index, visitors, bar_width, color="green")
plt.xticks(index + bar_width / 2, years) # labels get centered
plt.show()
In [ ]: