Title: Histograms In MatPlotLib
Slug: matplotlib_histogram
Summary: Histograms In MatPlotLib
Date: 2016-05-01 12:00
Category: Python
Tags: Data Visualization
Authors: Chris Albon

Based on: Sebastian Raschka.

Preliminaries


In [89]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math

# Set ipython's max row display
pd.set_option('display.max_row', 1000)

# Set iPython's max column width to 50
pd.set_option('display.max_columns', 50)

Create dataframe


In [3]:
df = pd.read_csv('https://www.dropbox.com/s/52cb7kcflr8qm2u/5kings_battles_v1.csv?dl=1')
df.head()


Out[3]:
name year battle_number attacker_king defender_king attacker_1 attacker_2 attacker_3 attacker_4 defender_1 defender_2 defender_3 defender_4 attacker_outcome battle_type major_death major_capture attacker_size defender_size attacker_commander defender_commander summer location region note
0 Battle of the Golden Tooth 298 1 Joffrey/Tommen Baratheon Robb Stark Lannister NaN NaN NaN Tully NaN NaN NaN win pitched battle 1 0 15000 4000 Jaime Lannister Clement Piper, Vance 1 Golden Tooth The Westerlands NaN
1 Battle at the Mummer's Ford 298 2 Joffrey/Tommen Baratheon Robb Stark Lannister NaN NaN NaN Baratheon NaN NaN NaN win ambush 1 0 NaN 120 Gregor Clegane Beric Dondarrion 1 Mummer's Ford The Riverlands NaN
2 Battle of Riverrun 298 3 Joffrey/Tommen Baratheon Robb Stark Lannister NaN NaN NaN Tully NaN NaN NaN win pitched battle 0 1 15000 10000 Jaime Lannister, Andros Brax Edmure Tully, Tytos Blackwood 1 Riverrun The Riverlands NaN
3 Battle of the Green Fork 298 4 Robb Stark Joffrey/Tommen Baratheon Stark NaN NaN NaN Lannister NaN NaN NaN loss pitched battle 1 1 18000 20000 Roose Bolton, Wylis Manderly, Medger Cerwyn, H... Tywin Lannister, Gregor Clegane, Kevan Lannist... 1 Green Fork The Riverlands NaN
4 Battle of the Whispering Wood 298 5 Robb Stark Joffrey/Tommen Baratheon Stark Tully NaN NaN Lannister NaN NaN NaN win ambush 1 1 1875 6000 Robb Stark, Brynden Tully Jaime Lannister 1 Whispering Wood The Riverlands NaN

Make plot with bins of fixed size


In [87]:
# Make two variables of the attacker and defender size, but leaving out
# cases when there are over 10000 attackers
data1 = df['attacker_size'][df['attacker_size'] < 90000]
data2 = df['defender_size'][df['attacker_size'] < 90000]

# Create bins of 2000 each
bins = np.arange(data1.min(), data2.max(), 2000) # fixed bin size

# Plot a histogram of attacker size
plt.hist(data1, 
         bins=bins, 
         alpha=0.5, 
         color='#EDD834',
         label='Attacker')

# Plot a histogram of defender size
plt.hist(data2, 
         bins=bins, 
         alpha=0.5, 
         color='#887E43',
         label='Defender')

# Set the x and y boundaries of the figure
plt.ylim([0, 10])

# Set the title and labels
plt.title('Histogram of Attacker and Defender Size')
plt.xlabel('Number of troops')
plt.ylabel('Number of battles')
plt.legend(loc='upper right')

plt.show()


Make plot with fixed number of bins


In [108]:
# Make two variables of the attacker and defender size, but leaving out
# cases when there are over 10000 attackers
data1 = df['attacker_size'][df['attacker_size'] < 90000]
data2 = df['defender_size'][df['attacker_size'] < 90000]

# Create 10 bins with the minimum 
# being the smallest value of data1 and data2 
bins = np.linspace(min(data1 + data2), 
                   # the max being the highest value
                   max(data1 + data2),
                   # and divided into 10 bins
                   10)

# Plot a histogram of attacker size
plt.hist(data1, 
         # with bins defined as
         bins=bins, 
         # with alpha
         alpha=0.5, 
         # with color
         color='#EDD834',
         # labelled attacker
         label='Attacker')

# Plot a histogram of defender size
plt.hist(data2, 
         # with bins defined as
         bins=bins, 
         # with alpha
         alpha=0.5, 
         # with color
         color='#887E43',
         # labeled defender
         label='Defender')

# Set the x and y boundaries of the figure
plt.ylim([0, 10])

# Set the title and labels
plt.title('Histogram of Attacker and Defender Size')
plt.xlabel('Number of troops')
plt.ylabel('Number of battles')
plt.legend(loc='upper right')

plt.show()