In [1]:

    
%matplotlib inline
import pandas as pd

Ingest wikipedia tables

Read in a whole wikipedia page as a list of data frames



In [2]:

    
wiki_df = pd.read_html("https://en.wikipedia.org/w/index.php?title=List_of_James_Bond_films&oldid=688916363", header=0)

Pandas read_html will return all the tables in the web page, as a list of dataframes



In [3]:

    
type(wiki_df)









    Out[3]:





list

The table we want is the second (the first is a revision message). Using Python slices we get only the rows we want.



In [4]:

    
df = wiki_df[1][1:24]



In [5]:

    
df[['Title','Box office.1']]









    Out[5]:






  
    
      
      Title
      Box office.1
    
  
  
    
      1
      Dr. No
      448.8
    
    
      2
      From Russia with Love
      543.8
    
    
      3
      Goldfinger
      820.4
    
    
      4
      Thunderball
      848.1
    
    
      5
      You Only Live Twice
      514.2
    
    
      6
      On Her Majesty's Secret Service
      291.5
    
    
      7
      Diamonds Are Forever
      442.5
    
    
      8
      Live and Let Die
      460.3
    
    
      9
      man with !The Man with the Golden Gun
      334.0
    
    
      10
      spy who !The Spy Who Loved Me
      533.0
    
    
      11
      Moonraker
      535.0
    
    
      12
      For Your Eyes Only
      449.4
    
    
      13
      Octopussy
      373.8
    
    
      14
      view !A View to a Kill
      275.2
    
    
      15
      living !The Living Daylights
      313.5
    
    
      16
      Licence to Kill
      250.9
    
    
      17
      GoldenEye
      518.5
    
    
      18
      Tomorrow Never Dies
      463.2
    
    
      19
      world !The World Is Not Enough
      439.5
    
    
      20
      Die Another Day
      465.4
    
    
      21
      Casino Royale
      581.5
    
    
      22
      Quantum of Solace
      514.2
    
    
      23
      Skyfall
      879.8

Hard to quickly see the trend in a table format. How 'bout a pretty graph? Pandas plot might be all you need. Usually dataframe.plot() is enough, but we'll add a title, a data table below, and some average dash lines.



In [6]:

    
ax = df.plot(table=True, xticks=[], title="Bond movies in 2005 dollars (million)", figsize=(17,11))
ax.hlines(y=df.mean()[0], xmin=0, xmax=23, color='b', alpha=0.5, linestyle='dashed', label='Box office average')
ax.hlines(y=df.mean()[1], xmin=0, xmax=23, color='g', alpha=0.5, linestyle='dashed', label='Budget average')









    Out[6]:





<matplotlib.collections.LineCollection at 0x10a5302b0>



In [ ]:

	Title	Box office.1
1	Dr. No	448.8
2	From Russia with Love	543.8
3	Goldfinger	820.4
4	Thunderball	848.1
5	You Only Live Twice	514.2
6	On Her Majesty's Secret Service	291.5
7	Diamonds Are Forever	442.5
8	Live and Let Die	460.3
9	man with !The Man with the Golden Gun	334.0
10	spy who !The Spy Who Loved Me	533.0
11	Moonraker	535.0
12	For Your Eyes Only	449.4
13	Octopussy	373.8
14	view !A View to a Kill	275.2
15	living !The Living Daylights	313.5
16	Licence to Kill	250.9
17	GoldenEye	518.5
18	Tomorrow Never Dies	463.2
19	world !The World Is Not Enough	439.5
20	Die Another Day	465.4
21	Casino Royale	581.5
22	Quantum of Solace	514.2
23	Skyfall	879.8