Ronica Reddick & Nick Pulito

in association with

"Those Data Bootcamp Guys" -- Professors Backus and Coleman

present

"3 Guys Named Chris"

Scene 1: "The Set-Up"

Hollywood hunks come and go, but every so often a star builds a lasting career out of blowing stuff up. Currently, there is no shortage of beef cake on the silver screen with Chris Evans, Chris Hemsworth, and Chris Pratt all regularly starring in blockbuster films. There is no denying the bankability of the Chrises, but which Chris has staying power?

The now defunct Grantland podcast had a “market correction” theory they applied to Hollywood actors. The idea is that there’s only room in the market for one A list celebrity of a particular type and that over time the market will choose its favorite.The hosts would compare two Hollywood actors with similar “types” and predict which one would still have a career in 20 years.

Using data from Box Office Mojo we decided to test the market correction theory on the Chrises by comparing the box office numbers of their biggest hits to those of heroes from the days of yore: Tom Cruise, Arnold Schwarzenegger, and Bruce Willis. We were looking for patterns in the box office receipts of the old guard that may shed some light on who which Chris will be on top in 2035, and to see if any of the box office heroes of yesteryear had a little more staying power than the others.


In [115]:
#This guided coding excercise requires associated .csv files: CE1.csv, CH1.csv, CP1.csv, Arnold1.csv, Bruce1.csv, and Tom1.csv
#make sure you have these supplemental materials ready to go in your active directory before proceeding

#Let's start coding! We first need to make sure our preliminary packages are in order. We imported the following... 
#some may have ended up superfluous, but we figured it was better to cover our bases!

import pandas as pd             
import sys
import matplotlib as mpl
import matplotlib.pyplot as plt 
import sys                      
import os                       
import datetime as dt            
import csv
import requests, io             
from bs4 import BeautifulSoup   

%matplotlib inline 

print('\nPython version: ', sys.version) 
print('Pandas version: ', pd.__version__)
print('Requests version: ', requests.__version__)
print("Today's date:", dt.date.today())


Python version:  3.5.1 |Anaconda 4.0.0 (64-bit)| (default, Feb 16 2016, 09:49:46) [MSC v.1900 64 bit (AMD64)]
Pandas version:  0.18.0
Requests version:  2.9.1
Today's date: 2016-05-10

Scene 2: "The Chris Contenders"

Methodology

To dive into which Chris will have staying power in years to come, we looked to authoritative Hollywood Data Source BoxOfficeMojo.com. A bit of simple webscraping gave us film titles broken out by actor, with adjusted box office revenues in tow.

We wanted to aggregate data for our "Three Chrises" and compare it to 3 Hollywood legends who have had variable staying power over the years: Bruce Willis, Tom Cruise, and Arnold Schwarzenegger.

Digging up data on our leading gentlemen

Cells that follow show our process for scraping and organizing the data for the Chris contenders.

Chris Evans

“The All American Hero”

Age: 34

Height: 6’

Known for: Captain America ($267,656,500); The Avengers; Fantastic Four

Legit Roles: Snowpiercer

Biggest Hit: Marvel’s The Avengers $659,640,800


In [2]:
# data scraped from Box Office Mojo, the authoritative source for Hollywood Box Office Data 

# chris evans
url = 'http://www.boxofficemojo.com/people/chart/?view=Actor&id=chrisevans.htm'
evans  = pd.read_html(url)

print('Ouput has type', type(evans), 'and length', len(evans))
print('First element has type', type(evans[0]))

#we have a list of dataframes, and the cut of data we want is represented by the below

evans[2]


Ouput has type <class 'list'> and length 4
First element has type <class 'pandas.core.frame.DataFrame'>
Out[2]:
0 1 2 3 4 5
0 Rank Title (click to view) Studio Adjusted Gross Unadjusted Gross Release
1 1 Marvel's The Avengers BV $659,640,800 $623,357,910 5/4/12
2 2 Avengers: Age of Ultron BV $459,260,900 $459,005,868 5/1/15
3 3 Captain America: The Winter Soldier BV $267,656,500 $259,766,572 4/4/14
4 - Thor: The Dark World BV $212,276,600 $206,362,140 11/8/13
5 4 Fantastic Four (2005) Fox $207,065,900 $154,696,080 7/8/05
6 5 Captain America: The First Avenger Par. $190,920,400 $176,654,505 7/22/11
7 - Ant-Man BV $187,285,300 $180,202,163 7/17/15
8 6 Captain America: Civil War BV $179,139,100 $179,139,142 5/6/16
9 7 Fantastic Four: Rise of the Silver Surfer Fox $164,518,700 $131,921,738 6/15/07
10 - TMNT WB $67,529,000 $54,149,098 3/23/07
11 8 Not Another Teen Movie Sony $57,502,000 $38,252,284 12/14/01
12 9 Cellular NL $44,217,600 $32,003,620 9/10/04
13 10 Push Sum. $38,014,300 $31,811,527 2/6/09
14 11 Scott Pilgrim vs. the World Uni. $34,022,400 $31,524,275 8/13/10
15 12 The Nanny Diaries MGM/W $32,337,900 $25,930,652 8/24/07
16 13 Street Kings FoxS $31,569,900 $26,418,667 4/11/08
17 14 The Losers WB $25,460,900 $23,591,432 4/23/10
18 15 What's Your Number? Fox $15,140,400 $14,011,084 9/30/11
19 16 The Perfect Score Par. $14,356,600 $10,391,003 1/30/04
20 17 Snowpiercer RTWC $4,845,600 $4,563,650 6/27/14
21 18 Sunshine FoxS $4,584,000 $3,675,753 7/20/07
22 19 The Iceman (2013) MNE $2,016,200 $1,969,193 5/3/13
23 20 Fierce People ADF $106,500 $85,410 9/7/07
24 21 Puncture MNE $74,500 $68,945 9/23/11
25 22 Before We Go RTWC $38,600 $37,151 9/4/15
26 23 London IDP $26,700 $20,361 2/10/06

In [7]:
ce=evans[2]
print("type=", type(ce)," ", "length=", len(ce), "shape=", ce.shape)

print(ce)


type= <class 'pandas.core.frame.DataFrame'>   length= 27 shape= (27, 6)
       0                                          1       2               3  \
0   Rank                      Title (click to view)  Studio  Adjusted Gross   
1      1                      Marvel's The Avengers      BV    $659,640,800   
2      2                    Avengers: Age of Ultron      BV    $459,260,900   
3      3        Captain America: The Winter Soldier      BV    $267,656,500   
4      -                       Thor: The Dark World      BV    $212,276,600   
5      4                      Fantastic Four (2005)     Fox    $207,065,900   
6      5         Captain America: The First Avenger    Par.    $190,920,400   
7      -                                    Ant-Man      BV    $187,285,300   
8      6                 Captain America: Civil War      BV    $179,139,100   
9      7  Fantastic Four: Rise of the Silver Surfer     Fox    $164,518,700   
10     -                                       TMNT      WB     $67,529,000   
11     8                     Not Another Teen Movie    Sony     $57,502,000   
12     9                                   Cellular      NL     $44,217,600   
13    10                                       Push    Sum.     $38,014,300   
14    11                Scott Pilgrim vs. the World    Uni.     $34,022,400   
15    12                          The Nanny Diaries   MGM/W     $32,337,900   
16    13                               Street Kings    FoxS     $31,569,900   
17    14                                 The Losers      WB     $25,460,900   
18    15                        What's Your Number?     Fox     $15,140,400   
19    16                          The Perfect Score    Par.     $14,356,600   
20    17                                Snowpiercer    RTWC      $4,845,600   
21    18                                   Sunshine    FoxS      $4,584,000   
22    19                          The Iceman (2013)     MNE      $2,016,200   
23    20                              Fierce People     ADF        $106,500   
24    21                                   Puncture     MNE         $74,500   
25    22                               Before We Go    RTWC         $38,600   
26    23                                     London     IDP         $26,700   

                   4         5  
0   Unadjusted Gross   Release  
1       $623,357,910    5/4/12  
2       $459,005,868    5/1/15  
3       $259,766,572    4/4/14  
4       $206,362,140   11/8/13  
5       $154,696,080    7/8/05  
6       $176,654,505   7/22/11  
7       $180,202,163   7/17/15  
8       $179,139,142    5/6/16  
9       $131,921,738   6/15/07  
10       $54,149,098   3/23/07  
11       $38,252,284  12/14/01  
12       $32,003,620   9/10/04  
13       $31,811,527    2/6/09  
14       $31,524,275   8/13/10  
15       $25,930,652   8/24/07  
16       $26,418,667   4/11/08  
17       $23,591,432   4/23/10  
18       $14,011,084   9/30/11  
19       $10,391,003   1/30/04  
20        $4,563,650   6/27/14  
21        $3,675,753   7/20/07  
22        $1,969,193    5/3/13  
23           $85,410    9/7/07  
24           $68,945   9/23/11  
25           $37,151    9/4/15  
26           $20,361   2/10/06  

In [4]:
ce.to_csv("ce.csv")

#since scraped dataset is small, and had a tricky double index, we decided to export to csv and do a quick cleanup there
#removed indices; cleaned titles; cleaned date
#Clean File saved as CE1.csv

In [33]:
#this is the path for my machine; you'll have to link to the CE1.csv file that you've saved on your machine

path='C:\\Users\\Nick\\Desktop\\Data_Bootcamp\\Final Project\\CE1.csv'
CE = pd.read_csv(path)

print(type(CE), "shape is", CE.shape, "types:", CE.dtypes)

print(CE) #this is going to be much better for us to work with


<class 'pandas.core.frame.DataFrame'> shape is (26, 6) types: Rank                      int64
Title (click to view)    object
Studio                   object
Adjusted Gross            int64
Unadjusted Gross          int64
Release Year              int64
dtype: object
    Rank                      Title (click to view) Studio  Adjusted Gross  \
0      1                      Marvel's The Avengers     BV       659640800   
1      2                    Avengers: Age of Ultron     BV       459260900   
2      3        Captain America: The Winter Soldier     BV       267656500   
3      4                       Thor: The Dark World     BV       212276600   
4      5                      Fantastic Four (2005)    Fox       207065900   
5      6         Captain America: The First Avenger   Par.       190920400   
6      7                                    Ant-Man     BV       187285300   
7      8                 Captain America: Civil War     BV       181791000   
8      9  Fantastic Four: Rise of the Silver Surfer    Fox       164518700   
9     10                                       TMNT     WB        67529000   
10    11                     Not Another Teen Movie   Sony        57502000   
11    12                                   Cellular     NL        44217600   
12    13                                       Push   Sum.        38014300   
13    14                Scott Pilgrim vs. the World   Uni.        34022400   
14    15                          The Nanny Diaries  MGM/W        32337900   
15    16                               Street Kings   FoxS        31569900   
16    17                                 The Losers     WB        25460900   
17    18                        What's Your Number?    Fox        15140400   
18    19                          The Perfect Score   Par.        14356600   
19    20                                Snowpiercer   RTWC         4845600   
20    21                                   Sunshine   FoxS         4584000   
21    22                          The Iceman (2013)    MNE         2016200   
22    23                              Fierce People    ADF          106500   
23    24                                   Puncture    MNE           74500   
24    25                               Before We Go   RTWC           38600   
25    26                                     London    IDP           26700   

    Unadjusted Gross  Release Year  
0          623357910          2012  
1          459005868          2015  
2          259766572          2014  
3          206362140          2013  
4          154696080          2005  
5          176654505          2011  
6          180202163          2015  
7          181791000          2016  
8          131921738          2007  
9           54149098          2007  
10          38252284          2001  
11          32003620          2004  
12          31811527          2009  
13          31524275          2010  
14          25930652          2007  
15          26418667          2008  
16          23591432          2010  
17          14011084          2011  
18          10391003          2004  
19           4563650          2014  
20           3675753          2007  
21           1969193          2013  
22             85410          2007  
23             68945          2011  
24             37151          2015  
25             20361          2006  

In [14]:
#this looks good! let's test and make sure the data makes sense with a simple plot:

CE.plot.scatter('Release Year', 'Adjusted Gross')


Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x2391f692f28>

In [67]:
#we love what we see, let's repeat it for our other leading gentlemen

Chris Hemsworth

“The Heartthrob”

Age: 32

Height: 6’ 3”

Known for: Thor; The Avengers; Snow White and the Huntsman

Legit Roles: Rush

Biggest Hit: Marvel’s The Avengers $659,640,800

Biggest Thor Movie: $212,276,600


In [88]:
# same process for our second leading Chris
# chris hemsworth
url = 'http://www.boxofficemojo.com/people/chart/?view=Actor&id=chrishemsworth.htm'
hemsworth  = pd.read_html(url)

print('Ouput has type', type(hemsworth), 'and length', len(hemsworth))
print('First element has type', type(hemsworth[0]))

hemsworth[3]


Ouput has type <class 'list'> and length 5
First element has type <class 'pandas.core.frame.DataFrame'>
Out[88]:
0 1 2 3 4 5
0 Rank Title (click to view) Studio Adjusted Gross Unadjusted Gross Release
1 1 Marvel's The Avengers BV $659,640,800 $623,357,910 5/4/12
2 2 Avengers: Age of Ultron BV $459,260,900 $459,005,868 5/1/15
3 - Star Trek Par. $296,422,000 $257,730,019 5/8/09
4 3 Thor: The Dark World BV $212,276,600 $206,362,140 11/8/13
5 4 Thor Par. $192,766,600 $181,030,624 5/6/11
6 5 Snow White and the Huntsman Uni. $164,785,200 $155,332,381 6/1/12
7 6 Vacation WB (NL) $61,208,200 $58,884,188 7/29/15
8 7 Red Dawn (2012) FD $47,783,500 $44,806,783 11/21/12
9 8 The Cabin in the Woods LGF $44,466,300 $42,073,277 4/13/12
10 9 The Huntsman: Winter's War Uni. $41,014,500 $41,014,460 4/22/16
11 10 Rush (2013) Uni. $27,707,400 $26,947,624 9/20/13
12 11 In the Heart of the Sea WB $24,675,600 $25,020,758 12/11/15
13 12 A Perfect Getaway Uni. $17,844,900 $15,515,460 8/7/09
14 13 Blackhat Uni. $8,459,500 $8,005,980 1/16/15
15 14 Ca$h RAtt. $50,200 $46,488 3/26/10

In [87]:
ch=hemsworth[3]

print("type=", type(ch)," ", "length=", len(ch), "shape=", ch.shape)

print(ch)

ch.to_csv("ch.csv")

#since scraped dataset is small, and had a tricky double index, we decided to export to csv and do a quick cleanup there
#Cleaned File saved as CH1.csv

path='C:\\Users\\Nick\\Desktop\\Data_Bootcamp\\Final Project\\CH1.csv'
#again, this is the path on my machine, you'll want to make sure you adjust to wherever you saved down CH1
CH = pd.read_csv(path)

print(type(CH), "shape is", CH.shape, "types:", CH.dtypes)

CH.plot.scatter('Release Year', 'Adjusted Gross')


type= <class 'pandas.core.frame.DataFrame'>   length= 16 shape= (16, 6)
       0                            1        2               3  \
0   Rank        Title (click to view)   Studio  Adjusted Gross   
1      1        Marvel's The Avengers       BV    $659,640,800   
2      2      Avengers: Age of Ultron       BV    $459,260,900   
3      -                    Star Trek     Par.    $296,422,000   
4      3         Thor: The Dark World       BV    $212,276,600   
5      4                         Thor     Par.    $192,766,600   
6      5  Snow White and the Huntsman     Uni.    $164,785,200   
7      6                     Vacation  WB (NL)     $61,208,200   
8      7              Red Dawn (2012)       FD     $47,783,500   
9      8       The Cabin in the Woods      LGF     $44,466,300   
10     9   The Huntsman: Winter's War     Uni.     $41,014,500   
11    10                  Rush (2013)     Uni.     $27,707,400   
12    11      In the Heart of the Sea       WB     $24,675,600   
13    12            A Perfect Getaway     Uni.     $17,844,900   
14    13                     Blackhat     Uni.      $8,459,500   
15    14                         Ca$h    RAtt.         $50,200   

                   4         5  
0   Unadjusted Gross   Release  
1       $623,357,910    5/4/12  
2       $459,005,868    5/1/15  
3       $257,730,019    5/8/09  
4       $206,362,140   11/8/13  
5       $181,030,624    5/6/11  
6       $155,332,381    6/1/12  
7        $58,884,188   7/29/15  
8        $44,806,783  11/21/12  
9        $42,073,277   4/13/12  
10       $41,014,460   4/22/16  
11       $26,947,624   9/20/13  
12       $25,020,758  12/11/15  
13       $15,515,460    8/7/09  
14        $8,005,980   1/16/15  
15           $46,488   3/26/10  
<class 'pandas.core.frame.DataFrame'> shape is (15, 6) types: Rank                 int64
Title               object
Studio              object
Adjusted Gross       int64
Unadjusted Gross     int64
Release Year         int64
dtype: object
Out[87]:
<matplotlib.axes._subplots.AxesSubplot at 0x23923f7a390>

Our data looks good! The axes are a little strange, but we just want to make sure we have data we can work with!

Chris Pratt

“The Everyman”

Age: 36

Height: 6’ 2”

Known for: Guardians of the Galaxy ($353,303,500); Jurassic World (1 + one in pre); Parks & Rec (TV)

Legit Roles: Her, Moneyball

Biggest Role: Jurassic World $678,242,100


In [ ]:
# Chris number three, coming through!
# chris pratt
url = 'http://www.boxofficemojo.com/people/chart/?view=Actor&id=chrispratt.htm'
pratt  = pd.read_html(url)

print('Ouput has type', type(pratt), 'and length', len(pratt))
print('First element has type', type(pratt[0]))

pratt[3]

In [90]:
cp=pratt[3]

print("type=", type(cp)," ", "length=", len(cp), "shape=", cp.shape)

print(cp)

cp.to_csv("cp.csv")

#since scraped dataset is small, and had a tricky double index, we decided to export to csv and do a quick cleanup there
#Cleaned File saved as CP1.csv

path='C:\\Users\\Nick\\Desktop\\Data_Bootcamp\\Final Project\\CP1.csv'
#remember to adjust path to where you've saved the .csv down
CP = pd.read_csv(path)

print(type(CP), "shape is", CP.shape, "types:", CP.dtypes)

CP.plot.scatter('Release Year', 'Adjusted Gross')


type= <class 'pandas.core.frame.DataFrame'>   length= 17 shape= (17, 6)
       0                         1       2               3                 4  \
0   Rank     Title (click to view)  Studio  Adjusted Gross  Unadjusted Gross   
1      1            Jurassic World    Uni.    $678,242,100      $652,270,625   
2      2   Guardians of the Galaxy      BV    $353,303,500      $333,176,600   
3      3            The LEGO Movie      WB    $277,265,000      $257,760,692   
4      -                    Wanted    Uni.    $160,735,800      $134,508,551   
5      -          Zero Dark Thirty    Sony    $103,420,700       $95,720,716   
6      4                 Moneyball    Sony     $82,442,600       $75,605,492   
7      5                Bride Wars     Fox     $70,164,200       $58,715,510   
8      6              Delivery Man      BV     $31,566,000       $30,664,106   
9      7  The Five-Year Engagement    Uni.     $30,469,100       $28,835,528   
10     8                Her (2013)      WB     $27,507,300       $25,568,251   
11     9           Jennifer's Body     Fox     $18,469,900       $16,204,793   
12    10       What's Your Number?     Fox     $15,140,400       $14,011,084   
13     -                  Movie 43   Rela.      $9,553,000        $8,840,453   
14    11      Take Me Home Tonight   Rela.      $7,421,100        $6,928,068   
15     -      Strangers with Candy   Think      $2,715,000        $2,072,645   
16    12                  10 Years   Anch.        $221,900          $203,373   

           5  
0    Release  
1    6/12/15  
2     8/1/14  
3     2/7/14  
4    6/27/08  
5   12/19/12  
6    9/23/11  
7     1/9/09  
8   11/22/13  
9    4/27/12  
10  12/18/13  
11   9/18/09  
12   9/30/11  
13   1/25/13  
14    3/4/11  
15   6/28/06  
16   9/14/12  
<class 'pandas.core.frame.DataFrame'> shape is (16, 6) types: Rank                 int64
Title               object
Studio              object
Adjusted Gross       int64
Unadjusted Gross     int64
Release Year         int64
dtype: object
Out[90]:
<matplotlib.axes._subplots.AxesSubplot at 0x2392407c240>

Now that we've got that sorted out, let's take a look at all three Chrises together. How do their box office titles stack up with one another over time?


In [80]:
plt.scatter(CE['Release Year'], CE['Adjusted Gross'],
           color="purple")

plt.scatter(CH['Release Year'], CH['Adjusted Gross'],
           color="red")

plt.scatter(CP['Release Year'], CP['Adjusted Gross'],
           color="orange")

plt.title('Chris Film Box Office Share Over Time')


Out[80]:
<matplotlib.text.Text at 0x23923bb5780>

In the graph above, we color coded our Chris contingency as follows:

Chris Evans: Purple

Chris Hemsworth: Red

Chris Pratt: Orange

A few things stand out. First, we can see right away that Chris Evans has, to date, had the longest career at the box office, dating back to 2001. Does this maybe suggest some longevity right off the bat? We're not so quick to draw that conclusion, especially since his biggest box office hit is shared with Chris Hemsworth in the Marvel Avengers movie.

Looking back at our raw data, we can also note that Pratt seems to have had the biggest breakout hit with his 2015 with Jurassic World, one of the top grossing films of all time, where he was the sole leading man.

This data gives us one view, but what other cuts might we want to look at?


In [108]:
fig, ax = plt.subplots(nrows=3, ncols=1, sharex=True, sharey=True)

CE['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[0], color='purple', title="Evans")
CH['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[1], color='red', title="Hemsworth")
CP['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[2], color='orange', title="Pratt")


Out[108]:
<matplotlib.axes._subplots.AxesSubplot at 0x23925d772b0>

In the above, we take a look at the box office grosses for the top 10 films for each Chris. Here, we start to wonder if maybe Evans has a more consistent box office performance. Of his top 10 filims, 9 are in the $200 million range, a stat unmatched by our other two gentlemen.

This is an interesting insight, but what does it look like over time?


In [89]:
plt.bar(CE['Release Year'], CE['Adjusted Gross'],
       align='center',
       color='pink')
plt.title('Chris Evans')


Out[89]:
<matplotlib.text.Text at 0x239226b1be0>

Buoyed by franchise films in the last five years, Chris Evans has been a steady player, but hasn't excelled outside the Marvel universe franchises. All his biggest hits are as a member of a franchise / ensemble. Evans's Marvel hits since 2011 have performed well, though non-Marvel titles have largely been blips on the radar.


In [85]:
plt.bar(CH['Release Year'], CH['Adjusted Gross'],
       align='center',
       color='red')
plt.title("Chris Hemsworth")


Out[85]:
<matplotlib.text.Text at 0x23923e994a8>

Hemsworth had a very rough 2015. He featured prominently in 4 films, only one of which was a box office success (another Marvel Avengers installment). After a breakout 2012, are the tides turning after major flops like In the Heart of the Sea?


In [86]:
plt.bar(CP['Release Year'], CP['Adjusted Gross'],
       align='center',
       color='orange')
plt.title("Chris Pratt")


Out[86]:
<matplotlib.text.Text at 0x23923f316d8>

Pratt may have been a slower starter than our other leading gentlemen, but his 2014 breakout Guardians of the Galaxy cemented his status as leading man potential, and 2015's Jurassic World broke tons of box office records. As a non-Marvel film (though a franchise reboot), Jurassic World is unique in that it may be a standalone hit for Pratt, and everyone will be closely watching his box office performance in whatever leading man project he chooses next.


In [120]:
plt.bar(CE['Release Year'], CE['Adjusted Gross'],
       align='center',
       color='purple')

plt.bar(CH['Release Year'], CH['Adjusted Gross'],
       align='center',
       color='red')

plt.bar(CP['Release Year'], CP['Adjusted Gross'],
       align='center',
       color='orange')

plt.title('Chris Film Box Office Share Over Time')


Out[120]:
<matplotlib.text.Text at 0x2392798ca90>

We love this data cut. Here, we take a comparative look of our Chrises over time. Keeping our colors consistent, Evans is purple, Hemsworth is red, Pratt is orange.

One slight issue; movies where both Hemsworth and Evans were cast (Avengers) -- the graph chooses just one color. Here's a flipped view:


In [121]:
plt.bar(CH['Release Year'], CH['Adjusted Gross'],
       align='center',
       color='red')

plt.bar(CE['Release Year'], CE['Adjusted Gross'],
       align='center',
       color='purple')

plt.bar(CP['Release Year'], CP['Adjusted Gross'],
       align='center',
       color='orange')

plt.title('Chris Film Box Office Share Over Time')


Out[121]:
<matplotlib.text.Text at 0x23927696048>

Whoa! Where did Hemsworth go?

What these two cuts show us is that Evans and Hemsworth are both heavily reliant on their Marvel franchise hits, where they are sharing the limelight, whereas Pratt has been more of a solo vehicle, especially in more recent years.

Scene 3: The "OGs"

In order to determine which Chris has staying power we pulled data on Hollywood stars of yore (Bruce Willis, Arnold Schwarzenegger, and Tom Cruise) for comparison. Given the volume of data on the older stars, we isolated the top ten grossing films for each hero.

Bruce Willis

Heyday: The late 80s to the late 90s

Known for: Die Hard franchise

Biggest Movie: The Sixth Sense $494,028,900

Type: Leading Man/Action Hero Hybrid


In [122]:
#Movie scraping and data arranging like we did before
#Bruce Willis
url = 'http://www.boxofficemojo.com/people/chart/?id=brucewillis.htm'
willis = pd.read_html(url)

print('Ouput has type', type(willis), 'and length', len(willis))
print('First element has type', type(willis[0]))

willis[2]


Ouput has type <class 'list'> and length 4
First element has type <class 'pandas.core.frame.DataFrame'>
Out[122]:
0 1 2 3 4 5
0 Rank Title (click to view) Studio Adjusted Gross Unadjusted Gross Release
1 1 The Sixth Sense BV $494,028,900 $293,506,292 8/6/99
2 2 Armageddon BV $368,772,000 $201,578,182 7/1/98
3 3 Look Who's Talking TriS $298,949,000 $140,088,813 10/13/89
4 4 Die Hard 2: Die Harder Fox $238,416,400 $117,540,947 7/6/90
5 5 Pulp Fiction Mira. $217,817,900 $107,928,762 10/14/94
6 6 Over the Hedge P/DW $203,063,500 $155,019,340 5/19/06
7 7 Die Hard: With A Vengeance Fox $197,266,000 $100,012,499 5/19/95
8 8 Die Hard Fox $173,288,500 $83,008,852 7/15/88
9 - Ocean's Twelve WB $172,251,500 $125,544,280 12/10/04
10 9 Live Free or Die Hard Fox $167,770,700 $134,529,403 6/27/07
11 10 Unbreakable BV $150,706,100 $95,011,339 11/22/00
12 11 G.I. Joe: Retaliation Par. $126,043,400 $122,523,060 3/28/13
13 12 The Last Boy Scout WB $122,373,600 $59,509,925 12/13/91
14 13 Death Becomes Her Uni. $120,787,100 $58,422,650 7/31/92
15 - Beavis and Butt-Head Do America Par. $120,208,300 $63,118,386 12/20/96
16 14 The Fifth Element Sony $119,297,800 $63,820,180 5/9/97
17 - The Expendables LGF $111,236,200 $103,068,524 8/13/10
18 15 Disney's The Kid BV $110,938,200 $69,691,949 7/7/00
19 16 12 Monkeys Uni. $110,921,700 $57,141,459 12/29/95
20 17 The Jackal Uni. $102,680,100 $54,930,280 11/14/97
21 18 Sin City Dim. $99,190,400 $74,103,820 4/1/05
22 19 Look Who's Talking Too TriS $97,208,200 $47,789,074 12/14/90
23 20 Red Sum. $96,811,700 $90,380,162 10/15/10
24 21 The Expendables 2 LGF $93,715,000 $85,028,192 8/17/12
25 22 The Whole Nine Yards WB $91,152,500 $57,262,492 2/18/00
26 23 Blind Date TriS $86,286,500 $39,321,715 3/27/87
27 24 Nobody's Fool Par. $77,907,800 $39,491,975 12/23/94
28 25 The Siege Fox $74,383,800 $40,981,289 11/6/98
29 26 A Good Day to Die Hard Fox $72,777,900 $67,349,198 2/14/13
... ... ... ... ... ... ...
39 35 16 Blocks WB $48,329,800 $36,895,141 3/3/06
40 36 Hostage Mira. $46,366,700 $34,639,939 3/11/05
41 37 The Story of Us Uni. $45,771,300 $27,100,031 10/15/99
42 - The Player NL $44,876,700 $21,706,101 4/10/92
43 38 Surrogates BV $43,495,100 $38,577,772 9/25/09
44 39 Color of Night BV $40,490,300 $19,726,050 8/19/94
45 40 Mortal Thoughts Col. $38,283,800 $18,784,957 4/19/91
46 41 Last Man Standing NL $35,166,200 $18,115,927 9/20/96
47 42 Hudson Hawk TriS $35,090,500 $17,218,080 5/24/91
48 - Nancy Drew WB $31,941,200 $25,612,520 6/15/07
49 43 The Bonfire of the Vanities WB $31,923,800 $15,691,192 12/22/90
50 44 Billy Bathgate BV $31,722,300 $15,565,363 11/1/91
51 - Grindhouse W/Dim. $31,224,600 $25,037,897 4/6/07
52 45 Perfect Stranger SonR $29,911,500 $23,984,949 4/13/07
53 46 Lucky Number Slevin MGM/W $29,467,300 $22,495,466 4/7/06
54 47 Hart's War MGM $28,173,200 $19,077,641 2/15/02
55 48 The Whole Ten Yards WB $22,560,100 $16,328,471 4/9/04
56 49 Alpha Dog Uni. $19,092,500 $15,309,602 1/12/07
57 50 North Col. $14,743,500 $7,182,747 7/22/94
58 51 Frank Miller's Sin City: A Dame to Kill For W/Dim. $14,607,200 $13,757,804 8/22/14
59 52 Sunset TriS $9,591,300 $4,594,452 4/29/88
60 53 Four Rooms Mira. $8,303,200 $4,257,354 12/22/95
61 54 In Country WB $7,633,300 $3,531,971 9/15/89
62 55 The Cold Light of Day LG/S $4,148,800 $3,763,583 9/7/12
63 56 Rock The Kasbah ORF $2,979,000 $3,020,664 10/23/15
64 57 What Just Happened? Magn. $1,303,700 $1,090,947 10/17/08
65 58 Breakfast of Champions BV $301,100 $178,278 9/17/99
66 - Grand Champion Inn. $75,400 $54,579 8/27/04
67 59 Lay the Favorite RTWC $22,400 $20,998 12/7/12
68 60 Extraction LGP $16,500 $16,775 12/18/15

69 rows × 6 columns


In [123]:
bruce=willis[2]

bruce.to_csv("Bruce.csv") #Converting dataframe into a csv file
#editing and cleaning as needed, resaved as Bruce1.csv

In [124]:
path='/Users/Nick/Desktop/data_bootcamp/Final Project/Bruce1.csv' 
BWillis = pd.read_csv(path)

print(type(BWillis), BWillis.shape, BWillis.dtypes)


<class 'pandas.core.frame.DataFrame'> (68, 6) Rank                object
Title               object
Studio              object
Adjusted Gross       int64
Unadjusted Gross     int64
Release Year         int64
dtype: object

In [126]:
import matplotlib as mpl
mpl.rcParams.update(mpl.rcParamsDefault)

In [127]:
BWillis.plot.scatter('Release Year', 'Adjusted Gross')


Out[127]:
<matplotlib.axes._subplots.AxesSubplot at 0x239273629b0>

In [129]:
#That's a lot of films! Let's narrow:

BW=BWillis.head(11)

print(BW)


   Rank                      Title  Studio  Adjusted Gross  Unadjusted Gross  \
0     1             The Sixth Sense     BV       494028900         293506292   
1     2                  Armageddon     BV       368772000         201578182   
2     3          Look Who's Talking   TriS       298949000         140088813   
3     4      Die Hard 2: Die Harder    Fox       238416400         117540947   
4     5                Pulp Fiction  Mira.       217817900         107928762   
5     6              Over the Hedge   P/DW       203063500         155019340   
6     7  Die Hard: With A Vengeance    Fox       197266000         100012499   
7     8                    Die Hard    Fox       173288500          83008852   
8     -              Ocean's Twelve     WB       172251500         125544280   
9     9       Live Free or Die Hard    Fox       167770700         134529403   
10   10                 Unbreakable     BV       150706100          95011339   

    Release Year  
0           1999  
1           1998  
2           1989  
3           1990  
4           1994  
5           2006  
6           1995  
7           1988  
8           2004  
9           2007  
10          2000  

In [131]:
#we'll come back to this later, but let's get our other leading men in the frame!

Arnold Schwarzenegger

Heyday: Mid 80s to the mid 90s

Known for: the Terminator franchise

Biggest Movie: Terminator 2: Judgement Day $417,471,700

Type: Beefcake w/comedic chops


In [132]:
#here we go again!

#Arnold Schwarzenegger

url = 'http://www.boxofficemojo.com/people/chart/?id=arnoldschwarzenegger.htm'
schwarz = pd.read_html(url)

print('Ouput has type', type(schwarz), 'and length', len(schwarz))
print('First element has type', type(schwarz[0]))

schwarz[2]


Ouput has type <class 'list'> and length 4
First element has type <class 'pandas.core.frame.DataFrame'>
Out[132]:
0 1 2 3 4 5
0 Rank Title (click to view) Studio Adjusted Gross Unadjusted Gross Release
1 1 Terminator 2: Judgment Day TriS $417,471,700 $204,843,345 7/3/91
2 2 True Lies Fox $300,263,900 $146,282,411 7/15/94
3 3 Total Recall Sony $242,195,900 $119,412,921 6/1/90
4 4 Twins Uni. $238,720,100 $111,938,388 12/9/88
5 5 Terminator 3: Rise of the Machines WB $213,960,900 $150,371,112 7/2/03
6 6 Batman and Robin WB $200,620,900 $107,325,195 6/20/97
7 7 Eraser WB $196,632,600 $101,295,562 6/21/96
8 8 Kindergarten Cop Uni. $186,235,700 $91,457,688 12/22/90
9 - Terminator Salvation WB $144,137,700 $125,322,469 5/21/09
10 9 Predator Fox $131,082,100 $59,735,548 6/12/87
11 10 Jingle All the Way Fox $116,877,400 $60,592,389 11/22/96
12 11 Conan the Barbarian Uni. $115,466,600 $39,565,475 5/14/82
13 12 End of Days Uni. $112,489,900 $66,889,043 11/24/99
14 - The Expendables LGF $111,236,200 $103,068,524 8/13/10
15 13 Last Action Hero Col. $103,657,200 $50,016,394 6/18/93
16 14 The Terminator Orion $97,386,800 $38,371,200 10/26/84
17 15 The Expendables 2 LGF $93,715,000 $85,028,192 8/17/12
18 16 Terminator: Genisys Par. $93,351,400 $89,760,956 7/1/15
19 17 Commando Fox $84,833,200 $35,100,000 10/4/85
20 18 The Running Man TriS $83,307,900 $38,122,105 11/13/87
21 19 Conan the Destroyer Uni. $79,268,100 $31,042,035 6/29/84
22 20 Junior Uni. $74,885,300 $36,763,355 11/23/94
23 21 Red Heat TriS $73,054,500 $34,994,648 6/17/88
24 22 Collateral Damage WB $59,184,700 $40,077,257 2/8/02
25 23 The 6th Day Sony $55,051,800 $34,604,280 11/17/00
26 24 The Expendables 3 LGF $41,744,500 $39,322,544 8/15/14
27 25 Raw Deal DEG $37,487,100 $16,209,459 6/6/86
28 - Around the World in 80 Days BV $33,170,700 $24,008,137 6/16/04
29 26 Escape Plan LG/S $25,840,800 $25,135,965 10/18/13
30 27 Red Sonja MGM $16,794,200 $6,948,633 7/5/85
31 28 The Last Stand LGF $13,021,600 $12,050,299 1/18/13
32 29 Sabotage (2014) ORF $10,823,900 $10,508,518 3/28/14
33 30 Maggie RAtt. $186,500 $187,112 5/8/15

In [133]:
arnold=schwarz[2]
print("type=", type(arnold)," ", "length=", len(arnold))

arnold.shape

print(arnold)


type= <class 'pandas.core.frame.DataFrame'>   length= 34
       0                                   1       2               3  \
0   Rank               Title (click to view)  Studio  Adjusted Gross   
1      1          Terminator 2: Judgment Day    TriS    $417,471,700   
2      2                           True Lies     Fox    $300,263,900   
3      3                        Total Recall    Sony    $242,195,900   
4      4                               Twins    Uni.    $238,720,100   
5      5  Terminator 3: Rise of the Machines      WB    $213,960,900   
6      6                    Batman and Robin      WB    $200,620,900   
7      7                              Eraser      WB    $196,632,600   
8      8                    Kindergarten Cop    Uni.    $186,235,700   
9      -                Terminator Salvation      WB    $144,137,700   
10     9                            Predator     Fox    $131,082,100   
11    10                  Jingle All the Way     Fox    $116,877,400   
12    11                 Conan the Barbarian    Uni.    $115,466,600   
13    12                         End of Days    Uni.    $112,489,900   
14     -                     The Expendables     LGF    $111,236,200   
15    13                    Last Action Hero    Col.    $103,657,200   
16    14                      The Terminator   Orion     $97,386,800   
17    15                   The Expendables 2     LGF     $93,715,000   
18    16                 Terminator: Genisys    Par.     $93,351,400   
19    17                            Commando     Fox     $84,833,200   
20    18                     The Running Man    TriS     $83,307,900   
21    19                 Conan the Destroyer    Uni.     $79,268,100   
22    20                              Junior    Uni.     $74,885,300   
23    21                            Red Heat    TriS     $73,054,500   
24    22                   Collateral Damage      WB     $59,184,700   
25    23                         The 6th Day    Sony     $55,051,800   
26    24                   The Expendables 3     LGF     $41,744,500   
27    25                            Raw Deal     DEG     $37,487,100   
28     -         Around the World in 80 Days      BV     $33,170,700   
29    26                         Escape Plan    LG/S     $25,840,800   
30    27                           Red Sonja     MGM     $16,794,200   
31    28                      The Last Stand     LGF     $13,021,600   
32    29                     Sabotage (2014)     ORF     $10,823,900   
33    30                              Maggie   RAtt.        $186,500   

                   4         5  
0   Unadjusted Gross   Release  
1       $204,843,345    7/3/91  
2       $146,282,411   7/15/94  
3       $119,412,921    6/1/90  
4       $111,938,388   12/9/88  
5       $150,371,112    7/2/03  
6       $107,325,195   6/20/97  
7       $101,295,562   6/21/96  
8        $91,457,688  12/22/90  
9       $125,322,469   5/21/09  
10       $59,735,548   6/12/87  
11       $60,592,389  11/22/96  
12       $39,565,475   5/14/82  
13       $66,889,043  11/24/99  
14      $103,068,524   8/13/10  
15       $50,016,394   6/18/93  
16       $38,371,200  10/26/84  
17       $85,028,192   8/17/12  
18       $89,760,956    7/1/15  
19       $35,100,000   10/4/85  
20       $38,122,105  11/13/87  
21       $31,042,035   6/29/84  
22       $36,763,355  11/23/94  
23       $34,994,648   6/17/88  
24       $40,077,257    2/8/02  
25       $34,604,280  11/17/00  
26       $39,322,544   8/15/14  
27       $16,209,459    6/6/86  
28       $24,008,137   6/16/04  
29       $25,135,965  10/18/13  
30        $6,948,633    7/5/85  
31       $12,050,299   1/18/13  
32       $10,508,518   3/28/14  
33          $187,112    5/8/15  

In [134]:
arnold.to_csv("Arnold.csv")

In [135]:
path='/Users/Nick/Desktop/data_bootcamp/Final Project/Arnold1.csv' 
ASchwarz = pd.read_csv(path)

print(type(ASchwarz), ASchwarz.shape, ASchwarz.dtypes)

print(ASchwarz)


<class 'pandas.core.frame.DataFrame'> (33, 6) Rank                object
Title               object
Studio              object
Adjusted Gross       int64
Unadjusted Gross     int64
Release Year         int64
dtype: object
   Rank                              Title  Studio  Adjusted Gross  \
0     1          Terminator 2: Judgment Day   TriS       417471700   
1     2                           True Lies    Fox       300263900   
2     3                        Total Recall   Sony       242195900   
3     4                               Twins   Uni.       238720100   
4     5  Terminator 3: Rise of the Machines     WB       213960900   
5     6                    Batman and Robin     WB       200620900   
6     7                              Eraser     WB       196632600   
7     8                    Kindergarten Cop   Uni.       186235700   
8     -                Terminator Salvation     WB       144137700   
9     9                            Predator    Fox       131082100   
10   10                  Jingle All the Way    Fox       116877400   
11   11                 Conan the Barbarian   Uni.       115466600   
12   12                         End of Days   Uni.       112489900   
13    -                     The Expendables    LGF       111236200   
14   13                    Last Action Hero   Col.       103657200   
15   14                      The Terminator  Orion        97386800   
16   15                   The Expendables 2    LGF        93715000   
17   16                 Terminator: Genisys   Par.        93351400   
18   17                            Commando    Fox        84833200   
19   18                     The Running Man   TriS        83307900   
20   19                 Conan the Destroyer   Uni.        79268100   
21   20                              Junior   Uni.        74885300   
22   21                            Red Heat   TriS        73054500   
23   22                   Collateral Damage     WB        59184700   
24   23                         The 6th Day   Sony        55051800   
25   24                   The Expendables 3    LGF        41744500   
26   25                            Raw Deal    DEG        37487100   
27    -         Around the World in 80 Days     BV        33170700   
28   26                         Escape Plan   LG/S        25840800   
29   27                           Red Sonja    MGM        16794200   
30   28                      The Last Stand    LGF        13021600   
31   29                     Sabotage (2014)    ORF        10823900   
32   30                              Maggie  RAtt.          186500   

    Unadjusted Gross  Release Year  
0          204843345          1991  
1          146282411          1994  
2          119412921          1990  
3          111938388          1988  
4          150371112          2003  
5          107325195          1997  
6          101295562          1996  
7           91457688          1990  
8          125322469          2009  
9           59735548          1987  
10          60592389          1996  
11          39565475          1982  
12          66889043          1999  
13         103068524          2010  
14          50016394          1993  
15          38371200          1984  
16          85028192          2012  
17          89760956          2015  
18          35100000          1985  
19          38122105          1987  
20          31042035          1984  
21          36763355          1994  
22          34994648          1988  
23          40077257          2002  
24          34604280          2000  
25          39322544          2014  
26          16209459          1986  
27          24008137          2004  
28          25135965          2013  
29           6948633          1985  
30          12050299          2013  
31          10508518          2014  
32            187112          2015  

In [136]:
ASchwarz.plot.scatter('Release Year', 'Adjusted Gross')


Out[136]:
<matplotlib.axes._subplots.AxesSubplot at 0x23927bb0fd0>

In [137]:
#let's scale back sample size again

AS=ASchwarz.head(11)

#we'll use this soon

Tom Cruise

Heyday: Mid 80’s - early aughts

Known for: Mission Impossible franchise

Biggest Movie: Top Gun $$412,055,200

Type: Cocky leading man


In [138]:
#last but not least, our data for Tom Cruise

url = 'http://www.boxofficemojo.com/people/chart/?id=tomcruise.htm'
cruise = pd.read_html(url)

print('Ouput has type', type(cruise), 'and length', len(cruise))
print('First element has type', type(cruise[0]))

cruise[3]

Tom=cruise[3]

Tom.to_csv("Tom.csv")


Ouput has type <class 'list'> and length 5
First element has type <class 'pandas.core.frame.DataFrame'>

In [139]:
path='/Users/Nick/Desktop/data_bootcamp/Final Project/Tom1.csv' 
TCruise = pd.read_csv(path)

print(type(TCruise), TCruise.shape, TCruise.dtypes)

print(TCruise)


<class 'pandas.core.frame.DataFrame'> (40, 6) Rank                object
Title               object
Studio              object
Adjusted Gross       int64
Unadjusted Gross     int64
Release Year         int64
dtype: object
   Rank                                 Title   Studio  Adjusted Gross  \
0     1                               Top Gun     Par.       412055200   
1     2                              Rain Man      MGM       371442800   
2     3                   Mission: Impossible     Par.       351317700   
3     4                Mission: Impossible II     Par.       342897400   
4     5                              The Firm     Par.       328171200   
5     -           Austin Powers in Goldmember       NL       315005500   
6     6                     War of the Worlds     Par.       313592100   
7     7                        A Few Good Men     Col.       292607100   
8     8                         Jerry Maguire     Sony       291111400   
9     9  Mission: Impossible - Ghost Protocol     Par.       228123000   
10   10            Interview with the Vampire       WB       215595000   
11   11    Mission: Impossible - Rogue Nation     Par.       202710200   
12   12                       Minority Report      Fox       195040600   
13   13               Mission: Impossible III     Par.       175568800   
14   14                        Risky Business       WB       173075700   
15   15                       Days of Thunder     Par.       167686700   
16   16                              Cocktail       BV       163297200   
17   17                      The Last Samurai       WB       156260100   
18   18                           Vanilla Sky     Par.       150743800   
19   19            Born on the Fourth of July     Uni.       142048500   
20   20                            Collateral       DW       139553800   
21    -                        Tropic Thunder     P/DW       132064300   
22    -              Space Station 3-D (IMAX)     Imax       124311100   
23   21                          Far and Away     Uni.       121740600   
24   22                    The Color of Money       BV       120938700   
25   23                                  Taps      Fox       105212400   
26   24                      Edge of Tomorrow       WB       103891200   
27   25                              Valkyrie       UA        99276900   
28    -                          Endless Love     Uni.        96244200   
29   26                        Eyes Wide Shut       WB        94061100   
30   27                              Oblivion     Uni.        91233900   
31   28                          Jack Reacher     Par.        86073600   
32   29                          Knight & Day      Fox        82479200   
33   30                         The Outsiders       WB        69995500   
34   31                   All the Right Moves      Fox        46939900   
35   32                          Rock of Ages  WB (NL)        40993300   
36   33                                Legend     Uni.        35851200   
37   34                              Magnolia       NL        35805200   
38   35                       Lions for Lambs       UA        18706200   
39   36                             Losin' It      Emb         3394200   

    Unadjusted Gross  Release Year  
0          179800601          1986  
1          172825435          1988  
2          180981856          1996  
3          215409889          2000  
4          158348367          1993  
5          213307889          2002  
6          234280354          2005  
7          141340178          1992  
8          153952592          1996  
9          209397903          2011  
10         105264608          1994  
11         195042377          2015  
12         132072926          2002  
13         134029801          2006  
14          63541777          1983  
15          82670733          1990  
16          78222753          1988  
17         111127263          2003  
18         100618344          2001  
19          70001698          1989  
20         101005703          2004  
21         110515313          2008  
22          93188822          2002  
23          58883840          1992  
24          52293982          1986  
25          35856053          1981  
26         100206256          2014  
27          83077833          2008  
28          31184024          1981  
29          55691208          1999  
30          89107235          2013  
31          80070736          2012  
32          76423035          2010  
33          25697647          1983  
34          17233166          1983  
35          38518613          2012  
36          15502112          1986  
37          22455976          1999  
38          15002854          2007  
39           1246141          1983  

In [140]:
TCruise.plot.scatter('Release Year', 'Adjusted Gross')


Out[140]:
<matplotlib.axes._subplots.AxesSubplot at 0x239280db400>

In [141]:
#cutting down to the top 10

TC=TCruise.head(11)

Scene 4: "The Final Showdown"


In [143]:
#All of the old school action stars in one histogram. Representing share of box office cumulatively over time.
plt.bar(TC['Release Year'], 
        TC['Adjusted Gross'],
      align='center',
      color='Blue')

plt.bar(BW['Release Year'], 
        BW['Adjusted Gross'],
      align='center',
      color='Green')

plt.bar(AS['Release Year'], 
        AS['Adjusted Gross'],
      align='center',
      color='Yellow')

plt.title('"OG" Leading Box Office over Time')


Out[143]:
<matplotlib.text.Text at 0x2392939a278>

LEGEND:

Tom Cruise = Blue

Bruce Willis = Green

Arnold Schwarzenegger = Yellow


In [145]:
#As a reminder, here's what we are comparing against:

fig, ax = plt.subplots(nrows=3, ncols=1, sharex=True, sharey=True)

CE['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[0], color='purple', title="Evans")
CH['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[1], color='red', title="Hemsworth")
CP['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[2], color='orange', title="Pratt")


Out[145]:
<matplotlib.text.Text at 0x23929bdc898>

In [146]:
plt.bar(CE['Release Year'], CE['Adjusted Gross'],
       align='center',
       color='purple')

plt.bar(CH['Release Year'], CH['Adjusted Gross'],
       align='center',
       color='red')

plt.bar(CP['Release Year'], CP['Adjusted Gross'],
       align='center',
       color='orange')

plt.title('Chris Film Box Office Share Over Time')


Out[146]:
<matplotlib.text.Text at 0x239280a4550>

LEGEND:

Chris Evans = Purple

Chris Hemsworth = Red

Chris Pratt = Orange

Our Findings

Tom Cruise (blue) has obvious staying power with films raking in over 200 million over two decades. Arnold's biggest films are clustered in a 10 year period. Bruce Willis also had clusters of hits with his biggest successes in the late nineties. If our Chrises want to stay relevant in 2035 they'll need to adopt the "slow and steady wins the race" strategy of Tom Cruise (as long as slow and steady comes with strong receipts).

The Verdict!

The Winner: Chris Pratt! Looking at the data we predict that Chris Pratt is in the best position to capitalize going forward given his strong hauls in solo vehicles over the past several years. If he can keep his popularity up over the next decade he will be the Chris you take your grandkids to the movies to see. The upward trajectory matches our legends, and we like the trend that we see coupled with soft factors like his "everyman" appeal.

Dark Horse: Chris Evans if he can successfully spin his Marvel success into a solo vehicle for leading roles that aren't franchises.

Throw him a lifesaver: Chris Hemsworth. The once bright Thor star is floundering in solo projects, and may go the downward route of Bruce Willis.

The End


In [ ]: