Ronica Reddick & Nick Pulito

in association with

"Those Data Bootcamp Guys" -- Professors Backus and Coleman


"3 Guys Named Chris"

Scene 1: "The Set-Up"

Hollywood hunks come and go, but every so often a star builds a lasting career out of blowing stuff up. Currently, there is no shortage of beef cake on the silver screen with Chris Evans, Chris Hemsworth, and Chris Pratt all regularly starring in blockbuster films. There is no denying the bankability of the Chrises, but which Chris has staying power?

The now defunct Grantland podcast had a “market correction” theory they applied to Hollywood actors. The idea is that there’s only room in the market for one A list celebrity of a particular type and that over time the market will choose its favorite.The hosts would compare two Hollywood actors with similar “types” and predict which one would still have a career in 20 years.

Using data from Box Office Mojo we decided to test the market correction theory on the Chrises by comparing the box office numbers of their biggest hits to those of heroes from the days of yore: Tom Cruise, Arnold Schwarzenegger, and Bruce Willis. We were looking for patterns in the box office receipts of the old guard that may shed some light on who which Chris will be on top in 2035, and to see if any of the box office heroes of yesteryear had a little more staying power than the others.

Scene 2: "The Chris Contenders"


To dive into which Chris will have staying power in years to come, we looked to authoritative Hollywood Data Source Box Office Mojo. A bit of simple webscraping gave us film titles broken out by actor, with adjusted box office revenues in tow.

We wanted to aggregate data for our "Three Chrises" and compare it to 3 Hollywood legends who have had variable staying power over the years: Bruce Willis, Tom Cruise, and Arnold Schwarzenegger.

Digging up data on our leading gentlemen

Cells that follow show our process for scraping and organizing the data for the Chris contenders.

Chris Evans

“The All American Hero”

Age: 34

Height: 6’

Known for: Captain America ($267,656,500); The Avengers; Fantastic Four

Legit Roles: Snowpiercer

Biggest Hit: Marvel’s The Avengers $659,640,800

#removed indices; cleaned titles; cleaned date
#Clean File saved as CE1.csv
#removed indices; cleaned titles; cleaned date
#Clean File saved as CE1.csv

In [67]:
#we love what we see, let's repeat it for our other leading gentlemen

Chris Hemsworth

“The Heartthrob”

Age: 32

Height: 6’ 3”

Known for: Thor; The Avengers; Snow White and the Huntsman

Legit Roles: Rush

Biggest Hit: Marvel’s The Avengers $659,640,800

Biggest Thor Movie: $212,276,600

#since scraped dataset is small, and had a tricky double index, we decided to export to csv and do a quick cleanup there
#Cleaned File saved as CH1.csv
Our data looks good! The axes are a little strange, but we just want to make sure we have data we can work with!

Chris Pratt

“The Everyman”

Age: 36

Height: 6’ 2”

Known for: Guardians of the Galaxy ($353,303,500); Jurassic World (1 + one in pre); Parks & Rec (TV)

Legit Roles: Her, Moneyball

Biggest Role: Jurassic World $678,242,100

#since scraped dataset is small, and had a tricky double index, we decided to export to csv and do a quick cleanup there
#Cleaned File saved as CP1.csv
Now that we've got that sorted out, let's take a look at all three Chrises together. How do their box office titles stack up with one another over time?

In [80]:
plt.scatter(CE['Release Year'], CE['Adjusted Gross'],

plt.scatter(CH['Release Year'], CH['Adjusted Gross'],

plt.scatter(CP['Release Year'], CP['Adjusted Gross'],

plt.title('Chris Film Box Office Share Over Time')

<matplotlib.text.Text at 0x23923bb5780>

In the graph above, we color coded our Chris contingency as follows:

Chris Evans: Purple

Chris Hemsworth: Red

Chris Pratt: Orange

A few things stand out. First, we can see right away that Chris Evans has, to date, had the longest career at the box office, dating back to 2001. Does this maybe suggest some longevity right off the bat? We're not so quick to draw that conclusion, especially since his biggest box office hit is shared with Chris Hemsworth in the Marvel Avengers movie.

Looking back at our raw data, we can also note that Pratt seems to have had the biggest breakout hit with his 2015 with Jurassic World, one of the top grossing films of all time, where he was the sole leading man.

This data gives us one view, but what other cuts might we want to look at?

In [108]:
fig, ax = plt.subplots(nrows=3, ncols=1, sharex=True, sharey=True)

CE['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[0], color='purple', title="Evans")
CH['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[1], color='red', title="Hemsworth")
CP['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[2], color='orange', title="Pratt")

<matplotlib.axes._subplots.AxesSubplot at 0x23925d772b0>

In the above, we take a look at the box office grosses for the top 10 films for each Chris. Here, we start to wonder if maybe Evans has a more consistent box office performance. Of his top 10 filims, 9 are in the $200 million range, a stat unmatched by our other two gentlemen.

This is an interesting insight, but what does it look like over time?

In [89]:['Release Year'], CE['Adjusted Gross'],
plt.title('Chris Evans')

<matplotlib.text.Text at 0x239226b1be0>

Buoyed by franchise films in the last five years, Chris Evans has been a steady player, but hasn't excelled outside the Marvel universe franchises. All his biggest hits are as a member of a franchise / ensemble. Evans's Marvel hits since 2011 have performed well, though non-Marvel titles have largely been blips on the radar.

In [85]:['Release Year'], CH['Adjusted Gross'],
plt.title("Chris Hemsworth")

<matplotlib.text.Text at 0x23923e994a8>

Hemsworth had a very rough 2015. He featured prominently in 4 films, only one of which was a box office success (another Marvel Avengers installment). After a breakout 2012, are the tides turning after major flops like In the Heart of the Sea?

In [86]:['Release Year'], CP['Adjusted Gross'],
plt.title("Chris Pratt")

<matplotlib.text.Text at 0x23923f316d8>

Pratt may have been a slower starter than our other leading gentlemen, but his 2014 breakout Guardians of the Galaxy cemented his status as leading man potential, and 2015's Jurassic World broke tons of box office records. As a non-Marvel film (though a franchise reboot), Jurassic World is unique in that it may be a standalone hit for Pratt, and everyone will be closely watching his box office performance in whatever leading man project he chooses next.

In [120]:['Release Year'], CE['Adjusted Gross'],
       color='purple')['Release Year'], CH['Adjusted Gross'],
       color='red')['Release Year'], CP['Adjusted Gross'],

plt.title('Chris Film Box Office Share Over Time')

<matplotlib.text.Text at 0x2392798ca90>

We love this data cut. Here, we take a comparative look of our Chrises over time. Keeping our colors consistent, Evans is purple, Hemsworth is red, Pratt is orange.

One slight issue; movies where both Hemsworth and Evans were cast (Avengers) -- the graph chooses just one color. Here's a flipped view:

In [121]:['Release Year'], CH['Adjusted Gross'],
       color='red')['Release Year'], CE['Adjusted Gross'],
       color='purple')['Release Year'], CP['Adjusted Gross'],

plt.title('Chris Film Box Office Share Over Time')

<matplotlib.text.Text at 0x23927696048>

Whoa! Where did Hemsworth go?

What these two cuts show us is that Evans and Hemsworth are both heavily reliant on their Marvel franchise hits, where they are sharing the limelight, whereas Pratt has been more of a solo vehicle, especially in more recent years.

Scene 3: The "OGs"

In order to determine which Chris has staying power we pulled data on Hollywood stars of yore (Bruce Willis, Arnold Schwarzenegger, and Tom Cruise) for comparison. Given the volume of data on the older stars, we isolated the top ten grossing films for each hero.

Bruce Willis

Heyday: The late 80s to the late 90s

Known for: Die Hard franchise

Biggest Movie: The Sixth Sense $494,028,900

Type: Leading Man/Action Hero Hybrid

69 rows × 6 columns

#editing and cleaning as needed, resaved as Bruce1.csv
In [131]:
#we'll come back to this later, but let's get our other leading men in the frame!

Arnold Schwarzenegger

Heyday: Mid 80s to the mid 90s

Known for: the Terminator franchise

Biggest Movie: Terminator 2: Judgement Day $417,471,700

Type: Beefcake w/comedic chops

In [134]:

In [135]:
path='/Users/Nick/Desktop/data_bootcamp/Final Project/Arnold1.csv' 
ASchwarz = pd.read_csv(path)

print(type(ASchwarz), ASchwarz.shape, ASchwarz.dtypes)


<class 'pandas.core.frame.DataFrame'> (33, 6) Rank                object
Title               object
Studio              object
Adjusted Gross       int64
Unadjusted Gross     int64
Release Year         int64
dtype: object
   Rank                              Title  Studio  Adjusted Gross  \
0     1          Terminator 2: Judgment Day   TriS       417471700   
1     2                           True Lies    Fox       300263900   
2     3                        Total Recall   Sony       242195900   
3     4                               Twins   Uni.       238720100   
4     5  Terminator 3: Rise of the Machines     WB       213960900   
5     6                    Batman and Robin     WB       200620900   
6     7                              Eraser     WB       196632600   
7     8                    Kindergarten Cop   Uni.       186235700   
8     -                Terminator Salvation     WB       144137700   
9     9                            Predator    Fox       131082100   
10   10                  Jingle All the Way    Fox       116877400   
11   11                 Conan the Barbarian   Uni.       115466600   
12   12                         End of Days   Uni.       112489900   
13    -                     The Expendables    LGF       111236200   
14   13                    Last Action Hero   Col.       103657200   
15   14                      The Terminator  Orion        97386800   
16   15                   The Expendables 2    LGF        93715000   
17   16                 Terminator: Genisys   Par.        93351400   
18   17                            Commando    Fox        84833200   
19   18                     The Running Man   TriS        83307900   
20   19                 Conan the Destroyer   Uni.        79268100   
21   20                              Junior   Uni.        74885300   
22   21                            Red Heat   TriS        73054500   
23   22                   Collateral Damage     WB        59184700   
24   23                         The 6th Day   Sony        55051800   
25   24                   The Expendables 3    LGF        41744500   
26   25                            Raw Deal    DEG        37487100   
27    -         Around the World in 80 Days     BV        33170700   
28   26                         Escape Plan   LG/S        25840800   
29   27                           Red Sonja    MGM        16794200   
30   28                      The Last Stand    LGF        13021600   
31   29                     Sabotage (2014)    ORF        10823900   
32   30                              Maggie  RAtt.          186500   

    Unadjusted Gross  Release Year  
0          204843345          1991  
1          146282411          1994  
2          119412921          1990  
3          111938388          1988  
4          150371112          2003  
5          107325195          1997  
6          101295562          1996  
7           91457688          1990  
8          125322469          2009  
9           59735548          1987  
10          60592389          1996  
11          39565475          1982  
12          66889043          1999  
13         103068524          2010  
14          50016394          1993  
15          38371200          1984  
16          85028192          2012  
17          89760956          2015  
18          35100000          1985  
19          38122105          1987  
20          31042035          1984  
21          36763355          1994  
22          34994648          1988  
23          40077257          2002  
24          34604280          2000  
25          39322544          2014  
26          16209459          1986  
27          24008137          2004  
28          25135965          2013  
29           6948633          1985  
30          12050299          2013  
31          10508518          2014  
32            187112          2015  

In [136]:
ASchwarz.plot.scatter('Release Year', 'Adjusted Gross')

<matplotlib.axes._subplots.AxesSubplot at 0x23927bb0fd0>

In [137]:
#let's scale back sample size again


#we'll use this soon

Tom Cruise

Heyday: Mid 80’s - early aughts

Known for: Mission Impossible franchise

Biggest Movie: Top Gun $$412,055,200

Type: Cocky leading man

In [141]:
#cutting down to the top 10


Scene 4: "The Final Showdown"

In [143]:
#All of the old school action stars in one histogram. Representing share of box office cumulatively over time.['Release Year'], 
        TC['Adjusted Gross'],
      color='Blue')['Release Year'], 
        BW['Adjusted Gross'],
      color='Green')['Release Year'], 
        AS['Adjusted Gross'],

plt.title('"OG" Leading Box Office over Time')

<matplotlib.text.Text at 0x2392939a278>


Tom Cruise = Blue

Bruce Willis = Green

Arnold Schwarzenegger = Yellow

In [145]:
#As a reminder, here's what we are comparing against:

fig, ax = plt.subplots(nrows=3, ncols=1, sharex=True, sharey=True)

CE['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[0], color='purple', title="Evans")
CH['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[1], color='red', title="Hemsworth")
CP['Adjusted Gross'].head(10).plot(kind="bar",ax=ax[2], color='orange', title="Pratt")

<matplotlib.text.Text at 0x23929bdc898>

In [146]:['Release Year'], CE['Adjusted Gross'],
       color='purple')['Release Year'], CH['Adjusted Gross'],
       color='red')['Release Year'], CP['Adjusted Gross'],

plt.title('Chris Film Box Office Share Over Time')

<matplotlib.text.Text at 0x239280a4550>


Chris Evans = Purple

Chris Hemsworth = Red

Chris Pratt = Orange

Our Findings

Tom Cruise (blue) has obvious staying power with films raking in over 200 million over two decades. Arnold's biggest films are clustered in a 10 year period. Bruce Willis also had clusters of hits with his biggest successes in the late nineties. If our Chrises want to stay relevant in 2035 they'll need to adopt the "slow and steady wins the race" strategy of Tom Cruise (as long as slow and steady comes with strong receipts).

The Verdict!

The Winner: Chris Pratt! Looking at the data we predict that Chris Pratt is in the best position to capitalize going forward given his strong hauls in solo vehicles over the past several years. If he can keep his popularity up over the next decade he will be the Chris you take your grandkids to the movies to see. The upward trajectory matches our legends, and we like the trend that we see coupled with soft factors like his "everyman" appeal.

Dark Horse: Chris Evans if he can successfully spin his Marvel success into a solo vehicle for leading roles that aren't franchises.

Throw him a lifesaver: Chris Hemsworth. The once bright Thor star is floundering in solo projects, and may go the downward route of Bruce Willis.

The End

