Pixar Animation is one of the most well known animation studios in the world and many people worldwide religiously watch every new released film.

Here we use a dataset on Pixar movies gathered from multiple sources including:

  • Box Office Mojo
  • IMDB
  • Rotten Tomatoes
  • Metacritic

Here are some of the columns in our dataset PixarMovies.csv:

  • Year Released - the year the movie was released.
  • Movie - the name of the movie.
  • RT Score - the Rotten Tomatoes rating for the movie.
  • IMDB Score - the IMDB rating for the movie.
  • Metacritic SCore - the Metacritic rating for the movie.
  • Opening Weekend - the amount of revenue the movie made on opening weekend (in millions of dollars).
  • Worldwide Gross - the total amount of revenue the movie has made to date.
  • Production Budget - the amount of money spent to produce the film (in millions of dollars).
  • Oscars Won - the number of Oscar awards the movie won.

In [1]:
# Setup the environment by importing the libraries we need
import pandas as pd
import matplotlib.pyplot as plt
# Note: Importing seaborn effects all matplotlib and pandas plots as well
import seaborn as sns

# Run the Jupyter magic so that plots are displayed in the notebook
%matplotlib notebook

In [2]:
# Read the dataset into a DataFrame and determine the dimensions
pixar_movies = pd.read_csv('../data/PixarMovies.csv')
pixar_movies.shape


Out[2]:
(15, 16)

In [3]:
# Display the entire dataset since it isn't too big
pixar_movies.head(15)


Out[3]:
Year Released Movie Length RT Score IMDB Score Metacritic Score Opening Weekend Worldwide Gross Domestic Gross Adjusted Domestic Gross International Gross Domestic % International % Production Budget Oscars Nominated Oscars Won
0 1995 Toy Story 81 100 8.3 92 29.14 362.0 191.8 356.21 170.2 52.98% 47.02% 30 3.0 0.0
1 1998 A Bug's Life 96 92 7.2 77 33.26 363.4 162.8 277.18 200.6 44.80% 55.20% 45 1.0 0.0
2 1999 Toy Story 2 92 100 7.9 88 57.39 485.0 245.9 388.43 239.2 50.70% 49.32% 90 1.0 0.0
3 2001 Monsters, Inc. 90 96 8.1 78 62.58 528.8 255.9 366.12 272.9 48.39% 51.61% 115 3.0 1.0
4 2003 Finding Nemo 104 99 8.2 90 70.25 895.6 339.7 457.46 555.9 37.93% 62.07% 94 4.0 1.0
5 2004 The Incredibles 115 97 8.0 90 70.47 631.4 261.4 341.28 370.0 41.40% 58.60% 92 4.0 2.0
6 2006 Cars 116 74 7.2 73 60.12 462.0 244.1 302.59 217.9 52.84% 47.16% 70 2.0 0.0
7 2007 Ratatouille 111 96 8.0 96 47.00 623.7 206.4 243.65 417.3 33.09% 66.91% 150 5.0 1.0
8 2008 WALL-E 97 96 8.4 94 63.10 521.3 223.8 253.11 297.5 42.93% 57.07% 180 6.0 1.0
9 2009 Up 96 98 8.3 88 68.11 731.3 293.0 318.90 438.3 40.07% 59.93% 175 5.0 2.0
10 2010 Toy Story 3 103 99 8.4 92 110.31 1063.2 415.0 423.88 648.2 39.03% 60.97% 200 5.0 2.0
11 2011 Cars 2 113 39 6.3 57 109.00 559.9 191.5 194.43 368.4 34.20% 65.80% 200 0.0 0.0
12 2012 Brave 100 78 7.2 69 66.30 539.0 237.3 243.39 301.7 44.03% 55.97% 185 1.0 1.0
13 2013 Monsters University 107 78 7.4 65 82.43 743.6 268.5 269.59 475.1 36.11% 63.89% 200 0.0 0.0
14 2015 Inside Out 102 98 8.8 93 90.40 677.1 340.5 340.50 336.6 50.29% 49.71% 175 NaN NaN

In [4]:
# Get the datatypes for each column
pixar_movies.dtypes


Out[4]:
Year Released                int64
Movie                       object
Length                       int64
RT Score                     int64
IMDB Score                 float64
Metacritic Score             int64
Opening Weekend            float64
Worldwide Gross            float64
Domestic Gross             float64
Adjusted Domestic Gross    float64
International Gross        float64
Domestic %                  object
International %             object
Production Budget            int64
Oscars Nominated           float64
Oscars Won                 float64
dtype: object

In [5]:
# Generate some summary statistics
pixar_movies.describe()


/home/todd/anaconda3/lib/python3.5/site-packages/numpy/lib/function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
Out[5]:
Year Released Length RT Score IMDB Score Metacritic Score Opening Weekend Worldwide Gross Domestic Gross Adjusted Domestic Gross International Gross Production Budget Oscars Nominated Oscars Won
count 15.000000 15.000000 15.000000 15.000000 15.000000 15.000000 15.000000 15.000000 15.000000 15.000000 15.000000 14.000000 14.000000
mean 2006.066667 101.533333 89.333333 7.846667 82.800000 67.990667 612.486667 258.506667 318.448000 353.986667 133.400000 2.857143 0.785714
std 5.933761 9.927355 16.451950 0.655599 12.119642 23.270468 190.193934 66.518284 73.321064 135.061615 59.696614 2.032700 0.801784
min 1995.000000 81.000000 39.000000 6.300000 57.000000 29.140000 362.000000 162.800000 194.430000 170.200000 30.000000 0.000000 0.000000
25% 2002.000000 96.000000 85.000000 7.300000 75.000000 58.755000 503.150000 215.100000 261.350000 256.050000 91.000000 NaN NaN
50% 2007.000000 102.000000 96.000000 8.000000 88.000000 66.300000 559.900000 245.900000 318.900000 336.600000 150.000000 NaN NaN
75% 2010.500000 109.000000 98.500000 8.300000 92.000000 76.450000 704.200000 280.750000 361.165000 427.800000 182.500000 NaN NaN
max 2015.000000 116.000000 100.000000 8.800000 96.000000 110.310000 1063.200000 415.000000 457.460000 648.200000 200.000000 6.000000 2.000000

In [6]:
# Strip the percentage sign (%) from the end of values and convert to float
pixar_movies['Domestic %'] = pixar_movies['Domestic %'].str.rstrip('%').astype(float)
pixar_movies['International %'] = pixar_movies['International %'].str.rstrip('%').astype(float)
pixar_movies[['Domestic %', 'International %']].head()


Out[6]:
Domestic % International %
0 52.98 47.02
1 44.80 55.20
2 50.70 49.32
3 48.39 51.61
4 37.93 62.07

In [7]:
# Multiply IMDB Scroe column by 10 to convert to a 100 point scale
pixar_movies['IMDB Score'] = pixar_movies['IMDB Score'] * 10
pixar_movies.head()


Out[7]:
Year Released Movie Length RT Score IMDB Score Metacritic Score Opening Weekend Worldwide Gross Domestic Gross Adjusted Domestic Gross International Gross Domestic % International % Production Budget Oscars Nominated Oscars Won
0 1995 Toy Story 81 100 83.0 92 29.14 362.0 191.8 356.21 170.2 52.98 47.02 30 3.0 0.0
1 1998 A Bug's Life 96 92 72.0 77 33.26 363.4 162.8 277.18 200.6 44.80 55.20 45 1.0 0.0
2 1999 Toy Story 2 92 100 79.0 88 57.39 485.0 245.9 388.43 239.2 50.70 49.32 90 1.0 0.0
3 2001 Monsters, Inc. 90 96 81.0 78 62.58 528.8 255.9 366.12 272.9 48.39 51.61 115 3.0 1.0
4 2003 Finding Nemo 104 99 82.0 90 70.25 895.6 339.7 457.46 555.9 37.93 62.07 94 4.0 1.0

In [8]:
# Create a new DataFrame with the last row filtered out
filtered_pixar = pixar_movies.dropna()
filtered_pixar


Out[8]:
Year Released Movie Length RT Score IMDB Score Metacritic Score Opening Weekend Worldwide Gross Domestic Gross Adjusted Domestic Gross International Gross Domestic % International % Production Budget Oscars Nominated Oscars Won
0 1995 Toy Story 81 100 83.0 92 29.14 362.0 191.8 356.21 170.2 52.98 47.02 30 3.0 0.0
1 1998 A Bug's Life 96 92 72.0 77 33.26 363.4 162.8 277.18 200.6 44.80 55.20 45 1.0 0.0
2 1999 Toy Story 2 92 100 79.0 88 57.39 485.0 245.9 388.43 239.2 50.70 49.32 90 1.0 0.0
3 2001 Monsters, Inc. 90 96 81.0 78 62.58 528.8 255.9 366.12 272.9 48.39 51.61 115 3.0 1.0
4 2003 Finding Nemo 104 99 82.0 90 70.25 895.6 339.7 457.46 555.9 37.93 62.07 94 4.0 1.0
5 2004 The Incredibles 115 97 80.0 90 70.47 631.4 261.4 341.28 370.0 41.40 58.60 92 4.0 2.0
6 2006 Cars 116 74 72.0 73 60.12 462.0 244.1 302.59 217.9 52.84 47.16 70 2.0 0.0
7 2007 Ratatouille 111 96 80.0 96 47.00 623.7 206.4 243.65 417.3 33.09 66.91 150 5.0 1.0
8 2008 WALL-E 97 96 84.0 94 63.10 521.3 223.8 253.11 297.5 42.93 57.07 180 6.0 1.0
9 2009 Up 96 98 83.0 88 68.11 731.3 293.0 318.90 438.3 40.07 59.93 175 5.0 2.0
10 2010 Toy Story 3 103 99 84.0 92 110.31 1063.2 415.0 423.88 648.2 39.03 60.97 200 5.0 2.0
11 2011 Cars 2 113 39 63.0 57 109.00 559.9 191.5 194.43 368.4 34.20 65.80 200 0.0 0.0
12 2012 Brave 100 78 72.0 69 66.30 539.0 237.3 243.39 301.7 44.03 55.97 185 1.0 1.0
13 2013 Monsters University 107 78 74.0 65 82.43 743.6 268.5 269.59 475.1 36.11 63.89 200 0.0 0.0

In [9]:
# Set the Movie column as the index for both DataFrames
pixar_movies.set_index('Movie', inplace=True)
filtered_pixar.set_index('Movie', inplace=True)
pixar_movies.head()


Out[9]:
Year Released Length RT Score IMDB Score Metacritic Score Opening Weekend Worldwide Gross Domestic Gross Adjusted Domestic Gross International Gross Domestic % International % Production Budget Oscars Nominated Oscars Won
Movie
Toy Story 1995 81 100 83.0 92 29.14 362.0 191.8 356.21 170.2 52.98 47.02 30 3.0 0.0
A Bug's Life 1998 96 92 72.0 77 33.26 363.4 162.8 277.18 200.6 44.80 55.20 45 1.0 0.0
Toy Story 2 1999 92 100 79.0 88 57.39 485.0 245.9 388.43 239.2 50.70 49.32 90 1.0 0.0
Monsters, Inc. 2001 90 96 81.0 78 62.58 528.8 255.9 366.12 272.9 48.39 51.61 115 3.0 1.0
Finding Nemo 2003 104 99 82.0 90 70.25 895.6 339.7 457.46 555.9 37.93 62.07 94 4.0 1.0

In [10]:
# Create a new DataFrame containing just the critics reviews
critics_reviews = pixar_movies[['RT Score', 'IMDB Score', 'Metacritic Score']]
critics_reviews.head()


Out[10]:
RT Score IMDB Score Metacritic Score
Movie
Toy Story 100 83.0 92
A Bug's Life 92 72.0 77
Toy Story 2 100 79.0 88
Monsters, Inc. 96 81.0 78
Finding Nemo 99 82.0 90

In [12]:
# Use the DataFrame plot() metod to visualize this new DataFrame
critics_reviews.plot()


Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1a2e69e710>

In [13]:
# The resulting plot is a little cramped, so lets tweak the figure size
critics_reviews.plot(figsize=(9,6))


Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1a2e3d1630>

Note: Note all movie names are listed on the x-axis and the vertical grid line on the x-aixs exist only for every other movie.


In [14]:
# Box plot
critics_reviews.plot(kind='box', figsize=(9,5))


Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1a2e41ac18>

In [15]:
# Stacked bar plot
revenue_proportions = filtered_pixar[['Domestic %', 'International %']]
revenue_proportions.plot(kind='bar', stacked='True', figsize=(9,6))


Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1a2e3639e8>

Create a grouped bar plot to explore if there's any correlation between the number of Oscars a movie was nominated for and the number it actually won.


In [16]:
# Create a grouped bar plot
movie_oscars = filtered_pixar[['Oscars Nominated', 'Oscars Won']]
movie_oscars.plot(kind='bar')


Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1a2c9d0a58>

What plots can you generate to better understand which columns correlate with the Adjusted Domestic Gross revenue column, which describes the total domestic revenue adjusted for economic and ticket price inflation?


In [17]:
# Generate plots to better understand which columsn correlate with the Adjusted Domestic Gross revenue

# Compute pairwise correlation of columns to understand which columns may have interesting correlation
pixar_movies.corr()


Out[17]:
Year Released Length RT Score IMDB Score Metacritic Score Opening Weekend Worldwide Gross Domestic Gross Adjusted Domestic Gross International Gross Domestic % International % Production Budget Oscars Nominated Oscars Won
Year Released 1.000000 0.534099 -0.385842 -0.076138 -0.258043 0.740376 0.503504 0.417111 -0.373099 0.503543 -0.438875 0.438724 0.892410 -0.014485 0.296232
Length 0.534099 1.000000 -0.492301 -0.406876 -0.288763 0.471698 0.312400 0.123327 -0.310743 0.379133 -0.500744 0.500658 0.307082 -0.069799 0.125731
RT Score -0.385842 -0.492301 1.000000 0.876587 0.867997 -0.423794 0.167612 0.349368 0.654057 0.064001 0.300440 -0.300373 -0.331351 0.632767 0.438174
IMDB Score -0.076138 -0.406876 0.876587 1.000000 0.891237 -0.096896 0.347247 0.548467 0.619059 0.218876 0.240026 -0.240066 -0.033545 0.817596 0.575408
Metacritic Score -0.258043 -0.288763 0.867997 0.891237 1.000000 -0.273258 0.220493 0.341773 0.532589 0.142197 0.157633 -0.157579 -0.213031 0.838883 0.503676
Opening Weekend 0.740376 0.471698 -0.423794 -0.096896 -0.273258 1.000000 0.688564 0.624218 0.026921 0.662183 -0.421982 0.421986 0.757867 -0.056367 0.316225
Worldwide Gross 0.503504 0.312400 0.167612 0.347247 0.220493 0.688564 1.000000 0.883506 0.460129 0.973036 -0.543928 0.543915 0.551215 0.384349 0.621834
Domestic Gross 0.417111 0.123327 0.349368 0.548467 0.341773 0.624218 0.883506 1.000000 0.672332 0.751641 -0.099319 0.099302 0.385702 0.402677 0.625018
Adjusted Domestic Gross -0.373099 -0.310743 0.654057 0.619059 0.532589 0.026921 0.460129 0.672332 1.000000 0.316879 0.299128 -0.298995 -0.347640 0.325413 0.311573
International Gross 0.503543 0.379133 0.064001 0.218876 0.142197 0.662183 0.973036 0.751641 0.316879 1.000000 -0.716986 0.716976 0.586223 0.352585 0.582735
Domestic % -0.438875 -0.500744 0.300440 0.240026 0.157633 -0.421982 -0.543928 -0.099319 0.299128 -0.716986 1.000000 -1.000000 -0.581244 -0.142666 -0.333284
International % 0.438724 0.500658 -0.300373 -0.240066 -0.157579 0.421986 0.543915 0.099302 -0.298995 0.716976 -1.000000 1.000000 0.581228 0.142492 0.333149
Production Budget 0.892410 0.307082 -0.331351 -0.033545 -0.213031 0.757867 0.551215 0.385702 -0.347640 0.586223 -0.581244 0.581228 1.000000 0.078973 0.352405
Oscars Nominated -0.014485 -0.069799 0.632767 0.817596 0.838883 -0.056367 0.384349 0.402677 0.325413 0.352585 -0.142666 0.142492 0.078973 1.000000 0.734945
Oscars Won 0.296232 0.125731 0.438174 0.575408 0.503676 0.316225 0.621834 0.625018 0.311573 0.582735 -0.333284 0.333149 0.352405 0.734945 1.000000

Domesitic Gross obviously has a strong positive correlation with Adusted Domestic Gross. It makes sense that all of the critic reviews have a strong positive correlation with money made. What isn't obvious aprior is which review score will correlate most strongly with box office success. For example, it looks like RT has a really strong correlation, ubt Metacritic less so.


In [18]:
pixar_movies.plot(x='RT Score', y='Adjusted Domestic Gross', kind='scatter')


Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f1a2c939240>

In [19]:
adjusted_gross = pixar_movies.copy()
adjusted_gross.set_index('Adjusted Domestic Gross', inplace=True)
adjusted_gross.head()


Out[19]:
Year Released Length RT Score IMDB Score Metacritic Score Opening Weekend Worldwide Gross Domestic Gross International Gross Domestic % International % Production Budget Oscars Nominated Oscars Won
Adjusted Domestic Gross
356.21 1995 81 100 83.0 92 29.14 362.0 191.8 170.2 52.98 47.02 30 3.0 0.0
277.18 1998 96 92 72.0 77 33.26 363.4 162.8 200.6 44.80 55.20 45 1.0 0.0
388.43 1999 92 100 79.0 88 57.39 485.0 245.9 239.2 50.70 49.32 90 1.0 0.0
366.12 2001 90 96 81.0 78 62.58 528.8 255.9 272.9 48.39 51.61 115 3.0 1.0
457.46 2003 104 99 82.0 90 70.25 895.6 339.7 555.9 37.93 62.07 94 4.0 1.0

In [ ]: