Timothy Helton
Exploring and visualizing movie data scraped from Box Office Mojo. Make sure to format your plots properly with axis labels and graph titles at the very least.
NOTE:
This notebook uses code found in the
k2datascience.movies module.
To execute all the cells do one of the following items:
In [ ]:
from k2datascience import movies
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
%matplotlib inline
In [ ]:
mov = movies.BoxOffice()
print(f'Data Types:\n{mov.data.dtypes}\n\n')
print(f'Data Shape:\n{mov.data.shape}\n\n')
print(f'Missing Data:\n{mov.data.isnull().sum()}\n\n')
mov.data.head()
mov.data.tail()
mov.data.describe()
In [ ]:
mov.distribution_plot()
In [ ]:
mov.kde_plot()
In [ ]:
mov.domestic_gross_vs_release_date_plot()
In [ ]:
mov.domestic_gross_vs_runtime_plot()
mov.runtime_vs_release_plot()
The Motion Picture Association of America (MPAA) rating system comprises of G (general audiences: all ages admitted), PG (parental guidance suggested: may not be suitable for children), PG-13 (parents stronly cautioned: may be inappropriate for children under 13), and R (restricted: under 17 requires accompanying parent or adult guardian). Find the average Runtime and Domestic Total Gross at each Rating Level. Plot both by Rating. Do you see any pattern in the data?
In [ ]:
mov.rating_plot()
Plot the Domestic Total Gross by the Release Date. Segment by Rating - that is, have all 4 groups on the same plot. Can you spot anything out of the ordinary? Now make 4 separate plots (one for each Rating) but part of the same matplotlib figure. What are the benefits and liabilities of each approach?
In [ ]:
mov.domestic_gross_rating_plot()
In [ ]:
directors = mov.director_performance()
print('Top 10 Directors')
directors.head(10)
print('Top 10 Directors with more than one release')
directors.query('qty > 1').head(10)
In [ ]:
mov.domestic_gross_vs_months()