In [48]:
%matplotlib inline
import pandas as pd
import seaborn as sbn
sbn.set()

In [3]:
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))


Out[3]:

In [4]:
titles = pd.DataFrame.from_csv('data/titles.csv', index_col=None)
titles.head()


Out[4]:
title year
0 The Rising Son 1990
1 Ashes of Kukulcan 2016
2 The Thousand Plane Raid 1969
3 Crucea de piatra 1993
4 The 86 2015

In [5]:
cast = pd.DataFrame.from_csv('data/cast.csv', index_col=None)
cast.head()


Out[5]:
title year name type character n
0 Suuri illusioni 1985 Homo $ actor Guests 22
1 Gangsta Rap: The Glockumentary 2007 Too $hort actor Himself NaN
2 Menace II Society 1993 Too $hort actor Lew-Loc 27
3 Porndogs: The Adventures of Sadie 2009 Too $hort actor Bosco 3
4 Stop Pepper Palmer 2014 Too $hort actor Himself NaN

In [ ]:

Define a year as a "Superman year" whose films feature more Superman characters than Batman. How many years in film history have been Superman years?


In [41]:
both = cast[(cast.character=='Superman') | (cast.character == 'Batman')].groupby(['year','character']).size().unstack().fillna(0)
diff = both.Superman - both.Batman
print("Superman: " + str(len(diff[diff>0])))


Superman: 12

In [ ]:

How many years have been "Batman years", with more Batman characters than Superman characters?


In [42]:
both = cast[(cast.character=='Superman') | (cast.character == 'Batman')].groupby(['year','character']).size().unstack().fillna(0)
diff = both.Batman - both.Superman
print("Batman: " + str(len(diff[diff>0])))


Batman: 24

In [ ]:

Plot the number of actor roles each year and the number of actress roles each year over the history of film.


In [51]:
cast.groupby(['year','type']).size().unstack().plot()


Out[51]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f71c41e4c50>

In [ ]:

Plot the number of actor roles each year and the number of actress roles each year, but this time as a kind='area' plot.


In [52]:
cast.groupby(['year','type']).size().unstack().plot(kind='area')


Out[52]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f71c4166748>

In [ ]:

Plot the difference between the number of actor roles each year and the number of actress roles each year over the history of film.


In [55]:
foo = cast.groupby(['year','type']).size().unstack().fillna(0)

In [60]:
foo['diff'] = foo['actor']-foo['actress']
foo['diff'].plot()


Out[60]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f71c40f48d0>

Plot the fraction of roles that have been 'actor' roles each year in the hitsory of film.


In [61]:
foo['totalRoles'] = foo['actor']+foo['actress']
foo['manFrac'] = foo['actor']/foo['totalRoles']
foo['manFrac'].plot()


Out[61]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f71c408a278>

In [ ]:

Plot the fraction of supporting (n=2) roles that have been 'actor' roles each year in the history of film.


In [68]:
support = cast[cast.n==2]
bar = support.groupby(['year','type']).size().unstack().fillna(0)
bar['totalRoles'] = bar['actor']+bar['actress']
bar['manFrac'] = bar['actor']/bar['totalRoles']
bar['manFrac'].plot()


Out[68]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f71c3fd87b8>

In [ ]:

Build a plot with a line for each rank n=1 through n=3, where the line shows what fraction of that rank's roles were 'actor' roles for each year in the history of film.


In [84]:
thirdWheel = cast[cast.n==3]
baz = thirdWheel.groupby(['year','type']).size().unstack().fillna(0)
baz['totalRoles'] = baz['actor']+baz['actress']
baz['manFrac'] = baz['actor']/baz['totalRoles']
foo['manFrac'].plot() + (bar['manFrac'].plot() + baz['manFrac'].plot())


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-84-69299e995b5c> in <module>()
      3 baz['totalRoles'] = baz['actor']+baz['actress']
      4 baz['manFrac'] = baz['actor']/baz['totalRoles']
----> 5 foo['manFrac'].plot() + (bar['manFrac'].plot() + baz['manFrac'].plot())

TypeError: unsupported operand type(s) for +: 'AxesSubplot' and 'AxesSubplot'

In [ ]: