In [8]:
%matplotlib inline
import pandas as pd

In [9]:
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))


Out[9]:

In [10]:
titles = pd.DataFrame.from_csv('data/titles.csv', index_col=None)
titles.head()


Out[10]:
title year
0 Ligaw na daigdig 1962
1 Sluby ulanskie 1934
2 The House of the Seven Gables 1940
3 Mandala - Il simbolo 2008
4 Shi bian 1958

In [11]:
cast = pd.DataFrame.from_csv('data/cast.csv', index_col=None)
cast.head()


Out[11]:
title year name type character n
0 Suuri illusioni 1985 Homo $ actor Guests 22
1 Gangsta Rap: The Glockumentary 2007 Too $hort actor Himself NaN
2 Menace II Society 1993 Too $hort actor Lew-Loc 27
3 Porndogs: The Adventures of Sadie 2009 Too $hort actor Bosco 3
4 Stop Pepper Palmer 2014 Too $hort actor Himself NaN

In [ ]:

Define a year as a "Superman year" whose films feature more Superman characters than Batman. How many years in film history have been Superman years?


In [17]:
c = cast
c = c[(c.character == 'Superman') | (c.character == 'Batman')]
c = c.groupby(['year', 'character']).size()
c = c.unstack()
c = c.fillna(0)
c.head()


Out[17]:
character Batman Superman
year
1938 1 0
1940 1 0
1943 1 0
1948 0 1
1949 2 0

In [18]:
d = c.Superman - c.Batman
print('Superman years:')
print(len(d[d > 0.0]))


Superman years:
13

How many years have been "Batman years", with more Batman characters than Superman characters?


In [19]:
d = c.Superman - c.Batman
print('Batman years:')
print(len(d[d < 0.0]))


Batman years:
23

In [ ]:

Plot the number of actor roles each year and the number of actress roles each year over the history of film.


In [23]:
c = cast
#c = c[(c.character == 'Superman') | (c.character == 'Batman')]
c = c.groupby(['year', 'type']).size()
c = c.unstack()
c = c.fillna(0)
c.plot()


Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8c70cef438>

In [ ]:

Plot the number of actor roles each year and the number of actress roles each year, but this time as a kind='area' plot.


In [24]:
c.plot(kind='area')


Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8cc19da780>

In [ ]:

Plot the difference between the number of actor roles each year and the number of actress roles each year over the history of film.


In [29]:
c = cast
c = c.groupby(['year', 'type']).size()
c = c.unstack('type')
(c.actor - c.actress).plot()


Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8cc19c6748>

In [ ]:

Plot the fraction of roles that have been 'actor' roles each year in the hitsory of film.


In [32]:
(c.actor/ (c.actor + c.actress)).plot(ylim=[0,1])


Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8cc07f6940>

In [ ]:

Plot the fraction of supporting (n=2) roles that have been 'actor' roles each year in the history of film.


In [40]:
c = cast[(cast["n"] == 2) ]
c = c.groupby(['year','type']).size()
c = c.unstack('type')
(c.actor/ (c.actor + c.actress)).plot(ylim=[0,1])


Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8cc06262b0>

In [ ]:

Build a plot with a line for each rank n=1 through n=3, where the line shows what fraction of that rank's roles were 'actor' roles for each year in the history of film.


In [62]:
c = cast
c = c[c.n <= 3]
c = c.groupby(['year', 'type', 'n']).size()
c = c.unstack('type')
r = c.actor / (c.actor + c.actress)
r = r.unstack('n')
r.plot(ylim=[0,1])


Out[62]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f8cc0355ef0>

In [ ]: