In [1]:
%matplotlib inline
import pandas as pd

In [2]:
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))


Out[2]:

In [8]:
titles = pd.DataFrame.from_csv('data/titles.csv', index_col=None)
titles.head()


Out[8]:
title year
0 Nijiotoko 1949
1 Rosenstrasse 2003
2 Dilruba Tangewali 1987
3 '68 1988
4 Catherine et Cie 1975

In [7]:
cast = pd.DataFrame.from_csv('data/cast.csv', index_col=None)
cast.head()


Out[7]:
title year name type character n
0 Suuri illusioni 1985 Homo $ actor Guests 22
1 Gangsta Rap: The Glockumentary 2007 Too $hort actor Himself NaN
2 Menace II Society 1993 Too $hort actor Lew-Loc 27
3 Porndogs: The Adventures of Sadie 2009 Too $hort actor Bosco 3
4 Stop Pepper Palmer 2014 Too $hort actor Himself NaN

What are the ten most common movie names of all time?


In [21]:
t = titles
t = t.title.value_counts()
t.head(10)


Out[21]:
Hamlet                  19
Carmen                  14
Macbeth                 14
The Three Musketeers    12
Blood Money             11
She                     11
The Outsider            11
Maya                    11
The Promise             10
Anna Karenina           10
dtype: int64

In [ ]:

Which three years of the 1930s saw the most films released?


In [50]:
t = titles
t = t[(t.year >= 1930) & (t.year < 1940)]
t = t.year.value_counts()
t.head(3)


Out[50]:
1937    1184
1936    1121
1938    1117
dtype: int64

In [ ]:

Plot the number of films that have been released each decade over the history of cinema.


In [63]:
t = titles
t = t.groupby(t.year // 10 * 10)
t.size().plot(kind='bar')


Out[63]:
<matplotlib.axes._subplots.AxesSubplot at 0x107855668>

In [ ]:

Plot the number of "Hamlet" films made each decade.


In [83]:
t = titles
t = t[t.title == 'Hamlet']
t = t.groupby(t.year // 10 * 10)
t.size().plot(kind="bar")


Out[83]:
<matplotlib.axes._subplots.AxesSubplot at 0x109424fd0>

In [ ]:

Plot the number of "Rustler" characters in each decade of the history of film.


In [10]:
c = cast
c = c[c.character == 'Rustler']
c = c.groupby(c.year // 10 * 10)
c.size().plot(kind='bar')


Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d1d8668>

In [ ]:

Plot the number of "Batman" characters each decade.


In [ ]:


In [ ]:

What are the 11 most common character names in movie history?


In [ ]:


In [ ]:

Who are the 10 people most often credited as "Herself" in film history?


In [ ]:


In [ ]:

Who are the 10 people most often credited as "Himself" in film history?


In [ ]:


In [ ]:

Which actors or actresses appeared in the most movies in the year 1945?


In [ ]:


In [ ]:

Which actors or actresses appeared in the most movies in the year 1985?


In [ ]:


In [ ]:

Plot how many roles Mammootty has played in each year of his career.


In [ ]:


In [ ]:

What are the 10 most frequent roles that start with the phrase "Patron in"?


In [ ]:


In [ ]:

What are the 10 most frequent roles that start with the word "Science"?


In [ ]:


In [ ]:

Plot the n-values of the roles that Judi Dench has played over her career.


In [ ]:


In [ ]:

Plot the n-values of Cary Grant's roles through his career.


In [ ]:


In [ ]:

Plot the n-value of the roles that Sidney Poitier has acted over the years.


In [ ]:


In [ ]:

How many leading (n=1) roles were available to actors, and how many to actresses, in the 1950s?


In [ ]:


In [ ]:

How many supporting (n=2) roles were available to actors, and how many to actresses, in the 1950s?


In [ ]:


In [ ]: