In [68]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt

from mpltools import style
from mpltools import layout

style.use('ggplot')

In [3]:
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))


Out[3]:

In [4]:
titles = pd.DataFrame.from_csv('data/titles.csv', index_col=None)
titles.head()


Out[4]:
title year
0 Night Walker 2017
1 Black Devil Doll 2007
2 Sedmaya pulya 1973
3 The Gentleman from Louisiana 1936
4 Agente XU 777 1963

In [5]:
cast = pd.DataFrame.from_csv('data/cast.csv', index_col=None)
cast.head()


Out[5]:
title year name type character n
0 Suuri illusioni 1985 Homo $ actor Guests 22
1 Gangsta Rap: The Glockumentary 2007 Too $hort actor Himself NaN
2 Menace II Society 1993 Too $hort actor Lew-Loc 27
3 Porndogs: The Adventures of Sadie 2009 Too $hort actor Bosco 3
4 Stop Pepper Palmer 2014 Too $hort actor Himself NaN

What are the ten most common movie names of all time?


In [17]:
titles['title'].value_counts().head()


Out[17]:
Hamlet                  19
Macbeth                 14
Carmen                  14
The Three Musketeers    12
Maya                    11
dtype: int64

In [12]:
titles['title'][titles.title == 'xXx']


Out[12]:
86898    xXx
Name: title, dtype: object

Which three years of the 1930s saw the most films released?


In [63]:
moviesOf1930s = titles[(titles.year >= 1930) & (titles.year < 1940)]

moviesOf1930s.year.value_counts().sort_index().plot()


Out[63]:
<matplotlib.axes._subplots.AxesSubplot at 0x1172d17f0>

In [ ]:

Plot the number of films that have been released each decade over the history of cinema.


In [78]:
t = titles
t.groupby(t.year // 10 * 10).size().plot(kind='bar')
del t


Plot the number of "Hamlet" films made each decade.


In [77]:
hFilms = titles[titles.title == 'Hamlet']
hFilms.groupby(hFilms.year // 10 * 10).size().plot(kind='bar')
del hFilms



In [81]:



  File "<ipython-input-81-0f5ad439a26e>", line 1
    print matplotlib.pyplot.plt.style.available
                   ^
SyntaxError: Missing parentheses in call to 'print'

Plot the number of "Rustler" characters in each decade of the history of film.


In [ ]:


In [ ]:

Plot the number of "Batman" characters each decade.


In [ ]:


In [ ]:

What are the 11 most common character names in movie history?


In [ ]:


In [ ]:

Who are the 10 people most often credited as "Herself" in film history?


In [ ]:


In [ ]:

Who are the 10 people most often credited as "Himself" in film history?


In [ ]:


In [ ]:

Which actors or actresses appeared in the most movies in the year 1945?


In [ ]:


In [ ]:

Which actors or actresses appeared in the most movies in the year 1985?


In [ ]:


In [ ]:

Plot how many roles Mammootty has played in each year of his career.


In [ ]:


In [ ]:

What are the 10 most frequent roles that start with the phrase "Patron in"?


In [ ]:


In [ ]:

What are the 10 most frequent roles that start with the word "Science"?


In [ ]:


In [ ]:

Plot the n-values of the roles that Judi Dench has played over her career.


In [ ]:


In [ ]:

Plot the n-values of Cary Grant's roles through his career.


In [ ]:


In [ ]:

Plot the n-value of the roles that Sidney Poitier has acted over the years.


In [ ]:


In [ ]:

How many leading (n=1) roles were available to actors, and how many to actresses, in the 1950s?


In [ ]:


In [ ]:

How many supporting (n=2) roles were available to actors, and how many to actresses, in the 1950s?


In [ ]:


In [82]:
c = cast
c = c[c.name == "George Clooney"]
c.groupby(['year']).size().plot(kind='hist')
del c



In [ ]: