In [5]:
%matplotlib inline
import pandas as pd

In [6]:
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))


Out[6]:

In [7]:
titles = pd.DataFrame.from_csv('data/titles.csv', index_col=None)
titles.head()


Out[7]:
title year
0 Nijiotoko 1949
1 Rosenstrasse 2003
2 Dilruba Tangewali 1987
3 '68 1988
4 Catherine et Cie 1975

In [8]:
cast = pd.DataFrame.from_csv('data/cast.csv', index_col=None)

In [49]:
cast.head()


Out[49]:
title year name type character n
0 Suuri illusioni 1985 Homo $ actor Guests 22
1 Gangsta Rap: The Glockumentary 2007 Too $hort actor Himself NaN
2 Menace II Society 1993 Too $hort actor Lew-Loc 27
3 Porndogs: The Adventures of Sadie 2009 Too $hort actor Bosco 3
4 Stop Pepper Palmer 2014 Too $hort actor Himself NaN

How many movies are listed in the titles dataframe?


In [11]:
titles.count()


Out[11]:
title    212811
year     212811
dtype: int64

212811

What are the earliest two films listed in the titles dataframe?


In [16]:
titles.sort('year').head()


Out[16]:
title year
211368 Miss Jerry 1894
71278 Reproduction of the Corbett and Fitzsimmons Fight 1897
137097 Sharkey-McCoy Fight Reproduced in 10 Rounds 1899
27632 Jeffries-Sharkey Contest 1899
144034 Battle of Jeffries and Sharkey for Championshi... 1899

Reproduction of the Corbett and Fitzimmons Fight, Miss Jerry

How many movies have the title "Hamlet"?


In [21]:
t = titles
t[t.title == 'Hamlet'].count()


Out[21]:
title    19
year     19
dtype: int64

19

How many movies are titled "North by Northwest"?


In [23]:
t = titles
t[t.title == "North by Northwest"]


Out[23]:
title year
18689 North by Northwest 1959

1

When was the first movie titled "Hamlet" made?


In [26]:
t = titles
t[t.title == 'Hamlet'].sort('year').head()


Out[26]:
title year
108190 Hamlet 1910
101859 Hamlet 1911
151019 Hamlet 1913
33577 Hamlet 1921
194828 Hamlet 1948

1910

List all of the "Treasure Island" movies from earliest to most recent.


In [30]:
t = titles
t[t.title == "Treasure Island"].sort('year')


Out[30]:
title year
183926 Treasure Island 1918
101232 Treasure Island 1920
202143 Treasure Island 1934
36428 Treasure Island 1950
174138 Treasure Island 1972
180535 Treasure Island 1973
188589 Treasure Island 1985
165726 Treasure Island 1999

In [ ]:

How many movies were made in the year 1950?


In [32]:
t = titles
t[t.year == 1950].count()


Out[32]:
title    1033
year     1033
dtype: int64

1033

How many movies were made in the year 1960?


In [36]:
t = titles
t[t.year == 1960].count()


Out[36]:
title    1423
year     1423
dtype: int64

1423

How many movies were made from 1950 through 1959?


In [41]:
t = titles
t[(t.year >= 1950) & (t.year <= 1959)].count()


Out[41]:
title    12051
year     12051
dtype: int64

12051

In what years has a movie titled "Batman" been released?


In [44]:
t = titles
t[t.title == 'Batman']


Out[44]:
title year
130209 Batman 1943
192955 Batman 1989

In [ ]:

How many roles were there in the movie "Inception"?


In [103]:
c = cast
c = len(c[c.title == 'Inception'])
c


Out[103]:
72

72

How many roles in the movie "Inception" are NOT ranked by an "n" value?


In [104]:
c = cast
c = c[c.title == 'Inception']
c = c[c.n.isnull()]
len(c)


Out[104]:
21

21

But how many roles in the movie "Inception" did receive an "n" value?


In [95]:
c =  cast
c = c[c.title == 'Inception']
c = c[c.n.notnull()]
len(c)


Out[95]:
51

51

Display the cast of "North by Northwest" in their correct "n"-value order, ignoring roles that did not earn a numeric "n" value.


In [110]:
c = cast
c = c[c.title == "North by Northwest"]
c = c[c.n.notnull()]
c.sort('n')


Out[110]:
title year name type character n
768520 North by Northwest 1959 Cary Grant actor Roger O. Thornhill 1
3064038 North by Northwest 1959 Eva Marie Saint actress Eve Kendall 2
1284677 North by Northwest 1959 James Mason actor Phillip Vandamm 3
2758169 North by Northwest 1959 Jessie Royce Landis actress Clara Thornhill 4
313200 North by Northwest 1959 Leo G. Carroll actor The Professor 5
2667606 North by Northwest 1959 Josephine Hutchinson actress Mrs. Townsend 6
1495479 North by Northwest 1959 Philip Ober actor Lester Townsend 7
1123585 North by Northwest 1959 Martin Landau actor Leonard 8
2153857 North by Northwest 1959 Adam Williams actor Valerian 9
1597034 North by Northwest 1959 Edward Platt actor Victor Larrabee 10
586825 North by Northwest 1959 Robert Ellenstein actor Licht 11
2020534 North by Northwest 1959 Les Tremayne actor Auctioneer 12
408497 North by Northwest 1959 Philip Coolidge actor Dr. Cross 13
1330774 North by Northwest 1959 Patrick McVey actor Sergeant Flamm 14
179711 North by Northwest 1959 Edward Binns actor Captain Junket 15
1220657 North by Northwest 1959 Ken Lynch actor Charley - Chicago Policeman 16

In [ ]:

Display the entire cast, in "n"-order, of the 1972 film "Sleuth".


In [112]:
c = cast
c = c[c.title == "Sleuth"]
c.sort(['n'])


Out[112]:
title year name type character n
286558 Sleuth 2007 Michael Caine actor Andrew 1
1504207 Sleuth 1972 Laurence Olivier actor Andrew Wyke 1
286557 Sleuth 1972 Michael Caine actor Milo Tindle 2
1139941 Sleuth 2007 Jude Law actor Milo 2
328520 Sleuth 1972 Alec Cawthorne actor Inspector Doppler 3
1592307 Sleuth 2007 Harold Pinter actor Man on T.V. 3
1291935 Sleuth 1972 John (II) Matthews actor Detective Sergeant Tarrant 4
2391225 Sleuth 1972 Eve (III) Channing actress Marguerite Wyke 5
1277367 Sleuth 1972 Teddy Martin actor Police Constable Higgs 6
227168 Sleuth 2007 Kenneth Branagh actor Other Man on T.V. NaN
328521 Sleuth 2007 Alec (II) Cawthorne actor Inspector Doppler NaN
2391224 Sleuth 2007 Eve (II) Channing actress Marguerite Wyke NaN
2939217 Sleuth 2007 Carmel O'Sullivan actress Maggie NaN

In [ ]:

Now display the entire cast, in "n"-order, of the 2007 version of "Sleuth".


In [115]:
c = cast 
c = c[(c.title == 'Sleuth') & (c.year == 2007)]
c.sort(['n'])


Out[115]:
title year name type character n
286558 Sleuth 2007 Michael Caine actor Andrew 1
1139941 Sleuth 2007 Jude Law actor Milo 2
1592307 Sleuth 2007 Harold Pinter actor Man on T.V. 3
227168 Sleuth 2007 Kenneth Branagh actor Other Man on T.V. NaN
328521 Sleuth 2007 Alec (II) Cawthorne actor Inspector Doppler NaN
2391224 Sleuth 2007 Eve (II) Channing actress Marguerite Wyke NaN
2939217 Sleuth 2007 Carmel O'Sullivan actress Maggie NaN

In [ ]:

How many roles were credited in the silent 1921 version of Hamlet?


In [118]:
c = cast
c = c[(c.title == 'Hamlet') & (c.year == 1921)]
len(c.n)


Out[118]:
9

9

How many roles were credited in Branagh’s 1996 Hamlet?


In [119]:
c = cast
c = c[(c.title == 'Hamlet') & (c.year == 1996)]
len(c.n)


Out[119]:
55

55

How many "Hamlet" roles have been listed in all film credits through history?


In [122]:
c = cast
c = c[c.character == 'Hamlet']
len(c)


Out[122]:
81

81

How many people have played an "Ophelia"?


In [123]:
c = cast
c = c[c.character == 'Ophelia']
len(c)


Out[123]:
96

96

How many people have played a role called "The Dude"?


In [125]:
c = cast
c = c[c.character == "The Dude"]
len(c)


Out[125]:
16

16

How many people have played a role called "The Stranger"?


In [126]:
c = cast 
c = c[c.character == 'The Stranger']
len(c)


Out[126]:
190

190

How many roles has Sidney Poitier played throughout his career?


In [127]:
c = cast
c = c[c.name == "Sidney Poitier"]
len(c)


Out[127]:
43

43

How many roles has Judi Dench played?


In [6]:
c = cast 
c = c[c.name == "Judi Dench"]
len(c)


Out[6]:
51

51

List the supporting roles (having n=2) played by Cary Grant in the 1940s, in order by year.


In [15]:
c = cast
c = c[(c.name == 'Cary Grant')] 
c = c[(c.year >= 1940) & (c.year < 1950)]
c = c[c.n == 2]
c


Out[15]:
title year name type character n
768517 My Favorite Wife 1940 Cary Grant actor Nick 2
768527 Penny Serenade 1941 Cary Grant actor Roger Adams 2

In [ ]:

List the leading roles that Cary Grant played in the 1940s in order by year.


In [20]:
c = cast
c = c[c.name == 'Cary Grant']
c = c[(c.year >= 1940) & (c.year < 1950)]
c.sort('year')


Out[20]:
title year name type character n
768544 The Philadelphia Story 1940 Cary Grant actor C. K. Dexter Haven 1
768542 The Howards of Virginia 1940 Cary Grant actor Matt Howard 1
768499 His Girl Friday 1940 Cary Grant actor Walter Burns 1
768517 My Favorite Wife 1940 Cary Grant actor Nick 2
768532 Suspicion 1941 Cary Grant actor Johnnie 1
768527 Penny Serenade 1941 Cary Grant actor Roger Adams 2
768546 The Talk of the Town 1942 Cary Grant actor Leopold Dilg 1
768523 Once Upon a Honeymoon 1942 Cary Grant actor Patrick 'Pat' O'Toole 1
768490 Destination Tokyo 1943 Cary Grant actor Capt. Cassidy 1
768515 Mr. Lucky 1943 Cary Grant actor Joe Adams 1
768516 Mr. Lucky 1943 Cary Grant actor Joe Bascopolous 1
768482 Arsenic and Old Lace 1944 Cary Grant actor Mortimer Brewster 1
768524 Once Upon a Time 1944 Cary Grant actor Jerry Flynn 1
768519 None But the Lonely Heart 1944 Cary Grant actor Ernie Mott 1
768497 George White's Scandals 1945 Cary Grant actor Himself NaN
768518 Night and Day 1946 Cary Grant actor Cole Porter 1
768521 Notorious 1946 Cary Grant actor Devlin 1
768559 Without Reservations 1946 Cary Grant actor Cary Grant NaN
768538 The Bachelor and the Bobby-Soxer 1947 Cary Grant actor Dick 1
768539 The Bishop's Wife 1947 Cary Grant actor Dudley 1
768514 Mr. Blandings Builds His Dream House 1948 Cary Grant actor Jim Blandings 1
768494 Every Girl Should Be Married 1948 Cary Grant actor Dr. Madison Brown 1
768503 I Was a Male War Bride 1949 Cary Grant actor Capt. Henri Rochard 1

In [ ]:

How many roles were available for actors in the 1950s?


In [26]:
c = cast
c = c[(c.year >= 1950) & (c.year < 1960)]
c = c[c.type == 'actor']
len(c.n)


Out[26]:
147404

147404

How many roles were avilable for actresses in the 1950s?


In [30]:
c = cast
c = c[c.type == 'actress']
len(c.n)


Out[30]:
1060867

106867

How many leading roles (n=1) were available from the beginning of film history through 1980?


In [35]:
c = cast
c = c[c.year <= 1980]
c = c[c.n == 1]
c.count()


Out[35]:
title        61285
year         61285
name         61285
type         61285
character    61285
n            61285
dtype: int64

61285

How many non-leading roles were available through from the beginning of film history through 1980?


In [36]:
c = cast
c = c[c.year <= 1980]
c = c[c.n > 1]
c.count()


Out[36]:
title        630932
year         630932
name         630932
type         630932
character    630932
n            630932
dtype: int64

630932

How many roles through 1980 were minor enough that they did not warrant a numeric "n" rank?


In [43]:
c = cast 
c = c[c.n.isnull()]
len(c)


Out[43]:
1229941

1229941


In [ ]: