In [3]:
%matplotlib inline
import pandas as pd

In [4]:
from IPython.core.display import HTML
css = open('style-table.css').read() + open('style-notebook.css').read()
HTML('<style>{}</style>'.format(css))


Out[4]:

In [5]:
titles = pd.DataFrame.from_csv('data/titles.csv', index_col=None)
titles.head()


Out[5]:
title year
0 Ligaw na daigdig 1962
1 Sluby ulanskie 1934
2 The House of the Seven Gables 1940
3 Mandala - Il simbolo 2008
4 Shi bian 1958

In [6]:
cast = pd.DataFrame.from_csv('data/cast.csv', index_col=None)
cast.head()


Out[6]:
title year name type character n
0 Suuri illusioni 1985 Homo $ actor Guests 22
1 Gangsta Rap: The Glockumentary 2007 Too $hort actor Himself NaN
2 Menace II Society 1993 Too $hort actor Lew-Loc 27
3 Porndogs: The Adventures of Sadie 2009 Too $hort actor Bosco 3
4 Stop Pepper Palmer 2014 Too $hort actor Himself NaN

In [ ]:

Using groupby(), plot the number of films that have been released each decade in the history of cinema.


In [7]:
t = titles
t.groupby(t.year // 10 * 10).size().plot(kind='bar')


Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fadba3eb0f0>

In [ ]:

Use groupby() to plot the number of "Hamlet" films made each decade.


In [8]:
t = titles[titles.title == "Hamlet"]
t.groupby(t.year // 10 * 10).size().plot(kind='bar')


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fadba3a4860>

In [ ]:

How many leading (n=1) roles were available to actors, and how many to actresses, in each year of the 1950s?


In [17]:
c = cast
c = c[c.year // 10 == 195]
c = c[c.n == 1]
c.groupby(['year', 'type']).size()


Out[17]:
year  type   
1950  actor      605
      actress    278
1951  actor      636
      actress    273
1952  actor      592
      actress    284
1953  actor      635
      actress    294
1954  actor      631
      actress    298
1955  actor      614
      actress    271
1956  actor      621
      actress    294
1957  actor      711
      actress    289
1958  actor      700
      actress    278
1959  actor      685
      actress    299
dtype: int64

In [ ]:

In the 1950s decade taken as a whole, how many total roles were available to actors, and how many to actresses, for each "n" number 1 through 5?


In [18]:
c = cast
c = c[c.year // 10 == 195]
c = c[(c.n >= 1) & (c.n <=5)]
c.groupby(['year', 'n','type']).size()


Out[18]:
year  n  type   
1950  1  actor      605
         actress    278
      2  actor      425
         actress    402
      3  actor      495
         actress    307
      4  actor      522
         actress    264
      5  actor      552
         actress    219
1951  1  actor      636
         actress    273
      2  actor      440
         actress    423
      3  actor      549
         actress    286
      4  actor      566
         actress    250
      5  actor      570
         actress    249
1952  1  actor      592
         actress    284
      2  actor      429
         actress    419
      3  actor      521
         actress    296
      4  actor      513
         actress    277
      5  actor      538
         actress    221
                   ... 
1957  1  actor      711
         actress    289
      2  actor      470
         actress    493
      3  actor      592
         actress    320
      4  actor      577
         actress    305
      5  actor      568
         actress    278
1958  1  actor      700
         actress    278
      2  actor      469
         actress    475
      3  actor      580
         actress    330
      4  actor      564
         actress    302
      5  actor      572
         actress    268
1959  1  actor      685
         actress    299
      2  actor      486
         actress    458
      3  actor      544
         actress    355
      4  actor      576
         actress    293
      5  actor      547
         actress    260
dtype: int64

In [ ]:

Use groupby() to determine how many roles are listed for each of the Pink Panther movies.


In [23]:
c = cast
c = c[c.title == 'The Pink Panther']
c = c.sort('n').groupby(['year'])[['n']].max()
c


/home/ubuntu/miniconda3/lib/python3.5/site-packages/ipykernel/__main__.py:3: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  app.launch_new_instance()
Out[23]:
n
year
1963 15
2006 50
2016 NaN

In [ ]:

List, in order by year, each of the films in which Frank Oz has played more than 1 role.


In [77]:
c = cast
c = c[c.name == 'Frank Oz']
g = c.groupby(['year','title']).size()
g[g > 1].order()


/home/ubuntu/miniconda3/lib/python3.5/site-packages/ipykernel/__main__.py:5: FutureWarning: order is deprecated, use sort_values(...)
Out[77]:
year  title                                   
1981  An American Werewolf in London              2
1982  The Dark Crystal                            2
1985  Sesame Street Presents: Follow that Bird    3
1999  The Adventures of Elmo in Grouchland        3
1996  Muppet Treasure Island                      4
1999  Muppets from Space                          4
1981  The Great Muppet Caper                      6
1984  The Muppets Take Manhattan                  7
1992  The Muppet Christmas Carol                  7
1979  The Muppet Movie                            8
dtype: int64

In [ ]:

List each of the characters that Frank Oz has portrayed at least twice.


In [78]:
c = cast
c = c[c.name == 'Frank Oz']
g = c.groupby(['character']).size()
g[g > 2].order()


Out[78]:
character
Animal                                                  6
Aughra, a Keeper Of Secrets (performer)                 1
Bert                                                    3
Brain Surgeon                                           1
Chamberlain (performer)                                 1
Cookie Monster                                          3
Corrections Officer                                     1
Corrupt Cop                                             1
Doc Hopper's Men                                        1
Fozzie                                                  1
Fozzie Bear                                             4
Fozzie Bear as Fozziewig                                1
Fungus                                                  1
George the Janitor                                      1
Gramps                                                  1
Grover                                                  2
Horse and Carriage Driver                               1
Marvin Suggs                                            1
Miss Piggy                                              6
Miss Piggy as Emily Cratchit                            1
Motorcycle Guy                                          1
Mr. Collins                                             1
Ocean Breeze Soap Board Member                          1
Pathologist                                             1
Robot                                                   1
Sam the Eagle                                           5
Sam the Eagle as Headmaster of Junior High Graduates    1
Subconscious Guard Dave                                 1
Swedish Chef                                            1
Swedish Chef (assistant)                                1
Test Monitor                                            1
The Wiseman                                             1
Vegetable Salesman                                      1
Warden                                                  1
Yoda                                                    6
dtype: int64

In [ ]: