Python and Data Science

Mariana Lopes

28/07/2016

Trabalhando com o Jupyter

Ferramenta que permite criação de código, visualização de resultados e documentação no mesmo documento (.ipynb)

Modo de comando: esc para ativar, o cursor fica inativo

Modo de edição: enter para ativar, modo de inserção

Atalhos do teclado (MUITO úteis)

Para usar os atalhos descritos abaixo a célula deve estar selecionada porém não pode estar no modo de edição.

Para entrar do modo de comando: esc
Criar nova célula abaixo: b (elow)
Criar nova célula acima: a (bove)
Recortar uma célula: x
Copiar uma célula: c
Colar uma cálula: v
Executar uma célula e permanecer nela mesma: ctrl + enter
Executar uma célula e mover para a próxima: shift + enter
Para ver todos os atalhos, tecle h

Tipos de célula

Code: Para código Python

Markdown: Para documentação

Também existem Raw NBConverter e Heading

Pandas (http://pandas.pydata.org/)

Biblioteca Python para análise de dados
Provê ferramentas de alta performance e fácil usabilidade para análise de dados

Como instalar

Anaconda (http://pandas.pydata.org/pandas-docs/stable/install.html#installing-pandas-with-anaconda)
- Download anaconda: https://www.continuum.io/downloads
- Instalar Anaconda: https://docs.continuum.io/anaconda/install
- Disponível para osx-64, linux-64, linux-32, win-64, win-32 e Python 2.7, Python 3.4, e Python 3.5
- conda install pandas
Pip
- pip install pandas

Matplotlib (http://matplotlib.org/)

Biblioteca Python para plotar gráficos 2D

Como instalar

Anaconda (http://pandas.pydata.org/pandas-docs/stable/install.html#installing-pandas-with-anaconda)
- Download anaconda: https://www.continuum.io/downloads
- Instalar Anaconda: https://docs.continuum.io/anaconda/install
- Disponível para osx-64, linux-64, linux-32, win-64, win-32 e Python 2.7, Python 3.4, e Python 3.5
- conda install matplotlib
Pip
- pip install matplotlib



In [1]:

    
import pandas as pd
import matplotlib
%matplotlib inline

Carregando um arquivo csv em um DataFrame do Pandas

pd.DataFrame.from_csv(file_name)

Se, ao usar este comando, você se deparar com um UnicodeDecodingError, adicione o parâmetro encoding='utf-8'

cast.csv



In [2]:

    
%%time
cast = pd.DataFrame.from_csv('data/cast.csv', index_col=None, encoding='utf-8')









    



CPU times: user 6.83 s, sys: 284 ms, total: 7.11 s
Wall time: 9 s

release_dates.csv



In [3]:

    
%%time
release_dates = pd.read_csv('data/release_dates.csv', index_col=None,
                            parse_dates=['date'], infer_datetime_format=True)









    



CPU times: user 768 ms, sys: 20 ms, total: 788 ms
Wall time: 1.23 s

titles



In [4]:

    
cast.columns









    Out[4]:





Index(['title', 'year', 'name', 'type', 'character', 'n'], dtype='object')



In [5]:

    
titles = cast[['title', 'year']].drop_duplicates().reset_index(drop=True)
titles.head()









    Out[5]:






  
    
      
      title
      year
    
  
  
    
      0
      Suuri illusioni
      1985
    
    
      1
      Battle of the Sexes
      2017
    
    
      2
      Secret in Their Eyes
      2015
    
    
      3
      Steve Jobs
      2015
    
    
      4
      Straight Outta Compton
      2015

df.head(n):

Visualizar as primeiras n linhas.
Default: n = 5.



In [6]:

    
cast.head()









    Out[6]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      0
      Suuri illusioni
      1985
      Homo $
      actor
      Guests
      22
    
    
      1
      Battle of the Sexes
      2017
      $hutter
      actor
      Bobby Riggs Fan
      10
    
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      NaN
    
    
      3
      Steve Jobs
      2015
      $hutter
      actor
      1988 Opera House Patron
      NaN
    
    
      4
      Straight Outta Compton
      2015
      $hutter
      actor
      Club Patron
      NaN



In [7]:

    
release_dates.head()









    Out[7]:






  
    
      
      title
      year
      country
      date
    
  
  
    
      0
      #73, Shaanthi Nivaasa
      2007
      India
      2007-06-15
    
    
      1
      #Beings
      2015
      Romania
      2015-01-29
    
    
      2
      #Ewankosau saranghaeyo
      2015
      Philippines
      2015-01-21
    
    
      3
      #Horror
      2015
      USA
      2015-11-20
    
    
      4
      #Horror
      2015
      UK
      2016-05-16

df.tail(n):

Visualizar as últimas n linhas.
Default: n = 5.



In [8]:

    
cast.tail()









    Out[8]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      3523211
      Stuttur Frakki
      1993
      Sveinbj?rg ??rhallsd?ttir
      actress
      Flugfreyja
      24
    
    
      3523212
      Foxtrot
      1988
      Lilja ??risd?ttir
      actress
      D?ra
      24
    
    
      3523213
      Niceland (Population. 1.000.002)
      2004
      Sigr??ur J?na ??risd?ttir
      actress
      Woman in Bus
      26
    
    
      3523214
      U.S.S.S.S...
      2003
      Krist?n Andrea ??r?ard?ttir
      actress
      Afgr.dama ? bens?nst??
      17
    
    
      3523215
      Bye Bye Blue Bird
      1999
      Rosa ? R?gvu
      actress
      Pensionatv?rtinde
      NaN



In [9]:

    
release_dates.tail()









    Out[9]:






  
    
      
      title
      year
      country
      date
    
  
  
    
      438481
      xXx: The Return of Xander Cage
      2017
      UK
      2017-01-20
    
    
      438482
      xXx: The Return of Xander Cage
      2017
      USA
      2017-01-20
    
    
      438483
      xXx: The Return of Xander Cage
      2017
      Singapore
      2017-01-26
    
    
      438484
      xXx: The Return of Xander Cage
      2017
      Netherlands
      2017-02-02
    
    
      438485
      xXx: The Return of Xander Cage
      2017
      Sweden
      2017-02-03

Quantos registros há no conjunto?

len(df):

Tamanho do df



In [10]:

    
len(cast), len(release_dates)









    Out[10]:





(3523216, 438486)

Quais são os possíveis valores para a coluna `type`?

df[col]:

Visualizar uma coluna do df

df.col:

Se o nome da coluna não tiver, espaços, caracteres especiais ou for uma variável

Obs: Ao selecionar uma coluna e manipulá-la fora de um DataFrame, a mesma é tratada como uma Série.



In [11]:

    
cast['type']









    Out[11]:





0            actor
1            actor
2            actor
3            actor
4            actor
5            actor
6            actor
7            actor
8            actor
9            actor
10           actor
11           actor
12           actor
13           actor
14           actor
15           actor
16           actor
17           actor
18           actor
19           actor
20           actor
21           actor
22           actor
23           actor
24           actor
25           actor
26           actor
27           actor
28           actor
29           actor
            ...   
3523186    actress
3523187    actress
3523188    actress
3523189    actress
3523190    actress
3523191    actress
3523192    actress
3523193    actress
3523194    actress
3523195    actress
3523196    actress
3523197    actress
3523198    actress
3523199    actress
3523200    actress
3523201    actress
3523202    actress
3523203    actress
3523204    actress
3523205    actress
3523206    actress
3523207    actress
3523208    actress
3523209    actress
3523210    actress
3523211    actress
3523212    actress
3523213    actress
3523214    actress
3523215    actress
Name: type, dtype: object



In [12]:

    
cast.type.head()









    Out[12]:





0    actor
1    actor
2    actor
3    actor
4    actor
Name: type, dtype: object



In [13]:

    
c = 'type'
cast[c].head()    #cast.c.head() não vai funcionar!









    Out[13]:





0    actor
1    actor
2    actor
3    actor
4    actor
Name: type, dtype: object

df[col].unique():

Mostrar os possíveis valores de uma coluna



In [14]:

    
cast['type'].unique()









    Out[14]:





array(['actor', 'actress'], dtype=object)

Quantos atores e quantas atrizes há no conjunto?

df[col].value_counts():

Contagem de quantos registros há para cada valor possível da coluna col (somente se col for categórica)



In [15]:

    
cast['type'].value_counts()









    Out[15]:





actor      2382581
actress    1140635
Name: type, dtype: int64

Operações com colunas



In [16]:

    
h = cast.head()
h









    Out[16]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      0
      Suuri illusioni
      1985
      Homo $
      actor
      Guests
      22
    
    
      1
      Battle of the Sexes
      2017
      $hutter
      actor
      Bobby Riggs Fan
      10
    
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      NaN
    
    
      3
      Steve Jobs
      2015
      $hutter
      actor
      1988 Opera House Patron
      NaN
    
    
      4
      Straight Outta Compton
      2015
      $hutter
      actor
      Club Patron
      NaN

Operações Aritméticas



In [17]:

    
h.year // 10 * 10    # Década









    Out[17]:





0    1980
1    2010
2    2010
3    2010
4    2010
Name: year, dtype: int64



In [18]:

    
h









    Out[18]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      0
      Suuri illusioni
      1985
      Homo $
      actor
      Guests
      22
    
    
      1
      Battle of the Sexes
      2017
      $hutter
      actor
      Bobby Riggs Fan
      10
    
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      NaN
    
    
      3
      Steve Jobs
      2015
      $hutter
      actor
      1988 Opera House Patron
      NaN
    
    
      4
      Straight Outta Compton
      2015
      $hutter
      actor
      Club Patron
      NaN

Comparações



In [19]:

    
h.year > 2000









    Out[19]:





0    False
1     True
2     True
3     True
4     True
Name: year, dtype: bool

Filtrar

Por valor específico de uma coluna



In [20]:

    
cast[cast.character == 'Macduff Child']









    Out[20]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      330970
      Macbeth
      1916
      Francis Carpenter
      actor
      Macduff Child
      17
    
    
      2522775
      Macbeth
      1916
      Thelma Burns
      actress
      Macduff Child
      18
    
    
      2672395
      Macbeth
      1916
      Madge Dyer
      actress
      Macduff Child
      19
    
    
      3473368
      Macbeth
      1948
      Christopher Welles
      actress
      Macduff Child
      14

Por colunas



In [21]:

    
h[['title', 'year']]









    Out[21]:






  
    
      
      title
      year
    
  
  
    
      0
      Suuri illusioni
      1985
    
    
      1
      Battle of the Sexes
      2017
    
    
      2
      Secret in Their Eyes
      2015
    
    
      3
      Steve Jobs
      2015
    
    
      4
      Straight Outta Compton
      2015

Por valor nulo ou não nulo



In [22]:

    
h[h.n.isnull()]









    Out[22]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      NaN
    
    
      3
      Steve Jobs
      2015
      $hutter
      actor
      1988 Opera House Patron
      NaN
    
    
      4
      Straight Outta Compton
      2015
      $hutter
      actor
      Club Patron
      NaN



In [23]:

    
h[h.n.notnull()]









    Out[23]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      0
      Suuri illusioni
      1985
      Homo $
      actor
      Guests
      22
    
    
      1
      Battle of the Sexes
      2017
      $hutter
      actor
      Bobby Riggs Fan
      10



In [24]:

    
h









    Out[24]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      0
      Suuri illusioni
      1985
      Homo $
      actor
      Guests
      22
    
    
      1
      Battle of the Sexes
      2017
      $hutter
      actor
      Bobby Riggs Fan
      10
    
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      NaN
    
    
      3
      Steve Jobs
      2015
      $hutter
      actor
      1988 Opera House Patron
      NaN
    
    
      4
      Straight Outta Compton
      2015
      $hutter
      actor
      Club Patron
      NaN

Por vetor de booleanos



In [25]:

    
h[[True, False, True, False, False]]









    Out[25]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      0
      Suuri illusioni
      1985
      Homo $
      actor
      Guests
      22
    
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      NaN



In [26]:

    
h.year > 2000









    Out[26]:





0    False
1     True
2     True
3     True
4     True
Name: year, dtype: bool



In [27]:

    
h[h.year > 2000]









    Out[27]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      1
      Battle of the Sexes
      2017
      $hutter
      actor
      Bobby Riggs Fan
      10
    
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      NaN
    
    
      3
      Steve Jobs
      2015
      $hutter
      actor
      1988 Opera House Patron
      NaN
    
    
      4
      Straight Outta Compton
      2015
      $hutter
      actor
      Club Patron
      NaN



In [28]:

    
h[(h.year > 2000) & (h.year < 2016)]    # & para 'and', | para 'or'









    Out[28]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      NaN
    
    
      3
      Steve Jobs
      2015
      $hutter
      actor
      1988 Opera House Patron
      NaN
    
    
      4
      Straight Outta Compton
      2015
      $hutter
      actor
      Club Patron
      NaN

Preencher valores nulos

Por DataFrame



In [29]:

    
h.fillna(0)









    Out[29]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      0
      Suuri illusioni
      1985
      Homo $
      actor
      Guests
      22
    
    
      1
      Battle of the Sexes
      2017
      $hutter
      actor
      Bobby Riggs Fan
      10
    
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      0
    
    
      3
      Steve Jobs
      2015
      $hutter
      actor
      1988 Opera House Patron
      0
    
    
      4
      Straight Outta Compton
      2015
      $hutter
      actor
      Club Patron
      0

Por coluna



In [30]:

    
h.n.fillna(0)









    Out[30]:





0    22
1    10
2     0
3     0
4     0
Name: n, dtype: float64

Quantos atores atuaram em cada ano?



In [31]:

    
cast.year.value_counts()#.head(10)









    Out[31]:





2016    164640
2015    149184
2014    146374
2013    137873
2012    131953
2011    125506
2009    117216
2010    113260
2008     98559
2007     88767
2006     84437
2005     74201
2004     63637
2003     57218
2002     54358
2001     52584
2000     50681
1999     48716
1998     44270
1997     42837
1996     38230
1995     37007
1994     35432
1989     35366
1988     34587
1990     34004
1987     33543
1991     32478
1992     32444
1993     32210
         ...  
1919      7892
1916      7770
1918      7758
1929      7470
1927      7208
1925      7190
1928      7181
1926      6926
1922      6835
1921      6807
1924      6708
1923      6291
1915      5477
1914      3397
2018      1999
1913      1001
1912       392
1911       198
2019       184
2020       114
1910        68
1908        42
2021        20
1906        17
2022        16
1907         5
1909         3
1894         3
1900         2
1905         1
Name: year, dtype: int64



In [32]:

    
cast.year.value_counts().plot()









    Out[32]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f0bdfd45080>



In [33]:

    
cast.year.value_counts().sort_index()#.head()









    Out[33]:





1894         3
1900         2
1905         1
1906        17
1907         5
1908        42
1909         3
1910        68
1911       198
1912       392
1913      1001
1914      3397
1915      5477
1916      7770
1917      8361
1918      7758
1919      7892
1920      8285
1921      6807
1922      6835
1923      6291
1924      6708
1925      7190
1926      6926
1927      7208
1928      7181
1929      7470
1930      9671
1931     11710
1932     14628
         ...  
1993     32210
1994     35432
1995     37007
1996     38230
1997     42837
1998     44270
1999     48716
2000     50681
2001     52584
2002     54358
2003     57218
2004     63637
2005     74201
2006     84437
2007     88767
2008     98559
2009    117216
2010    113260
2011    125506
2012    131953
2013    137873
2014    146374
2015    149184
2016    164640
2017     28329
2018      1999
2019       184
2020       114
2021        20
2022        16
Name: year, dtype: int64



In [34]:

    
cast.year.value_counts().sort_index().plot()









    Out[34]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f0be0b7dbe0>



In [35]:

    
bins = pd.np.arange(1880, 2040, 2)
cast.year.hist(bins=bins)









    Out[35]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f0be0c85f28>

Qual foi a diferença entre o número de atores e atrizes que atuaram em cada década?



In [36]:

    
g = cast.groupby([cast.year // 10 * 10, 'type']).size()
g









    Out[36]:





year  type   
1890  actor           2
      actress         1
1900  actor          54
      actress        16
1910  actor       26900
      actress     15414
1920  actor       47501
      actress     23400
1930  actor      149205
      actress     48819
1940  actor      159827
      actress     50689
1950  actor      153417
      actress     55802
1960  actor      128524
      actress     55161
1970  actor      146749
      actress     63656
1980  actor      191094
      actress     88247
1990  actor      256294
      actress    121334
2000  actor      482052
      actress    259606
2010  actor      640863
      actress    358439
2020  actor          99
      actress        51
dtype: int64



In [37]:

    
u = g.unstack()
u



In [38]:

    
a = u['actor'] - u['actress']
a









    Out[38]:





year
1890         1
1900        38
1910     11486
1920     24101
1930    100386
1940    109138
1950     97615
1960     73363
1970     83093
1980    102847
1990    134960
2000    222446
2010    282424
2020        48
dtype: int64



In [39]:

    
a.plot()









    Out[39]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f0be027dac8>

Datas



In [40]:

    
release_dates.head()









    Out[40]:






  
    
      
      title
      year
      country
      date
    
  
  
    
      0
      #73, Shaanthi Nivaasa
      2007
      India
      2007-06-15
    
    
      1
      #Beings
      2015
      Romania
      2015-01-29
    
    
      2
      #Ewankosau saranghaeyo
      2015
      Philippines
      2015-01-21
    
    
      3
      #Horror
      2015
      USA
      2015-11-20
    
    
      4
      #Horror
      2015
      UK
      2016-05-16



In [41]:

    
release_dates.date.dt.year.head()









    Out[41]:





0    2007
1    2015
2    2015
3    2015
4    2016
Name: date, dtype: int64



In [42]:

    
release_dates.date.dt.dayofyear.head() # segunda=0, domingo=6









    Out[42]:





0    166
1     29
2     21
3    324
4    137
Name: date, dtype: int64

Quanto % dos filmes foram lançados na sexta-feira?



In [43]:

    
len(release_dates[release_dates.date.dt.dayofweek == 4])*100/len(release_dates)









    Out[43]:





37.27963948677951

Merge



In [44]:

    
cast.head()









    Out[44]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      0
      Suuri illusioni
      1985
      Homo $
      actor
      Guests
      22
    
    
      1
      Battle of the Sexes
      2017
      $hutter
      actor
      Bobby Riggs Fan
      10
    
    
      2
      Secret in Their Eyes
      2015
      $hutter
      actor
      2002 Dodger Fan
      NaN
    
    
      3
      Steve Jobs
      2015
      $hutter
      actor
      1988 Opera House Patron
      NaN
    
    
      4
      Straight Outta Compton
      2015
      $hutter
      actor
      Club Patron
      NaN



In [45]:

    
release_dates.head()









    Out[45]:






  
    
      
      title
      year
      country
      date
    
  
  
    
      0
      #73, Shaanthi Nivaasa
      2007
      India
      2007-06-15
    
    
      1
      #Beings
      2015
      Romania
      2015-01-29
    
    
      2
      #Ewankosau saranghaeyo
      2015
      Philippines
      2015-01-21
    
    
      3
      #Horror
      2015
      USA
      2015-11-20
    
    
      4
      #Horror
      2015
      UK
      2016-05-16



In [46]:

    
c = cast[cast.name == 'Ellen Page']
c = c.merge(release_dates)
c.head()









    Out[46]:






  
    
      
      title
      year
      name
      type
      character
      n
      country
      date
    
  
  
    
      0
      An American Crime
      2007
      Ellen Page
      actress
      Sylvia Likens
      1
      Sweden
      2007-12-07
    
    
      1
      An American Crime
      2007
      Ellen Page
      actress
      Sylvia Likens
      1
      Greece
      2008-05-15
    
    
      2
      An American Crime
      2007
      Ellen Page
      actress
      Sylvia Likens
      1
      Spain
      2008-06-13
    
    
      3
      An American Crime
      2007
      Ellen Page
      actress
      Sylvia Likens
      1
      Finland
      2008-07-18
    
    
      4
      An American Crime
      2007
      Ellen Page
      actress
      Sylvia Likens
      1
      Mexico
      2008-08-08

Qual o nome e ano do filme mais antigo?



In [47]:

    
titles.sort_values('year').head(1)









    Out[47]:






  
    
      
      title
      year
    
  
  
    
      130990
      Miss Jerry
      1894

Quantos filmes são de 1960?



In [48]:

    
len(titles[titles.year == 1960])









    Out[48]:





1076

Quantos filmes são de cada ano dos anos 70?



In [49]:

    
for y in range(1970, 1980):
    print(y, (titles.year == y).sum())



In [50]:

    
titles[titles.year // 10 == 197].year.value_counts().sort_index()









    Out[50]:





1970    1474
1971    1365
1972    1421
1973    1412
1974    1453
1975    1255
1976    1286
1977    1264
1978    1257
1979    1311
Name: year, dtype: int64



In [51]:

    
titles.groupby('year').size().loc[1970:1979]









    Out[51]:





year
1970    1474
1971    1365
1972    1421
1973    1412
1974    1453
1975    1255
1976    1286
1977    1264
1978    1257
1979    1311
dtype: int64

Quantos filmes foram lançados desde o ano que você nasceu até hoje?



In [52]:

    
birth = 1990
len(titles[(titles.year >= birth) & (titles.year <= 2016)])









    Out[52]:





100215

Quais são os nomes dos filmes até 1906?



In [53]:

    
titles[titles.year <= 1906][['title']]









    Out[53]:






  
    
      
      title
    
  
  
    
      114665
      The Story of the Kelly Gang
    
    
      130990
      Miss Jerry
    
    
      140911
      Valsons
    
    
      150054
      The Joe Gans-Battling Nelson Fight
    
    
      166249
      Lika mot lika
    
    
      173596
      Soldiers of the Cross



In [54]:

    
titles.year.min()









    Out[54]:





1894



In [55]:

    
titles.set_index('year').sort_index().loc[1894:1906]









    Out[55]:






  
    
      
      title
    
    
      year
      
    
  
  
    
      1894
      Miss Jerry
    
    
      1900
      Soldiers of the Cross
    
    
      1905
      Valsons
    
    
      1906
      The Story of the Kelly Gang
    
    
      1906
      Lika mot lika
    
    
      1906
      The Joe Gans-Battling Nelson Fight

Quais são os 15 nomes de filmes mais comuns?



In [103]:

    
titles.title.value_counts().head(15)









    Out[103]:





Hamlet                       17
Macbeth                      15
Carmen                       13
The Outsider                 11
Maya                         11
She                          10
Anna Karenina                10
Vengeance                    10
Othello                      10
Temptation                   10
The Three Musketeers          9
Blood Money                   9
Underground                   9
A Midsummer Night's Dream     9
Rage                          9
Name: title, dtype: int64

Em quantos filmes Judi Dench atuou?



In [106]:

    
len(cast[cast.name == 'Judi Dench'])









    Out[106]:





54

Liste os filmes nos quais Judi Dench atuou como o ator número 1, ordenado por ano.



In [108]:

    
c = cast
c = c[c.name == 'Judi Dench']
c = c[c.n == 1]
c.sort_values('year')









    Out[108]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      2639086
      Mrs Brown
      1997
      Judi Dench
      actress
      Queen Victoria
      1
    
    
      2639083
      Ladies in Lavender
      2004
      Judi Dench
      actress
      Ursula
      1
    
    
      2639087
      Mrs Henderson Presents
      2005
      Judi Dench
      actress
      Mrs. Laura Henderson
      1
    
    
      2639090
      Notes on a Scandal
      2006
      Judi Dench
      actress
      Barbara Covett
      1
    
    
      2639104
      The Best Exotic Marigold Hotel
      2011
      Judi Dench
      actress
      Evelyn Greenslade
      1
    
    
      2639091
      Philomena
      2013
      Judi Dench
      actress
      Philomena
      1



In [38]:

    
c =cast
c = c[c.name == 'Judi Dench']
c









    Out[38]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      2639061
      84 Charing Cross Road
      1987
      Judi Dench
      actress
      Nora Doel
      3
    
    
      2639062
      A Handful of Dust
      1988
      Judi Dench
      actress
      Mrs. Beaver
      13
    
    
      2639063
      A Midsummer Night's Dream
      1968
      Judi Dench
      actress
      Titania
      16
    
    
      2639064
      A Room with a View
      1985
      Judi Dench
      actress
      Eleanor Lavish, a novelist
      7
    
    
      2639065
      A Study in Terror
      1965
      Judi Dench
      actress
      Sally
      8
    
    
      2639066
      Angelina Ballerina: Angelina Sets Sail
      2006
      Judi Dench
      actress
      Miss Lilly
      2
    
    
      2639067
      Casino Royale
      2006
      Judi Dench
      actress
      M
      4
    
    
      2639068
      Chocolat
      2000
      Judi Dench
      actress
      Armande Voizin
      18
    
    
      2639069
      Dead Cert
      1974
      Judi Dench
      actress
      Laura Davidson
      2
    
    
      2639070
      Die Another Day
      2002
      Judi Dench
      actress
      M
      6
    
    
      2639071
      Doogal
      2006
      Judi Dench
      actress
      Narrator
      7
    
    
      2639072
      Four in the Morning
      1965
      Judi Dench
      actress
      Wife
      2
    
    
      2639073
      GoldenEye
      1995
      Judi Dench
      actress
      M
      6
    
    
      2639074
      Hamlet
      1996
      Judi Dench
      actress
      Hecuba
      12
    
    
      2639075
      He Who Rides a Tiger
      1965
      Judi Dench
      actress
      Joanne
      2
    
    
      2639076
      Henry V
      1989
      Judi Dench
      actress
      Mistress Nell Quickly
      26
    
    
      2639077
      Home on the Range
      2004
      Judi Dench
      actress
      Mrs. Caloway
      7
    
    
      2639078
      Iris
      2001
      Judi Dench
      actress
      Iris Murdoch
      3
    
    
      2639079
      J. Edgar
      2011
      Judi Dench
      actress
      Annie Hoover
      13
    
    
      2639080
      Jack & Sarah
      1995
      Judi Dench
      actress
      Margaret
      3
    
    
      2639081
      Jane Eyre
      2011
      Judi Dench
      actress
      Mrs. Fairfax
      18
    
    
      2639082
      Kenneth Branagh Theatre Company's the Winter's...
      2015
      Judi Dench
      actress
      Paulina
      NaN
    
    
      2639083
      Ladies in Lavender
      2004
      Judi Dench
      actress
      Ursula
      1
    
    
      2639084
      Luther
      1974
      Judi Dench
      actress
      Katherine
      7
    
    
      2639085
      Miss Peregrine's Home for Peculiar Children
      2016
      Judi Dench
      actress
      Miss Avocet
      NaN
    
    
      2639086
      Mrs Brown
      1997
      Judi Dench
      actress
      Queen Victoria
      1
    
    
      2639087
      Mrs Henderson Presents
      2005
      Judi Dench
      actress
      Mrs. Laura Henderson
      1
    
    
      2639088
      My Week with Marilyn
      2011
      Judi Dench
      actress
      Dame Sybil Thorndike
      23
    
    
      2639089
      Nine
      2009
      Judi Dench
      actress
      Lilli
      6
    
    
      2639090
      Notes on a Scandal
      2006
      Judi Dench
      actress
      Barbara Covett
      1
    
    
      2639091
      Philomena
      2013
      Judi Dench
      actress
      Philomena
      1
    
    
      2639092
      Pirates of the Caribbean: On Stranger Tides
      2011
      Judi Dench
      actress
      Society Lady
      16
    
    
      2639093
      Pride & Prejudice
      2005
      Judi Dench
      actress
      Lady Catherine de Bourg
      20
    
    
      2639094
      Quantum of Solace
      2008
      Judi Dench
      actress
      M
      4
    
    
      2639095
      Rage
      2009
      Judi Dench
      actress
      Mona Carvell
      NaN
    
    
      2639096
      Run for Your Wife
      2012
      Judi Dench
      actress
      Bag Lady
      40
    
    
      2639097
      Schadenfreude
      2016
      Judi Dench
      actress
      The Narrator
      NaN
    
    
      2639098
      Shakespeare in Love
      1998
      Judi Dench
      actress
      Queen Elizabeth
      12
    
    
      2639099
      Skyfall
      2012
      Judi Dench
      actress
      M
      2
    
    
      2639100
      Spectre
      2015
      Judi Dench
      actress
      M
      NaN
    
    
      2639101
      Stars in Shorts
      2012
      Judi Dench
      actress
      Mary
      29
    
    
      2639102
      Tea with Mussolini
      1999
      Judi Dench
      actress
      Arabella
      2
    
    
      2639103
      The Angelic Conversation
      1985
      Judi Dench
      actress
      Shakespeare's Sonnets read by
      3
    
    
      2639104
      The Best Exotic Marigold Hotel
      2011
      Judi Dench
      actress
      Evelyn Greenslade
      1
    
    
      2639105
      The Chronicles of Riddick
      2004
      Judi Dench
      actress
      Aereon
      4
    
    
      2639106
      The Importance of Being Earnest
      2002
      Judi Dench
      actress
      Lady Bracknell
      5
    
    
      2639107
      The Second Best Exotic Marigold Hotel
      2015
      Judi Dench
      actress
      Evelyn Greenslade
      5
    
    
      2639108
      The Shipping News
      2001
      Judi Dench
      actress
      Agnis Hamm
      3
    
    
      2639109
      The Third Secret
      1964
      Judi Dench
      actress
      Miss Humphries
      12
    
    
      2639110
      The World Is Not Enough
      1999
      Judi Dench
      actress
      M
      6
    
    
      2639111
      Tomorrow Never Dies
      1997
      Judi Dench
      actress
      M
      9
    
    
      2639112
      Tulip Fever
      2017
      Judi Dench
      actress
      The Abbess of St. Ursula
      NaN
    
    
      2639113
      Victoria and Abdul
      2017
      Judi Dench
      actress
      Queen Victoria
      NaN
    
    
      2639114
      Wetherby
      1985
      Judi Dench
      actress
      Marcia Pilborough
      3

Liste os atores da versão de 1972 de Sleuth pela ordem do rank n.



In [40]:

    
c = cast
c = c[c.title == 'Sleuth']
c = c[c.year == 1972]
c.sort_values('n')









    Out[40]:






  
    
      
      title
      year
      name
      type
      character
      n
    
  
  
    
      1605823
      Sleuth
      1972
      Laurence Olivier
      actor
      Andrew Wyke
      1
    
    
      307018
      Sleuth
      1972
      Michael Caine
      actor
      Milo Tindle
      2
    
    
      351643
      Sleuth
      1972
      Alec Cawthorne
      actor
      Inspector Doppler
      3
    
    
      1379549
      Sleuth
      1972
      John (II) Matthews
      actor
      Detective Sergeant Tarrant
      4
    
    
      2557108
      Sleuth
      1972
      Eve (III) Channing
      actress
      Marguerite Wyke
      5
    
    
      1363933
      Sleuth
      1972
      Teddy Martin
      actor
      Police Constable Higgs
      6

Quais atores mais atuaram em 1985?



In [111]:

    
cast[cast.year == 1985].name.value_counts().head(10)









    Out[111]:





Mammootty         21
Sukumari          19
Shakti Kapoor     19
Lou Scheimer      15
Mohanlal          15
Aruna Irani       14
Raj Babbar        14
Deven Verma       13
Rajesh Khanna     13
Pinchoo Kapoor    12
Name: name, dtype: int64



In [ ]:

SciKit Learn (http://scikit-learn.org)

Biblioteca Python para mineração e análise de dados

Como instalar

Anaconda (http://pandas.pydata.org/pandas-docs/stable/install.html#installing-pandas-with-anaconda)
- Download anaconda: https://www.continuum.io/downloads
- Instalar Anaconda: https://docs.continuum.io/anaconda/install
- Disponível para osx-64, linux-64, linux-32, win-64, win-32 e Python 2.7, Python 3.4, e Python 3.5
- conda install scikit-learn
Pip
- pip install -U scikit-learn



In [44]:

    
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from sklearn.cross_validation import train_test_split
import pickle
import time
time1=time.strftime('%Y-%m-%d_%H-%M-%S')

iris.csv



In [45]:

    
iris = pd.DataFrame.from_csv('iris.csv', index_col=None, encoding='utf-8')



In [46]:

    
iris.columns









    Out[46]:





Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')



In [47]:

    
target_data = iris['species']
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
feature_data = iris[features]



In [48]:

    
features_train, features_test, target_train, target_test = train_test_split(feature_data, target_data, test_size=0.33, random_state=42)

Treinar modelo de Árvore de Decisão



In [49]:

    
dt = DecisionTreeClassifier()

target_data = iris['species']
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
feature_data = iris[features]

dt = dt.fit(features_train, target_train)

Salvar modelo



In [50]:

    
with open('iris-dt_'+time1, 'bw') as f:
    pickle.dump(dt, f)

Carregar modelo



In [51]:

    
with open('iris-dt_'+time1, 'br') as f:
    pickle.load(f)

Predição para casos de teste



In [52]:

    
predictions = dt.predict(features_test)

confusion_matrix(target_test, predictions)









    Out[52]:





array([[19,  0,  0],
       [ 0, 15,  0],
       [ 0,  1, 15]])



In [ ]:

	title	year
0	Suuri illusioni	1985
1	Battle of the Sexes	2017
2	Secret in Their Eyes	2015
3	Steve Jobs	2015
4	Straight Outta Compton	2015

	title	year	name	type	character	n
0	Suuri illusioni	1985	Homo $	actor	Guests	22
1	Battle of the Sexes	2017	$hutter	actor	Bobby Riggs Fan	10
2	Secret in Their Eyes	2015	$hutter	actor	2002 Dodger Fan	NaN
3	Steve Jobs	2015	$hutter	actor	1988 Opera House Patron	NaN
4	Straight Outta Compton	2015	$hutter	actor	Club Patron	NaN

	title	year	country	date
0	#73, Shaanthi Nivaasa	2007	India	2007-06-15
1	#Beings	2015	Romania	2015-01-29
2	#Ewankosau saranghaeyo	2015	Philippines	2015-01-21
3	#Horror	2015	USA	2015-11-20
4	#Horror	2015	UK	2016-05-16

	title	year	name	type	character	n
3523211	Stuttur Frakki	1993	Sveinbj?rg ??rhallsd?ttir	actress	Flugfreyja	24
3523212	Foxtrot	1988	Lilja ??risd?ttir	actress	D?ra	24
3523213	Niceland (Population. 1.000.002)	2004	Sigr??ur J?na ??risd?ttir	actress	Woman in Bus	26
3523214	U.S.S.S.S...	2003	Krist?n Andrea ??r?ard?ttir	actress	Afgr.dama ? bens?nst??	17
3523215	Bye Bye Blue Bird	1999	Rosa ? R?gvu	actress	Pensionatv?rtinde	NaN

	title	year	country	date
438481	xXx: The Return of Xander Cage	2017	UK	2017-01-20
438482	xXx: The Return of Xander Cage	2017	USA	2017-01-20
438483	xXx: The Return of Xander Cage	2017	Singapore	2017-01-26
438484	xXx: The Return of Xander Cage	2017	Netherlands	2017-02-02
438485	xXx: The Return of Xander Cage	2017	Sweden	2017-02-03

	title	year	name	type	character	n
330970	Macbeth	1916	Francis Carpenter	actor	Macduff Child	17
2522775	Macbeth	1916	Thelma Burns	actress	Macduff Child	18
2672395	Macbeth	1916	Madge Dyer	actress	Macduff Child	19
3473368	Macbeth	1948	Christopher Welles	actress	Macduff Child	14

type	actor	actress
year
1890	2	1
1900	54	16
1910	26900	15414
1920	47501	23400
1930	149205	48819
1940	159827	50689
1950	153417	55802
1960	128524	55161
1970	146749	63656
1980	191094	88247
1990	256294	121334
2000	482052	259606
2010	640863	358439
2020	99	51

	title	year	name	type	character	n	country	date
0	An American Crime	2007	Ellen Page	actress	Sylvia Likens	1	Sweden	2007-12-07
1	An American Crime	2007	Ellen Page	actress	Sylvia Likens	1	Greece	2008-05-15
2	An American Crime	2007	Ellen Page	actress	Sylvia Likens	1	Spain	2008-06-13
3	An American Crime	2007	Ellen Page	actress	Sylvia Likens	1	Finland	2008-07-18
4	An American Crime	2007	Ellen Page	actress	Sylvia Likens	1	Mexico	2008-08-08

	title
114665	The Story of the Kelly Gang
130990	Miss Jerry
140911	Valsons
150054	The Joe Gans-Battling Nelson Fight
166249	Lika mot lika
173596	Soldiers of the Cross

	title
year
1894	Miss Jerry
1900	Soldiers of the Cross
1905	Valsons
1906	The Story of the Kelly Gang
1906	Lika mot lika
1906	The Joe Gans-Battling Nelson Fight

	title	year	name	type	character	n
2639086	Mrs Brown	1997	Judi Dench	actress	Queen Victoria	1
2639083	Ladies in Lavender	2004	Judi Dench	actress	Ursula	1
2639087	Mrs Henderson Presents	2005	Judi Dench	actress	Mrs. Laura Henderson	1
2639090	Notes on a Scandal	2006	Judi Dench	actress	Barbara Covett	1
2639104	The Best Exotic Marigold Hotel	2011	Judi Dench	actress	Evelyn Greenslade	1
2639091	Philomena	2013	Judi Dench	actress	Philomena	1

	title	year	name	type	character	n
2639061	84 Charing Cross Road	1987	Judi Dench	actress	Nora Doel	3
2639062	A Handful of Dust	1988	Judi Dench	actress	Mrs. Beaver	13
2639063	A Midsummer Night's Dream	1968	Judi Dench	actress	Titania	16
2639064	A Room with a View	1985	Judi Dench	actress	Eleanor Lavish, a novelist	7
2639065	A Study in Terror	1965	Judi Dench	actress	Sally	8
2639066	Angelina Ballerina: Angelina Sets Sail	2006	Judi Dench	actress	Miss Lilly	2
2639067	Casino Royale	2006	Judi Dench	actress	M	4
2639068	Chocolat	2000	Judi Dench	actress	Armande Voizin	18
2639069	Dead Cert	1974	Judi Dench	actress	Laura Davidson	2
2639070	Die Another Day	2002	Judi Dench	actress	M	6
2639071	Doogal	2006	Judi Dench	actress	Narrator	7
2639072	Four in the Morning	1965	Judi Dench	actress	Wife	2
2639073	GoldenEye	1995	Judi Dench	actress	M	6
2639074	Hamlet	1996	Judi Dench	actress	Hecuba	12
2639075	He Who Rides a Tiger	1965	Judi Dench	actress	Joanne	2
2639076	Henry V	1989	Judi Dench	actress	Mistress Nell Quickly	26
2639077	Home on the Range	2004	Judi Dench	actress	Mrs. Caloway	7
2639078	Iris	2001	Judi Dench	actress	Iris Murdoch	3
2639079	J. Edgar	2011	Judi Dench	actress	Annie Hoover	13
2639080	Jack & Sarah	1995	Judi Dench	actress	Margaret	3
2639081	Jane Eyre	2011	Judi Dench	actress	Mrs. Fairfax	18
2639082	Kenneth Branagh Theatre Company's the Winter's...	2015	Judi Dench	actress	Paulina	NaN
2639083	Ladies in Lavender	2004	Judi Dench	actress	Ursula	1
2639084	Luther	1974	Judi Dench	actress	Katherine	7
2639085	Miss Peregrine's Home for Peculiar Children	2016	Judi Dench	actress	Miss Avocet	NaN
2639086	Mrs Brown	1997	Judi Dench	actress	Queen Victoria	1
2639087	Mrs Henderson Presents	2005	Judi Dench	actress	Mrs. Laura Henderson	1
2639088	My Week with Marilyn	2011	Judi Dench	actress	Dame Sybil Thorndike	23
2639089	Nine	2009	Judi Dench	actress	Lilli	6
2639090	Notes on a Scandal	2006	Judi Dench	actress	Barbara Covett	1
2639091	Philomena	2013	Judi Dench	actress	Philomena	1
2639092	Pirates of the Caribbean: On Stranger Tides	2011	Judi Dench	actress	Society Lady	16
2639093	Pride & Prejudice	2005	Judi Dench	actress	Lady Catherine de Bourg	20
2639094	Quantum of Solace	2008	Judi Dench	actress	M	4
2639095	Rage	2009	Judi Dench	actress	Mona Carvell	NaN
2639096	Run for Your Wife	2012	Judi Dench	actress	Bag Lady	40
2639097	Schadenfreude	2016	Judi Dench	actress	The Narrator	NaN
2639098	Shakespeare in Love	1998	Judi Dench	actress	Queen Elizabeth	12
2639099	Skyfall	2012	Judi Dench	actress	M	2
2639100	Spectre	2015	Judi Dench	actress	M	NaN
2639101	Stars in Shorts	2012	Judi Dench	actress	Mary	29
2639102	Tea with Mussolini	1999	Judi Dench	actress	Arabella	2
2639103	The Angelic Conversation	1985	Judi Dench	actress	Shakespeare's Sonnets read by	3
2639104	The Best Exotic Marigold Hotel	2011	Judi Dench	actress	Evelyn Greenslade	1
2639105	The Chronicles of Riddick	2004	Judi Dench	actress	Aereon	4
2639106	The Importance of Being Earnest	2002	Judi Dench	actress	Lady Bracknell	5
2639107	The Second Best Exotic Marigold Hotel	2015	Judi Dench	actress	Evelyn Greenslade	5
2639108	The Shipping News	2001	Judi Dench	actress	Agnis Hamm	3
2639109	The Third Secret	1964	Judi Dench	actress	Miss Humphries	12
2639110	The World Is Not Enough	1999	Judi Dench	actress	M	6
2639111	Tomorrow Never Dies	1997	Judi Dench	actress	M	9
2639112	Tulip Fever	2017	Judi Dench	actress	The Abbess of St. Ursula	NaN
2639113	Victoria and Abdul	2017	Judi Dench	actress	Queen Victoria	NaN
2639114	Wetherby	1985	Judi Dench	actress	Marcia Pilborough	3

	title	year	name	type	character	n
1605823	Sleuth	1972	Laurence Olivier	actor	Andrew Wyke	1
307018	Sleuth	1972	Michael Caine	actor	Milo Tindle	2
351643	Sleuth	1972	Alec Cawthorne	actor	Inspector Doppler	3
1379549	Sleuth	1972	John (II) Matthews	actor	Detective Sergeant Tarrant	4
2557108	Sleuth	1972	Eve (III) Channing	actress	Marguerite Wyke	5
1363933	Sleuth	1972	Teddy Martin	actor	Police Constable Higgs	6

Python and Data Science

Mariana Lopes

28/07/2016

Trabalhando com o Jupyter

Atalhos do teclado (MUITO úteis)

Tipos de célula

Pandas (http://pandas.pydata.org/)

Como instalar

Matplotlib (http://matplotlib.org/)

Como instalar

Carregando um arquivo csv em um DataFrame do Pandas

cast.csv

release_dates.csv

titles

Quantos registros há no conjunto?

Quais são os possíveis valores para a coluna type?

Quantos atores e quantas atrizes há no conjunto?

Operações com colunas

Operações Aritméticas

Comparações

Filtrar

Quantos atores atuaram em cada ano?

Qual foi a diferença entre o número de atores e atrizes que atuaram em cada década?

Datas

Quanto % dos filmes foram lançados na sexta-feira?

Merge

Qual o nome e ano do filme mais antigo?

Quantos filmes são de 1960?

Quantos filmes são de cada ano dos anos 70?

Quantos filmes foram lançados desde o ano que você nasceu até hoje?

Quais são os nomes dos filmes até 1906?

Quais são os 15 nomes de filmes mais comuns?

Em quantos filmes Judi Dench atuou?

Liste os filmes nos quais Judi Dench atuou como o ator número 1, ordenado por ano.

Liste os atores da versão de 1972 de Sleuth pela ordem do rank n.

Quais atores mais atuaram em 1985?

SciKit Learn (http://scikit-learn.org)

Como instalar

iris.csv

Treinar modelo de Árvore de Decisão

Salvar modelo

Carregar modelo

Predição para casos de teste

Quais são os possíveis valores para a coluna `type`?