Chapter 2, 3 of PDA



In [1]:

    
%pylab --no-import-all inline









    



Populating the interactive namespace from numpy and matplotlib



In [2]:

    
import matplotlib.pyplot as plt
import numpy as np

from pylab import figure, show

from pandas import DataFrame, Series
import pandas as pd

Preliminaries: Assumed location of pydata-book files

To make it more practical for me to look at your homework, I'm again going to assume a relative placement of files. I placed the files from

https://github.com/pydata/pydata-book

in a local directory, which in my case is "/Users/raymondyee/D/Document/Working_with_Open_Data/pydata-book/"

and then symbolically linked (ln -s) to the the pydata-book from the root directory of the working-open-data folder. i.e., on OS X

cd /Users/raymondyee/D/Document/Working_with_Open_Data/working-open-data
ln -s /Users/raymondyee/D/Document/Working_with_Open_Data/pydata-book/ pydata-book

That way the files from the pydata-book repository look like they sit in the working-open-data directory -- without having to actually copy the files.

With this arrangment, I should then be able to drop your notebook into my own notebooks directory and run them without having to mess around with paths.



In [3]:

    
import os

USAGOV_BITLY_PATH = os.path.join(os.pardir, "pydata-book", "ch02", "usagov_bitly_data2012-03-16-1331923249.txt")
MOVIELENS_DIR = os.path.join(os.pardir, "pydata-book", "ch02", "movielens")
NAMES_DIR = os.path.join(os.pardir, "pydata-book", "ch02", "names")

assert os.path.exists(USAGOV_BITLY_PATH)
assert os.path.exists(MOVIELENS_DIR)
assert os.path.exists(NAMES_DIR)

Please make sure the above assertions work

usa.gov bit.ly example

(PfDA, p. 18)

What's in the data file?

http://my.safaribooksonline.com/book/programming/python/9781449323592/2dot-introductory-examples/id2802197 :

In 2011, URL shortening service bit.ly partnered with the United States government website usa.gov to provide a feed of anonymous data gathered from users who shorten links ending with .gov or .mil.

Hourly archive of data: http://bitly.measuredvoice.com/bitly_archive/?C=M;O=D



In [4]:

    
open(USAGOV_BITLY_PATH).readline()









    Out[4]:





'{ "a": "Mozilla\\/5.0 (Windows NT 6.1; WOW64) AppleWebKit\\/535.11 (KHTML, like Gecko) Chrome\\/17.0.963.78 Safari\\/535.11", "c": "US", "nk": 1, "tz": "America\\/New_York", "gr": "MA", "g": "A6qOVH", "h": "wfLQtf", "l": "orofrog", "al": "en-US,en;q=0.8", "hh": "1.usa.gov", "r": "http:\\/\\/www.facebook.com\\/l\\/7AQEFzjSi\\/1.usa.gov\\/wfLQtf", "u": "http:\\/\\/www.ncbi.nlm.nih.gov\\/pubmed\\/22415991", "t": 1331923247, "hc": 1331822918, "cy": "Danvers", "ll": [ 42.576698, -70.954903 ] }\n'



In [5]:

    
import json
records = [json.loads(line) for line in open(USAGOV_BITLY_PATH)]  # list comprehension

Counting Time Zones with pandas

Recall what records is



In [6]:

    
len(records)









    Out[6]:





3560



In [7]:

    
# list of dict -> DataFrame

frame = DataFrame(records)
frame.head()









    Out[7]:






  
    
      
      _heartbeat_
      a
      al
      c
      cy
      g
      gr
      h
      hc
      hh
      kw
      l
      ll
      nk
      r
      t
      tz
      u
    
  
  
    
      0
      NaN
       Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
       en-US,en;q=0.8
       US
          Danvers
       A6qOVH
       MA
       wfLQtf
       1331822918
       1.usa.gov
       NaN
        orofrog
        [42.576698, -70.954903]
       1
       http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/...
       1331923247
        America/New_York
             http://www.ncbi.nlm.nih.gov/pubmed/22415991
    
    
      1
      NaN
                                  GoogleMaps/RochesterNY
                  NaN
       US
            Provo
       mwszkS
       UT
       mwszkS
       1308262393
            j.mp
       NaN
          bitly
       [40.218102, -111.613297]
       0
                                http://www.AwareMap.com/
       1331923249
          America/Denver
             http://www.monroecounty.gov/etc/911/rss.php
    
    
      2
      NaN
       Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...
                en-US
       US
       Washington
       xxr3Qb
       DC
       xxr3Qb
       1331919941
       1.usa.gov
       NaN
          bitly
          [38.9007, -77.043098]
       1
                                    http://t.co/03elZC4Q
       1331923250
        America/New_York
       http://boxer.senate.gov/en/press/releases/0316...
    
    
      3
      NaN
       Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...
                pt-br
       BR
             Braz
       zCaLwp
       27
       zUtuOu
       1331923068
       1.usa.gov
       NaN
       alelex88
       [-23.549999, -46.616699]
       0
                                                  direct
       1331923249
       America/Sao_Paulo
                 http://apod.nasa.gov/apod/ap120312.html
    
    
      4
      NaN
       Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...
       en-US,en;q=0.8
       US
       Shrewsbury
       9b6kNl
       MA
       9b6kNl
       1273672411
          bit.ly
       NaN
          bitly
        [42.286499, -71.714699]
       0
                     http://www.shrewsbury-ma.gov/selco/
       1331923251
        America/New_York
       http://www.shrewsbury-ma.gov/egov/gallery/1341...
    
  

5 rows × 18 columns

movielens dataset

PDA p. 26

http://www.grouplens.org/node/73 --> there's also a 10 million ratings dataset -- would be interesting to try out to test scalability of running IPython notebook on laptop



In [8]:

    
# let's take a look at the data

# my local dir: /Users/raymondyee/D/Document/Working_with_Open_Data/pydata-book/ch02/movielens

!head $MOVIELENS_DIR/movies.dat









    



1::Toy Story (1995)::Animation|Children's|Comedy
2::Jumanji (1995)::Adventure|Children's|Fantasy
3::Grumpier Old Men (1995)::Comedy|Romance
4::Waiting to Exhale (1995)::Comedy|Drama
5::Father of the Bride Part II (1995)::Comedy
6::Heat (1995)::Action|Crime|Thriller
7::Sabrina (1995)::Comedy|Romance
8::Tom and Huck (1995)::Adventure|Children's
9::Sudden Death (1995)::Action
10::GoldenEye (1995)::Action|Adventure|Thriller



In [26]:

    
# how many movies?
!wc $MOVIELENS_DIR/movies.dat









    



    3883   15675  171308 ../pydata-book/ch02/movielens/movies.dat



In [10]:

    
!head $MOVIELENS_DIR/users.dat









    



1::F::1::10::48067
2::M::56::16::70072
3::M::25::15::55117
4::M::45::7::02460
5::M::25::20::55455
6::F::50::9::55117
7::M::35::1::06810
8::M::25::12::11413
9::M::25::17::61614
10::F::35::1::95370



In [11]:

    
!head $MOVIELENS_DIR/ratings.dat









    



1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291
1::1197::3::978302268
1::1287::5::978302039
1::2804::5::978300719
1::594::4::978302268
1::919::4::978301368



In [28]:

    
import pandas as pd
import os

os.path.join(MOVIELENS_DIR, 'users.dat')









    Out[28]:





'../pydata-book/ch02/movielens/users.dat'



In [34]:

    
unames = ['user_id', 'gender', 'age', 'occupation', 'zip']
users = pd.read_table(os.path.join(MOVIELENS_DIR, 'users.dat'), sep='::', header=None,
  names=unames)

rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table(os.path.join(MOVIELENS_DIR, 'ratings.dat'), sep='::', header=None,
  names=rnames)

mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table(os.path.join(MOVIELENS_DIR, 'movies.dat'), sep='::', header=None,names=mnames, encoding='iso-8859-1')
movies[:5]









    Out[34]:






  
    
      
      movie_id
      title
      genres
    
  
  
    
      0
       1
                         Toy Story (1995)
        Animation|Children's|Comedy
    
    
      1
       2
                           Jumanji (1995)
       Adventure|Children's|Fantasy
    
    
      2
       3
                  Grumpier Old Men (1995)
                     Comedy|Romance
    
    
      3
       4
                 Waiting to Exhale (1995)
                       Comedy|Drama
    
    
      4
       5
       Father of the Bride Part II (1995)
                             Comedy
    
  

5 rows × 3 columns



In [35]:

    
movies = pd.read_table(os.path.join(MOVIELENS_DIR, 'movies.dat'), sep='::', header=None,names=mnames)
movies[:5]









    Out[35]:






  
    
      
      movie_id
      title
      genres
    
  
  
    
      0
       1
                         Toy Story (1995)
        Animation|Children's|Comedy
    
    
      1
       2
                           Jumanji (1995)
       Adventure|Children's|Fantasy
    
    
      2
       3
                  Grumpier Old Men (1995)
                     Comedy|Romance
    
    
      3
       4
                 Waiting to Exhale (1995)
                       Comedy|Drama
    
    
      4
       5
       Father of the Bride Part II (1995)
                             Comedy
    
  

5 rows × 3 columns



In [36]:

    
import traceback

try:
    movies[:100]
except:
    traceback.print_exc()



In [15]:

    
# explicit encoding of movies file

import pandas as pd
import codecs


unames = ['user_id', 'gender', 'age', 'occupation', 'zip']
users = pd.read_table(os.path.join(MOVIELENS_DIR, 'users.dat'), sep='::', header=None,
  names=unames)

rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table(os.path.join(MOVIELENS_DIR, 'ratings.dat'), sep='::', header=None,
  names=rnames)


movies_file = codecs.open(os.path.join(MOVIELENS_DIR, 'movies.dat'), encoding='iso-8859-1')

mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table(movies_file, sep='::', header=None,
  names=mnames)



In [37]:

    
movies[:5]









    Out[37]:






  
    
      
      movie_id
      title
      genres
    
  
  
    
      0
       1
                         Toy Story (1995)
        Animation|Children's|Comedy
    
    
      1
       2
                           Jumanji (1995)
       Adventure|Children's|Fantasy
    
    
      2
       3
                  Grumpier Old Men (1995)
                     Comedy|Romance
    
    
      3
       4
                 Waiting to Exhale (1995)
                       Comedy|Drama
    
    
      4
       5
       Father of the Bride Part II (1995)
                             Comedy
    
  

5 rows × 3 columns



In [17]:

    
users[:5]









    Out[17]:






  
    
      
      user_id
      gender
      age
      occupation
      zip
    
  
  
    
      0
       1
       F
        1
       10
       48067
    
    
      1
       2
       M
       56
       16
       70072
    
    
      2
       3
       M
       25
       15
       55117
    
    
      3
       4
       M
       45
        7
       02460
    
    
      4
       5
       M
       25
       20
       55455
    
  

5 rows × 5 columns



In [38]:

    
movies[:5]









    Out[38]:






  
    
      
      movie_id
      title
      genres
    
  
  
    
      0
       1
                         Toy Story (1995)
        Animation|Children's|Comedy
    
    
      1
       2
                           Jumanji (1995)
       Adventure|Children's|Fantasy
    
    
      2
       3
                  Grumpier Old Men (1995)
                     Comedy|Romance
    
    
      3
       4
                 Waiting to Exhale (1995)
                       Comedy|Drama
    
    
      4
       5
       Father of the Bride Part II (1995)
                             Comedy
    
  

5 rows × 3 columns

hmmm...age 1? Where to learn about occupation types? We have zip data...so it'd be fun to map. Might be useful to look at distribution of age, gender, and zip.

check on encoding of the movie files



In [19]:

    
import codecs
from itertools import islice

fname = os.path.join(MOVIELENS_DIR, "movies.dat")

f = codecs.open(fname, encoding='iso-8859-1')
for line in islice(f,100):
    print line









    



1::Toy Story (1995)::Animation|Children's|Comedy

2::Jumanji (1995)::Adventure|Children's|Fantasy

3::Grumpier Old Men (1995)::Comedy|Romance

4::Waiting to Exhale (1995)::Comedy|Drama

5::Father of the Bride Part II (1995)::Comedy

6::Heat (1995)::Action|Crime|Thriller

7::Sabrina (1995)::Comedy|Romance

8::Tom and Huck (1995)::Adventure|Children's

9::Sudden Death (1995)::Action

10::GoldenEye (1995)::Action|Adventure|Thriller

11::American President, The (1995)::Comedy|Drama|Romance

12::Dracula: Dead and Loving It (1995)::Comedy|Horror

13::Balto (1995)::Animation|Children's

14::Nixon (1995)::Drama

15::Cutthroat Island (1995)::Action|Adventure|Romance

16::Casino (1995)::Drama|Thriller

17::Sense and Sensibility (1995)::Drama|Romance

18::Four Rooms (1995)::Thriller

19::Ace Ventura: When Nature Calls (1995)::Comedy

20::Money Train (1995)::Action

21::Get Shorty (1995)::Action|Comedy|Drama

22::Copycat (1995)::Crime|Drama|Thriller

23::Assassins (1995)::Thriller

24::Powder (1995)::Drama|Sci-Fi

25::Leaving Las Vegas (1995)::Drama|Romance

26::Othello (1995)::Drama

27::Now and Then (1995)::Drama

28::Persuasion (1995)::Romance

29::City of Lost Children, The (1995)::Adventure|Sci-Fi

30::Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)::Drama

31::Dangerous Minds (1995)::Drama

32::Twelve Monkeys (1995)::Drama|Sci-Fi

33::Wings of Courage (1995)::Adventure|Romance

34::Babe (1995)::Children's|Comedy|Drama

35::Carrington (1995)::Drama|Romance

36::Dead Man Walking (1995)::Drama

37::Across the Sea of Time (1995)::Documentary

38::It Takes Two (1995)::Comedy

39::Clueless (1995)::Comedy|Romance

40::Cry, the Beloved Country (1995)::Drama

41::Richard III (1995)::Drama|War

42::Dead Presidents (1995)::Action|Crime|Drama

43::Restoration (1995)::Drama

44::Mortal Kombat (1995)::Action|Adventure

45::To Die For (1995)::Comedy|Drama

46::How to Make an American Quilt (1995)::Drama|Romance

47::Seven (Se7en) (1995)::Crime|Thriller

48::Pocahontas (1995)::Animation|Children's|Musical|Romance

49::When Night Is Falling (1995)::Drama|Romance

50::Usual Suspects, The (1995)::Crime|Thriller

51::Guardian Angel (1994)::Action|Drama|Thriller

52::Mighty Aphrodite (1995)::Comedy

53::Lamerica (1994)::Drama

54::Big Green, The (1995)::Children's|Comedy

55::Georgia (1995)::Drama

56::Kids of the Round Table (1995)::Adventure|Children's|Fantasy

57::Home for the Holidays (1995)::Drama

58::Postino, Il (The Postman) (1994)::Drama|Romance

59::Confessional, The (Le Confessionnal) (1995)::Drama|Mystery

60::Indian in the Cupboard, The (1995)::Adventure|Children's|Fantasy

61::Eye for an Eye (1996)::Drama|Thriller

62::Mr. Holland's Opus (1995)::Drama

63::Don't Be a Menace to South Central While Drinking Your Juice in the Hood (1996)::Comedy

64::Two if by Sea (1996)::Comedy|Romance

65::Bio-Dome (1996)::Comedy

66::Lawnmower Man 2: Beyond Cyberspace (1996)::Sci-Fi|Thriller

67::Two Bits (1995)::Drama

68::French Twist (Gazon maudit) (1995)::Comedy|Romance

69::Friday (1995)::Comedy

70::From Dusk Till Dawn (1996)::Action|Comedy|Crime|Horror|Thriller

71::Fair Game (1995)::Action

72::Kicking and Screaming (1995)::Comedy|Drama

73::Misérables, Les (1995)::Drama|Musical

74::Bed of Roses (1996)::Drama|Romance

75::Big Bully (1996)::Comedy|Drama

76::Screamers (1995)::Sci-Fi|Thriller

77::Nico Icon (1995)::Documentary

78::Crossing Guard, The (1995)::Drama

79::Juror, The (1996)::Drama|Thriller

80::White Balloon, The (Badkonake Sefid ) (1995)::Drama

81::Things to Do in Denver when You're Dead (1995)::Crime|Drama|Romance

82::Antonia's Line (Antonia) (1995)::Drama

83::Once Upon a Time... When We Were Colored (1995)::Drama

84::Last Summer in the Hamptons (1995)::Comedy|Drama

85::Angels and Insects (1995)::Drama|Romance

86::White Squall (1996)::Adventure|Drama

87::Dunston Checks In (1996)::Children's|Comedy

88::Black Sheep (1996)::Comedy

89::Nick of Time (1995)::Action|Thriller

90::Journey of August King, The (1995)::Drama

92::Mary Reilly (1996)::Drama|Thriller

93::Vampire in Brooklyn (1995)::Comedy|Romance

94::Beautiful Girls (1996)::Drama

95::Broken Arrow (1996)::Action|Thriller

96::In the Bleak Midwinter (1995)::Comedy

97::Hate (Haine, La) (1995)::Drama

98::Shopping (1994)::Action|Thriller

99::Heidi Fleiss: Hollywood Madam (1995)::Documentary

100::City Hall (1996)::Drama|Thriller

101::Bottle Rocket (1996)::Comedy



In [20]:

    
import pandas as pd
import codecs

movies_file = codecs.open(os.path.join(MOVIELENS_DIR, 'movies.dat'), encoding='iso-8859-1')

mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table(movies_file, sep='::', header=None,
  names=mnames)

print (movies.ix[72]['title'] == u'Misérables, Les (1995)')









    



True

Baby names dataset



In [21]:

    
import pandas as pd
import codecs

names1880_file = codecs.open(os.path.join(NAMES_DIR,'yob2010.txt'), encoding='iso-8859-1')
names1880 = pd.read_csv(names1880_file, names=['name', 'sex', 'births'])

names1880









    Out[21]:






  
    
      
      name
      sex
      births
    
  
  
    
      0 
        Isabella
       F
       22731
    
    
      1 
          Sophia
       F
       20477
    
    
      2 
            Emma
       F
       17179
    
    
      3 
          Olivia
       F
       16860
    
    
      4 
             Ava
       F
       15300
    
    
      5 
           Emily
       F
       14172
    
    
      6 
         Abigail
       F
       14124
    
    
      7 
         Madison
       F
       13070
    
    
      8 
           Chloe
       F
       11656
    
    
      9 
             Mia
       F
       10541
    
    
      10
         Addison
       F
       10253
    
    
      11
       Elizabeth
       F
       10135
    
    
      12
            Ella
       F
        9796
    
    
      13
         Natalie
       F
        8715
    
    
      14
        Samantha
       F
        8334
    
    
      15
          Alexis
       F
        8181
    
    
      16
            Lily
       F
        7900
    
    
      17
           Grace
       F
        7598
    
    
      18
          Hailey
       F
        6969
    
    
      19
          Alyssa
       F
        6934
    
    
      20
         Lillian
       F
        6898
    
    
      21
          Hannah
       F
        6891
    
    
      22
           Avery
       F
        6633
    
    
      23
            Leah
       F
        6474
    
    
      24
          Nevaeh
       F
        6345
    
    
      25
           Sofia
       F
        6282
    
    
      26
          Ashley
       F
        6276
    
    
      27
            Anna
       F
        6242
    
    
      28
         Brianna
       F
        6224
    
    
      29
           Sarah
       F
        6223
    
    
      30
             Zoe
       F
        6200
    
    
      31
        Victoria
       F
        6174
    
    
      32
       Gabriella
       F
        6101
    
    
      33
        Brooklyn
       F
        6068
    
    
      34
          Kaylee
       F
        6049
    
    
      35
          Taylor
       F
        5858
    
    
      36
           Layla
       F
        5853
    
    
      37
         Allison
       F
        5829
    
    
      38
          Evelyn
       F
        5802
    
    
      39
           Riley
       F
        5506
    
    
      40
          Amelia
       F
        5417
    
    
      41
           Khloe
       F
        5369
    
    
      42
         Makayla
       F
        5348
    
    
      43
          Aubrey
       F
        5337
    
    
      44
       Charlotte
       F
        5314
    
    
      45
        Savannah
       F
        5311
    
    
      46
            Zoey
       F
        5164
    
    
      47
           Bella
       F
        5082
    
    
      48
           Kayla
       F
        5021
    
    
      49
           Alexa
       F
        5012
    
    
      50
          Peyton
       F
        4937
    
    
      51
          Audrey
       F
        4910
    
    
      52
          Claire
       F
        4885
    
    
      53
         Arianna
       F
        4797
    
    
      54
           Julia
       F
        4640
    
    
      55
         Aaliyah
       F
        4628
    
    
      56
           Kylie
       F
        4564
    
    
      57
          Lauren
       F
        4429
    
    
      58
          Sophie
       F
        4380
    
    
      59
          Sydney
       F
        4301
    
    
      
      ...
      ...
      ...
    
  

33838 rows × 3 columns



In [22]:

    
# sort by name

names1880.sort('births', ascending=False)[:10]









    Out[22]:






  
    
      
      name
      sex
      births
    
  
  
    
      0    
        Isabella
       F
       22731
    
    
      19698
           Jacob
       M
       21875
    
    
      1    
          Sophia
       F
       20477
    
    
      19699
           Ethan
       M
       17866
    
    
      2    
            Emma
       F
       17179
    
    
      19700
         Michael
       M
       17133
    
    
      19701
          Jayden
       M
       17030
    
    
      19702
         William
       M
       16870
    
    
      3    
          Olivia
       F
       16860
    
    
      19703
       Alexander
       M
       16634
    
  

10 rows × 3 columns



In [23]:

    
names1880[names1880.sex == 'F'].sort('births', ascending=False)[:10]









    Out[23]:






  
    
      
      name
      sex
      births
    
  
  
    
      0
       Isabella
       F
       22731
    
    
      1
         Sophia
       F
       20477
    
    
      2
           Emma
       F
       17179
    
    
      3
         Olivia
       F
       16860
    
    
      4
            Ava
       F
       15300
    
    
      5
          Emily
       F
       14172
    
    
      6
        Abigail
       F
       14124
    
    
      7
        Madison
       F
       13070
    
    
      8
          Chloe
       F
       11656
    
    
      9
            Mia
       F
       10541
    
  

10 rows × 3 columns



In [24]:

    
names1880['births'].plot()









    Out[24]:





<matplotlib.axes.AxesSubplot at 0x133fd410>



In [25]:

    
names1880['births'].count()









    Out[25]:





33838



In [25]:

	_heartbeat_	a	al	c	cy	g	gr	h	hc	hh	kw	l	ll	nk	r	t	tz	u
0	NaN	Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...	en-US,en;q=0.8	US	Danvers	A6qOVH	MA	wfLQtf	1331822918	1.usa.gov	NaN	orofrog	[42.576698, -70.954903]	1	http://www.facebook.com/l/7AQEFzjSi/1.usa.gov/...	1331923247	America/New_York	http://www.ncbi.nlm.nih.gov/pubmed/22415991
1	NaN	GoogleMaps/RochesterNY	NaN	US	Provo	mwszkS	UT	mwszkS	1308262393	j.mp	NaN	bitly	[40.218102, -111.613297]	0	http://www.AwareMap.com/	1331923249	America/Denver	http://www.monroecounty.gov/etc/911/rss.php
2	NaN	Mozilla/4.0 (compatible; MSIE 8.0; Windows NT ...	en-US	US	Washington	xxr3Qb	DC	xxr3Qb	1331919941	1.usa.gov	NaN	bitly	[38.9007, -77.043098]	1	http://t.co/03elZC4Q	1331923250	America/New_York	http://boxer.senate.gov/en/press/releases/0316...
3	NaN	Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8)...	pt-br	BR	Braz	zCaLwp	27	zUtuOu	1331923068	1.usa.gov	NaN	alelex88	[-23.549999, -46.616699]	0	direct	1331923249	America/Sao_Paulo	http://apod.nasa.gov/apod/ap120312.html
4	NaN	Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...	en-US,en;q=0.8	US	Shrewsbury	9b6kNl	MA	9b6kNl	1273672411	bit.ly	NaN	bitly	[42.286499, -71.714699]	0	http://www.shrewsbury-ma.gov/selco/	1331923251	America/New_York	http://www.shrewsbury-ma.gov/egov/gallery/1341...

	movie_id	title	genres
0	1	Toy Story (1995)	Animation\|Children's\|Comedy
1	2	Jumanji (1995)	Adventure\|Children's\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama
4	5	Father of the Bride Part II (1995)	Comedy

	name	sex	births
0	Isabella	F	22731
1	Sophia	F	20477
2	Emma	F	17179
3	Olivia	F	16860
4	Ava	F	15300
5	Emily	F	14172
6	Abigail	F	14124
7	Madison	F	13070
8	Chloe	F	11656
9	Mia	F	10541
10	Addison	F	10253
11	Elizabeth	F	10135
12	Ella	F	9796
13	Natalie	F	8715
14	Samantha	F	8334
15	Alexis	F	8181
16	Lily	F	7900
17	Grace	F	7598
18	Hailey	F	6969
19	Alyssa	F	6934
20	Lillian	F	6898
21	Hannah	F	6891
22	Avery	F	6633
23	Leah	F	6474
24	Nevaeh	F	6345
25	Sofia	F	6282
26	Ashley	F	6276
27	Anna	F	6242
28	Brianna	F	6224
29	Sarah	F	6223
30	Zoe	F	6200
31	Victoria	F	6174
32	Gabriella	F	6101
33	Brooklyn	F	6068
34	Kaylee	F	6049
35	Taylor	F	5858
36	Layla	F	5853
37	Allison	F	5829
38	Evelyn	F	5802
39	Riley	F	5506
40	Amelia	F	5417
41	Khloe	F	5369
42	Makayla	F	5348
43	Aubrey	F	5337
44	Charlotte	F	5314
45	Savannah	F	5311
46	Zoey	F	5164
47	Bella	F	5082
48	Kayla	F	5021
49	Alexa	F	5012
50	Peyton	F	4937
51	Audrey	F	4910
52	Claire	F	4885
53	Arianna	F	4797
54	Julia	F	4640
55	Aaliyah	F	4628
56	Kylie	F	4564
57	Lauren	F	4429
58	Sophie	F	4380
59	Sydney	F	4301
	...	...	...

	name	sex	births
0	Isabella	F	22731
19698	Jacob	M	21875
1	Sophia	F	20477
19699	Ethan	M	17866
2	Emma	F	17179
19700	Michael	M	17133
19701	Jayden	M	17030
19702	William	M	16870
3	Olivia	F	16860
19703	Alexander	M	16634