pandas: Open source library in Python for data analysis, data manipulation, and data visualisation.
Pros:
In [37]:
import pandas as pd
Tabular data file: By default tab separated file (tsv)
In [7]:
orders = pd.read_table('http://bit.ly/chiporders')
In [9]:
orders.head()
Out[9]:
order_id
quantity
item_name
choice_description
item_price
0
1
1
Chips and Fresh Tomato Salsa
NaN
$2.39
1
1
1
Izze
[Clementine]
$3.39
2
1
1
Nantucket Nectar
[Apple]
$3.39
3
1
1
Chips and Tomatillo-Green Chili Salsa
NaN
$2.39
4
2
2
Chicken Bowl
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
$16.98
In [17]:
user_cols = ['user_id', 'age', 'gender', 'occupation', 'zip_code']
users = pd.read_table('http://bit.ly/movieusers', delimiter='|', header=None, names=user_cols)
In [18]:
users.head()
Out[18]:
user_id
age
gender
occupation
zip_code
0
1
24
M
technician
85711
1
2
53
F
other
94043
2
3
23
M
writer
32067
3
4
24
M
technician
43537
4
5
33
F
other
15213
Tip: skiprows and skipfooter params are useful to omit extra data in a file heading or ending.
Two basic data structures in pandas
In [20]:
ufo = pd.read_csv('http://bit.ly/uforeports')
In [21]:
ufo.head()
Out[21]:
City
Colors Reported
Shape Reported
State
Time
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
3
Abilene
NaN
DISK
KS
6/1/1931 13:00
4
New York Worlds Fair
NaN
LIGHT
NY
4/18/1933 19:00
In [22]:
type(ufo)
Out[22]:
pandas.core.frame.DataFrame
In [24]:
type(ufo['City'])
Out[24]:
pandas.core.series.Series
In [26]:
city = ufo.City
In [27]:
city.head()
Out[27]:
0 Ithaca
1 Willingboro
2 Holyoke
3 Abilene
4 New York Worlds Fair
Name: City, dtype: object
Tip: Create a new Series in a DataFrame
In [28]:
ufo['location'] = ufo.City + ', ' + ufo.State
In [29]:
ufo.head()
Out[29]:
City
Colors Reported
Shape Reported
State
Time
location
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
Ithaca, NY
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
Willingboro, NJ
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
Holyoke, CO
3
Abilene
NaN
DISK
KS
6/1/1931 13:00
Abilene, KS
4
New York Worlds Fair
NaN
LIGHT
NY
4/18/1933 19:00
New York Worlds Fair, NY
In [31]:
movies = pd.read_csv('http://bit.ly/imdbratings')
In [32]:
movies.head()
Out[32]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
142
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
4
8.9
Pulp Fiction
R
Crime
154
[u'John Travolta', u'Uma Thurman', u'Samuel L....
In [33]:
movies.describe()
Out[33]:
star_rating
duration
count
979.000000
979.000000
mean
7.889785
120.979571
std
0.336069
26.218010
min
7.400000
64.000000
25%
7.600000
102.000000
50%
7.800000
117.000000
75%
8.100000
134.000000
max
9.300000
242.000000
In [34]:
movies.shape
Out[34]:
(979, 6)
In [36]:
movies.dtypes
Out[36]:
star_rating float64
title object
content_rating object
genre object
duration int64
actors_list object
dtype: object
In [37]:
type(movies)
Out[37]:
pandas.core.frame.DataFrame
In [38]:
movies.describe(include=['object'])
Out[38]:
title
content_rating
genre
actors_list
count
979
976
979
979
unique
975
12
16
969
top
The Girl with the Dragon Tattoo
R
Drama
[u'Daniel Radcliffe', u'Emma Watson', u'Rupert...
freq
2
460
278
6
Tip: Hit "Shift+Tab" inside a method parentheses to get list of arguments
In [39]:
ufo.head()
Out[39]:
City
Colors Reported
Shape Reported
State
Time
location
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
Ithaca, NY
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
Willingboro, NJ
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
Holyoke, CO
3
Abilene
NaN
DISK
KS
6/1/1931 13:00
Abilene, KS
4
New York Worlds Fair
NaN
LIGHT
NY
4/18/1933 19:00
New York Worlds Fair, NY
In [40]:
ufo.columns
Out[40]:
Index([u'City', u'Colors Reported', u'Shape Reported', u'State', u'Time',
u'location'],
dtype='object')
In [42]:
ufo.rename(columns={'Colors Reported': 'Colors_Reported', 'Shape Reported': 'Shape_Reported'}, inplace=True)
In [43]:
ufo.columns
Out[43]:
Index([u'City', u'Colors_Reported', u'Shape_Reported', u'State', u'Time',
u'location'],
dtype='object')
In [44]:
ufo_cols = ['city', 'colors reported', 'state reported', 'state', 'time', 'location']
ufo.columns = ufo_cols
In [45]:
ufo.head()
Out[45]:
city
colors reported
state reported
state
time
location
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
Ithaca, NY
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
Willingboro, NJ
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
Holyoke, CO
3
Abilene
NaN
DISK
KS
6/1/1931 13:00
Abilene, KS
4
New York Worlds Fair
NaN
LIGHT
NY
4/18/1933 19:00
New York Worlds Fair, NY
In [50]:
ufo_cols = ['City', 'Colors Reported', 'State Reported', 'State', 'Time']
ufo = pd.read_csv('http://bit.ly/uforeports', names=ufo_cols, header=0)
In [51]:
ufo.head()
Out[51]:
City
Colors Reported
State Reported
State
Time
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
3
Abilene
NaN
DISK
KS
6/1/1931 13:00
4
New York Worlds Fair
NaN
LIGHT
NY
4/18/1933 19:00
Tip: Use str.replace method to drop the space from columns names
In [53]:
ufo.columns = ufo.columns.str.replace(' ', '_')
In [54]:
ufo.columns
Out[54]:
Index([u'City', u'Colors_Reported', u'State_Reported', u'State', u'Time'], dtype='object')
In [56]:
ufo = pd.read_csv('http://bit.ly/uforeports')
In [57]:
ufo.head()
Out[57]:
City
Colors Reported
Shape Reported
State
Time
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
3
Abilene
NaN
DISK
KS
6/1/1931 13:00
4
New York Worlds Fair
NaN
LIGHT
NY
4/18/1933 19:00
In [58]:
ufo.shape
Out[58]:
(18241, 5)
In [59]:
ufo.drop('Colors Reported', axis=1, inplace=True)
In [60]:
ufo.head()
Out[60]:
City
Shape Reported
State
Time
0
Ithaca
TRIANGLE
NY
6/1/1930 22:00
1
Willingboro
OTHER
NJ
6/30/1930 20:00
2
Holyoke
OVAL
CO
2/15/1931 14:00
3
Abilene
DISK
KS
6/1/1931 13:00
4
New York Worlds Fair
LIGHT
NY
4/18/1933 19:00
In [61]:
ufo.drop(labels=['City', 'State'], axis=1, inplace=True)
In [62]:
ufo.head()
Out[62]:
Shape Reported
Time
0
TRIANGLE
6/1/1930 22:00
1
OTHER
6/30/1930 20:00
2
OVAL
2/15/1931 14:00
3
DISK
6/1/1931 13:00
4
LIGHT
4/18/1933 19:00
Tip: To remove rows instead of columns, choose axis=0
In [63]:
ufo.drop([0, 1], axis=0, inplace=True)
In [65]:
ufo.shape
Out[65]:
(18239, 2)
In [67]:
movies = pd.read_csv('http://bit.ly/imdbratings')
In [68]:
movies.head()
Out[68]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
142
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
4
8.9
Pulp Fiction
R
Crime
154
[u'John Travolta', u'Uma Thurman', u'Samuel L....
In [69]:
movies.title.sort_values()
Out[69]:
542 (500) Days of Summer
5 12 Angry Men
201 12 Years a Slave
698 127 Hours
110 2001: A Space Odyssey
910 2046
596 21 Grams
624 25th Hour
708 28 Days Later...
60 3 Idiots
225 3-Iron
570 300
555 3:10 to Yuma
427 4 Months, 3 Weeks and 2 Days
824 42
597 50/50
203 8 1/2
170 A Beautiful Mind
941 A Bridge Too Far
571 A Bronx Tale
266 A Christmas Story
86 A Clockwork Orange
716 A Few Good Men
750 A Fish Called Wanda
276 A Fistful of Dollars
612 A Hard Day's Night
883 A History of Violence
869 A Nightmare on Elm Street
865 A Perfect World
426 A Prophet
...
207 What Ever Happened to Baby Jane?
562 What's Eating Gilbert Grape
719 When Harry Met Sally...
649 Where Eagles Dare
33 Whiplash
669 Who Framed Roger Rabbit
219 Who's Afraid of Virginia Woolf?
127 Wild Strawberries
497 Willy Wonka & the Chocolate Factory
270 Wings of Desire
483 Withnail & I
920 Witness
65 Witness for the Prosecution
970 Wonder Boys
518 Wreck-It Ralph
954 X-Men
248 X-Men: Days of Future Past
532 X-Men: First Class
871 X2
695 Y Tu Mama Tambien
403 Ying xiong
235 Yip Man
96 Yojimbo
280 Young Frankenstein
535 Zelig
955 Zero Dark Thirty
677 Zodiac
615 Zombieland
526 Zulu
864 [Rec]
Name: title, dtype: object
In [70]:
movies.sort_values('title')
Out[70]:
star_rating
title
content_rating
genre
duration
actors_list
542
7.8
(500) Days of Summer
PG-13
Comedy
95
[u'Zooey Deschanel', u'Joseph Gordon-Levitt', ...
5
8.9
12 Angry Men
NOT RATED
Drama
96
[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...
201
8.1
12 Years a Slave
R
Biography
134
[u'Chiwetel Ejiofor', u'Michael Kenneth Willia...
698
7.6
127 Hours
R
Adventure
94
[u'James Franco', u'Amber Tamblyn', u'Kate Mara']
110
8.3
2001: A Space Odyssey
G
Mystery
160
[u'Keir Dullea', u'Gary Lockwood', u'William S...
910
7.5
2046
R
Drama
129
[u'Tony Chiu Wai Leung', u'Ziyi Zhang', u'Faye...
596
7.7
21 Grams
R
Crime
124
[u'Sean Penn', u'Benicio Del Toro', u'Naomi Wa...
624
7.7
25th Hour
R
Crime
135
[u'Edward Norton', u'Barry Pepper', u'Philip S...
708
7.6
28 Days Later...
R
Horror
113
[u'Cillian Murphy', u'Naomie Harris', u'Christ...
60
8.5
3 Idiots
PG-13
Comedy
170
[u'Aamir Khan', u'Madhavan', u'Mona Singh']
225
8.1
3-Iron
R
Crime
88
[u'Seung-yeon Lee', u'Hyun-kyoon Lee', u'Hyuk-...
570
7.8
300
R
Action
117
[u'Gerard Butler', u'Lena Headey', u'David Wen...
555
7.8
3:10 to Yuma
R
Adventure
122
[u'Russell Crowe', u'Christian Bale', u'Ben Fo...
427
7.9
4 Months, 3 Weeks and 2 Days
NOT RATED
Drama
113
[u'Anamaria Marinca', u'Vlad Ivanov', u'Laura ...
824
7.5
42
PG-13
Biography
128
[u'Chadwick Boseman', u'T.R. Knight', u'Harris...
597
7.7
50/50
R
Comedy
100
[u'Joseph Gordon-Levitt', u'Seth Rogen', u'Ann...
203
8.1
8 1/2
NOT RATED
Drama
138
[u'Marcello Mastroianni', u'Anouk Aim\xe9e', u...
170
8.2
A Beautiful Mind
PG-13
Biography
135
[u'Russell Crowe', u'Ed Harris', u'Jennifer Co...
941
7.4
A Bridge Too Far
PG
Drama
175
[u'Sean Connery', u"Ryan O'Neal", u'Michael Ca...
571
7.8
A Bronx Tale
R
Crime
121
[u'Robert De Niro', u'Chazz Palminteri', u'Lil...
266
8.1
A Christmas Story
PG
Comedy
94
[u'Peter Billingsley', u'Melinda Dillon', u'Da...
86
8.4
A Clockwork Orange
X
Crime
136
[u'Malcolm McDowell', u'Patrick Magee', u'Mich...
716
7.6
A Few Good Men
R
Drama
138
[u'Tom Cruise', u'Jack Nicholson', u'Demi Moore']
750
7.6
A Fish Called Wanda
R
Comedy
108
[u'John Cleese', u'Jamie Lee Curtis', u'Kevin ...
276
8.1
A Fistful of Dollars
R
Action
99
[u'Clint Eastwood', u'Gian Maria Volont\xe9', ...
612
7.7
A Hard Day's Night
APPROVED
Comedy
87
[u'John Lennon', u'Paul McCartney', u'George H...
883
7.5
A History of Violence
R
Crime
96
[u'Viggo Mortensen', u'Maria Bello', u'Ed Harr...
869
7.5
A Nightmare on Elm Street
R
Horror
91
[u'Heather Langenkamp', u'Johnny Depp', u'Robe...
865
7.5
A Perfect World
PG-13
Crime
138
[u'Kevin Costner', u'Clint Eastwood', u'Laura ...
426
7.9
A Prophet
R
Crime
155
[u'Tahar Rahim', u'Niels Arestrup', u'Adel Ben...
...
...
...
...
...
...
...
207
8.1
What Ever Happened to Baby Jane?
APPROVED
Drama
134
[u'Bette Davis', u'Joan Crawford', u'Victor Bu...
562
7.8
What's Eating Gilbert Grape
PG-13
Drama
118
[u'Johnny Depp', u'Leonardo DiCaprio', u'Julie...
719
7.6
When Harry Met Sally...
R
Comedy
96
[u'Billy Crystal', u'Meg Ryan', u'Carrie Fisher']
649
7.7
Where Eagles Dare
NaN
Action
158
[u'Richard Burton', u'Clint Eastwood', u'Mary ...
33
8.6
Whiplash
R
Drama
107
[u'Miles Teller', u'J.K. Simmons', u'Melissa B...
669
7.7
Who Framed Roger Rabbit
PG
Animation
104
[u'Bob Hoskins', u'Christopher Lloyd', u'Joann...
219
8.1
Who's Afraid of Virginia Woolf?
TV-MA
Drama
131
[u'Elizabeth Taylor', u'Richard Burton', u'Geo...
127
8.3
Wild Strawberries
UNRATED
Drama
91
[u'Victor Sj\xf6str\xf6m', u'Bibi Andersson', ...
497
7.8
Willy Wonka & the Chocolate Factory
G
Family
100
[u'Gene Wilder', u'Jack Albertson', u'Peter Os...
270
8.1
Wings of Desire
PG-13
Drama
128
[u'Bruno Ganz', u'Solveig Dommartin', u'Otto S...
483
7.8
Withnail & I
R
Comedy
107
[u'Richard E. Grant', u'Paul McGann', u'Richar...
920
7.5
Witness
R
Crime
112
[u'Harrison Ford', u'Kelly McGillis', u'Lukas ...
65
8.4
Witness for the Prosecution
APPROVED
Crime
116
[u'Tyrone Power', u'Marlene Dietrich', u'Charl...
970
7.4
Wonder Boys
R
Drama
107
[u'Michael Douglas', u'Tobey Maguire', u'Franc...
518
7.8
Wreck-It Ralph
PG
Animation
101
[u'John C. Reilly', u'Jack McBrayer', u'Jane L...
954
7.4
X-Men
PG-13
Action
104
[u'Patrick Stewart', u'Hugh Jackman', u'Ian Mc...
248
8.1
X-Men: Days of Future Past
PG-13
Action
131
[u'Patrick Stewart', u'Ian McKellen', u'Hugh J...
532
7.8
X-Men: First Class
PG-13
Action
132
[u'James McAvoy', u'Michael Fassbender', u'Jen...
871
7.5
X2
PG-13
Action
134
[u'Patrick Stewart', u'Hugh Jackman', u'Halle ...
695
7.7
Y Tu Mama Tambien
R
Drama
106
[u'Maribel Verd\xfa', u'Gael Garc\xeda Bernal'...
403
7.9
Ying xiong
PG-13
Action
99
[u'Jet Li', u'Tony Chiu Wai Leung', u'Maggie C...
235
8.1
Yip Man
R
Action
106
[u'Donnie Yen', u'Simon Yam', u'Siu-Wong Fan']
96
8.4
Yojimbo
UNRATED
Action
110
[u'Toshir\xf4 Mifune', u'Eijir\xf4 T\xf4no', u...
280
8.1
Young Frankenstein
PG
Comedy
106
[u'Gene Wilder', u'Madeline Kahn', u'Marty Fel...
535
7.8
Zelig
PG
Comedy
79
[u'Woody Allen', u'Mia Farrow', u'Patrick Horg...
955
7.4
Zero Dark Thirty
R
Drama
157
[u'Jessica Chastain', u'Joel Edgerton', u'Chri...
677
7.7
Zodiac
R
Crime
157
[u'Jake Gyllenhaal', u'Robert Downey Jr.', u'M...
615
7.7
Zombieland
R
Comedy
88
[u'Jesse Eisenberg', u'Emma Stone', u'Woody Ha...
526
7.8
Zulu
UNRATED
Drama
138
[u'Stanley Baker', u'Jack Hawkins', u'Ulla Jac...
864
7.5
[Rec]
R
Horror
78
[u'Manuela Velasco', u'Ferran Terraza', u'Jorg...
979 rows × 6 columns
Tip: Sort by multiple columns
In [71]:
movies.sort_values(['content_rating', 'duration'])
Out[71]:
star_rating
title
content_rating
genre
duration
actors_list
713
7.6
The Jungle Book
APPROVED
Animation
78
[u'Phil Harris', u'Sebastian Cabot', u'Louis P...
513
7.8
Invasion of the Body Snatchers
APPROVED
Horror
80
[u'Kevin McCarthy', u'Dana Wynter', u'Larry Ga...
272
8.1
The Killing
APPROVED
Crime
85
[u'Sterling Hayden', u'Coleen Gray', u'Vince E...
703
7.6
Dracula
APPROVED
Horror
85
[u'Bela Lugosi', u'Helen Chandler', u'David Ma...
612
7.7
A Hard Day's Night
APPROVED
Comedy
87
[u'John Lennon', u'Paul McCartney', u'George H...
58
8.5
Paths of Glory
APPROVED
Drama
88
[u'Kirk Douglas', u'Ralph Meeker', u'Adolphe M...
210
8.1
Laura
APPROVED
Film-Noir
88
[u'Gene Tierney', u'Dana Andrews', u'Clifton W...
656
7.7
Snow White and the Seven Dwarfs
APPROVED
Animation
88
[u'Adriana Caselotti', u'Harry Stockwell', u'L...
844
7.5
Pinocchio
APPROVED
Animation
88
[u'Dickie Jones', u'Christian Rub', u'Mel Blanc']
233
8.1
The Night of the Hunter
APPROVED
Crime
92
[u'Robert Mitchum', u'Shelley Winters', u'Lill...
287
8.1
His Girl Friday
APPROVED
Comedy
92
[u'Cary Grant', u'Rosalind Russell', u'Ralph B...
484
7.8
The Day the Earth Stood Still
APPROVED
Sci-Fi
92
[u'Michael Rennie', u'Patricia Neal', u'Hugh M...
451
7.9
Miracle on 34th Street
APPROVED
Comedy
96
[u'Edmund Gwenn', u"Maureen O'Hara", u'John Pa...
229
8.1
Strangers on a Train
APPROVED
Crime
101
[u'Farley Granger', u'Robert Walker', u'Ruth R...
237
8.1
Notorious
APPROVED
Drama
101
[u'Cary Grant', u'Ingrid Bergman', u'Claude Ra...
521
7.8
Belle de Jour
APPROVED
Drama
101
[u'Catherine Deneuve', u'Jean Sorel', u'Michel...
285
8.1
Bringing Up Baby
APPROVED
Comedy
102
[u'Katharine Hepburn', u'Cary Grant', u'Charle...
94
8.4
Singin' in the Rain
APPROVED
Comedy
103
[u'Gene Kelly', u"Donald O'Connor", u'Debbie R...
575
7.8
Kramer vs. Kramer
APPROVED
Drama
105
[u'Dustin Hoffman', u'Meryl Streep', u'Jane Al...
298
8.0
The Graduate
APPROVED
Comedy
106
[u'Dustin Hoffman', u'Anne Bancroft', u'Kathar...
866
7.5
To Catch a Thief
APPROVED
Mystery
106
[u'Cary Grant', u'Grace Kelly', u'Jessie Royce...
230
8.1
Cat on a Hot Tin Roof
APPROVED
Drama
108
[u'Elizabeth Taylor', u'Paul Newman', u'Burl I...
350
8.0
Shadow of a Doubt
APPROVED
Thriller
108
[u'Teresa Wright', u'Joseph Cotten', u'Macdona...
329
8.0
In the Heat of the Night
APPROVED
Crime
109
[u'Sidney Poitier', u'Rod Steiger', u'Warren O...
563
7.8
Goldfinger
APPROVED
Action
110
[u'Sean Connery', u'Gert Fr\xf6be', u'Honor Bl...
410
7.9
Bonnie and Clyde
APPROVED
Biography
111
[u'Warren Beatty', u'Faye Dunaway', u'Michael ...
38
8.6
Rear Window
APPROVED
Mystery
112
[u'James Stewart', u'Grace Kelly', u'Wendell C...
253
8.1
The Big Sleep
APPROVED
Crime
114
[u'Humphrey Bogart', u'Lauren Bacall', u'John ...
896
7.5
From Russia with Love
APPROVED
Action
115
[u'Sean Connery', u'Robert Shaw', u'Lotte Lenya']
65
8.4
Witness for the Prosecution
APPROVED
Crime
116
[u'Tyrone Power', u'Marlene Dietrich', u'Charl...
...
...
...
...
...
...
...
652
7.7
This Is England
UNRATED
Crime
101
[u'Thomas Turgoose', u'Stephen Graham', u'Jo H...
144
8.2
It Happened One Night
UNRATED
Comedy
105
[u'Clark Gable', u'Claudette Colbert', u'Walte...
291
8.1
The Return
UNRATED
Drama
105
[u'Vladimir Garin', u'Ivan Dobronravov', u'Kon...
523
7.8
Repulsion
UNRATED
Drama
105
[u'Catherine Deneuve', u'Ian Hendry', u'John F...
123
8.3
The General
UNRATED
Action
107
[u'Buster Keaton', u'Marion Mack', u'Glen Cave...
472
7.8
Guess Who's Coming to Dinner
UNRATED
Comedy
108
[u'Spencer Tracy', u'Sidney Poitier', u'Kathar...
757
7.6
Funny Games
UNRATED
Crime
108
[u'Susanne Lothar', u'Ulrich M\xfche', u'Arno ...
96
8.4
Yojimbo
UNRATED
Action
110
[u'Toshir\xf4 Mifune', u'Eijir\xf4 T\xf4no', u...
687
7.7
Spellbound
UNRATED
Film-Noir
111
[u'Ingrid Bergman', u'Gregory Peck', u'Michael...
582
7.8
Sabrina
UNRATED
Comedy
113
[u'Humphrey Bogart', u'Audrey Hepburn', u'Will...
823
7.5
Straw Dogs
UNRATED
Crime
113
[u'Dustin Hoffman', u'Susan George', u'Peter V...
239
8.1
Elite Squad: The Enemy Within
UNRATED
Action
115
[u'Wagner Moura', u'Irandhir Santos', u'Andr\x...
161
8.2
Diabolique
UNRATED
Horror
116
[u'Simone Signoret', u'V\xe9ra Clouzot', u'Pau...
317
8.0
Dawn of the Dead
UNRATED
Horror
127
[u'David Emge', u'Ken Foree', u'Scott H. Reini...
802
7.6
Tell No One
UNRATED
Crime
131
[u'Fran\xe7ois Cluzet', u'Marie-Jos\xe9e Croze...
217
8.1
Memories of Murder
UNRATED
Crime
132
[u'Kang-ho Song', u'Sang-kyung Kim', u'Roe-ha ...
262
8.1
The Hustler
UNRATED
Drama
134
[u'Paul Newman', u'Jackie Gleason', u'Piper La...
289
8.1
All Quiet on the Western Front
UNRATED
Drama
136
[u'Lew Ayres', u'Louis Wolheim', u'John Wray']
526
7.8
Zulu
UNRATED
Drama
138
[u'Stanley Baker', u'Jack Hawkins', u'Ulla Jac...
730
7.6
West Side Story
UNRATED
Crime
152
[u'Natalie Wood', u'George Chakiris', u'Richar...
257
8.1
Anatomy of a Murder
UNRATED
Crime
160
[u'James Stewart', u'Lee Remick', u'Ben Gazzara']
120
8.3
The Great Escape
UNRATED
Adventure
172
[u'Steve McQueen', u'James Garner', u'Richard ...
17
8.7
Seven Samurai
UNRATED
Drama
207
[u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K...
486
7.8
Evil Dead II
X
Comedy
84
[u'Bruce Campbell', u'Sarah Berry', u'Dan Hicks']
913
7.5
Suspiria
X
Horror
92
[u'Jessica Harper', u'Stefania Casini', u'Flav...
387
8.0
Midnight Cowboy
X
Drama
113
[u'Dustin Hoffman', u'Jon Voight', u'Sylvia Mi...
86
8.4
A Clockwork Orange
X
Crime
136
[u'Malcolm McDowell', u'Patrick Magee', u'Mich...
187
8.2
Butch Cassidy and the Sundance Kid
NaN
Biography
110
[u'Paul Newman', u'Robert Redford', u'Katharin...
936
7.4
True Grit
NaN
Adventure
128
[u'John Wayne', u'Kim Darby', u'Glen Campbell']
649
7.7
Where Eagles Dare
NaN
Action
158
[u'Richard Burton', u'Clint Eastwood', u'Mary ...
979 rows × 6 columns
In [72]:
movies = pd.read_csv('http://bit.ly/imdbratings')
In [73]:
movies.head()
Out[73]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
142
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
4
8.9
Pulp Fiction
R
Crime
154
[u'John Travolta', u'Uma Thurman', u'Samuel L....
In [74]:
movies.shape
Out[74]:
(979, 6)
In [78]:
movies[movies.duration >= 200]
Out[78]:
star_rating
title
content_rating
genre
duration
actors_list
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
7
8.9
The Lord of the Rings: The Return of the King
PG-13
Adventure
201
[u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...
17
8.7
Seven Samurai
UNRATED
Drama
207
[u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K...
78
8.4
Once Upon a Time in America
R
Crime
229
[u'Robert De Niro', u'James Woods', u'Elizabet...
85
8.4
Lawrence of Arabia
PG
Adventure
216
[u"Peter O'Toole", u'Alec Guinness', u'Anthony...
142
8.3
Lagaan: Once Upon a Time in India
PG
Adventure
224
[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...
157
8.2
Gone with the Wind
G
Drama
238
[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...
204
8.1
Ben-Hur
G
Adventure
212
[u'Charlton Heston', u'Jack Hawkins', u'Stephe...
445
7.9
The Ten Commandments
APPROVED
Adventure
220
[u'Charlton Heston', u'Yul Brynner', u'Anne Ba...
476
7.8
Hamlet
PG-13
Drama
242
[u'Kenneth Branagh', u'Julie Christie', u'Dere...
630
7.7
Malcolm X
PG-13
Biography
202
[u'Denzel Washington', u'Angela Bassett', u'De...
767
7.6
It's a Mad, Mad, Mad, Mad World
APPROVED
Action
205
[u'Spencer Tracy', u'Milton Berle', u'Ethel Me...
Tip: Filtered DataFrame is also a DataFrame
In [79]:
movies.loc[movies.duration >= 200, 'genre']
Out[79]:
2 Crime
7 Adventure
17 Drama
78 Crime
85 Adventure
142 Adventure
157 Drama
204 Adventure
445 Adventure
476 Drama
630 Biography
767 Action
Name: genre, dtype: object
In [80]:
movies = pd.read_csv('http://bit.ly/imdbratings')
In [81]:
movies.head()
Out[81]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
142
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
4
8.9
Pulp Fiction
R
Crime
154
[u'John Travolta', u'Uma Thurman', u'Samuel L....
In [82]:
movies[movies.duration >= 200]
Out[82]:
star_rating
title
content_rating
genre
duration
actors_list
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
7
8.9
The Lord of the Rings: The Return of the King
PG-13
Adventure
201
[u'Elijah Wood', u'Viggo Mortensen', u'Ian McK...
17
8.7
Seven Samurai
UNRATED
Drama
207
[u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K...
78
8.4
Once Upon a Time in America
R
Crime
229
[u'Robert De Niro', u'James Woods', u'Elizabet...
85
8.4
Lawrence of Arabia
PG
Adventure
216
[u"Peter O'Toole", u'Alec Guinness', u'Anthony...
142
8.3
Lagaan: Once Upon a Time in India
PG
Adventure
224
[u'Aamir Khan', u'Gracy Singh', u'Rachel Shell...
157
8.2
Gone with the Wind
G
Drama
238
[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...
204
8.1
Ben-Hur
G
Adventure
212
[u'Charlton Heston', u'Jack Hawkins', u'Stephe...
445
7.9
The Ten Commandments
APPROVED
Adventure
220
[u'Charlton Heston', u'Yul Brynner', u'Anne Ba...
476
7.8
Hamlet
PG-13
Drama
242
[u'Kenneth Branagh', u'Julie Christie', u'Dere...
630
7.7
Malcolm X
PG-13
Biography
202
[u'Denzel Washington', u'Angela Bassett', u'De...
767
7.6
It's a Mad, Mad, Mad, Mad World
APPROVED
Action
205
[u'Spencer Tracy', u'Milton Berle', u'Ethel Me...
In [88]:
movies[(movies.duration >= 200) & (movies.genre == 'Drama')]
Out[88]:
star_rating
title
content_rating
genre
duration
actors_list
17
8.7
Seven Samurai
UNRATED
Drama
207
[u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K...
157
8.2
Gone with the Wind
G
Drama
238
[u'Clark Gable', u'Vivien Leigh', u'Thomas Mit...
476
7.8
Hamlet
PG-13
Drama
242
[u'Kenneth Branagh', u'Julie Christie', u'Dere...
Tip: Multiple condition on a single column
In [89]:
movies[movies.genre.isin(['Crime', 'Drama', 'Action'])]
Out[89]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
142
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
4
8.9
Pulp Fiction
R
Crime
154
[u'John Travolta', u'Uma Thurman', u'Samuel L....
5
8.9
12 Angry Men
NOT RATED
Drama
96
[u'Henry Fonda', u'Lee J. Cobb', u'Martin Bals...
9
8.9
Fight Club
R
Drama
139
[u'Brad Pitt', u'Edward Norton', u'Helena Bonh...
11
8.8
Inception
PG-13
Action
148
[u'Leonardo DiCaprio', u'Joseph Gordon-Levitt'...
12
8.8
Star Wars: Episode V - The Empire Strikes Back
PG
Action
124
[u'Mark Hamill', u'Harrison Ford', u'Carrie Fi...
13
8.8
Forrest Gump
PG-13
Drama
142
[u'Tom Hanks', u'Robin Wright', u'Gary Sinise']
16
8.7
One Flew Over the Cuckoo's Nest
R
Drama
133
[u'Jack Nicholson', u'Louise Fletcher', u'Mich...
17
8.7
Seven Samurai
UNRATED
Drama
207
[u'Toshir\xf4 Mifune', u'Takashi Shimura', u'K...
19
8.7
Star Wars
PG
Action
121
[u'Mark Hamill', u'Harrison Ford', u'Carrie Fi...
20
8.7
The Matrix
R
Action
136
[u'Keanu Reeves', u'Laurence Fishburne', u'Car...
21
8.7
City of God
R
Crime
130
[u'Alexandre Rodrigues', u'Matheus Nachtergael...
22
8.7
It's a Wonderful Life
APPROVED
Drama
130
[u'James Stewart', u'Donna Reed', u'Lionel Bar...
23
8.7
The Usual Suspects
R
Crime
106
[u'Kevin Spacey', u'Gabriel Byrne', u'Chazz Pa...
24
8.7
Se7en
R
Drama
127
[u'Morgan Freeman', u'Brad Pitt', u'Kevin Spac...
27
8.6
The Silence of the Lambs
R
Drama
118
[u'Jodie Foster', u'Anthony Hopkins', u'Lawren...
28
8.6
Leon: The Professional
R
Crime
110
[u'Jean Reno', u'Gary Oldman', u'Natalie Portm...
32
8.6
Casablanca
PG
Drama
102
[u'Humphrey Bogart', u'Ingrid Bergman', u'Paul...
33
8.6
Whiplash
R
Drama
107
[u'Miles Teller', u'J.K. Simmons', u'Melissa B...
34
8.6
American History X
R
Crime
119
[u'Edward Norton', u'Edward Furlong', u"Beverl...
36
8.6
Saving Private Ryan
R
Action
169
[u'Tom Hanks', u'Matt Damon', u'Tom Sizemore']
37
8.6
Raiders of the Lost Ark
PG
Action
115
[u'Harrison Ford', u'Karen Allen', u'Paul Free...
40
8.5
The Green Mile
R
Crime
189
[u'Tom Hanks', u'Michael Clarke Duncan', u'Dav...
41
8.5
Sunset Blvd.
NOT RATED
Drama
110
[u'William Holden', u'Gloria Swanson', u'Erich...
43
8.5
The Dark Knight Rises
PG-13
Action
165
[u'Christian Bale', u'Tom Hardy', u'Anne Hatha...
44
8.5
Gladiator
R
Action
155
[u'Russell Crowe', u'Joaquin Phoenix', u'Conni...
45
8.5
Terminator 2: Judgment Day
R
Action
137
[u'Arnold Schwarzenegger', u'Linda Hamilton', ...
...
...
...
...
...
...
...
916
7.5
Up in the Air
R
Drama
109
[u'George Clooney', u'Vera Farmiga', u'Anna Ke...
917
7.5
Begin Again
R
Drama
104
[u'Keira Knightley', u'Mark Ruffalo', u'Adam L...
918
7.5
Running Scared
R
Action
122
[u'Paul Walker', u'Cameron Bright', u'Chazz Pa...
920
7.5
Witness
R
Crime
112
[u'Harrison Ford', u'Kelly McGillis', u'Lukas ...
922
7.5
Mud
PG-13
Drama
130
[u'Matthew McConaughey', u'Tye Sheridan', u'Ja...
923
7.5
Across the Universe
PG-13
Drama
133
[u'Evan Rachel Wood', u'Jim Sturgess', u'Joe A...
924
7.5
Les Miserables
PG-13
Crime
134
[u'Liam Neeson', u'Geoffrey Rush', u'Uma Thurm...
925
7.5
Notes on a Scandal
R
Drama
92
[u'Cate Blanchett', u'Judi Dench', u'Andrew Si...
926
7.5
Inside Llewyn Davis
R
Drama
104
[u'Oscar Isaac', u'Carey Mulligan', u'John Goo...
927
7.5
Brick
R
Crime
110
[u'Joseph Gordon-Levitt', u'Lukas Haas', u'Emi...
931
7.4
Mean Streets
R
Crime
112
[u'Robert De Niro', u'Harvey Keitel', u'David ...
939
7.4
Predestination
R
Drama
97
[u'Ethan Hawke', u'Sarah Snook', u'Noah Taylor']
941
7.4
A Bridge Too Far
PG
Drama
175
[u'Sean Connery', u"Ryan O'Neal", u'Michael Ca...
945
7.4
Take Shelter
R
Drama
120
[u'Michael Shannon', u'Jessica Chastain', u'Sh...
946
7.4
Far from Heaven
PG-13
Drama
107
[u'Julianne Moore', u'Dennis Quaid', u'Dennis ...
947
7.4
Eraserhead
UNRATED
Drama
89
[u'Jack Nance', u'Charlotte Stewart', u'Allen ...
950
7.4
Bound
R
Crime
108
[u'Jennifer Tilly', u'Gina Gershon', u'Joe Pan...
951
7.4
Sleepy Hollow
R
Drama
105
[u'Johnny Depp', u'Christina Ricci', u'Miranda...
954
7.4
X-Men
PG-13
Action
104
[u'Patrick Stewart', u'Hugh Jackman', u'Ian Mc...
955
7.4
Zero Dark Thirty
R
Drama
157
[u'Jessica Chastain', u'Joel Edgerton', u'Chri...
958
7.4
My Sister's Keeper
PG-13
Drama
109
[u'Cameron Diaz', u'Abigail Breslin', u'Alec B...
963
7.4
La Femme Nikita
R
Action
118
[u'Anne Parillaud', u'Marc Duret', u'Patrick F...
967
7.4
The Rock
R
Action
136
[u'Sean Connery', u'Nicolas Cage', u'Ed Harris']
968
7.4
The English Patient
R
Drama
162
[u'Ralph Fiennes', u'Juliette Binoche', u'Will...
969
7.4
Law Abiding Citizen
R
Crime
109
[u'Gerard Butler', u'Jamie Foxx', u'Leslie Bibb']
970
7.4
Wonder Boys
R
Drama
107
[u'Michael Douglas', u'Tobey Maguire', u'Franc...
972
7.4
Blue Valentine
NC-17
Drama
112
[u'Ryan Gosling', u'Michelle Williams', u'John...
973
7.4
The Cider House Rules
PG-13
Drama
126
[u'Tobey Maguire', u'Charlize Theron', u'Micha...
976
7.4
Master and Commander: The Far Side of the World
PG-13
Action
138
[u'Russell Crowe', u'Paul Bettany', u'Billy Bo...
978
7.4
Wall Street
R
Crime
126
[u'Charlie Sheen', u'Michael Douglas', u'Tamar...
538 rows × 6 columns
In [90]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
In [92]:
drinks.dtypes
Out[92]:
country object
beer_servings int64
spirit_servings int64
wine_servings int64
total_litres_of_pure_alcohol float64
continent object
dtype: object
In [93]:
drinks.beer_servings = drinks.beer_servings.astype(float)
In [94]:
drinks.dtypes
Out[94]:
country object
beer_servings float64
spirit_servings int64
wine_servings int64
total_litres_of_pure_alcohol float64
continent object
dtype: object
In [95]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry', dtype={'beer_servings': float})
In [96]:
drinks.dtypes
Out[96]:
country object
beer_servings float64
spirit_servings int64
wine_servings int64
total_litres_of_pure_alcohol float64
continent object
dtype: object
In [97]:
orders = pd.read_table('http://bit.ly/chiporders')
In [98]:
orders.head()
Out[98]:
order_id
quantity
item_name
choice_description
item_price
0
1
1
Chips and Fresh Tomato Salsa
NaN
$2.39
1
1
1
Izze
[Clementine]
$3.39
2
1
1
Nantucket Nectar
[Apple]
$3.39
3
1
1
Chips and Tomatillo-Green Chili Salsa
NaN
$2.39
4
2
2
Chicken Bowl
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
$16.98
In [102]:
orders.dtypes
Out[102]:
order_id int64
quantity int64
item_name object
choice_description object
item_price float64
dtype: object
In [100]:
orders.item_price = orders.item_price.str.replace('$', '').astype(float)
In [101]:
orders.head()
Out[101]:
order_id
quantity
item_name
choice_description
item_price
0
1
1
Chips and Fresh Tomato Salsa
NaN
2.39
1
1
1
Izze
[Clementine]
3.39
2
1
1
Nantucket Nectar
[Apple]
3.39
3
1
1
Chips and Tomatillo-Green Chili Salsa
NaN
2.39
4
2
2
Chicken Bowl
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
16.98
In [105]:
orders.item_name.str.contains('Chicken').astype(int).head()
Out[105]:
0 0
1 0
2 0
3 0
4 1
Name: item_name, dtype: int64
In [106]:
movies = pd.read_csv('http://bit.ly/imdbratings')
In [107]:
movies.head()
Out[107]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
142
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
4
8.9
Pulp Fiction
R
Crime
154
[u'John Travolta', u'Uma Thurman', u'Samuel L....
In [108]:
movies.dtypes
Out[108]:
star_rating float64
title object
content_rating object
genre object
duration int64
actors_list object
dtype: object
In [110]:
movies.genre.describe()
Out[110]:
count 979
unique 16
top Drama
freq 278
Name: genre, dtype: object
In [111]:
movies.genre.value_counts()
Out[111]:
Drama 278
Comedy 156
Action 136
Crime 124
Biography 77
Adventure 75
Animation 62
Horror 29
Mystery 16
Western 9
Thriller 5
Sci-Fi 5
Film-Noir 3
Family 2
Fantasy 1
History 1
Name: genre, dtype: int64
In [112]:
movies.genre.value_counts(normalize=True)
Out[112]:
Drama 0.283963
Comedy 0.159346
Action 0.138917
Crime 0.126660
Biography 0.078652
Adventure 0.076609
Animation 0.063330
Horror 0.029622
Mystery 0.016343
Western 0.009193
Thriller 0.005107
Sci-Fi 0.005107
Film-Noir 0.003064
Family 0.002043
Fantasy 0.001021
History 0.001021
Name: genre, dtype: float64
In [113]:
movies.genre.unique()
Out[113]:
array(['Crime', 'Action', 'Drama', 'Western', 'Adventure', 'Biography',
'Comedy', 'Animation', 'Mystery', 'Horror', 'Film-Noir', 'Sci-Fi',
'History', 'Thriller', 'Family', 'Fantasy'], dtype=object)
In [114]:
movies.genre.nunique()
Out[114]:
16
In [116]:
pd.crosstab(movies.genre, movies.content_rating)
Out[116]:
content_rating
APPROVED
G
GP
NC-17
NOT RATED
PASSED
PG
PG-13
R
TV-MA
UNRATED
X
genre
Action
3
1
1
0
4
1
11
44
67
0
3
0
Adventure
3
2
0
0
5
1
21
23
17
0
2
0
Animation
3
20
0
0
3
0
25
5
5
0
1
0
Biography
1
2
1
0
1
0
6
29
36
0
0
0
Comedy
9
2
1
1
16
3
23
23
73
0
4
1
Crime
6
0
0
1
7
1
6
4
87
0
11
1
Drama
12
3
0
4
24
1
25
55
143
1
9
1
Family
0
1
0
0
0
0
1
0
0
0
0
0
Fantasy
0
0
0
0
0
0
0
0
1
0
0
0
Film-Noir
1
0
0
0
1
0
0
0
0
0
1
0
History
0
0
0
0
0
0
0
0
0
0
1
0
Horror
2
0
0
1
1
0
1
2
16
0
5
1
Mystery
4
1
0
0
1
0
1
2
6
0
1
0
Sci-Fi
1
0
0
0
0
0
0
1
3
0
0
0
Thriller
1
0
0
0
0
0
1
0
3
0
0
0
Western
1
0
0
0
2
0
2
1
3
0
0
0
In [117]:
movies.duration.describe()
Out[117]:
count 979.000000
mean 120.979571
std 26.218010
min 64.000000
25% 102.000000
50% 117.000000
75% 134.000000
max 242.000000
Name: duration, dtype: float64
In [119]:
movies.duration.mean()
Out[119]:
120.97957099080695
Tip: Visualisation
In [120]:
%matplotlib inline
In [121]:
movies.duration.plot(kind='hist')
Out[121]:
<matplotlib.axes._subplots.AxesSubplot at 0x116aea150>
In [122]:
movies.genre.value_counts().plot(kind='bar')
Out[122]:
<matplotlib.axes._subplots.AxesSubplot at 0x116b06c90>
In [123]:
ufo = pd.read_csv('http://bit.ly/uforeports')
In [124]:
ufo.tail()
Out[124]:
City
Colors Reported
Shape Reported
State
Time
18236
Grant Park
NaN
TRIANGLE
IL
12/31/2000 23:00
18237
Spirit Lake
NaN
DISK
IA
12/31/2000 23:00
18238
Eagle River
NaN
NaN
WI
12/31/2000 23:45
18239
Eagle River
RED
LIGHT
WI
12/31/2000 23:45
18240
Ybor
NaN
OVAL
FL
12/31/2000 23:59
In [126]:
ufo.isnull().tail()
Out[126]:
City
Colors Reported
Shape Reported
State
Time
18236
False
True
False
False
False
18237
False
True
False
False
False
18238
False
True
True
False
False
18239
False
False
False
False
False
18240
False
True
False
False
False
In [127]:
ufo.notnull().tail()
Out[127]:
City
Colors Reported
Shape Reported
State
Time
18236
True
False
True
True
True
18237
True
False
True
True
True
18238
True
False
False
True
True
18239
True
True
True
True
True
18240
True
False
True
True
True
In [129]:
ufo.isnull().sum()
Out[129]:
City 25
Colors Reported 15359
Shape Reported 2644
State 0
Time 0
dtype: int64
In [131]:
ufo[ufo.City.isnull()]
Out[131]:
City
Colors Reported
Shape Reported
State
Time
21
NaN
NaN
NaN
LA
8/15/1943 0:00
22
NaN
NaN
LIGHT
LA
8/15/1943 0:00
204
NaN
NaN
DISK
CA
7/15/1952 12:30
241
NaN
BLUE
DISK
MT
7/4/1953 14:00
613
NaN
NaN
DISK
NV
7/1/1960 12:00
1877
NaN
YELLOW
CIRCLE
AZ
8/15/1969 1:00
2013
NaN
NaN
NaN
NH
8/1/1970 9:30
2546
NaN
NaN
FIREBALL
OH
10/25/1973 23:30
3123
NaN
RED
TRIANGLE
WV
11/25/1975 23:00
4736
NaN
NaN
SPHERE
CA
6/23/1982 23:00
5269
NaN
NaN
NaN
AZ
6/30/1985 21:30
6735
NaN
NaN
FORMATION
TX
4/1/1992 2:00
7208
NaN
NaN
CIRCLE
MI
10/4/1993 17:30
8828
NaN
NaN
TRIANGLE
WA
10/30/1995 21:30
8967
NaN
NaN
VARIOUS
CA
12/8/1995 18:00
9273
NaN
NaN
TRIANGLE
OH
5/1/1996 3:00
9388
NaN
NaN
OVAL
CA
6/12/1996 12:00
9587
NaN
NaN
EGG
FL
8/24/1996 15:00
10399
NaN
NaN
TRIANGLE
IL
6/15/1997 23:00
11625
NaN
NaN
CIRCLE
TX
6/7/1998 7:00
12441
NaN
RED
FIREBALL
WA
10/26/1998 17:58
15767
NaN
NaN
RECTANGLE
NV
1/21/2000 11:30
15812
NaN
NaN
LIGHT
NV
2/2/2000 3:00
16054
NaN
GREEN
NaN
FL
3/11/2000 3:30
16608
NaN
NaN
SPHERE
NY
6/15/2000 15:00
In [132]:
ufo.shape
Out[132]:
(18241, 5)
In [133]:
ufo.dropna(how='any').shape
Out[133]:
(2486, 5)
In [134]:
ufo.dropna(how='all').shape
Out[134]:
(18241, 5)
In [135]:
ufo.dropna(subset=['City', 'Shape Reported'], how='any').shape
Out[135]:
(15576, 5)
In [137]:
ufo['Shape Reported'].value_counts(dropna=False)
Out[137]:
LIGHT 2803
NaN 2644
DISK 2122
TRIANGLE 1889
OTHER 1402
CIRCLE 1365
SPHERE 1054
FIREBALL 1039
OVAL 845
CIGAR 617
FORMATION 434
VARIOUS 333
RECTANGLE 303
CYLINDER 294
CHEVRON 248
DIAMOND 234
EGG 197
FLASH 188
TEARDROP 119
CONE 60
CROSS 36
DELTA 7
ROUND 2
CRESCENT 2
DOME 1
PYRAMID 1
HEXAGON 1
FLARE 1
Name: Shape Reported, dtype: int64
In [138]:
ufo['Shape Reported'].fillna(value='VARIOUS', inplace=True)
In [139]:
ufo['Shape Reported'].value_counts()
Out[139]:
VARIOUS 2977
LIGHT 2803
DISK 2122
TRIANGLE 1889
OTHER 1402
CIRCLE 1365
SPHERE 1054
FIREBALL 1039
OVAL 845
CIGAR 617
FORMATION 434
RECTANGLE 303
CYLINDER 294
CHEVRON 248
DIAMOND 234
EGG 197
FLASH 188
TEARDROP 119
CONE 60
CROSS 36
DELTA 7
ROUND 2
CRESCENT 2
PYRAMID 1
HEXAGON 1
DOME 1
FLARE 1
Name: Shape Reported, dtype: int64
In [3]:
train = pd.read_csv('http://bit.ly/kaggletrain')
In [4]:
train.head()
Out[4]:
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
In [5]:
train['Sex_num'] = train.Sex.map({'female': 0, 'male': 1})
In [6]:
train.loc[0:4, ['Sex', 'Sex_num']]
Out[6]:
Sex
Sex_num
0
male
1
1
female
0
2
female
0
3
female
0
4
male
1
In [7]:
train['Name_length'] = train.Name.apply(len)
In [9]:
train.loc[0:4, ['Name', 'Name_length']]
Out[9]:
Name
Name_length
0
Braund, Mr. Owen Harris
23
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
51
2
Heikkinen, Miss. Laina
22
3
Futrelle, Mrs. Jacques Heath (Lily May Peel)
44
4
Allen, Mr. William Henry
24
In [10]:
import numpy as np
In [11]:
train['Fare_ceil'] = train.Fare.apply(np.ceil)
In [12]:
train.loc[0:4, ['Fare', 'Fare_ceil']]
Out[12]:
Fare
Fare_ceil
0
7.2500
8.0
1
71.2833
72.0
2
7.9250
8.0
3
53.1000
54.0
4
8.0500
9.0
In [14]:
train.Name.str.split(',').head()
Out[14]:
0 [Braund, Mr. Owen Harris]
1 [Cumings, Mrs. John Bradley (Florence Briggs ...
2 [Heikkinen, Miss. Laina]
3 [Futrelle, Mrs. Jacques Heath (Lily May Peel)]
4 [Allen, Mr. William Henry]
Name: Name, dtype: object
In [15]:
def get_element(my_list, position):
return my_list[position]
In [16]:
train['Last_name'] = train.Name.str.split(',').apply(get_element, position=0)
In [19]:
train['Last_name'] = train.Name.str.split(',').apply(lambda x: x[0])
In [20]:
train.loc[0:4, ['Name', 'Last_name']]
Out[20]:
Name
Last_name
0
Braund, Mr. Owen Harris
Braund
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
Cumings
2
Heikkinen, Miss. Laina
Heikkinen
3
Futrelle, Mrs. Jacques Heath (Lily May Peel)
Futrelle
4
Allen, Mr. William Henry
Allen
In [22]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks.head()
Out[22]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
In [27]:
drinks.loc[:, 'beer_servings':'wine_servings'].apply(np.argmax, axis=1)
Out[27]:
0 beer_servings
1 spirit_servings
2 beer_servings
3 wine_servings
4 beer_servings
5 spirit_servings
6 wine_servings
7 spirit_servings
8 beer_servings
9 beer_servings
10 spirit_servings
11 spirit_servings
12 spirit_servings
13 beer_servings
14 spirit_servings
15 spirit_servings
16 beer_servings
17 beer_servings
18 beer_servings
19 beer_servings
20 beer_servings
21 spirit_servings
22 beer_servings
23 beer_servings
24 beer_servings
25 spirit_servings
26 beer_servings
27 beer_servings
28 beer_servings
29 beer_servings
...
163 spirit_servings
164 beer_servings
165 wine_servings
166 wine_servings
167 spirit_servings
168 spirit_servings
169 spirit_servings
170 beer_servings
171 wine_servings
172 beer_servings
173 beer_servings
174 beer_servings
175 beer_servings
176 beer_servings
177 spirit_servings
178 spirit_servings
179 beer_servings
180 spirit_servings
181 spirit_servings
182 beer_servings
183 beer_servings
184 beer_servings
185 wine_servings
186 spirit_servings
187 beer_servings
188 beer_servings
189 beer_servings
190 beer_servings
191 beer_servings
192 beer_servings
dtype: object
In [28]:
drinks.loc[:, 'beer_servings':'wine_servings'].applymap(float)
Out[28]:
beer_servings
spirit_servings
wine_servings
0
0.0
0.0
0.0
1
89.0
132.0
54.0
2
25.0
0.0
14.0
3
245.0
138.0
312.0
4
217.0
57.0
45.0
5
102.0
128.0
45.0
6
193.0
25.0
221.0
7
21.0
179.0
11.0
8
261.0
72.0
212.0
9
279.0
75.0
191.0
10
21.0
46.0
5.0
11
122.0
176.0
51.0
12
42.0
63.0
7.0
13
0.0
0.0
0.0
14
143.0
173.0
36.0
15
142.0
373.0
42.0
16
295.0
84.0
212.0
17
263.0
114.0
8.0
18
34.0
4.0
13.0
19
23.0
0.0
0.0
20
167.0
41.0
8.0
21
76.0
173.0
8.0
22
173.0
35.0
35.0
23
245.0
145.0
16.0
24
31.0
2.0
1.0
25
231.0
252.0
94.0
26
25.0
7.0
7.0
27
88.0
0.0
0.0
28
37.0
1.0
7.0
29
144.0
56.0
16.0
...
...
...
...
163
128.0
178.0
7.0
164
90.0
2.0
2.0
165
152.0
60.0
186.0
166
185.0
100.0
280.0
167
5.0
35.0
16.0
168
2.0
15.0
0.0
169
99.0
258.0
1.0
170
106.0
27.0
86.0
171
1.0
1.0
4.0
172
36.0
2.0
19.0
173
36.0
21.0
5.0
174
197.0
156.0
7.0
175
51.0
3.0
20.0
176
51.0
22.0
7.0
177
19.0
71.0
32.0
178
6.0
41.0
9.0
179
45.0
9.0
0.0
180
206.0
237.0
45.0
181
16.0
135.0
5.0
182
219.0
126.0
195.0
183
36.0
6.0
1.0
184
249.0
158.0
84.0
185
115.0
35.0
220.0
186
25.0
101.0
8.0
187
21.0
18.0
11.0
188
333.0
100.0
3.0
189
111.0
2.0
1.0
190
6.0
0.0
0.0
191
32.0
19.0
4.0
192
64.0
18.0
4.0
193 rows × 3 columns
In [31]:
movies = pd.read_csv('http://bit.ly/imdbratings')
movies.head()
Out[31]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
142
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
4
8.9
Pulp Fiction
R
Crime
154
[u'John Travolta', u'Uma Thurman', u'Samuel L....
In [33]:
movies.content_rating.isnull().sum()
Out[33]:
3
In [34]:
movies[movies.content_rating.isnull()]
Out[34]:
star_rating
title
content_rating
genre
duration
actors_list
187
8.2
Butch Cassidy and the Sundance Kid
NaN
Biography
110
[u'Paul Newman', u'Robert Redford', u'Katharin...
649
7.7
Where Eagles Dare
NaN
Action
158
[u'Richard Burton', u'Clint Eastwood', u'Mary ...
936
7.4
True Grit
NaN
Adventure
128
[u'John Wayne', u'Kim Darby', u'Glen Campbell']
In [35]:
movies.content_rating.value_counts()
Out[35]:
R 460
PG-13 189
PG 123
NOT RATED 65
APPROVED 47
UNRATED 38
G 32
PASSED 7
NC-17 7
X 4
GP 3
TV-MA 1
Name: content_rating, dtype: int64
In [38]:
movies.loc[movies.content_rating == 'NOT RATED', 'content_rating'] = np.nan
In [39]:
movies.content_rating.isnull().sum()
Out[39]:
68
In [41]:
top_movies = movies.loc[movies.star_rating >= 9, :]
top_movies
Out[41]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
142
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
In [42]:
top_movies.loc[0, 'duration'] = 150
/usr/local/lib/python2.7/site-packages/pandas/core/indexing.py:476: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
In [43]:
top_movies
Out[43]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
150
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
In [44]:
top_movies = movies.loc[movies.star_rating >= 9, :].copy()
top_movies
Out[44]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
142
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
In [46]:
top_movies.loc[0, 'duration'] = 150
top_movies
Out[46]:
star_rating
title
content_rating
genre
duration
actors_list
0
9.3
The Shawshank Redemption
R
Crime
150
[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunt...
1
9.2
The Godfather
R
Crime
175
[u'Marlon Brando', u'Al Pacino', u'James Caan']
2
9.1
The Godfather: Part II
R
Crime
200
[u'Al Pacino', u'Robert De Niro', u'Robert Duv...
3
9.0
The Dark Knight
PG-13
Action
152
[u'Christian Bale', u'Heath Ledger', u'Aaron E...
In [48]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks
Out[48]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
5
Antigua & Barbuda
102
128
45
4.9
North America
6
Argentina
193
25
221
8.3
South America
7
Armenia
21
179
11
3.8
Europe
8
Australia
261
72
212
10.4
Oceania
9
Austria
279
75
191
9.7
Europe
10
Azerbaijan
21
46
5
1.3
Europe
11
Bahamas
122
176
51
6.3
North America
12
Bahrain
42
63
7
2.0
Asia
13
Bangladesh
0
0
0
0.0
Asia
14
Barbados
143
173
36
6.3
North America
15
Belarus
142
373
42
14.4
Europe
16
Belgium
295
84
212
10.5
Europe
17
Belize
263
114
8
6.8
North America
18
Benin
34
4
13
1.1
Africa
19
Bhutan
23
0
0
0.4
Asia
20
Bolivia
167
41
8
3.8
South America
21
Bosnia-Herzegovina
76
173
8
4.6
Europe
22
Botswana
173
35
35
5.4
Africa
23
Brazil
245
145
16
7.2
South America
24
Brunei
31
2
1
0.6
Asia
25
Bulgaria
231
252
94
10.3
Europe
26
Burkina Faso
25
7
7
4.3
Africa
27
Burundi
88
0
0
6.3
Africa
28
Cote d'Ivoire
37
1
7
4.0
Africa
29
Cabo Verde
144
56
16
4.0
Africa
...
...
...
...
...
...
...
163
Suriname
128
178
7
5.6
South America
164
Swaziland
90
2
2
4.7
Africa
165
Sweden
152
60
186
7.2
Europe
166
Switzerland
185
100
280
10.2
Europe
167
Syria
5
35
16
1.0
Asia
168
Tajikistan
2
15
0
0.3
Asia
169
Thailand
99
258
1
6.4
Asia
170
Macedonia
106
27
86
3.9
Europe
171
Timor-Leste
1
1
4
0.1
Asia
172
Togo
36
2
19
1.3
Africa
173
Tonga
36
21
5
1.1
Oceania
174
Trinidad & Tobago
197
156
7
6.4
North America
175
Tunisia
51
3
20
1.3
Africa
176
Turkey
51
22
7
1.4
Asia
177
Turkmenistan
19
71
32
2.2
Asia
178
Tuvalu
6
41
9
1.0
Oceania
179
Uganda
45
9
0
8.3
Africa
180
Ukraine
206
237
45
8.9
Europe
181
United Arab Emirates
16
135
5
2.8
Asia
182
United Kingdom
219
126
195
10.4
Europe
183
Tanzania
36
6
1
5.7
Africa
184
USA
249
158
84
8.7
North America
185
Uruguay
115
35
220
6.6
South America
186
Uzbekistan
25
101
8
2.4
Asia
187
Vanuatu
21
18
11
0.9
Oceania
188
Venezuela
333
100
3
7.7
South America
189
Vietnam
111
2
1
2.0
Asia
190
Yemen
6
0
0
0.1
Asia
191
Zambia
32
19
4
2.5
Africa
192
Zimbabwe
64
18
4
4.7
Africa
193 rows × 6 columns
In [49]:
pd.get_option('display.max_rows')
Out[49]:
60
In [50]:
pd.set_option('display.max_rows', None)
In [51]:
drinks
Out[51]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
5
Antigua & Barbuda
102
128
45
4.9
North America
6
Argentina
193
25
221
8.3
South America
7
Armenia
21
179
11
3.8
Europe
8
Australia
261
72
212
10.4
Oceania
9
Austria
279
75
191
9.7
Europe
10
Azerbaijan
21
46
5
1.3
Europe
11
Bahamas
122
176
51
6.3
North America
12
Bahrain
42
63
7
2.0
Asia
13
Bangladesh
0
0
0
0.0
Asia
14
Barbados
143
173
36
6.3
North America
15
Belarus
142
373
42
14.4
Europe
16
Belgium
295
84
212
10.5
Europe
17
Belize
263
114
8
6.8
North America
18
Benin
34
4
13
1.1
Africa
19
Bhutan
23
0
0
0.4
Asia
20
Bolivia
167
41
8
3.8
South America
21
Bosnia-Herzegovina
76
173
8
4.6
Europe
22
Botswana
173
35
35
5.4
Africa
23
Brazil
245
145
16
7.2
South America
24
Brunei
31
2
1
0.6
Asia
25
Bulgaria
231
252
94
10.3
Europe
26
Burkina Faso
25
7
7
4.3
Africa
27
Burundi
88
0
0
6.3
Africa
28
Cote d'Ivoire
37
1
7
4.0
Africa
29
Cabo Verde
144
56
16
4.0
Africa
30
Cambodia
57
65
1
2.2
Asia
31
Cameroon
147
1
4
5.8
Africa
32
Canada
240
122
100
8.2
North America
33
Central African Republic
17
2
1
1.8
Africa
34
Chad
15
1
1
0.4
Africa
35
Chile
130
124
172
7.6
South America
36
China
79
192
8
5.0
Asia
37
Colombia
159
76
3
4.2
South America
38
Comoros
1
3
1
0.1
Africa
39
Congo
76
1
9
1.7
Africa
40
Cook Islands
0
254
74
5.9
Oceania
41
Costa Rica
149
87
11
4.4
North America
42
Croatia
230
87
254
10.2
Europe
43
Cuba
93
137
5
4.2
North America
44
Cyprus
192
154
113
8.2
Europe
45
Czech Republic
361
170
134
11.8
Europe
46
North Korea
0
0
0
0.0
Asia
47
DR Congo
32
3
1
2.3
Africa
48
Denmark
224
81
278
10.4
Europe
49
Djibouti
15
44
3
1.1
Africa
50
Dominica
52
286
26
6.6
North America
51
Dominican Republic
193
147
9
6.2
North America
52
Ecuador
162
74
3
4.2
South America
53
Egypt
6
4
1
0.2
Africa
54
El Salvador
52
69
2
2.2
North America
55
Equatorial Guinea
92
0
233
5.8
Africa
56
Eritrea
18
0
0
0.5
Africa
57
Estonia
224
194
59
9.5
Europe
58
Ethiopia
20
3
0
0.7
Africa
59
Fiji
77
35
1
2.0
Oceania
60
Finland
263
133
97
10.0
Europe
61
France
127
151
370
11.8
Europe
62
Gabon
347
98
59
8.9
Africa
63
Gambia
8
0
1
2.4
Africa
64
Georgia
52
100
149
5.4
Europe
65
Germany
346
117
175
11.3
Europe
66
Ghana
31
3
10
1.8
Africa
67
Greece
133
112
218
8.3
Europe
68
Grenada
199
438
28
11.9
North America
69
Guatemala
53
69
2
2.2
North America
70
Guinea
9
0
2
0.2
Africa
71
Guinea-Bissau
28
31
21
2.5
Africa
72
Guyana
93
302
1
7.1
South America
73
Haiti
1
326
1
5.9
North America
74
Honduras
69
98
2
3.0
North America
75
Hungary
234
215
185
11.3
Europe
76
Iceland
233
61
78
6.6
Europe
77
India
9
114
0
2.2
Asia
78
Indonesia
5
1
0
0.1
Asia
79
Iran
0
0
0
0.0
Asia
80
Iraq
9
3
0
0.2
Asia
81
Ireland
313
118
165
11.4
Europe
82
Israel
63
69
9
2.5
Asia
83
Italy
85
42
237
6.5
Europe
84
Jamaica
82
97
9
3.4
North America
85
Japan
77
202
16
7.0
Asia
86
Jordan
6
21
1
0.5
Asia
87
Kazakhstan
124
246
12
6.8
Asia
88
Kenya
58
22
2
1.8
Africa
89
Kiribati
21
34
1
1.0
Oceania
90
Kuwait
0
0
0
0.0
Asia
91
Kyrgyzstan
31
97
6
2.4
Asia
92
Laos
62
0
123
6.2
Asia
93
Latvia
281
216
62
10.5
Europe
94
Lebanon
20
55
31
1.9
Asia
95
Lesotho
82
29
0
2.8
Africa
96
Liberia
19
152
2
3.1
Africa
97
Libya
0
0
0
0.0
Africa
98
Lithuania
343
244
56
12.9
Europe
99
Luxembourg
236
133
271
11.4
Europe
100
Madagascar
26
15
4
0.8
Africa
101
Malawi
8
11
1
1.5
Africa
102
Malaysia
13
4
0
0.3
Asia
103
Maldives
0
0
0
0.0
Asia
104
Mali
5
1
1
0.6
Africa
105
Malta
149
100
120
6.6
Europe
106
Marshall Islands
0
0
0
0.0
Oceania
107
Mauritania
0
0
0
0.0
Africa
108
Mauritius
98
31
18
2.6
Africa
109
Mexico
238
68
5
5.5
North America
110
Micronesia
62
50
18
2.3
Oceania
111
Monaco
0
0
0
0.0
Europe
112
Mongolia
77
189
8
4.9
Asia
113
Montenegro
31
114
128
4.9
Europe
114
Morocco
12
6
10
0.5
Africa
115
Mozambique
47
18
5
1.3
Africa
116
Myanmar
5
1
0
0.1
Asia
117
Namibia
376
3
1
6.8
Africa
118
Nauru
49
0
8
1.0
Oceania
119
Nepal
5
6
0
0.2
Asia
120
Netherlands
251
88
190
9.4
Europe
121
New Zealand
203
79
175
9.3
Oceania
122
Nicaragua
78
118
1
3.5
North America
123
Niger
3
2
1
0.1
Africa
124
Nigeria
42
5
2
9.1
Africa
125
Niue
188
200
7
7.0
Oceania
126
Norway
169
71
129
6.7
Europe
127
Oman
22
16
1
0.7
Asia
128
Pakistan
0
0
0
0.0
Asia
129
Palau
306
63
23
6.9
Oceania
130
Panama
285
104
18
7.2
North America
131
Papua New Guinea
44
39
1
1.5
Oceania
132
Paraguay
213
117
74
7.3
South America
133
Peru
163
160
21
6.1
South America
134
Philippines
71
186
1
4.6
Asia
135
Poland
343
215
56
10.9
Europe
136
Portugal
194
67
339
11.0
Europe
137
Qatar
1
42
7
0.9
Asia
138
South Korea
140
16
9
9.8
Asia
139
Moldova
109
226
18
6.3
Europe
140
Romania
297
122
167
10.4
Europe
141
Russian Federation
247
326
73
11.5
Asia
142
Rwanda
43
2
0
6.8
Africa
143
St. Kitts & Nevis
194
205
32
7.7
North America
144
St. Lucia
171
315
71
10.1
North America
145
St. Vincent & the Grenadines
120
221
11
6.3
North America
146
Samoa
105
18
24
2.6
Oceania
147
San Marino
0
0
0
0.0
Europe
148
Sao Tome & Principe
56
38
140
4.2
Africa
149
Saudi Arabia
0
5
0
0.1
Asia
150
Senegal
9
1
7
0.3
Africa
151
Serbia
283
131
127
9.6
Europe
152
Seychelles
157
25
51
4.1
Africa
153
Sierra Leone
25
3
2
6.7
Africa
154
Singapore
60
12
11
1.5
Asia
155
Slovakia
196
293
116
11.4
Europe
156
Slovenia
270
51
276
10.6
Europe
157
Solomon Islands
56
11
1
1.2
Oceania
158
Somalia
0
0
0
0.0
Africa
159
South Africa
225
76
81
8.2
Africa
160
Spain
284
157
112
10.0
Europe
161
Sri Lanka
16
104
0
2.2
Asia
162
Sudan
8
13
0
1.7
Africa
163
Suriname
128
178
7
5.6
South America
164
Swaziland
90
2
2
4.7
Africa
165
Sweden
152
60
186
7.2
Europe
166
Switzerland
185
100
280
10.2
Europe
167
Syria
5
35
16
1.0
Asia
168
Tajikistan
2
15
0
0.3
Asia
169
Thailand
99
258
1
6.4
Asia
170
Macedonia
106
27
86
3.9
Europe
171
Timor-Leste
1
1
4
0.1
Asia
172
Togo
36
2
19
1.3
Africa
173
Tonga
36
21
5
1.1
Oceania
174
Trinidad & Tobago
197
156
7
6.4
North America
175
Tunisia
51
3
20
1.3
Africa
176
Turkey
51
22
7
1.4
Asia
177
Turkmenistan
19
71
32
2.2
Asia
178
Tuvalu
6
41
9
1.0
Oceania
179
Uganda
45
9
0
8.3
Africa
180
Ukraine
206
237
45
8.9
Europe
181
United Arab Emirates
16
135
5
2.8
Asia
182
United Kingdom
219
126
195
10.4
Europe
183
Tanzania
36
6
1
5.7
Africa
184
USA
249
158
84
8.7
North America
185
Uruguay
115
35
220
6.6
South America
186
Uzbekistan
25
101
8
2.4
Asia
187
Vanuatu
21
18
11
0.9
Oceania
188
Venezuela
333
100
3
7.7
South America
189
Vietnam
111
2
1
2.0
Asia
190
Yemen
6
0
0
0.1
Asia
191
Zambia
32
19
4
2.5
Africa
192
Zimbabwe
64
18
4
4.7
Africa
In [56]:
pd.reset_option('display.max_rows')
pd.get_option('display.max_rows')
Out[56]:
60
In [64]:
train = pd.read_csv('http://bit.ly/kaggletrain')
train
Out[64]:
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.25
NaN
S
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Thayer)
female
38.0
1
0
PC 17599
71.28
C85
C
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.92
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.10
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.05
NaN
S
5
6
0
3
Moran, Mr. James
male
NaN
0
0
330877
8.46
NaN
Q
6
7
0
1
McCarthy, Mr. Timothy J
male
54.0
0
0
17463
51.86
E46
S
7
8
0
3
Palsson, Master. Gosta Leonard
male
2.0
3
1
349909
21.07
NaN
S
8
9
1
3
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
female
27.0
0
2
347742
11.13
NaN
S
9
10
1
2
Nasser, Mrs. Nicholas (Adele Achem)
female
14.0
1
0
237736
30.07
NaN
C
10
11
1
3
Sandstrom, Miss. Marguerite Rut
female
4.0
1
1
PP 9549
16.70
G6
S
11
12
1
1
Bonnell, Miss. Elizabeth
female
58.0
0
0
113783
26.55
C103
S
12
13
0
3
Saundercock, Mr. William Henry
male
20.0
0
0
A/5. 2151
8.05
NaN
S
13
14
0
3
Andersson, Mr. Anders Johan
male
39.0
1
5
347082
31.27
NaN
S
14
15
0
3
Vestrom, Miss. Hulda Amanda Adolfina
female
14.0
0
0
350406
7.85
NaN
S
15
16
1
2
Hewlett, Mrs. (Mary D Kingcome)
female
55.0
0
0
248706
16.00
NaN
S
16
17
0
3
Rice, Master. Eugene
male
2.0
4
1
382652
29.12
NaN
Q
17
18
1
2
Williams, Mr. Charles Eugene
male
NaN
0
0
244373
13.00
NaN
S
18
19
0
3
Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)
female
31.0
1
0
345763
18.00
NaN
S
19
20
1
3
Masselmani, Mrs. Fatima
female
NaN
0
0
2649
7.22
NaN
C
20
21
0
2
Fynney, Mr. Joseph J
male
35.0
0
0
239865
26.00
NaN
S
21
22
1
2
Beesley, Mr. Lawrence
male
34.0
0
0
248698
13.00
D56
S
22
23
1
3
McGowan, Miss. Anna "Annie"
female
15.0
0
0
330923
8.03
NaN
Q
23
24
1
1
Sloper, Mr. William Thompson
male
28.0
0
0
113788
35.50
A6
S
24
25
0
3
Palsson, Miss. Torborg Danira
female
8.0
3
1
349909
21.07
NaN
S
25
26
1
3
Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)
female
38.0
1
5
347077
31.39
NaN
S
26
27
0
3
Emir, Mr. Farred Chehab
male
NaN
0
0
2631
7.22
NaN
C
27
28
0
1
Fortune, Mr. Charles Alexander
male
19.0
3
2
19950
263.00
C23 C25 C27
S
28
29
1
3
O'Dwyer, Miss. Ellen "Nellie"
female
NaN
0
0
330959
7.88
NaN
Q
29
30
0
3
Todoroff, Mr. Lalio
male
NaN
0
0
349216
7.90
NaN
S
...
...
...
...
...
...
...
...
...
...
...
...
...
861
862
0
2
Giles, Mr. Frederick Edward
male
21.0
1
0
28134
11.50
NaN
S
862
863
1
1
Swift, Mrs. Frederick Joel (Margaret Welles Barron)
female
48.0
0
0
17466
25.93
D17
S
863
864
0
3
Sage, Miss. Dorothy Edith "Dolly"
female
NaN
8
2
CA. 2343
69.55
NaN
S
864
865
0
2
Gill, Mr. John William
male
24.0
0
0
233866
13.00
NaN
S
865
866
1
2
Bystrom, Mrs. (Karolina)
female
42.0
0
0
236852
13.00
NaN
S
866
867
1
2
Duran y More, Miss. Asuncion
female
27.0
1
0
SC/PARIS 2149
13.86
NaN
C
867
868
0
1
Roebling, Mr. Washington Augustus II
male
31.0
0
0
PC 17590
50.50
A24
S
868
869
0
3
van Melkebeke, Mr. Philemon
male
NaN
0
0
345777
9.50
NaN
S
869
870
1
3
Johnson, Master. Harold Theodor
male
4.0
1
1
347742
11.13
NaN
S
870
871
0
3
Balkic, Mr. Cerin
male
26.0
0
0
349248
7.90
NaN
S
871
872
1
1
Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
female
47.0
1
1
11751
52.55
D35
S
872
873
0
1
Carlsson, Mr. Frans Olof
male
33.0
0
0
695
5.00
B51 B53 B55
S
873
874
0
3
Vander Cruyssen, Mr. Victor
male
47.0
0
0
345765
9.00
NaN
S
874
875
1
2
Abelson, Mrs. Samuel (Hannah Wizosky)
female
28.0
1
0
P/PP 3381
24.00
NaN
C
875
876
1
3
Najib, Miss. Adele Kiamie "Jane"
female
15.0
0
0
2667
7.22
NaN
C
876
877
0
3
Gustafsson, Mr. Alfred Ossian
male
20.0
0
0
7534
9.85
NaN
S
877
878
0
3
Petroff, Mr. Nedelio
male
19.0
0
0
349212
7.90
NaN
S
878
879
0
3
Laleff, Mr. Kristo
male
NaN
0
0
349217
7.90
NaN
S
879
880
1
1
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
female
56.0
0
1
11767
83.16
C50
C
880
881
1
2
Shelley, Mrs. William (Imanita Parrish Hall)
female
25.0
0
1
230433
26.00
NaN
S
881
882
0
3
Markun, Mr. Johann
male
33.0
0
0
349257
7.90
NaN
S
882
883
0
3
Dahlberg, Miss. Gerda Ulrika
female
22.0
0
0
7552
10.52
NaN
S
883
884
0
2
Banfield, Mr. Frederick James
male
28.0
0
0
C.A./SOTON 34068
10.50
NaN
S
884
885
0
3
Sutehall, Mr. Henry Jr
male
25.0
0
0
SOTON/OQ 392076
7.05
NaN
S
885
886
0
3
Rice, Mrs. William (Margaret Norton)
female
39.0
0
5
382652
29.12
NaN
Q
886
887
0
2
Montvila, Rev. Juozas
male
27.0
0
0
211536
13.00
NaN
S
887
888
1
1
Graham, Miss. Margaret Edith
female
19.0
0
0
112053
30.00
B42
S
888
889
0
3
Johnston, Miss. Catherine Helen "Carrie"
female
NaN
1
2
W./C. 6607
23.45
NaN
S
889
890
1
1
Behr, Mr. Karl Howell
male
26.0
0
0
111369
30.00
C148
C
890
891
0
3
Dooley, Mr. Patrick
male
32.0
0
0
370376
7.75
NaN
Q
891 rows × 12 columns
In [58]:
pd.get_option('display.max_colwidth')
Out[58]:
50
In [61]:
pd.set_option('display.max_colwidth', 1000)
In [62]:
pd.get_option('display.precision')
Out[62]:
6
In [63]:
pd.set_option('display.precision', 2)
In [66]:
drinks.head()
Out[66]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
In [70]:
drinks['x'] = drinks.wine_servings * 1000
drinks['y'] = drinks.total_litres_of_pure_alcohol * 1000
drinks.head()
Out[70]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
x
y
0
Afghanistan
0
0
0
0.0
Asia
0
0.0
1
Albania
89
132
54
4.9
Europe
54000
4,900.0
2
Algeria
25
0
14
0.7
Africa
14000
700.0
3
Andorra
245
138
312
12.4
Europe
312000
12,400.0
4
Angola
217
57
45
5.9
Africa
45000
5,900.0
In [69]:
pd.set_option('display.float_format', '{:,}'.format)
In [72]:
pd.describe_option('rows')
display.max_info_rows : int or None
df.info() will usually show null-counts for each column.
For large frames this can be quite slow. max_info_rows and max_info_cols
limit this null check only to frames with smaller dimensions than
specified.
[default: 1690785] [currently: 1690785]
display.max_rows : int
If max_rows is exceeded, switch to truncate view. Depending on
`large_repr`, objects are either centrally truncated or printed as
a summary view. 'None' value means unlimited.
In case python/IPython is running in a terminal and `large_repr`
equals 'truncate' this can be set to 0 and pandas will auto-detect
the height of the terminal and print a truncated object which fits
the screen height. The IPython notebook, IPython qtconsole, or
IDLE do not run in a terminal and hence it is not possible to do
correct auto-detection.
[default: 60] [currently: 60]
In [73]:
pd.reset_option('all')
height has been deprecated.
line_width has been deprecated, use display.width instead (currently both are
identical)
/usr/local/lib/python2.7/site-packages/ipykernel/__main__.py:1: FutureWarning:
mpl_style had been deprecated and will be removed in a future version.
Use `matplotlib.pyplot.style.use` instead.
if __name__ == '__main__':
In [77]:
df = pd.DataFrame({'id': [100, 101, 102], 'color': ['red', 'blue', 'red']}, columns=['id', 'color'], index=['A', 'B', 'C'])
In [79]:
pd.DataFrame([[100, 'red'], [101, 'blue'], [102, 'red']], columns=['id', 'color'])
Out[79]:
id
color
0
100
red
1
101
blue
2
102
red
In [81]:
import numpy as np
arr = np.random.rand(4, 2)
arr
Out[81]:
array([[ 0.2773193 , 0.62793445],
[ 0.48314833, 0.66573051],
[ 0.96897954, 0.59954869],
[ 0.33206409, 0.06175922]])
In [84]:
pd.DataFrame(arr, columns=['one', 'two'])
Out[84]:
one
two
0
0.277319
0.627934
1
0.483148
0.665731
2
0.968980
0.599549
3
0.332064
0.061759
In [86]:
pd.DataFrame({'student': np.arange(100, 110, 1), 'test': np.random.randint(60, 101, 10)}).set_index('student')
Out[86]:
test
student
100
66
101
85
102
72
103
98
104
82
105
72
106
78
107
82
108
75
109
60
In [90]:
s = pd.Series(['round', 'square'], index=['C', 'B'], name='shape')
s
Out[90]:
C round
B square
Name: shape, dtype: object
In [88]:
df
Out[88]:
id
color
A
100
red
B
101
blue
C
102
red
In [91]:
pd.concat([df, s], axis=1)
Out[91]:
id
color
shape
A
100
red
NaN
B
101
blue
square
C
102
red
round
In [108]:
train = pd.read_csv('http://bit.ly/kaggletrain')
train
Out[108]:
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
5
6
0
3
Moran, Mr. James
male
NaN
0
0
330877
8.4583
NaN
Q
6
7
0
1
McCarthy, Mr. Timothy J
male
54.0
0
0
17463
51.8625
E46
S
7
8
0
3
Palsson, Master. Gosta Leonard
male
2.0
3
1
349909
21.0750
NaN
S
8
9
1
3
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
female
27.0
0
2
347742
11.1333
NaN
S
9
10
1
2
Nasser, Mrs. Nicholas (Adele Achem)
female
14.0
1
0
237736
30.0708
NaN
C
10
11
1
3
Sandstrom, Miss. Marguerite Rut
female
4.0
1
1
PP 9549
16.7000
G6
S
11
12
1
1
Bonnell, Miss. Elizabeth
female
58.0
0
0
113783
26.5500
C103
S
12
13
0
3
Saundercock, Mr. William Henry
male
20.0
0
0
A/5. 2151
8.0500
NaN
S
13
14
0
3
Andersson, Mr. Anders Johan
male
39.0
1
5
347082
31.2750
NaN
S
14
15
0
3
Vestrom, Miss. Hulda Amanda Adolfina
female
14.0
0
0
350406
7.8542
NaN
S
15
16
1
2
Hewlett, Mrs. (Mary D Kingcome)
female
55.0
0
0
248706
16.0000
NaN
S
16
17
0
3
Rice, Master. Eugene
male
2.0
4
1
382652
29.1250
NaN
Q
17
18
1
2
Williams, Mr. Charles Eugene
male
NaN
0
0
244373
13.0000
NaN
S
18
19
0
3
Vander Planke, Mrs. Julius (Emelia Maria Vande...
female
31.0
1
0
345763
18.0000
NaN
S
19
20
1
3
Masselmani, Mrs. Fatima
female
NaN
0
0
2649
7.2250
NaN
C
20
21
0
2
Fynney, Mr. Joseph J
male
35.0
0
0
239865
26.0000
NaN
S
21
22
1
2
Beesley, Mr. Lawrence
male
34.0
0
0
248698
13.0000
D56
S
22
23
1
3
McGowan, Miss. Anna "Annie"
female
15.0
0
0
330923
8.0292
NaN
Q
23
24
1
1
Sloper, Mr. William Thompson
male
28.0
0
0
113788
35.5000
A6
S
24
25
0
3
Palsson, Miss. Torborg Danira
female
8.0
3
1
349909
21.0750
NaN
S
25
26
1
3
Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
female
38.0
1
5
347077
31.3875
NaN
S
26
27
0
3
Emir, Mr. Farred Chehab
male
NaN
0
0
2631
7.2250
NaN
C
27
28
0
1
Fortune, Mr. Charles Alexander
male
19.0
3
2
19950
263.0000
C23 C25 C27
S
28
29
1
3
O'Dwyer, Miss. Ellen "Nellie"
female
NaN
0
0
330959
7.8792
NaN
Q
29
30
0
3
Todoroff, Mr. Lalio
male
NaN
0
0
349216
7.8958
NaN
S
...
...
...
...
...
...
...
...
...
...
...
...
...
861
862
0
2
Giles, Mr. Frederick Edward
male
21.0
1
0
28134
11.5000
NaN
S
862
863
1
1
Swift, Mrs. Frederick Joel (Margaret Welles Ba...
female
48.0
0
0
17466
25.9292
D17
S
863
864
0
3
Sage, Miss. Dorothy Edith "Dolly"
female
NaN
8
2
CA. 2343
69.5500
NaN
S
864
865
0
2
Gill, Mr. John William
male
24.0
0
0
233866
13.0000
NaN
S
865
866
1
2
Bystrom, Mrs. (Karolina)
female
42.0
0
0
236852
13.0000
NaN
S
866
867
1
2
Duran y More, Miss. Asuncion
female
27.0
1
0
SC/PARIS 2149
13.8583
NaN
C
867
868
0
1
Roebling, Mr. Washington Augustus II
male
31.0
0
0
PC 17590
50.4958
A24
S
868
869
0
3
van Melkebeke, Mr. Philemon
male
NaN
0
0
345777
9.5000
NaN
S
869
870
1
3
Johnson, Master. Harold Theodor
male
4.0
1
1
347742
11.1333
NaN
S
870
871
0
3
Balkic, Mr. Cerin
male
26.0
0
0
349248
7.8958
NaN
S
871
872
1
1
Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
female
47.0
1
1
11751
52.5542
D35
S
872
873
0
1
Carlsson, Mr. Frans Olof
male
33.0
0
0
695
5.0000
B51 B53 B55
S
873
874
0
3
Vander Cruyssen, Mr. Victor
male
47.0
0
0
345765
9.0000
NaN
S
874
875
1
2
Abelson, Mrs. Samuel (Hannah Wizosky)
female
28.0
1
0
P/PP 3381
24.0000
NaN
C
875
876
1
3
Najib, Miss. Adele Kiamie "Jane"
female
15.0
0
0
2667
7.2250
NaN
C
876
877
0
3
Gustafsson, Mr. Alfred Ossian
male
20.0
0
0
7534
9.8458
NaN
S
877
878
0
3
Petroff, Mr. Nedelio
male
19.0
0
0
349212
7.8958
NaN
S
878
879
0
3
Laleff, Mr. Kristo
male
NaN
0
0
349217
7.8958
NaN
S
879
880
1
1
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
female
56.0
0
1
11767
83.1583
C50
C
880
881
1
2
Shelley, Mrs. William (Imanita Parrish Hall)
female
25.0
0
1
230433
26.0000
NaN
S
881
882
0
3
Markun, Mr. Johann
male
33.0
0
0
349257
7.8958
NaN
S
882
883
0
3
Dahlberg, Miss. Gerda Ulrika
female
22.0
0
0
7552
10.5167
NaN
S
883
884
0
2
Banfield, Mr. Frederick James
male
28.0
0
0
C.A./SOTON 34068
10.5000
NaN
S
884
885
0
3
Sutehall, Mr. Henry Jr
male
25.0
0
0
SOTON/OQ 392076
7.0500
NaN
S
885
886
0
3
Rice, Mrs. William (Margaret Norton)
female
39.0
0
5
382652
29.1250
NaN
Q
886
887
0
2
Montvila, Rev. Juozas
male
27.0
0
0
211536
13.0000
NaN
S
887
888
1
1
Graham, Miss. Margaret Edith
female
19.0
0
0
112053
30.0000
B42
S
888
889
0
3
Johnston, Miss. Catherine Helen "Carrie"
female
NaN
1
2
W./C. 6607
23.4500
NaN
S
889
890
1
1
Behr, Mr. Karl Howell
male
26.0
0
0
111369
30.0000
C148
C
890
891
0
3
Dooley, Mr. Patrick
male
32.0
0
0
370376
7.7500
NaN
Q
891 rows × 12 columns
In [93]:
train['Sex_male'] = train.Sex.map({'female': 0, 'male': 1})
In [94]:
train.head()
Out[94]:
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
Sex_male
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
0
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
0
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
0
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
1
In [97]:
pd.get_dummies(train.Sex, prefix='Sex').iloc[:, 1:]
Out[97]:
Sex_male
0
1
1
0
2
0
3
0
4
1
5
1
6
1
7
1
8
0
9
0
10
0
11
0
12
1
13
1
14
0
15
0
16
1
17
1
18
0
19
0
20
1
21
1
22
0
23
1
24
0
25
0
26
1
27
1
28
0
29
1
...
...
861
1
862
0
863
0
864
1
865
0
866
0
867
1
868
1
869
1
870
1
871
0
872
1
873
1
874
0
875
0
876
1
877
1
878
1
879
0
880
0
881
1
882
0
883
1
884
1
885
0
886
1
887
0
888
0
889
1
890
1
891 rows × 1 columns
In [99]:
train.Embarked.value_counts()
Out[99]:
S 644
C 168
Q 77
Name: Embarked, dtype: int64
In [107]:
embarked_dummies = pd.get_dummies(train.Embarked, prefix='Embarked').iloc[:, 1:]
train = pd.concat([train, embarked_dummies], axis=1)
train.head()
Out[107]:
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
Embarked_Q
Embarked_S
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
0
1
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
0
0
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
0
1
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
0
1
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
0
1
In [110]:
pd.get_dummies(train, columns=['Sex', 'Embarked'], drop_first=True)
Out[110]:
PassengerId
Survived
Pclass
Name
Age
SibSp
Parch
Ticket
Fare
Cabin
Sex_male
Embarked_Q
Embarked_S
0
1
0
3
Braund, Mr. Owen Harris
22.0
1
0
A/5 21171
7.2500
NaN
1
0
1
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
38.0
1
0
PC 17599
71.2833
C85
0
0
0
2
3
1
3
Heikkinen, Miss. Laina
26.0
0
0
STON/O2. 3101282
7.9250
NaN
0
0
1
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
35.0
1
0
113803
53.1000
C123
0
0
1
4
5
0
3
Allen, Mr. William Henry
35.0
0
0
373450
8.0500
NaN
1
0
1
5
6
0
3
Moran, Mr. James
NaN
0
0
330877
8.4583
NaN
1
1
0
6
7
0
1
McCarthy, Mr. Timothy J
54.0
0
0
17463
51.8625
E46
1
0
1
7
8
0
3
Palsson, Master. Gosta Leonard
2.0
3
1
349909
21.0750
NaN
1
0
1
8
9
1
3
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
27.0
0
2
347742
11.1333
NaN
0
0
1
9
10
1
2
Nasser, Mrs. Nicholas (Adele Achem)
14.0
1
0
237736
30.0708
NaN
0
0
0
10
11
1
3
Sandstrom, Miss. Marguerite Rut
4.0
1
1
PP 9549
16.7000
G6
0
0
1
11
12
1
1
Bonnell, Miss. Elizabeth
58.0
0
0
113783
26.5500
C103
0
0
1
12
13
0
3
Saundercock, Mr. William Henry
20.0
0
0
A/5. 2151
8.0500
NaN
1
0
1
13
14
0
3
Andersson, Mr. Anders Johan
39.0
1
5
347082
31.2750
NaN
1
0
1
14
15
0
3
Vestrom, Miss. Hulda Amanda Adolfina
14.0
0
0
350406
7.8542
NaN
0
0
1
15
16
1
2
Hewlett, Mrs. (Mary D Kingcome)
55.0
0
0
248706
16.0000
NaN
0
0
1
16
17
0
3
Rice, Master. Eugene
2.0
4
1
382652
29.1250
NaN
1
1
0
17
18
1
2
Williams, Mr. Charles Eugene
NaN
0
0
244373
13.0000
NaN
1
0
1
18
19
0
3
Vander Planke, Mrs. Julius (Emelia Maria Vande...
31.0
1
0
345763
18.0000
NaN
0
0
1
19
20
1
3
Masselmani, Mrs. Fatima
NaN
0
0
2649
7.2250
NaN
0
0
0
20
21
0
2
Fynney, Mr. Joseph J
35.0
0
0
239865
26.0000
NaN
1
0
1
21
22
1
2
Beesley, Mr. Lawrence
34.0
0
0
248698
13.0000
D56
1
0
1
22
23
1
3
McGowan, Miss. Anna "Annie"
15.0
0
0
330923
8.0292
NaN
0
1
0
23
24
1
1
Sloper, Mr. William Thompson
28.0
0
0
113788
35.5000
A6
1
0
1
24
25
0
3
Palsson, Miss. Torborg Danira
8.0
3
1
349909
21.0750
NaN
0
0
1
25
26
1
3
Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
38.0
1
5
347077
31.3875
NaN
0
0
1
26
27
0
3
Emir, Mr. Farred Chehab
NaN
0
0
2631
7.2250
NaN
1
0
0
27
28
0
1
Fortune, Mr. Charles Alexander
19.0
3
2
19950
263.0000
C23 C25 C27
1
0
1
28
29
1
3
O'Dwyer, Miss. Ellen "Nellie"
NaN
0
0
330959
7.8792
NaN
0
1
0
29
30
0
3
Todoroff, Mr. Lalio
NaN
0
0
349216
7.8958
NaN
1
0
1
...
...
...
...
...
...
...
...
...
...
...
...
...
...
861
862
0
2
Giles, Mr. Frederick Edward
21.0
1
0
28134
11.5000
NaN
1
0
1
862
863
1
1
Swift, Mrs. Frederick Joel (Margaret Welles Ba...
48.0
0
0
17466
25.9292
D17
0
0
1
863
864
0
3
Sage, Miss. Dorothy Edith "Dolly"
NaN
8
2
CA. 2343
69.5500
NaN
0
0
1
864
865
0
2
Gill, Mr. John William
24.0
0
0
233866
13.0000
NaN
1
0
1
865
866
1
2
Bystrom, Mrs. (Karolina)
42.0
0
0
236852
13.0000
NaN
0
0
1
866
867
1
2
Duran y More, Miss. Asuncion
27.0
1
0
SC/PARIS 2149
13.8583
NaN
0
0
0
867
868
0
1
Roebling, Mr. Washington Augustus II
31.0
0
0
PC 17590
50.4958
A24
1
0
1
868
869
0
3
van Melkebeke, Mr. Philemon
NaN
0
0
345777
9.5000
NaN
1
0
1
869
870
1
3
Johnson, Master. Harold Theodor
4.0
1
1
347742
11.1333
NaN
1
0
1
870
871
0
3
Balkic, Mr. Cerin
26.0
0
0
349248
7.8958
NaN
1
0
1
871
872
1
1
Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
47.0
1
1
11751
52.5542
D35
0
0
1
872
873
0
1
Carlsson, Mr. Frans Olof
33.0
0
0
695
5.0000
B51 B53 B55
1
0
1
873
874
0
3
Vander Cruyssen, Mr. Victor
47.0
0
0
345765
9.0000
NaN
1
0
1
874
875
1
2
Abelson, Mrs. Samuel (Hannah Wizosky)
28.0
1
0
P/PP 3381
24.0000
NaN
0
0
0
875
876
1
3
Najib, Miss. Adele Kiamie "Jane"
15.0
0
0
2667
7.2250
NaN
0
0
0
876
877
0
3
Gustafsson, Mr. Alfred Ossian
20.0
0
0
7534
9.8458
NaN
1
0
1
877
878
0
3
Petroff, Mr. Nedelio
19.0
0
0
349212
7.8958
NaN
1
0
1
878
879
0
3
Laleff, Mr. Kristo
NaN
0
0
349217
7.8958
NaN
1
0
1
879
880
1
1
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
56.0
0
1
11767
83.1583
C50
0
0
0
880
881
1
2
Shelley, Mrs. William (Imanita Parrish Hall)
25.0
0
1
230433
26.0000
NaN
0
0
1
881
882
0
3
Markun, Mr. Johann
33.0
0
0
349257
7.8958
NaN
1
0
1
882
883
0
3
Dahlberg, Miss. Gerda Ulrika
22.0
0
0
7552
10.5167
NaN
0
0
1
883
884
0
2
Banfield, Mr. Frederick James
28.0
0
0
C.A./SOTON 34068
10.5000
NaN
1
0
1
884
885
0
3
Sutehall, Mr. Henry Jr
25.0
0
0
SOTON/OQ 392076
7.0500
NaN
1
0
1
885
886
0
3
Rice, Mrs. William (Margaret Norton)
39.0
0
5
382652
29.1250
NaN
0
1
0
886
887
0
2
Montvila, Rev. Juozas
27.0
0
0
211536
13.0000
NaN
1
0
1
887
888
1
1
Graham, Miss. Margaret Edith
19.0
0
0
112053
30.0000
B42
0
0
1
888
889
0
3
Johnston, Miss. Catherine Helen "Carrie"
NaN
1
2
W./C. 6607
23.4500
NaN
0
0
1
889
890
1
1
Behr, Mr. Karl Howell
26.0
0
0
111369
30.0000
C148
1
0
0
890
891
0
3
Dooley, Mr. Patrick
32.0
0
0
370376
7.7500
NaN
1
1
0
891 rows × 13 columns
In [112]:
user_cols = ['user_id', 'age', 'gender', 'occupation', 'zip_code']
users = pd.read_table('http://bit.ly/movieusers', delimiter='|', header=None, names=user_cols, index_col='user_id')
users.head()
Out[112]:
age
gender
occupation
zip_code
user_id
1
24
M
technician
85711
2
53
F
other
94043
3
23
M
writer
32067
4
24
M
technician
43537
5
33
F
other
15213
In [113]:
users.shape
Out[113]:
(943, 4)
In [117]:
users.zip_code.duplicated().sum()
Out[117]:
148
In [119]:
users.duplicated().sum()
Out[119]:
7
In [123]:
users.loc[users.duplicated(keep=False), :]
Out[123]:
age
gender
occupation
zip_code
user_id
67
17
M
student
60402
85
51
M
educator
20003
198
21
F
student
55414
350
32
M
student
97301
428
28
M
student
55414
437
27
F
other
20009
460
44
F
other
60630
496
21
F
student
55414
572
51
M
educator
20003
621
17
M
student
60402
684
28
M
student
55414
733
44
F
other
60630
805
27
F
other
20009
890
32
M
student
97301
In [124]:
users.drop_duplicates(keep='first').shape
Out[124]:
(936, 4)
In [125]:
users.duplicated(subset=['age', 'zip_code']).sum()
Out[125]:
16
In [126]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks.head()
Out[126]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
In [127]:
drinks.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
country 193 non-null object
beer_servings 193 non-null int64
spirit_servings 193 non-null int64
wine_servings 193 non-null int64
total_litres_of_pure_alcohol 193 non-null float64
continent 193 non-null object
dtypes: float64(1), int64(3), object(2)
memory usage: 9.1+ KB
In [128]:
drinks.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
country 193 non-null object
beer_servings 193 non-null int64
spirit_servings 193 non-null int64
wine_servings 193 non-null int64
total_litres_of_pure_alcohol 193 non-null float64
continent 193 non-null object
dtypes: float64(1), int64(3), object(2)
memory usage: 25.9 KB
In [130]:
drinks.memory_usage(deep=True)
Out[130]:
Index 72
country 10272
beer_servings 1544
spirit_servings 1544
wine_servings 1544
total_litres_of_pure_alcohol 1544
continent 10016
dtype: int64
In [132]:
sorted(drinks.continent.unique())
Out[132]:
['Africa', 'Asia', 'Europe', 'North America', 'Oceania', 'South America']
In [133]:
drinks['continent'] = drinks.continent.astype('category')
In [135]:
drinks.dtypes
Out[135]:
country object
beer_servings int64
spirit_servings int64
wine_servings int64
total_litres_of_pure_alcohol float64
continent category
dtype: object
In [136]:
drinks.continent.cat.codes.head()
Out[136]:
0 1
1 2
2 0
3 2
4 0
dtype: int8
In [137]:
drinks.memory_usage(deep=True)
Out[137]:
Index 72
country 10272
beer_servings 1544
spirit_servings 1544
wine_servings 1544
total_litres_of_pure_alcohol 1544
continent 512
dtype: int64
In [141]:
drinks['country'] = drinks.country.astype('category')
drinks.memory_usage(deep=True)
Out[141]:
Index 72
country 10658
beer_servings 1544
spirit_servings 1544
wine_servings 1544
total_litres_of_pure_alcohol 1544
continent 512
dtype: int64
In [143]:
df = pd.DataFrame({'ID': [100, 101, 102, 103], 'quality': ['good', 'very good', 'good', 'excellent']})
df
Out[143]:
ID
quality
0
100
good
1
101
very good
2
102
good
3
103
excellent
In [2]:
ufo = pd.read_csv('http://bit.ly/uforeports')
In [3]:
ufo.head(3)
Out[3]:
City
Colors Reported
Shape Reported
State
Time
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
In [4]:
ufo.loc[0, :]
Out[4]:
City Ithaca
Colors Reported NaN
Shape Reported TRIANGLE
State NY
Time 6/1/1930 22:00
Name: 0, dtype: object
In [7]:
ufo.loc[0:2, :]
Out[7]:
City
Colors Reported
Shape Reported
State
Time
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
In [11]:
ufo.loc[0, 'City':'State']
Out[11]:
City Ithaca
Colors Reported NaN
Shape Reported TRIANGLE
State NY
Name: 0, dtype: object
In [12]:
ufo.loc[ufo.City=='Oakland', :]
Out[12]:
City
Colors Reported
Shape Reported
State
Time
1694
Oakland
NaN
CIGAR
CA
7/21/1968 14:00
2144
Oakland
NaN
DISK
CA
8/19/1971 0:00
4686
Oakland
NaN
LIGHT
MD
6/1/1982 0:00
7293
Oakland
NaN
LIGHT
CA
3/28/1994 17:00
8488
Oakland
NaN
NaN
CA
8/10/1995 21:45
8768
Oakland
NaN
NaN
CA
10/10/1995 22:40
10816
Oakland
NaN
LIGHT
OR
10/1/1997 21:30
10948
Oakland
NaN
DISK
CA
11/14/1997 19:55
11045
Oakland
NaN
TRIANGLE
CA
12/10/1997 1:30
12322
Oakland
NaN
FIREBALL
CA
10/9/1998 19:40
12941
Oakland
NaN
CYLINDER
CA
1/23/1999 21:30
16803
Oakland
NaN
TRIANGLE
MD
7/4/2000 23:00
17322
Oakland
NaN
CYLINDER
CA
9/1/2000 21:35
In [14]:
ufo.iloc[:, [0, 3]]
Out[14]:
City
State
0
Ithaca
NY
1
Willingboro
NJ
2
Holyoke
CO
3
Abilene
KS
4
New York Worlds Fair
NY
5
Valley City
ND
6
Crater Lake
CA
7
Alma
MI
8
Eklutna
AK
9
Hubbard
OR
10
Fontana
CA
11
Waterloo
AL
12
Belton
SC
13
Keokuk
IA
14
Ludington
MI
15
Forest Home
CA
16
Los Angeles
CA
17
Hapeville
GA
18
Oneida
TN
19
Bering Sea
AK
20
Nebraska
NE
21
NaN
LA
22
NaN
LA
23
Owensboro
KY
24
Wilderness
WV
25
San Diego
CA
26
Wilderness
WV
27
Clovis
NM
28
Los Alamos
NM
29
Ft. Duschene
UT
...
...
...
18211
Holyoke
MA
18212
Carson
CA
18213
Pasadena
CA
18214
Austin
TX
18215
El Campo
TX
18216
Garden Grove
CA
18217
Berthoud Pass
CO
18218
Sisterdale
TX
18219
Garden Grove
CA
18220
Shasta Lake
CA
18221
Franklin
NH
18222
Albrightsville
PA
18223
Greenville
SC
18224
Eufaula
OK
18225
Simi Valley
CA
18226
San Francisco
CA
18227
San Francisco
CA
18228
Kingsville
TX
18229
Chicago
IL
18230
Pismo Beach
CA
18231
Pismo Beach
CA
18232
Lodi
WI
18233
Anchorage
AK
18234
Capitola
CA
18235
Fountain Hills
AZ
18236
Grant Park
IL
18237
Spirit Lake
IA
18238
Eagle River
WI
18239
Eagle River
WI
18240
Ybor
FL
18241 rows × 2 columns
In [38]:
train = pd.read_csv('http://bit.ly/kaggletrain')
In [39]:
train.head()
Out[39]:
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
In [40]:
feature_cols = ['Pclass', 'Parch']
X = train.loc[:, feature_cols]
In [41]:
y = train.Survived
In [42]:
from sklearn import linear_model
logreg = linear_model.LogisticRegression()
logreg.fit(X, y)
Out[42]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
In [46]:
test = pd.read_csv('http://bit.ly/kaggletest')
X_new = test.loc[:, feature_cols]
In [47]:
y_predict = logreg.predict(X_new)
In [48]:
y_predict.shape
Out[48]:
(418,)
In [50]:
pd.DataFrame({'PassengerId': test.PassengerId, 'Survived': y_predict}).set_index('PassengerId').to_csv('sub.csv')
In [51]:
train.to_pickle('train.pkl')
In [52]:
pd.read_pickle('train.pkl')
Out[52]:
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
5
6
0
3
Moran, Mr. James
male
NaN
0
0
330877
8.4583
NaN
Q
6
7
0
1
McCarthy, Mr. Timothy J
male
54.0
0
0
17463
51.8625
E46
S
7
8
0
3
Palsson, Master. Gosta Leonard
male
2.0
3
1
349909
21.0750
NaN
S
8
9
1
3
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
female
27.0
0
2
347742
11.1333
NaN
S
9
10
1
2
Nasser, Mrs. Nicholas (Adele Achem)
female
14.0
1
0
237736
30.0708
NaN
C
10
11
1
3
Sandstrom, Miss. Marguerite Rut
female
4.0
1
1
PP 9549
16.7000
G6
S
11
12
1
1
Bonnell, Miss. Elizabeth
female
58.0
0
0
113783
26.5500
C103
S
12
13
0
3
Saundercock, Mr. William Henry
male
20.0
0
0
A/5. 2151
8.0500
NaN
S
13
14
0
3
Andersson, Mr. Anders Johan
male
39.0
1
5
347082
31.2750
NaN
S
14
15
0
3
Vestrom, Miss. Hulda Amanda Adolfina
female
14.0
0
0
350406
7.8542
NaN
S
15
16
1
2
Hewlett, Mrs. (Mary D Kingcome)
female
55.0
0
0
248706
16.0000
NaN
S
16
17
0
3
Rice, Master. Eugene
male
2.0
4
1
382652
29.1250
NaN
Q
17
18
1
2
Williams, Mr. Charles Eugene
male
NaN
0
0
244373
13.0000
NaN
S
18
19
0
3
Vander Planke, Mrs. Julius (Emelia Maria Vande...
female
31.0
1
0
345763
18.0000
NaN
S
19
20
1
3
Masselmani, Mrs. Fatima
female
NaN
0
0
2649
7.2250
NaN
C
20
21
0
2
Fynney, Mr. Joseph J
male
35.0
0
0
239865
26.0000
NaN
S
21
22
1
2
Beesley, Mr. Lawrence
male
34.0
0
0
248698
13.0000
D56
S
22
23
1
3
McGowan, Miss. Anna "Annie"
female
15.0
0
0
330923
8.0292
NaN
Q
23
24
1
1
Sloper, Mr. William Thompson
male
28.0
0
0
113788
35.5000
A6
S
24
25
0
3
Palsson, Miss. Torborg Danira
female
8.0
3
1
349909
21.0750
NaN
S
25
26
1
3
Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
female
38.0
1
5
347077
31.3875
NaN
S
26
27
0
3
Emir, Mr. Farred Chehab
male
NaN
0
0
2631
7.2250
NaN
C
27
28
0
1
Fortune, Mr. Charles Alexander
male
19.0
3
2
19950
263.0000
C23 C25 C27
S
28
29
1
3
O'Dwyer, Miss. Ellen "Nellie"
female
NaN
0
0
330959
7.8792
NaN
Q
29
30
0
3
Todoroff, Mr. Lalio
male
NaN
0
0
349216
7.8958
NaN
S
...
...
...
...
...
...
...
...
...
...
...
...
...
861
862
0
2
Giles, Mr. Frederick Edward
male
21.0
1
0
28134
11.5000
NaN
S
862
863
1
1
Swift, Mrs. Frederick Joel (Margaret Welles Ba...
female
48.0
0
0
17466
25.9292
D17
S
863
864
0
3
Sage, Miss. Dorothy Edith "Dolly"
female
NaN
8
2
CA. 2343
69.5500
NaN
S
864
865
0
2
Gill, Mr. John William
male
24.0
0
0
233866
13.0000
NaN
S
865
866
1
2
Bystrom, Mrs. (Karolina)
female
42.0
0
0
236852
13.0000
NaN
S
866
867
1
2
Duran y More, Miss. Asuncion
female
27.0
1
0
SC/PARIS 2149
13.8583
NaN
C
867
868
0
1
Roebling, Mr. Washington Augustus II
male
31.0
0
0
PC 17590
50.4958
A24
S
868
869
0
3
van Melkebeke, Mr. Philemon
male
NaN
0
0
345777
9.5000
NaN
S
869
870
1
3
Johnson, Master. Harold Theodor
male
4.0
1
1
347742
11.1333
NaN
S
870
871
0
3
Balkic, Mr. Cerin
male
26.0
0
0
349248
7.8958
NaN
S
871
872
1
1
Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
female
47.0
1
1
11751
52.5542
D35
S
872
873
0
1
Carlsson, Mr. Frans Olof
male
33.0
0
0
695
5.0000
B51 B53 B55
S
873
874
0
3
Vander Cruyssen, Mr. Victor
male
47.0
0
0
345765
9.0000
NaN
S
874
875
1
2
Abelson, Mrs. Samuel (Hannah Wizosky)
female
28.0
1
0
P/PP 3381
24.0000
NaN
C
875
876
1
3
Najib, Miss. Adele Kiamie "Jane"
female
15.0
0
0
2667
7.2250
NaN
C
876
877
0
3
Gustafsson, Mr. Alfred Ossian
male
20.0
0
0
7534
9.8458
NaN
S
877
878
0
3
Petroff, Mr. Nedelio
male
19.0
0
0
349212
7.8958
NaN
S
878
879
0
3
Laleff, Mr. Kristo
male
NaN
0
0
349217
7.8958
NaN
S
879
880
1
1
Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
female
56.0
0
1
11767
83.1583
C50
C
880
881
1
2
Shelley, Mrs. William (Imanita Parrish Hall)
female
25.0
0
1
230433
26.0000
NaN
S
881
882
0
3
Markun, Mr. Johann
male
33.0
0
0
349257
7.8958
NaN
S
882
883
0
3
Dahlberg, Miss. Gerda Ulrika
female
22.0
0
0
7552
10.5167
NaN
S
883
884
0
2
Banfield, Mr. Frederick James
male
28.0
0
0
C.A./SOTON 34068
10.5000
NaN
S
884
885
0
3
Sutehall, Mr. Henry Jr
male
25.0
0
0
SOTON/OQ 392076
7.0500
NaN
S
885
886
0
3
Rice, Mrs. William (Margaret Norton)
female
39.0
0
5
382652
29.1250
NaN
Q
886
887
0
2
Montvila, Rev. Juozas
male
27.0
0
0
211536
13.0000
NaN
S
887
888
1
1
Graham, Miss. Margaret Edith
female
19.0
0
0
112053
30.0000
B42
S
888
889
0
3
Johnston, Miss. Catherine Helen "Carrie"
female
NaN
1
2
W./C. 6607
23.4500
NaN
S
889
890
1
1
Behr, Mr. Karl Howell
male
26.0
0
0
111369
30.0000
C148
C
890
891
0
3
Dooley, Mr. Patrick
male
32.0
0
0
370376
7.7500
NaN
Q
891 rows × 12 columns
In [53]:
orders = pd.read_table('http://bit.ly/chiporders')
orders.head()
Out[53]:
order_id
quantity
item_name
choice_description
item_price
0
1
1
Chips and Fresh Tomato Salsa
NaN
$2.39
1
1
1
Izze
[Clementine]
$3.39
2
1
1
Nantucket Nectar
[Apple]
$3.39
3
1
1
Chips and Tomatillo-Green Chili Salsa
NaN
$2.39
4
2
2
Chicken Bowl
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
$16.98
In [54]:
orders.item_name.str.upper()
Out[54]:
0 CHIPS AND FRESH TOMATO SALSA
1 IZZE
2 NANTUCKET NECTAR
3 CHIPS AND TOMATILLO-GREEN CHILI SALSA
4 CHICKEN BOWL
5 CHICKEN BOWL
6 SIDE OF CHIPS
7 STEAK BURRITO
8 STEAK SOFT TACOS
9 STEAK BURRITO
10 CHIPS AND GUACAMOLE
11 CHICKEN CRISPY TACOS
12 CHICKEN SOFT TACOS
13 CHICKEN BOWL
14 CHIPS AND GUACAMOLE
15 CHIPS AND TOMATILLO-GREEN CHILI SALSA
16 CHICKEN BURRITO
17 CHICKEN BURRITO
18 CANNED SODA
19 CHICKEN BOWL
20 CHIPS AND GUACAMOLE
21 BARBACOA BURRITO
22 NANTUCKET NECTAR
23 CHICKEN BURRITO
24 IZZE
25 CHIPS AND FRESH TOMATO SALSA
26 CHICKEN BOWL
27 CARNITAS BURRITO
28 CANNED SODA
29 CHICKEN BURRITO
...
4592 BARBACOA BURRITO
4593 CARNITAS BOWL
4594 BARBACOA BOWL
4595 CHICKEN BOWL
4596 CHIPS AND GUACAMOLE
4597 CANNED SOFT DRINK
4598 BOTTLED WATER
4599 CHICKEN BOWL
4600 CHIPS AND GUACAMOLE
4601 CANNED SOFT DRINK
4602 BARBACOA BURRITO
4603 BARBACOA BURRITO
4604 CHICKEN BOWL
4605 CHIPS AND GUACAMOLE
4606 CANNED SOFT DRINK
4607 STEAK BURRITO
4608 VEGGIE BURRITO
4609 CANNED SOFT DRINK
4610 STEAK BURRITO
4611 VEGGIE BURRITO
4612 CARNITAS BOWL
4613 CHIPS
4614 BOTTLED WATER
4615 CHICKEN SOFT TACOS
4616 CHIPS AND GUACAMOLE
4617 STEAK BURRITO
4618 STEAK BURRITO
4619 CHICKEN SALAD BOWL
4620 CHICKEN SALAD BOWL
4621 CHICKEN SALAD BOWL
Name: item_name, dtype: object
In [56]:
orders[orders.item_name.str.contains('Chicken')]
Out[56]:
order_id
quantity
item_name
choice_description
item_price
4
2
2
Chicken Bowl
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
$16.98
5
3
1
Chicken Bowl
[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...
$10.98
11
6
1
Chicken Crispy Tacos
[Roasted Chili Corn Salsa, [Fajita Vegetables,...
$8.75
12
6
1
Chicken Soft Tacos
[Roasted Chili Corn Salsa, [Rice, Black Beans,...
$8.75
13
7
1
Chicken Bowl
[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...
$11.25
16
8
1
Chicken Burrito
[Tomatillo-Green Chili Salsa (Medium), [Pinto ...
$8.49
17
9
1
Chicken Burrito
[Fresh Tomato Salsa (Mild), [Black Beans, Rice...
$8.49
19
10
1
Chicken Bowl
[Tomatillo Red Chili Salsa, [Fajita Vegetables...
$8.75
23
12
1
Chicken Burrito
[[Tomatillo-Green Chili Salsa (Medium), Tomati...
$10.98
26
13
1
Chicken Bowl
[Roasted Chili Corn Salsa (Medium), [Pinto Bea...
$8.49
29
15
1
Chicken Burrito
[Tomatillo-Green Chili Salsa (Medium), [Pinto ...
$8.49
35
18
1
Chicken Soft Tacos
[Roasted Chili Corn Salsa, Rice]
$8.75
36
18
1
Chicken Soft Tacos
[Roasted Chili Corn Salsa, [Cheese, Lettuce]]
$8.75
42
20
1
Chicken Bowl
[Roasted Chili Corn Salsa, [Rice, Black Beans,...
$11.25
44
20
1
Chicken Salad Bowl
[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...
$8.75
45
21
1
Chicken Burrito
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
$10.98
52
24
1
Chicken Burrito
[Roasted Chili Corn Salsa (Medium), [Black Bea...
$10.98
63
28
1
Chicken Burrito
[Fresh Tomato Salsa, [Rice, Black Beans, Cheese]]
$8.75
68
30
1
Chicken Burrito
[Tomatillo-Red Chili Salsa (Hot), [Black Beans...
$10.98
73
33
1
Chicken Burrito
[Tomatillo Red Chili Salsa, [Rice, Black Beans...
$8.75
75
34
1
Chicken Soft Tacos
[Tomatillo Red Chili Salsa, [Rice, Cheese, Sou...
$8.75
76
34
1
Chicken Bowl
[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...
$8.75
78
34
1
Chicken Bowl
[Fresh Tomato Salsa, [Rice, Black Beans, Chees...
$8.75
79
35
1
Chicken Soft Tacos
[Roasted Chili Corn Salsa, [Rice, Cheese, Lett...
$11.25
82
36
1
Chicken Burrito
[Fresh Tomato Salsa, [Rice, Black Beans, Chees...
$8.75
83
37
1
Chicken Burrito
[Tomatillo Green Chili Salsa, [Rice, Black Bea...
$8.75
98
43
1
Chicken Burrito
[Tomatillo Red Chili Salsa, [Rice, Black Beans...
$8.75
99
44
1
Chicken Bowl
[Tomatillo Red Chili Salsa, [Rice, Fajita Vege...
$8.75
110
49
1
Chicken Bowl
[Tomatillo Red Chili Salsa, [Rice, Black Beans...
$8.75
114
50
1
Chicken Burrito
[Tomatillo-Green Chili Salsa (Medium), [Pinto ...
$8.49
...
...
...
...
...
...
4537
1804
1
Chicken Bowl
[Fresh Tomato Salsa, [Rice, Black Beans, Chees...
$8.75
4540
1805
1
Chicken Bowl
[Tomatillo Green Chili Salsa, [Fajita Vegetabl...
$11.25
4543
1806
1
Chicken Burrito
[Fresh Tomato Salsa, [Fajita Vegetables, Rice]]
$8.75
4550
1809
1
Chicken Soft Tacos
[Fresh Tomato Salsa, [Fajita Vegetables, Chees...
$8.75
4553
1810
1
Chicken Bowl
[Roasted Chili Corn Salsa, [Black Beans, Sour ...
$11.25
4555
1811
1
Chicken Bowl
[Roasted Chili Corn Salsa, [Fajita Vegetables,...
$8.75
4556
1811
1
Chicken Burrito
[Tomatillo Green Chili Salsa, [Fajita Vegetabl...
$11.25
4558
1812
1
Chicken Burrito
[Fresh Tomato Salsa, [Rice, Black Beans, Chees...
$8.75
4559
1812
1
Chicken Burrito
[Tomatillo Red Chili Salsa, [Rice, Cheese, Gua...
$11.25
4561
1813
2
Chicken Salad Bowl
[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...
$17.50
4564
1815
1
Chicken Bowl
[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...
$11.25
4566
1816
1
Chicken Burrito
[Roasted Chili Corn Salsa, [Pinto Beans, Chees...
$8.75
4569
1817
1
Chicken Burrito
[Fresh Tomato Salsa, [Fajita Vegetables, Rice]]
$8.75
4574
1819
1
Chicken Bowl
[Roasted Chili Corn Salsa, [Rice, Cheese, Lett...
$8.75
4576
1820
1
Chicken Bowl
[Fresh Tomato Salsa, [Rice, Cheese, Sour Cream...
$11.25
4580
1822
1
Chicken Salad Bowl
[Fresh Tomato Salsa, [Black Beans, Cheese, Gua...
$11.25
4581
1822
1
Chicken Bowl
[Tomatillo Red Chili Salsa, [Rice, Black Beans...
$8.75
4583
1823
1
Chicken Bowl
[Tomatillo Green Chili Salsa, [Rice, Black Bea...
$8.75
4586
1824
1
Chicken Bowl
[Fresh Tomato Salsa, [Rice, Black Beans, Chees...
$11.25
4587
1824
1
Chicken Bowl
[Fresh Tomato Salsa, [Rice, Black Beans, Chees...
$11.25
4589
1825
1
Chicken Bowl
[Fresh Tomato Salsa, [Rice, Black Beans, Sour ...
$11.25
4590
1825
1
Chicken Bowl
[Roasted Chili Corn Salsa, [Rice, Black Beans,...
$11.25
4591
1825
1
Chicken Bowl
[Tomatillo Red Chili Salsa, [Rice, Black Beans...
$8.75
4595
1826
1
Chicken Bowl
[Tomatillo Green Chili Salsa, [Rice, Black Bea...
$8.75
4599
1827
1
Chicken Bowl
[Roasted Chili Corn Salsa, [Cheese, Lettuce]]
$8.75
4604
1828
1
Chicken Bowl
[Fresh Tomato Salsa, [Rice, Black Beans, Chees...
$8.75
4615
1832
1
Chicken Soft Tacos
[Fresh Tomato Salsa, [Rice, Cheese, Sour Cream]]
$8.75
4619
1834
1
Chicken Salad Bowl
[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...
$11.25
4620
1834
1
Chicken Salad Bowl
[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...
$8.75
4621
1834
1
Chicken Salad Bowl
[Fresh Tomato Salsa, [Fajita Vegetables, Pinto...
$8.75
1560 rows × 5 columns
In [57]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks.head()
Out[57]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
In [59]:
drinks.drop(2, axis=0).head()
Out[59]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
5
Antigua & Barbuda
102
128
45
4.9
North America
In [62]:
drinks.mean(axis=1).head()
Out[62]:
0 0.000
1 69.975
2 9.925
3 176.850
4 81.225
dtype: float64
In [63]:
ufo = pd.read_csv('http://bit.ly/uforeports')
ufo.head()
Out[63]:
City
Colors Reported
Shape Reported
State
Time
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
3
Abilene
NaN
DISK
KS
6/1/1931 13:00
4
New York Worlds Fair
NaN
LIGHT
NY
4/18/1933 19:00
In [64]:
ufo.dtypes
Out[64]:
City object
Colors Reported object
Shape Reported object
State object
Time object
dtype: object
In [69]:
ufo.Time.str.slice(-5, -3).astype(int).head()
Out[69]:
0 22
1 20
2 14
3 13
4 19
Name: Time, dtype: int64
In [70]:
ufo['Time'] = pd.to_datetime(ufo.Time)
ufo.dtypes
Out[70]:
City object
Colors Reported object
Shape Reported object
State object
Time datetime64[ns]
dtype: object
In [71]:
ufo.head()
Out[71]:
City
Colors Reported
Shape Reported
State
Time
0
Ithaca
NaN
TRIANGLE
NY
1930-06-01 22:00:00
1
Willingboro
NaN
OTHER
NJ
1930-06-30 20:00:00
2
Holyoke
NaN
OVAL
CO
1931-02-15 14:00:00
3
Abilene
NaN
DISK
KS
1931-06-01 13:00:00
4
New York Worlds Fair
NaN
LIGHT
NY
1933-04-18 19:00:00
In [75]:
ufo.Time.dt.dayofyear.head()
Out[75]:
0 152
1 181
2 46
3 152
4 108
Name: Time, dtype: int64
In [76]:
ts = pd.to_datetime('1/1/1999')
In [79]:
ufo.loc[ufo.Time > ts, :].head()
Out[79]:
City
Colors Reported
Shape Reported
State
Time
12832
Loma Rica
NaN
LIGHT
CA
1999-01-01 02:30:00
12833
Bauxite
NaN
NaN
AR
1999-01-01 03:00:00
12834
Florence
NaN
CYLINDER
SC
1999-01-01 14:00:00
12835
Lake Henshaw
NaN
CIGAR
CA
1999-01-01 15:00:00
12836
Wilmington Island
NaN
LIGHT
GA
1999-01-01 17:15:00
In [80]:
%matplotlib inline
In [81]:
ufo['Year'] = ufo.Time.dt.year
ufo.head()
Out[81]:
City
Colors Reported
Shape Reported
State
Time
Year
0
Ithaca
NaN
TRIANGLE
NY
1930-06-01 22:00:00
1930
1
Willingboro
NaN
OTHER
NJ
1930-06-30 20:00:00
1930
2
Holyoke
NaN
OVAL
CO
1931-02-15 14:00:00
1931
3
Abilene
NaN
DISK
KS
1931-06-01 13:00:00
1931
4
New York Worlds Fair
NaN
LIGHT
NY
1933-04-18 19:00:00
1933
In [85]:
ufo.Year.value_counts().sort_index().plot()
Out[85]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d082d10>
In [86]:
ufo.sample(3)
Out[86]:
City
Colors Reported
Shape Reported
State
Time
Year
17939
Greenup
NaN
FIREBALL
KY
2000-11-14 21:00:00
2000
4164
Wenatchee
NaN
TRIANGLE
WA
1979-08-15 22:00:00
1979
9884
Millsboro
NaN
TRIANGLE
DE
1997-01-15 18:00:00
1997
In [88]:
ufo.sample(frac=0.001)
Out[88]:
City
Colors Reported
Shape Reported
State
Time
Year
2983
Shoreline
NaN
EGG
WA
1975-07-13 19:30:00
1975
14529
Cotton
NaN
TRIANGLE
MN
1999-09-10 22:00:00
1999
5330
Pendleton
NaN
RECTANGLE
OR
1985-09-01 13:30:00
1985
10508
St. Louis
BLUE
LIGHT
MO
1997-07-04 21:15:00
1997
5241
Route 7
NaN
RECTANGLE
NY
1985-06-15 02:30:00
1985
16331
Plano
NaN
TRIANGLE
TX
2000-04-27 23:38:00
2000
17198
Beaumont
NaN
NaN
TX
2000-08-15 17:00:00
2000
7942
Everett
NaN
NaN
WA
1995-03-22 21:30:00
1995
4934
Grand Prairie
NaN
NaN
TX
1983-07-15 18:00:00
1983
9257
Langley
NaN
NaN
WA
1996-04-26 03:30:00
1996
9402
Carlsbad
NaN
DISK
NM
1996-06-15 12:00:00
1996
16322
San Rafael
NaN
LIGHT
CA
2000-04-25 23:30:00
2000
44
Blairsden
GREEN
SPHERE
CA
1946-06-30 19:00:00
1946
9834
Poulsbo
NaN
NaN
WA
1996-12-21 00:00:00
1996
9216
Phillipsburg
NaN
NaN
PA
1996-04-12 23:55:00
1996
3983
North Branch
NaN
RECTANGLE
MN
1978-11-21 23:45:00
1978
7158
Bellevue
NaN
SPHERE
WA
1993-08-10 20:30:00
1993
7029
Topeka
NaN
DISK
KS
1993-05-22 20:50:00
1993
In [89]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks.head()
Out[89]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
In [90]:
drinks.index
Out[90]:
RangeIndex(start=0, stop=193, step=1)
In [91]:
drinks.loc[23, :]
Out[91]:
country Brazil
beer_servings 245
spirit_servings 145
wine_servings 16
total_litres_of_pure_alcohol 7.2
continent South America
Name: 23, dtype: object
In [92]:
drinks.set_index('country', inplace=True)
In [93]:
drinks.head()
Out[93]:
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
country
Afghanistan
0
0
0
0.0
Asia
Albania
89
132
54
4.9
Europe
Algeria
25
0
14
0.7
Africa
Andorra
245
138
312
12.4
Europe
Angola
217
57
45
5.9
Africa
In [94]:
drinks.loc['Brazil', 'beer_servings']
Out[94]:
245
In [95]:
drinks.index.name = None
In [96]:
drinks.head()
Out[96]:
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
Afghanistan
0
0
0
0.0
Asia
Albania
89
132
54
4.9
Europe
Algeria
25
0
14
0.7
Africa
Andorra
245
138
312
12.4
Europe
Angola
217
57
45
5.9
Africa
In [97]:
drinks.reset_index(inplace=True)
In [98]:
drinks.head()
Out[98]:
index
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
In [99]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks.head()
Out[99]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
In [101]:
drinks.set_index('country', inplace=True)
drinks.head()
Out[101]:
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
country
Afghanistan
0
0
0
0.0
Asia
Albania
89
132
54
4.9
Europe
Algeria
25
0
14
0.7
Africa
Andorra
245
138
312
12.4
Europe
Angola
217
57
45
5.9
Africa
In [103]:
drinks.continent.head()
Out[103]:
country
Afghanistan Asia
Albania Europe
Algeria Africa
Andorra Europe
Angola Africa
Name: continent, dtype: object
In [105]:
drinks.continent.value_counts()['Africa']
Out[105]:
53
In [106]:
drinks.continent.value_counts().sort_index().head()
Out[106]:
Africa 53
Asia 44
Europe 45
North America 23
Oceania 16
Name: continent, dtype: int64
In [107]:
people = pd.Series([3000000, 85000], index=['Albania', 'Andorra'], name='population')
people
Out[107]:
Albania 3000000
Andorra 85000
Name: population, dtype: int64
In [109]:
drinks.beer_servings * people
Out[109]:
Afghanistan NaN
Albania 267000000.0
Algeria NaN
Andorra 20825000.0
Angola NaN
Antigua & Barbuda NaN
Argentina NaN
Armenia NaN
Australia NaN
Austria NaN
Azerbaijan NaN
Bahamas NaN
Bahrain NaN
Bangladesh NaN
Barbados NaN
Belarus NaN
Belgium NaN
Belize NaN
Benin NaN
Bhutan NaN
Bolivia NaN
Bosnia-Herzegovina NaN
Botswana NaN
Brazil NaN
Brunei NaN
Bulgaria NaN
Burkina Faso NaN
Burundi NaN
Cabo Verde NaN
Cambodia NaN
...
Sudan NaN
Suriname NaN
Swaziland NaN
Sweden NaN
Switzerland NaN
Syria NaN
Tajikistan NaN
Tanzania NaN
Thailand NaN
Timor-Leste NaN
Togo NaN
Tonga NaN
Trinidad & Tobago NaN
Tunisia NaN
Turkey NaN
Turkmenistan NaN
Tuvalu NaN
USA NaN
Uganda NaN
Ukraine NaN
United Arab Emirates NaN
United Kingdom NaN
Uruguay NaN
Uzbekistan NaN
Vanuatu NaN
Venezuela NaN
Vietnam NaN
Yemen NaN
Zambia NaN
Zimbabwe NaN
dtype: float64
In [110]:
pd.concat([drinks, people], axis=1).head()
Out[110]:
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
population
Afghanistan
0
0
0
0.0
Asia
NaN
Albania
89
132
54
4.9
Europe
3000000.0
Algeria
25
0
14
0.7
Africa
NaN
Andorra
245
138
312
12.4
Europe
85000.0
Angola
217
57
45
5.9
Africa
NaN
In [111]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')
drinks.head()
Out[111]:
country
beer_servings
spirit_servings
wine_servings
total_litres_of_pure_alcohol
continent
0
Afghanistan
0
0
0
0.0
Asia
1
Albania
89
132
54
4.9
Europe
2
Algeria
25
0
14
0.7
Africa
3
Andorra
245
138
312
12.4
Europe
4
Angola
217
57
45
5.9
Africa
In [112]:
drinks.beer_servings.mean()
Out[112]:
106.16062176165804
In [114]:
drinks.groupby('continent').beer_servings.mean()
Out[114]:
continent
Africa 61.471698
Asia 37.045455
Europe 193.777778
North America 145.434783
Oceania 89.687500
South America 175.083333
Name: beer_servings, dtype: float64
In [116]:
drinks[drinks.continent=='Africa'].beer_servings.mean()
Out[116]:
61.471698113207545
In [117]:
drinks.groupby('continent').beer_servings.agg(['count', 'min', 'max', 'mean'])
Out[117]:
count
min
max
mean
continent
Africa
53
0
376
61.471698
Asia
44
0
247
37.045455
Europe
45
0
361
193.777778
North America
23
1
285
145.434783
Oceania
16
0
306
89.687500
South America
12
93
333
175.083333
In [118]:
%matplotlib inline
In [121]:
drinks.groupby('continent').mean().plot(kind='bar')
Out[121]:
<matplotlib.axes._subplots.AxesSubplot at 0x10de95190>
In [126]:
ufo = pd.read_csv('http://bit.ly/uforeports')
ufo.head()
Out[126]:
City
Colors Reported
Shape Reported
State
Time
0
Ithaca
NaN
TRIANGLE
NY
6/1/1930 22:00
1
Willingboro
NaN
OTHER
NJ
6/30/1930 20:00
2
Holyoke
NaN
OVAL
CO
2/15/1931 14:00
3
Abilene
NaN
DISK
KS
6/1/1931 13:00
4
New York Worlds Fair
NaN
LIGHT
NY
4/18/1933 19:00
In [127]:
ufo.drop('City', axis=1, inplace=True)
In [128]:
ufo.head()
Out[128]:
Colors Reported
Shape Reported
State
Time
0
NaN
TRIANGLE
NY
6/1/1930 22:00
1
NaN
OTHER
NJ
6/30/1930 20:00
2
NaN
OVAL
CO
2/15/1931 14:00
3
NaN
DISK
KS
6/1/1931 13:00
4
NaN
LIGHT
NY
4/18/1933 19:00
In [130]:
ufo.dropna(how='any').shape
Out[130]:
(2490, 4)
In [131]:
ufo.shape
Out[131]:
(18241, 4)
Content source: reetawwsum/Jupyter-Notebooks
Similar notebooks: