The last few weeks we have been using low-level methods to read data in to Python and manipulate it. This week we will be exploring pandas to accelerate this process.
Pandas is based around the notion that arrays can be indexed in a flexible manner, and that we can structure our data access around the indexing labels.
We will start out, as we often do, by applying our boilerplate setup.
In [2]:
%matplotlib inline
In [3]:
import pandas as pd
import matplotlib.pyplot as plt
Pandas provides a number of read_* options, including read_csv, which we will use here.
One important note about read_csv in particular is that there are over 50 possible arguments to it. This allows for intensely flexible specification of how to read data in, how to parse it, and very detailed control over things like encoding of files and so forth. This flexibility is designed to eliminate the need to pre-process any data files before importing, but it can also make for a complex import process if you only have to adjust a few columns. We will use this in some of its more simple ways here.
Below, we read the building inventory file into an object called df (for Data Frame).
In [5]:
df = pd.read_csv("data-readonly/IL_Building_Inventory.csv")
One of the first things we can do is examine the columns that the dataframe has identified.
In [6]:
df.columns
Out[6]:
Index(['Agency Name', 'Location Name', 'Address', 'City', 'Zip code', 'County',
'Congress Dist', 'Congressional Full Name', 'Rep Dist', 'Rep Full Name',
'Senate Dist', 'Senator Full Name', 'Bldg Status', 'Year Acquired',
'Year Constructed', 'Square Footage', 'Total Floors',
'Floors Above Grade', 'Floors Below Grade', 'Usage Description',
'Usage Description 2', 'Usage Description 3'],
dtype='object')
In [9]:
df.head()
Out[9]:
Agency Name
Location Name
Address
City
Zip code
County
Congress Dist
Congressional Full Name
Rep Dist
Rep Full Name
...
Bldg Status
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
Usage Description
Usage Description 2
Usage Description 3
0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
1975
1975
144
1
1
0
Unusual
Unusual
Not provided
1
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004
2004
144
1
1
0
Unusual
Unusual
Not provided
2
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004
2004
144
1
1
0
Unusual
Unusual
Not provided
3
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004
2004
144
1
1
0
Unusual
Unusual
Not provided
4
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004
2004
144
1
1
0
Unusual
Unusual
Not provided
5 rows × 22 columns
In [11]:
df.tail()
Out[11]:
Agency Name
Location Name
Address
City
Zip code
County
Congress Dist
Congressional Full Name
Rep Dist
Rep Full Name
...
Bldg Status
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
Usage Description
Usage Description 2
Usage Description 3
8857
Department of Transportation
Belvidere Maintenance Storage Facility - Boone...
9797 Illinois Rte. 76
Belvidere
61008
Boone
16
Adam Kinzinger
69
Sosnowski Joe
...
In Use
0
0
432
1
0
0
Storage
NaN
NaN
8858
Department of Transportation
Belvidere Maintenance Storage Facility - Boone...
9797 Illinois Rte 76
Belvidere
61008
Boone
16
Adam Kinzinger
69
Sosnowski Joe
...
In Use
0
0
330
1
0
0
Storage
NaN
NaN
8859
Department of Transportation
Quincy Maintenance Storage Facility
800 Koch's Lane
Quincy
62305
Adams
18
Darin M. LaHood
94
Frese Randy E.
...
In Use
0
1987
130
1
0
0
Storage
High Hazard
NaN
8860
Illinois Community College Board
Illinois Valley Community College - Oglesby
815 North Orlando Smith Avenue
Oglesby
61348
LaSalle
16
Adam Kinzinger
76
Long Jerry Lee
...
In Use
1971
1971
49552
1
1
0
Education
Education
Not provided
8861
Department of Military Affairs
Peoria Army Aviation Support Facility
2323 S. Airport Rd
Peoria
61607
Peoria
17
Cheri Bustos
92
Gordon-Booth Jehan
...
In Progress
0
2017
288
1
0
0
Utiility & Miscellan
Utiility & Miscellan
NaN
5 rows × 22 columns
In [23]:
df.describe()
Out[23]:
Zip code
Congress Dist
Rep Dist
Senate Dist
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
count
8862.000000
8862.000000
8862.000000
8862.000000
8862.000000
8862.000000
8.862000e+03
8862.000000
8862.000000
8862.000000
mean
61821.076845
13.404085
92.303318
46.408599
1972.593320
1906.135184
1.147603e+04
1.636087
1.449334
0.161589
std
1095.203357
4.037936
23.568457
11.781038
27.491941
351.180642
3.817263e+04
1.537603
1.286898
0.392717
min
1235.000000
0.000000
0.000000
0.000000
1753.000000
0.000000
0.000000e+00
0.000000
0.000000
0.000000
25%
61105.000000
12.000000
79.000000
40.000000
1960.000000
1953.000000
2.330000e+02
1.000000
1.000000
0.000000
50%
62023.000000
14.000000
97.000000
49.000000
1976.000000
1974.000000
1.600000e+03
1.000000
1.000000
0.000000
75%
62650.000000
16.000000
110.000000
55.000000
1993.000000
1991.000000
6.426500e+03
2.000000
1.000000
0.000000
max
68297.000000
18.000000
119.000000
60.000000
2019.000000
2019.000000
1.200000e+06
31.000000
30.000000
4.000000
In [24]:
df.dtypes
Out[24]:
Agency Name object
Location Name object
Address object
City object
Zip code int64
County object
Congress Dist int64
Congressional Full Name object
Rep Dist int64
Rep Full Name object
Senate Dist int64
Senator Full Name object
Bldg Status object
Year Acquired int64
Year Constructed int64
Square Footage int64
Total Floors int64
Floors Above Grade int64
Floors Below Grade int64
Usage Description object
Usage Description 2 object
Usage Description 3 object
dtype: object
In [28]:
df.groupby(["Agency Name"])["Square Footage"].sum()
Out[28]:
Agency Name
Appellate Court / Fifth District 15124
Appellate Court / Fourth District 16400
Appellate Court / Second District 43330
Appellate Court / Third District 18700
Chicago State University 1219492
Department of Agriculture 2608398
Department of Central Management Services 4260911
Department of Corrections 15120750
Department of Human Services 8466774
Department of Juvenile Justice 1147982
Department of Military Affairs 4579470
Department of Natural Resources 3937319
Department of Public Health 7160
Department of Revenue 913236
Department of State Police 828851
Department of Transportation 5659737
Department of Veterans' Affairs 1483981
Eastern Illinois University 1164674
Governor's Office 45120
Governors State University 1055971
Historic Preservation Agency 1667954
IL State Board of Education 19147
Illinois Board of Higher Education 545816
Illinois Community College Board 486473
Illinois Courts 54540
Illinois Emergency Management Agency 55650
Illinois Medical District Commission 46200
Illinois State University 2960272
Northeastern Illinois University 1110103
Northern Illinois University 3751095
Office of the Attorney General 60500
Office of the Secretary of State 2273828
Southern Illinois University 8709473
University of Illinois 25018006
Western Illinois University 2348109
Name: Square Footage, dtype: int64
In [29]:
df["Agency Name"].value_counts()
Out[29]:
Department of Natural Resources 3223
Department of Corrections 1428
Department of Transportation 1137
Department of Human Services 617
University of Illinois 525
Southern Illinois University 420
Historic Preservation Agency 284
Department of Military Affairs 231
Department of Agriculture 228
Department of Juvenile Justice 120
Department of State Police 109
Illinois State University 102
Department of Veterans' Affairs 94
Northern Illinois University 79
Department of Central Management Services 60
Western Illinois University 42
Office of the Secretary of State 41
Eastern Illinois University 35
Northeastern Illinois University 18
Chicago State University 16
Illinois Community College Board 15
Governors State University 11
Illinois Board of Higher Education 10
Illinois Medical District Commission 3
Illinois Emergency Management Agency 2
Appellate Court / Third District 2
Department of Public Health 2
Appellate Court / Fifth District 1
Department of Revenue 1
Illinois Courts 1
Appellate Court / Second District 1
Appellate Court / Fourth District 1
Governor's Office 1
Office of the Attorney General 1
IL State Board of Education 1
Name: Agency Name, dtype: int64
In [30]:
df.describe()
Out[30]:
Zip code
Congress Dist
Rep Dist
Senate Dist
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
count
8862.000000
8862.000000
8862.000000
8862.000000
8862.000000
8862.000000
8.862000e+03
8862.000000
8862.000000
8862.000000
mean
61821.076845
13.404085
92.303318
46.408599
1972.593320
1906.135184
1.147603e+04
1.636087
1.449334
0.161589
std
1095.203357
4.037936
23.568457
11.781038
27.491941
351.180642
3.817263e+04
1.537603
1.286898
0.392717
min
1235.000000
0.000000
0.000000
0.000000
1753.000000
0.000000
0.000000e+00
0.000000
0.000000
0.000000
25%
61105.000000
12.000000
79.000000
40.000000
1960.000000
1953.000000
2.330000e+02
1.000000
1.000000
0.000000
50%
62023.000000
14.000000
97.000000
49.000000
1976.000000
1974.000000
1.600000e+03
1.000000
1.000000
0.000000
75%
62650.000000
16.000000
110.000000
55.000000
1993.000000
1991.000000
6.426500e+03
2.000000
1.000000
0.000000
max
68297.000000
18.000000
119.000000
60.000000
2019.000000
2019.000000
1.200000e+06
31.000000
30.000000
4.000000
In [31]:
df["Total Floors"].median()
Out[31]:
1.0
In [32]:
df.median()
Out[32]:
Zip code 62023.0
Congress Dist 14.0
Rep Dist 97.0
Senate Dist 49.0
Year Acquired 1976.0
Year Constructed 1974.0
Square Footage 1600.0
Total Floors 1.0
Floors Above Grade 1.0
Floors Below Grade 0.0
dtype: float64
In [35]:
df.quantile([0.1, 0.2, 0.9])
Out[35]:
Zip code
Congress Dist
Rep Dist
Senate Dist
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
0.1
60450.0
10.0
64.0
32.0
1935.0
1929.0
80.0
1.0
1.0
0.0
0.2
61001.0
12.0
75.0
38.0
1953.0
1947.0
150.0
1.0
1.0
0.0
0.9
62901.0
18.0
116.0
58.0
2001.0
2001.0
25568.1
3.0
3.0
1.0
In [38]:
df["Agency Name"].apply(lambda a: a.upper()).head()
Out[38]:
0 DEPARTMENT OF NATURAL RESOURCES
1 DEPARTMENT OF NATURAL RESOURCES
2 DEPARTMENT OF NATURAL RESOURCES
3 DEPARTMENT OF NATURAL RESOURCES
4 DEPARTMENT OF NATURAL RESOURCES
Name: Agency Name, dtype: object
In [39]:
df["Agency Name"].apply(lambda a: a).head()
Out[39]:
0 Department of Natural Resources
1 Department of Natural Resources
2 Department of Natural Resources
3 Department of Natural Resources
4 Department of Natural Resources
Name: Agency Name, dtype: object
In [46]:
"This is my string".lower()
Out[46]:
'this is my string'
In [45]:
"this is my string. here is another.".capitalize()
Out[45]:
'This is my string. here is another.'
In [63]:
df = pd.read_csv("data-readonly/IL_Building_Inventory.csv", na_values={'Year Acquired': 0, 'Year Constructed': 0})
In [64]:
df.count()
Out[64]:
Agency Name 8862
Location Name 8862
Address 8811
City 8862
Zip code 8862
County 8837
Congress Dist 8862
Congressional Full Name 8699
Rep Dist 8862
Rep Full Name 8839
Senate Dist 8862
Senator Full Name 8839
Bldg Status 8862
Year Acquired 8597
Year Constructed 8573
Square Footage 8862
Total Floors 8862
Floors Above Grade 8862
Floors Below Grade 8862
Usage Description 8862
Usage Description 2 8832
Usage Description 3 8774
dtype: int64
In [65]:
df.iloc[10]
Out[65]:
Agency Name Department of Natural Resources
Location Name Matthiessen State Park - LaSalle County
Address R. R. 178, Box 509
City Utica
Zip code 61373
County LaSalle
Congress Dist 16
Congressional Full Name Adam Kinzinger
Rep Dist 76
Rep Full Name Long Jerry Lee
Senate Dist 38
Senator Full Name Sue Rezin
Bldg Status In Use
Year Acquired 2000
Year Constructed 2000
Square Footage 144
Total Floors 1
Floors Above Grade 1
Floors Below Grade 0
Usage Description Unusual
Usage Description 2 Unusual
Usage Description 3 Not provided
Name: 10, dtype: object
In [73]:
df.iloc[10]
Out[73]:
Agency Name Department of Natural Resources
Location Name Matthiessen State Park - LaSalle County
Address R. R. 178, Box 509
City Utica
Zip code 61373
County LaSalle
Congress Dist 16
Congressional Full Name Adam Kinzinger
Rep Dist 76
Rep Full Name Long Jerry Lee
Senate Dist 38
Senator Full Name Sue Rezin
Bldg Status In Use
Year Acquired 2000
Year Constructed 2000
Square Footage 144
Total Floors 1
Floors Above Grade 1
Floors Below Grade 0
Usage Description Unusual
Usage Description 2 Unusual
Usage Description 3 Not provided
Name: 10, dtype: object
In [68]:
df.loc[10, ["County", "Senate Dist"]]
Out[68]:
County LaSalle
Senate Dist 38
Name: 10, dtype: object
In [75]:
year = df.groupby("Year Acquired")
In [88]:
df.index = df["Year Acquired"]
In [89]:
df.head()
Out[89]:
Agency Name
Location Name
Address
City
Zip code
County
Congress Dist
Congressional Full Name
Rep Dist
Rep Full Name
...
Bldg Status
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
Usage Description
Usage Description 2
Usage Description 3
Year Acquired
1975.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
1975.0
1975.0
144
1
1
0
Unusual
Unusual
Not provided
2004.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004.0
2004.0
144
1
1
0
Unusual
Unusual
Not provided
2004.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004.0
2004.0
144
1
1
0
Unusual
Unusual
Not provided
2004.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004.0
2004.0
144
1
1
0
Unusual
Unusual
Not provided
2004.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004.0
2004.0
144
1
1
0
Unusual
Unusual
Not provided
5 rows × 22 columns
In [91]:
df.loc[1970].head()
Out[91]:
Agency Name
Location Name
Address
City
Zip code
County
Congress Dist
Congressional Full Name
Rep Dist
Rep Full Name
...
Bldg Status
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
Usage Description
Usage Description 2
Usage Description 3
Year Acquired
1970.0
Governors State University
Governors State University - Will County
Governor's Hwy & Univ Pkwy
University Park
60466
Will
3
Daniel William Lipinski
85
Connor John
...
In Use
1970.0
1970.0
10000
2
2
0
Storage
Storage
Not provided
1970.0
Department of Natural Resources
Chain O'Lakes CA and SP - McHenry County
39947 North State Park Road
Spring Grove
60081
McHenry
14
Randy Hultgren
64
Wheeler Barbara
...
In Use
1970.0
1970.0
1440
1
1
0
Assembly
Assembly
Not provided
1970.0
Office of the Secretary of State
Capitol Complex
1st And Capitol
Springfield
62704
Sangamon
13
Rodney L. Davis
96
Scherer Sue
...
In Use
1970.0
1970.0
500
2
1
1
Industrial
Industrial
Not provided
1970.0
Department of Transportation
Dixon Springs Maintenance Storage Facility - P...
Rt. 145 1 Mi. S Of Rt. 146
Dixon Springs
62943
Pope
15
John Shimkus
118
Phelps Brandon W.
...
In Use
1970.0
1970.0
240
1
1
0
Storage
Storage
Not provided
1970.0
Department of Transportation
Anna Maintenance Storage Facility - Union County
215 North Lime Kiln Road
Anna
62906
Union
12
Mike Bost
118
Phelps Brandon W.
...
In Use
1970.0
1970.0
612
1
1
0
Storage
Storage
Not provided
5 rows × 22 columns
In [93]:
df.head()
Out[93]:
Agency Name
Location Name
Address
City
Zip code
County
Congress Dist
Congressional Full Name
Rep Dist
Rep Full Name
...
Bldg Status
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
Usage Description
Usage Description 2
Usage Description 3
Year Acquired
1975.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
1975.0
1975.0
144
1
1
0
Unusual
Unusual
Not provided
2004.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004.0
2004.0
144
1
1
0
Unusual
Unusual
Not provided
2004.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004.0
2004.0
144
1
1
0
Unusual
Unusual
Not provided
2004.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004.0
2004.0
144
1
1
0
Unusual
Unusual
Not provided
2004.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
In Use
2004.0
2004.0
144
1
1
0
Unusual
Unusual
Not provided
5 rows × 22 columns
In [95]:
df.loc[1974]
Out[95]:
Agency Name
Location Name
Address
City
Zip code
County
Congress Dist
Congressional Full Name
Rep Dist
Rep Full Name
...
Bldg Status
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
Usage Description
Usage Description 2
Usage Description 3
Year Acquired
1974.0
Department of Human Services
Howe Developmental Center - Tinley Park
7600 West 183rd Street
Tinley Park
60477
Cook
0
NaN
38
Riley Al
...
In Use
1974.0
1974.0
112
1
1
0
Storage
Storage
Not provided
1974.0
Department of Natural Resources
Union County Conservation Area
R. R. 2
Jonesboro
62952
Union
12
Mike Bost
115
Bryant Terri
...
In Use
1974.0
1974.0
120
1
1
0
Storage
Storage
Not provided
1974.0
Department of Central Management Services
Statewide Program
4200 North Oak Park Ave
Chicago
60634
Statewide
0
NaN
119
District Multiple
...
Abandon
1974.0
1974.0
2000
1
1
0
Unusual
Unusual
Unusual
1974.0
Department of Natural Resources
Sand Ridge Forest - Mason County
25799 E County Road 2300 N.
Forest City
61532
Mason
18
Darin M. LaHood
93
Hammond Norine K.
...
In Use
1974.0
1974.0
1800
1
1
0
Storage
Storage
Not provided
1974.0
Department of Natural Resources
Mississippi State Fish & Wildlife Area
R. R. Box 182
Grafton
62037
Jersey
17
Cheri Bustos
97
Batinick Mark
...
In Use
1974.0
1974.0
27
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Mississippi State Fish & Wildlife Area
R. R. Box 182
Grafton
62037
Jersey
17
Cheri Bustos
97
Batinick Mark
...
In Use
1974.0
1974.0
27
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Woodford County Conservation Area
R. R. #1
Lowpoint
61545
Woodford
18
Darin M. LaHood
73
Spain Ryan
...
In Use
1974.0
1974.0
560
1
1
0
Assembly
Assembly
Not provided
1974.0
Department of Natural Resources
Woodford County Conservation Area
R. R. #1
Lowpoint
61545
Woodford
18
Darin M. LaHood
73
Spain Ryan
...
In Use
1974.0
1974.0
160
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Woodford County Conservation Area
R. R. #1
Lowpoint
61545
Woodford
18
Darin M. LaHood
73
Spain Ryan
...
In Use
1974.0
1974.0
160
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kaskaskia River Fish & Wildlife Area - Randolp...
Rt. 1, Box 49
Baldwin
62217
Randolph
12
Mike Bost
116
Costello, II Jerry
...
In Use
1974.0
1974.0
1470
1
1
0
Storage
Storage
Not provided
1974.0
Department of Natural Resources
Apple River Canyon State Park - Jo Daviess County
8763 E. Canyon Rd.
Apple River
61001
Jo Daviess
17
Cheri Bustos
89
Stewart Brian W.
...
In Use
1974.0
1974.0
380
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Ferne Clyffe State Park - Johnson County
Rte 38 So
Goreville
62939
Johnson
15
John Shimkus
118
Phelps Brandon W.
...
In Use
1974.0
1974.0
18
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Ferne Clyffe State Park - Johnson County
Rte 37 So
Goreville
62939
Johnson
15
John Shimkus
118
Phelps Brandon W.
...
In Use
1974.0
1974.0
18
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Illinois Beach State Park - Lake County
Il Beach State Park, Ranger
Zion
60099
Lake
10
Robert Dold
61
Jesiel Sheri
...
In Use
1974.0
1974.0
2157
2
1
1
Business
Business
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Kickapoo State Park - Vermilion County
Rr #1, Box 374
Oakwood
61858
Vermilion
15
John Shimkus
104
Hays Chad
...
In Use
1974.0
1974.0
20
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Natural Resources
Lake Le-Aqua-Na State Park - Stephenson County
8542 North Lake Road
Lena
61048
Stephenson
17
Cheri Bustos
89
Stewart Brian W.
...
In Use
1974.0
1974.0
560
1
1
0
Assembly
Assembly
Not provided
1974.0
Department of Natural Resources
Johnson-Sauk Trail State Park - Henry County
27500 N. 1200 Avenue
Kewanee
61443
Henry
17
Cheri Bustos
74
Swanson Daniel
...
In Use
1974.0
1974.0
50
1
1
0
Unusual
Unusual
Not provided
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
1974.0
Department of Central Management Services
Statewide Program
4200 North Oak Park Ave
Chicago
60634
Statewide
0
NaN
119
District Multiple
...
Abandon
1974.0
1974.0
14725
2
2
0
Health Care
Health Care
Assembly
1974.0
Department of Central Management Services
Statewide Program
4200 North Oak Park Ave
Chicago
60634
Statewide
0
NaN
119
District Multiple
...
Abandon
1974.0
1974.0
14725
2
2
0
Health Care
Health Care
Assembly
1974.0
Department of Central Management Services
Statewide Program
4200 North Oak Park Ave
Chicago
60634
Statewide
0
NaN
119
District Multiple
...
In Use
1974.0
1974.0
54600
4
3
1
Education
Education
Assembly
1974.0
Department of Central Management Services
Statewide Program
4200 North Oak Park Ave
Chicago
60634
Statewide
0
NaN
119
District Multiple
...
In Use
1974.0
1974.0
28000
4
3
1
Assembly
Assembly
Education
1974.0
Department of Central Management Services
Statewide Program
4200 North Oak Park Ave
Chicago
60634
Statewide
0
NaN
119
District Multiple
...
In Use
1974.0
1974.0
15000
2
1
1
Industrial
Industrial
Not provided
1974.0
Department of Corrections
Stateville Correctional Center - Joliet
Rt 53 & Division St
Joliet
60434
Will
11
Bill Foster
86
Walsh, Jr. Lawrence M.
...
Abandon
1974.0
1974.0
1050
1
1
0
Detention & Correc
Education
Not provided
1974.0
Department of Corrections
Menard Correctional Center - Randolph County
Route 3 & Rainbow Drive
Menard
62259
Randolph
12
Mike Bost
116
Costello, II Jerry
...
In Use
1974.0
1974.0
288
2
2
0
Detention & Correc
Detention & Correc
Not provided
1974.0
Department of Corrections
Menard Correctional Center - Randolph County
Route 3 & Rainbow Drive
Menard
62259
Randolph
12
Mike Bost
116
Costello, II Jerry
...
In Use
1974.0
1974.0
25
1
1
0
Detention & Correc
Detention & Correc
Not provided
1974.0
Department of Corrections
Vienna Correctional Center - Johnson County
P.o. Box 200, Hwy 146e
Vienna
62995
Johnson
15
John Shimkus
118
Phelps Brandon W.
...
In Use
1974.0
1974.0
50
1
1
0
Business
Business
Not provided
1974.0
Department of Juvenile Justice
Illinois Youth Center - Warrenville
30w200 Ferry Road
Warrenville
60555
DuPage
6
Peter J. Roskam
41
Wehrli Grant
...
In Use
1974.0
1974.0
1000
1
1
0
Storage
Storage
Not provided
1974.0
Department of Juvenile Justice
Illinois Youth Center - Warrenville
30w200 Ferry Road
Warrenville
60555
DuPage
6
Peter J. Roskam
41
Wehrli Grant
...
In Use
1974.0
1974.0
4295
1
1
0
Detention & Correc
Detention & Correc
Residential
1974.0
Department of Juvenile Justice
Illinois Youth Center - Warrenville
30w200 Ferry Road
Warrenville
60555
DuPage
6
Peter J. Roskam
41
Wehrli Grant
...
In Use
1974.0
1974.0
4295
1
1
0
Detention & Correc
Detention & Correc
Residential
1974.0
Department of Juvenile Justice
Illinois Youth Center - Warrenville
30w200 Ferry Road
Warrenville
60555
DuPage
6
Peter J. Roskam
41
Wehrli Grant
...
In Use
1974.0
1974.0
4295
1
1
0
Detention & Correc
Detention & Correc
Residential
1974.0
Department of Juvenile Justice
Illinois Youth Center - St. Charles
38 West 060 Rte 38
St Charles
60174
Kane
14
Randy Hultgren
50
Wheeler Keith R.
...
In Use
1974.0
1974.0
288
1
1
0
Industrial
Industrial
Not provided
1974.0
Department of Corrections
Menard Correctional Center - Randolph County
Route 3 & Rainbow Drive
Menard
62259
Randolph
12
Mike Bost
116
Costello, II Jerry
...
In Use
1974.0
1974.0
144
3
2
1
Detention & Correc
Detention & Correc
Not provided
1974.0
Department of Corrections
Menard Correctional Center - Randolph County
Route 3 & Rainbow Drive
Menard
62259
Randolph
12
Mike Bost
116
Costello, II Jerry
...
In Use
1974.0
1936.0
116
3
2
1
Detention & Correc
Detention & Correc
Not provided
1974.0
Department of Transportation
Buckley Maintenance Storage Facility - Iroquoi...
I 57
Buckley
60918
Iroquois
16
Adam Kinzinger
106
Bennett Thomas M.
...
In Use
1974.0
1973.0
3200
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Transportation
Buckley Maintenance Storage Facility - Iroquoi...
I 57
Buckley
60918
Iroquois
16
Adam Kinzinger
106
Bennett Thomas M.
...
In Use
1974.0
1973.0
3200
1
1
0
Unusual
Unusual
Not provided
1974.0
Department of Transportation
Wyoming Maintenance Storage Facility - Stark C...
South Seventh Street
Wyoming
61491
Marshall
18
Darin M. LaHood
73
Spain Ryan
...
In Use
1974.0
1974.0
4224
1
1
0
Storage
Storage
Not provided
1974.0
IL State Board of Education
The Philip J. Rock Center and School - Glen Ellyn
Rte 38 & 53
Glen Ellyn
60137
DuPage
6
Peter J. Roskam
48
Breen Peter
...
In Use
1974.0
1974.0
19147
3
2
1
Education
Education
Residential
1974.0
Department of State Police
Sterling District 1 - Whiteside County
3107 East Lincolnway
Sterling
61081
Whiteside
17
Cheri Bustos
71
McCombie Tony
...
In Use
1974.0
1974.0
192
1
1
0
Storage
Storage
Not provided
1974.0
Department of State Police
Effingham District 12 - Effingham County
401 Industrial Ave
Effingham
62401
Effingham
15
John Shimkus
107
Cavaletto John
...
In Use
1974.0
1974.0
280
1
1
0
Unusual
Unusual
Not provided
1974.0
Office of the Secretary of State
Motor Vehicle Services Facility - Springfield
2701 Dirksen Parkway
Springfield
62703
Sangamon
13
Rodney L. Davis
96
Scherer Sue
...
In Use
1974.0
1974.0
131400
2
2
0
Business
Business
Not provided
1974.0
University of Illinois
University of Illinois Urbana-Champaign
50 East Gerty Drive
Champaign
61820
Champaign
13
Rodney L. Davis
103
Ammons Carol
...
In Use
1974.0
1974.0
32017
3
2
1
Business
Business
Not provided
1974.0
University of Illinois
University of Illinois Urbana-Champaign
11 Airport Road
Savoy
61874
Champaign
13
Rodney L. Davis
103
Ammons Carol
...
In Use
1974.0
1974.0
2750
1
1
0
Industrial
Storage
Not provided
1974.0
Southern Illinois University
Southern Illinois University - Carbondale
1000 Faner Drive
Carbondale
62901
Jackson
12
Mike Bost
115
Bryant Terri
...
In Use
1974.0
1974.0
277831
7
6
1
Education
Unusual
Not provided
1974.0
Northern Illinois University
Northern Illinois University - DeKalb
Northern Illinois University
Dekalb
60115
DeKalb
16
Adam Kinzinger
70
Pritchard Robert W.
...
In Use
1974.0
1974.0
4698
1
1
0
Storage
Storage
Not provided
1974.0
University of Illinois
University of Illinois - Springfield
County Road 1 South
Mechanicsburg
62794
Sangamon
13
Rodney L. Davis
99
Wojcicki Jimene Sara
...
In Use
1974.0
1974.0
800
1
1
0
Industrial
Industrial
Not provided
1974.0
University of Illinois
University of Illinois - Springfield
1301 West Lake Drive
Springfield
62794
Sangamon
13
Rodney L. Davis
99
Wojcicki Jimene Sara
...
In Use
1974.0
1974.0
5594
4
3
1
Residential
Residential
Not provided
1974.0
University of Illinois
University of Illinois - Springfield
1301 West Lake Drive
Springfield
62794
Sangamon
13
Rodney L. Davis
99
Wojcicki Jimene Sara
...
In Use
1974.0
1974.0
971
1
1
0
Storage
Storage
Not provided
192 rows × 22 columns
In [96]:
df.loc[0]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
1433 if not ax.contains(key):
-> 1434 error()
1435 except TypeError as e:
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in error()
1428 raise KeyError("the label [%s] is not in the [%s]" %
-> 1429 (key, self.obj._get_axis_name(axis)))
1430
KeyError: 'the label [0] is not in the [index]'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-96-549f4325178e> in <module>()
----> 1 df.loc[0]
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
1326 else:
1327 key = com._apply_if_callable(key, self.obj)
-> 1328 return self._getitem_axis(key, axis=0)
1329
1330 def _is_scalar_access(self, key):
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1549
1550 # fall thru to straight lookup
-> 1551 self._has_valid_type(key, axis)
1552 return self._get_label(key, axis=axis)
1553
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
1440 raise
1441 except:
-> 1442 error()
1443
1444 return True
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in error()
1427 "key")
1428 raise KeyError("the label [%s] is not in the [%s]" %
-> 1429 (key, self.obj._get_axis_name(axis)))
1430
1431 try:
KeyError: 'the label [0] is not in the [index]'
In [100]:
df = pd.read_csv("data-readonly/IL_Building_Inventory.csv",
na_values={'Year Acquired': 0, 'Year Constructed': 0})
In [101]:
df.index
Out[101]:
RangeIndex(start=0, stop=8862, step=1)
In [104]:
df2 = df.set_index("Year Acquired")
In [105]:
df2.index
Out[105]:
Float64Index([1975.0, 2004.0, 2004.0, 2004.0, 2004.0, 2004.0, 2000.0, 2000.0,
2000.0, 2000.0,
...
2017.0, 2019.0, 2019.0, nan, nan, nan, nan, nan,
1971.0, nan],
dtype='float64', name='Year Acquired', length=8862)
In [106]:
df2.loc[1975].head()
Out[106]:
Agency Name
Location Name
Address
City
Zip code
County
Congress Dist
Congressional Full Name
Rep Dist
Rep Full Name
...
Senator Full Name
Bldg Status
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
Usage Description
Usage Description 2
Usage Description 3
Year Acquired
1975.0
Department of Natural Resources
Anderson Lake Conservation Area - Fulton County
Anderson Lake C.a.
Astoria
61501
Fulton
17
Cheri Bustos
93
Hammond Norine K.
...
Jil Tracy
In Use
1975.0
144
1
1
0
Unusual
Unusual
Not provided
1975.0
Department of State Police
Effingham District 12 - Effingham County
2 Miles East Of Mason
Effingham
62454
Effingham
15
John Shimkus
107
Cavaletto John
...
Kyle McCarter
In Use
1975.0
120
1
1
0
Industrial
Industrial
Not provided
1975.0
Department of Natural Resources
Pyramid State Park - Perry County
Rr #1, Box298
Pinckneyville
62274
Perry
12
Mike Bost
116
Costello, II Jerry
...
Paul Schimpf
In Use
1975.0
2400
1
1
0
Business
Storage
Not provided
1975.0
Department of Natural Resources
Wolf Creek State Park
R.r. 1 Box 99
Windsor
62534
Shelby
15
John Shimkus
102
Halbrook Brad
...
Chapin Rose
In Use
1975.0
1860
1
1
0
Unusual
Unusual
Not provided
1975.0
Department of Natural Resources
Wolf Creek State Park
R.r. 1 Box 99
Windsor
62534
Shelby
15
John Shimkus
102
Halbrook Brad
...
Chapin Rose
In Use
1975.0
20
1
1
0
Unusual
Unusual
Not provided
5 rows × 21 columns
In [109]:
df2.iloc[[1974, 1975]]
Out[109]:
Agency Name
Location Name
Address
City
Zip code
County
Congress Dist
Congressional Full Name
Rep Dist
Rep Full Name
...
Senator Full Name
Bldg Status
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
Usage Description
Usage Description 2
Usage Description 3
Year Acquired
1963.0
Department of Natural Resources
MARION CO FISH-R-KINMUNDY
6401 Mecham Road
Kinmundy
62854
Marion
0
NaN
107
Cavaletto John
...
Kyle McCarter
In Use
1963.0
2000
2
2
0
Residential
Residential
Not provided
1965.0
Department of Natural Resources
MARION CO FISH-R-KINMUNDY
Sam Parr Biological Station
Kinmundy
62854
Marion
0
NaN
107
Cavaletto John
...
Kyle McCarter
In Use
1965.0
2560
1
1
0
Industrial
Industrial
Not provided
2 rows × 21 columns
In [110]:
keith = df.set_index("City")
In [112]:
keith.loc["Kinmundy"].describe()
Out[112]:
Zip code
Congress Dist
Rep Dist
Senate Dist
Year Acquired
Year Constructed
Square Footage
Total Floors
Floors Above Grade
Floors Below Grade
count
55.000000
55.000000
55.000000
55.000000
53.000000
53.000000
55.000000
55.000000
55.000000
55.0
mean
62682.454545
12.490909
107.800000
54.381818
1981.245283
1981.264151
670.600000
1.036364
1.036364
0.0
std
186.461874
5.590576
2.444949
1.224607
16.208860
16.219825
998.700927
0.188919
0.188919
0.0
min
62263.000000
0.000000
107.000000
54.000000
1950.000000
1950.000000
16.000000
1.000000
1.000000
0.0
25%
62584.000000
15.000000
107.000000
54.000000
1974.000000
1974.000000
24.000000
1.000000
1.000000
0.0
50%
62584.000000
15.000000
107.000000
54.000000
1974.000000
1974.000000
196.000000
1.000000
1.000000
0.0
75%
62854.000000
15.000000
107.000000
54.000000
1996.000000
1996.000000
660.000000
1.000000
1.000000
0.0
max
62854.000000
15.000000
117.000000
59.000000
2009.000000
2009.000000
4200.000000
2.000000
2.000000
0.0
In [113]:
In [118]:
names = ["date", "city", "state", "country", "shape", "duration_seconds", "duration_reported", "description", "report_date", "latitude", "longitude"]
In [121]:
ufo = pd.read_csv("data-readonly/ufo-scrubbed-geocoded-time-standardized.csv",
names = names, parse_dates = ["date", "report_date"])
In [122]:
ufo.dtypes
Out[122]:
date datetime64[ns]
city object
state object
country object
shape object
duration_seconds float64
duration_reported object
description object
report_date datetime64[ns]
latitude float64
longitude float64
dtype: object
In [123]:
ufo.describe()
Out[123]:
duration_seconds
latitude
longitude
count
8.033200e+04
80332.000000
80332.000000
mean
9.016889e+03
38.124416
-86.772885
std
6.202168e+05
10.469585
39.697205
min
1.000000e-03
-82.862752
-176.658056
25%
3.000000e+01
34.134722
-112.073333
50%
1.800000e+02
39.411111
-87.903611
75%
6.000000e+02
42.788333
-78.755000
max
9.783600e+07
72.700000
178.441900
In [129]:
sum_seconds = ufo.groupby("state")["duration_seconds"].sum()
In [131]:
sum_seconds.sort_values() / (365*24*3600)
Out[131]:
state
yk 0.000324
pe 0.000326
yt 0.000507
nf 0.000666
pr 0.000922
nt 0.001039
sa 0.001223
pq 0.001751
nb 0.002321
dc 0.003645
sk 0.004072
nd 0.004837
de 0.005042
qc 0.005191
ns 0.005800
vt 0.010490
wy 0.011972
ne 0.013659
sd 0.016213
ri 0.016662
id 0.017015
ab 0.018725
mb 0.019460
ia 0.021290
md 0.024086
bc 0.027567
al 0.030532
ks 0.031364
mt 0.034749
nh 0.036332
...
or 0.062261
il 0.073586
nc 0.075969
nv 0.083969
wi 0.086566
wv 0.096154
ut 0.111653
ms 0.115596
oh 0.121203
ky 0.121884
nm 0.134020
in 0.136016
co 0.144334
hi 0.215672
la 0.217719
nj 0.254109
mi 0.255138
tx 0.276491
ny 0.297516
pa 0.326689
ga 0.340368
ok 0.362630
ct 0.401774
va 0.435646
az 0.507069
ca 1.202704
fl 1.798539
wa 1.833226
ar 2.130185
on 2.684267
Name: duration_seconds, Length: 67, dtype: float64
In [132]:
states = ufo.groupby("state")
In [133]:
states["duration_seconds"].mean()
Out[133]:
state
ab 1773.351351
ak 4231.830508
al 1393.408828
ar 100867.138889
az 5946.797731
bc 1103.225660
ca 3928.374984
co 3024.394751
ct 13089.214928
dc 1161.224545
de 868.904372
fl 13504.459262
ga 7968.701633
hi 19267.543909
ia 949.626591
id 968.556498
il 877.362219
in 3094.797763
ks 1514.714395
ky 4205.386761
la 11481.575251
ma 1312.187776
mb 3959.309677
md 833.780790
me 3066.431122
mi 3885.088653
mn 1411.120564
mo 1144.388198
ms 8784.190361
mt 2148.723529
...
nm 5185.816675
ns 1279.048951
nt 1638.100000
nv 2926.016519
ny 2914.711572
oh 1576.187773
ok 14929.384204
on 53441.317677
or 1064.210450
pa 3990.113091
pe 605.117647
pq 613.494444
pr 881.515152
qc 919.660112
ri 1811.881034
sa 1285.266667
sc 1102.498885
sd 2608.640306
sk 1310.357143
tn 1610.461526
tx 2371.343283
ut 4739.033647
va 9862.555276
vt 1077.587948
wa 13545.595628
wi 2047.964966
wv 6239.355967
wy 1841.629273
yk 1459.714286
yt 1228.769231
Name: duration_seconds, Length: 67, dtype: float64
In [153]:
ufo.loc[ ufo["duration_seconds"] > 900 , ["state", "duration_seconds", "shape"] ].groupby("state")["duration_seconds"].sum()
Out[153]:
state
ab 538980.0
ak 1424520.0
al 833760.0
ar 67072060.0
az 15454118.0
bc 752845.0
ca 36134927.0
co 4307084.0
ct 12486902.0
dc 98874.0
de 124918.0
fl 55930974.0
ga 10503360.0
hi 6738330.0
ia 536289.0
id 446312.0
il 1788347.0
in 4053510.0
ks 880575.0
ky 3690050.0
la 6759962.0
ma 1549755.0
mb 590280.0
md 608694.0
me 1806480.0
mi 7662699.0
mn 1350930.0
mo 1528211.0
ms 3568533.0
mt 1007850.0
...
nm 4100389.0
ns 149136.0
nt 29400.0
nv 2480905.0
ny 8797885.0
oh 3359160.0
ok 11299680.0
on 84384178.0
or 1635428.0
pa 9832828.0
pe 6300.0
pq 43560.0
pr 20700.0
qc 138600.0
ri 471600.0
sa 36600.0
sc 983385.0
sd 479220.0
sk 116700.0
tn 1717685.0
tx 8110517.0
ut 3395018.0
va 13515652.0
vt 269160.0
wa 57106208.0
wi 2470803.0
wv 2940834.0
wy 346560.0
yk 9300.0
yt 12840.0
Name: duration_seconds, Length: 67, dtype: float64
In [156]:
ufo["date"].min()
Out[156]:
Timestamp('1906-11-11 00:00:00')
In [157]:
ufo["date"].max()
Out[157]:
Timestamp('2014-05-08 18:45:00')
In [161]:
first_sighting = ufo.groupby("state")["date"].min()
last_sighting = ufo.groupby("state")["date"].max()
last_sighting - first_sighting
Out[161]:
state
ab 22476 days 23:46:00
ak 28325 days 23:00:00
al 27369 days 01:00:00
ar 23385 days 13:44:00
az 24686 days 01:45:00
bc 25576 days 06:45:00
ca 28023 days 23:30:00
co 30391 days 20:30:00
ct 23103 days 23:10:00
dc 22138 days 13:30:00
de 21801 days 08:00:00
fl 25498 days 07:30:00
ga 26270 days 04:50:00
hi 19564 days 23:30:00
ia 27321 days 20:20:00
id 24400 days 11:31:00
il 32268 days 05:30:00
in 34286 days 20:00:00
ks 30278 days 10:30:00
ky 25769 days 09:15:00
la 25832 days 22:15:00
ma 22293 days 08:00:00
mb 22967 days 03:10:00
md 24403 days 02:30:00
me 24749 days 01:00:00
mi 26637 days 11:00:00
mn 24433 days 07:30:00
mo 38111 days 21:21:00
ms 23670 days 04:30:00
mt 22211 days 10:32:00
...
nm 25602 days 12:30:00
ns 15929 days 03:40:00
nt 10798 days 10:30:00
nv 24399 days 12:02:00
ny 30654 days 01:04:00
oh 24443 days 17:30:00
ok 24358 days 02:25:00
on 23063 days 11:02:00
or 30982 days 18:15:00
pa 24433 days 02:12:00
pe 27010 days 05:30:00
pq 9630 days 05:41:00
pr 18291 days 04:46:00
qc 21792 days 17:15:00
ri 25498 days 14:00:00
sa 16107 days 12:00:00
sc 27339 days 03:00:00
sd 20975 days 03:00:00
sk 14439 days 10:40:00
tn 26229 days 23:00:00
tx 37957 days 09:00:00
ut 25508 days 12:00:00
va 25133 days 21:10:00
vt 20454 days 00:20:00
wa 24782 days 00:00:00
wi 24415 days 07:15:00
wv 23743 days 07:45:00
wy 22934 days 06:00:00
yk 6349 days 20:50:00
yt 7553 days 08:56:00
Name: date, Length: 67, dtype: timedelta64[ns]
In [165]:
first_sighting.index
Out[165]:
Index(['ab', 'ak', 'al', 'ar', 'az', 'bc', 'ca', 'co', 'ct', 'dc', 'de', 'fl',
'ga', 'hi', 'ia', 'id', 'il', 'in', 'ks', 'ky', 'la', 'ma', 'mb', 'md',
'me', 'mi', 'mn', 'mo', 'ms', 'mt', 'nb', 'nc', 'nd', 'ne', 'nf', 'nh',
'nj', 'nm', 'ns', 'nt', 'nv', 'ny', 'oh', 'ok', 'on', 'or', 'pa', 'pe',
'pq', 'pr', 'qc', 'ri', 'sa', 'sc', 'sd', 'sk', 'tn', 'tx', 'ut', 'va',
'vt', 'wa', 'wi', 'wv', 'wy', 'yk', 'yt'],
dtype='object', name='state')
In [167]:
ufo["state"].nunique()
Out[167]:
67
In [169]:
ufo["country"].unique()
Out[169]:
array(['us', nan, 'gb', 'ca', 'au', 'de'], dtype=object)
In [173]:
ufo["country"] = ufo["country"].astype("category")
ufo["shape"] = ufo["shape"].astype("category")
ufo["state"] = ufo["state"].astype("category")
In [174]:
ufo["city"].nunique()
Out[174]:
19900
In [175]:
ufo.shape
Out[175]:
(80332, 11)
In [179]:
ufo.groupby("city").count().nlargest(10, "date")
Out[179]:
date
state
country
shape
duration_seconds
duration_reported
description
report_date
latitude
longitude
city
seattle
525
525
524
473
525
525
524
525
525
525
phoenix
454
454
454
438
454
454
454
454
454
454
portland
374
374
373
355
374
374
374
374
374
374
las vegas
368
368
367
357
368
368
368
368
368
368
los angeles
353
353
352
348
353
353
353
353
353
353
san diego
338
338
338
328
338
338
338
338
338
338
houston
297
297
297
293
297
297
296
297
297
297
chicago
265
265
264
257
265
265
265
265
265
265
tucson
241
241
241
237
241
241
241
241
241
241
miami
239
239
239
230
239
239
239
239
239
239
In [182]:
ufo.dtypes
Out[182]:
date datetime64[ns]
city object
state category
country category
shape category
duration_seconds float64
duration_reported object
description object
report_date datetime64[ns]
latitude float64
longitude float64
dtype: object
In [184]:
shape_times = ufo.groupby("shape")["duration_seconds"].sum()
In [186]:
shape_times.index
Out[186]:
CategoricalIndex(['changed', 'changing', 'chevron', 'cigar', 'circle', 'cone',
'crescent', 'cross', 'cylinder', 'delta', 'diamond', 'disk',
'dome', 'egg', 'fireball', 'flare', 'flash', 'formation',
'hexagon', 'light', 'other', 'oval', 'pyramid', 'rectangle',
'round', 'sphere', 'teardrop', 'triangle', 'unknown'],
categories=['changed', 'changing', 'chevron', 'cigar', 'circle', 'cone', 'crescent', 'cross', ...], ordered=False, name='shape', dtype='category')
In [185]:
shape_times.plot()
Out[185]:
<matplotlib.axes._subplots.AxesSubplot at 0x7facde55af98>
In [189]:
shape_times.sort_values().plot()
Out[189]:
<matplotlib.axes._subplots.AxesSubplot at 0x7facdef309b0>
In [201]:
shape_times.nlargest(5)
Out[201]:
shape
light 2.181668e+08
sphere 1.173682e+08
other 1.165627e+08
circle 3.627088e+07
unknown 3.097290e+07
Name: duration_seconds, dtype: float64
In [203]:
shape_state = ufo.groupby(["state", "shape"])
In [205]:
times = shape_state["duration_seconds"].sum()
In [210]:
times.loc[ ["il", "mi", "oh"], ["sphere", "unknown"] ]
Out[210]:
state shape
il sphere 90337.00
unknown 130649.50
mi sphere 126384.30
unknown 232625.00
oh sphere 132796.50
unknown 702877.05
Name: duration_seconds, dtype: float64
In [212]:
times.loc["il":"ok", "sphere":"unknown"]
Out[212]:
state shape
il sphere 90337.00
teardrop 11508.00
triangle 319862.01
unknown 130649.50
in sphere 64863.00
teardrop 2528.00
triangle 144674.00
unknown 357716.00
ks sphere 19858.00
teardrop 6457.00
triangle 39154.00
unknown 211927.00
ky sphere 32884.00
teardrop 3420.00
triangle 84589.50
unknown 70038.00
la sphere 69414.00
teardrop 8852.00
triangle 34620.00
unknown 6356899.00
ma sphere 81274.00
teardrop 3847.00
triangle 82519.00
unknown 195458.00
mb sphere 8690.00
triangle 46193.00
unknown 23527.00
md sphere 47169.00
teardrop 1777.00
triangle 86253.00
...
nj sphere 92185.00
teardrop 3345.00
triangle 83989.00
unknown 719678.00
nm sphere 74126.00
teardrop 29955.50
triangle 82508.39
unknown 135486.20
ns sphere 16569.00
teardrop 1416.00
triangle 4020.00
unknown 7160.00
nt sphere 900.00
unknown 320.00
nv sphere 1282555.00
teardrop 8164.00
triangle 36728.50
unknown 60179.00
ny sphere 178021.00
teardrop 14063.00
triangle 212210.30
unknown 285847.00
oh sphere 132796.50
teardrop 26946.00
triangle 577885.00
unknown 702877.05
ok sphere 38560.30
teardrop 1280.00
triangle 227335.00
unknown 152659.00
Name: duration_seconds, Length: 107, dtype: float64
In [217]:
unsorted_nonsense = times.sort_index()
In [219]:
unsorted_nonsense.loc["il":"ok"]
Out[219]:
state shape
il changing 104672.00
chevron 12885.00
cigar 27140.00
circle 248601.50
cone 2404.00
cross 2135.00
cylinder 21268.00
diamond 15441.00
disk 109203.50
egg 283910.00
fireball 80942.00
flash 15532.00
formation 80328.75
light 474224.81
other 111285.00
oval 87292.00
rectangle 15598.00
sphere 90337.00
teardrop 11508.00
triangle 319862.01
unknown 130649.50
in changing 24620.00
chevron 1228.00
cigar 16922.00
circle 93794.50
cone 16203.00
cross 20.00
cylinder 6960.00
delta 14400.00
diamond 13854.00
...
oh formation 32233.00
light 834661.00
other 339451.50
oval 62381.00
rectangle 9556.00
sphere 132796.50
teardrop 26946.00
triangle 577885.00
unknown 702877.05
ok changing 46870.00
chevron 778.00
cigar 10432.00
circle 10593937.00
cone 21050.00
cross 495.00
cylinder 7795.00
diamond 4868.00
disk 76342.50
egg 3785.00
fireball 20647.50
flash 566.00
formation 8672.00
light 126701.00
other 60161.00
oval 25883.00
rectangle 5088.00
sphere 38560.30
teardrop 1280.00
triangle 227335.00
unknown 152659.00
Name: duration_seconds, Length: 554, dtype: float64
In [221]:
unsorted_nonsense.loc["il":"ok"].plot()
Out[221]:
<matplotlib.axes._subplots.AxesSubplot at 0x7facde560e80>
In [226]:
ufo.set_index("date", inplace=True)
In [228]:
ufo.resample("A")["duration_seconds"].sum()
Out[228]:
date
1906-12-31 1.080000e+04
1907-12-31 NaN
1908-12-31 NaN
1909-12-31 NaN
1910-12-31 2.400000e+02
1911-12-31 NaN
1912-12-31 NaN
1913-12-31 NaN
1914-12-31 NaN
1915-12-31 NaN
1916-12-31 6.000000e+01
1917-12-31 NaN
1918-12-31 NaN
1919-12-31 NaN
1920-12-31 6.000000e+01
1921-12-31 NaN
1922-12-31 NaN
1923-12-31 NaN
1924-12-31 NaN
1925-12-31 6.000000e+01
1926-12-31 NaN
1927-12-31 NaN
1928-12-31 NaN
1929-12-31 6.000000e+01
1930-12-31 1.200000e+03
1931-12-31 1.860000e+03
1932-12-31 NaN
1933-12-31 1.800000e+03
1934-12-31 5.000000e+00
1935-12-31 NaN
...
1985-12-31 2.351740e+05
1986-12-31 2.206870e+05
1987-12-31 2.898583e+06
1988-12-31 5.649914e+06
1989-12-31 5.467870e+05
1990-12-31 2.728360e+05
1991-12-31 6.706685e+07
1992-12-31 2.911250e+05
1993-12-31 7.046784e+06
1994-12-31 1.106235e+07
1995-12-31 4.250211e+06
1996-12-31 3.552288e+06
1997-12-31 3.456926e+06
1998-12-31 7.337902e+06
1999-12-31 2.796181e+06
2000-12-31 2.956642e+06
2001-12-31 1.314867e+07
2002-12-31 6.844939e+07
2003-12-31 4.989784e+06
2004-12-31 2.253751e+07
2005-12-31 5.577597e+06
2006-12-31 1.137782e+07
2007-12-31 1.101497e+07
2008-12-31 2.058855e+07
2009-12-31 1.303383e+07
2010-12-31 9.210828e+07
2011-12-31 1.258026e+07
2012-12-31 6.820438e+07
2013-12-31 3.043190e+07
2014-12-31 1.930258e+06
Freq: A-DEC, Name: duration_seconds, Length: 109, dtype: float64
In [233]:
myplot = ufo.resample("10A")["duration_seconds"].sum().plot()
myplot.set_yscale('log')
In [235]:
r = ufo.resample("10A")
In [237]:
r["duration_seconds"].sum()
Out[237]:
date
1906-12-31 1.080000e+04
1916-12-31 3.000000e+02
1926-12-31 1.200000e+02
1936-12-31 6.305000e+03
1946-12-31 8.942870e+05
1956-12-31 4.835465e+05
1966-12-31 1.371158e+07
1976-12-31 3.597365e+07
1986-12-31 1.781056e+08
1996-12-31 1.026377e+08
2006-12-31 1.426284e+08
2016-12-31 2.498924e+08
Freq: 10A-DEC, Name: duration_seconds, dtype: float64
In [240]:
ufo.resample("W")["duration_seconds"].sum()
Out[240]:
date
1906-11-11 10800.00
1906-11-18 NaN
1906-11-25 NaN
1906-12-02 NaN
1906-12-09 NaN
1906-12-16 NaN
1906-12-23 NaN
1906-12-30 NaN
1907-01-06 NaN
1907-01-13 NaN
1907-01-20 NaN
1907-01-27 NaN
1907-02-03 NaN
1907-02-10 NaN
1907-02-17 NaN
1907-02-24 NaN
1907-03-03 NaN
1907-03-10 NaN
1907-03-17 NaN
1907-03-24 NaN
1907-03-31 NaN
1907-04-07 NaN
1907-04-14 NaN
1907-04-21 NaN
1907-04-28 NaN
1907-05-05 NaN
1907-05-12 NaN
1907-05-19 NaN
1907-05-26 NaN
1907-06-02 NaN
...
2013-10-20 113983.50
2013-10-27 218682.50
2013-11-03 192999.00
2013-11-10 218170.00
2013-11-17 1376755.50
2013-11-24 432983.00
2013-12-01 149513.50
2013-12-08 130273.00
2013-12-15 77970.00
2013-12-22 92732.50
2013-12-29 121536.00
2014-01-05 2408500.00
2014-01-12 93786.00
2014-01-19 96740.00
2014-01-26 75544.00
2014-02-02 65642.00
2014-02-09 49998.50
2014-02-16 254023.00
2014-02-23 81940.50
2014-03-02 81376.00
2014-03-09 103976.00
2014-03-16 114800.00
2014-03-23 73526.53
2014-03-30 78260.00
2014-04-06 84559.00
2014-04-13 162560.00
2014-04-20 69973.00
2014-04-27 259356.00
2014-05-04 76259.00
2014-05-11 26368.00
Freq: W-SUN, Name: duration_seconds, Length: 5610, dtype: float64
In [252]:
day_of_week = ufo.index.dayofweek
In [253]:
ufo["day_of_week"] = day_of_week
In [257]:
ufo.groupby("day_of_week")["duration_seconds"].sum().plot()
Out[257]:
<matplotlib.axes._subplots.AxesSubplot at 0x7facddbec860>
In [260]:
ufo.groupby("state").sum().loc["tx"]
Out[260]:
duration_seconds 8.719429e+06
latitude 1.144141e+05
longitude -3.584412e+05
day_of_week 1.137300e+04
Name: tx, dtype: float64
In [261]:
ufo.reset_index()
Out[261]:
date
city
state
country
shape
duration_seconds
duration_reported
description
report_date
latitude
longitude
day_of_week
0
1949-10-10 20:30:00
san marcos
tx
us
cylinder
2700.0
45 minutes
This event took place in early fall around 194...
2004-04-27
29.883056
-97.941111
0
1
1949-10-10 21:00:00
lackland afb
tx
NaN
light
7200.0
1-2 hrs
1949 Lackland AFB, TX. Lights racing acros...
2005-12-16
29.384210
-98.581082
0
2
1955-10-10 17:00:00
chester (uk/england)
NaN
gb
circle
20.0
20 seconds
Green/Orange circular disc over Chester, En...
2008-01-21
53.200000
-2.916667
0
3
1956-10-10 21:00:00
edna
tx
us
circle
20.0
1/2 hour
My older brother and twin sister were leaving ...
2004-01-17
28.978333
-96.645833
2
4
1960-10-10 20:00:00
kaneohe
hi
us
light
900.0
15 minutes
AS a Marine 1st Lt. flying an FJ4B fighter/att...
2004-01-22
21.418056
-157.803611
0
5
1961-10-10 19:00:00
bristol
tn
us
sphere
300.0
5 minutes
My father is now 89 my brother 52 the girl wit...
2007-04-27
36.595000
-82.188889
1
6
1965-10-10 21:00:00
penarth (uk/wales)
NaN
gb
circle
180.0
about 3 mins
penarth uk circle 3mins stayed 30ft above m...
2006-02-14
51.434722
-3.180000
6
7
1965-10-10 23:45:00
norwalk
ct
us
disk
1200.0
20 minutes
A bright orange color changing to reddish colo...
1999-10-02
41.117500
-73.408333
6
8
1966-10-10 20:00:00
pell city
al
us
disk
180.0
3 minutes
Strobe Lighted disk shape object observed clos...
2009-03-19
33.586111
-86.286111
0
9
1966-10-10 21:00:00
live oak
fl
us
disk
120.0
several minutes
Saucer zaps energy from powerline as my pregna...
2005-05-11
30.294722
-82.984167
0
10
1968-10-10 13:00:00
hawthorne
ca
us
circle
300.0
5 min.
ROUND , ORANGE , WITH WHAT I WOULD SAY W...
2003-10-31
33.916389
-118.351667
3
11
1968-10-10 19:00:00
brevard
nc
us
fireball
180.0
3 minutes
silent red /orange mass of energy floated by t...
2008-06-12
35.233333
-82.734444
3
12
1970-10-10 16:00:00
bellmore
ny
us
disk
1800.0
30 min.
silver disc seen by family and neighbors
2000-05-11
40.668611
-73.527500
5
13
1970-10-10 19:00:00
manchester
ky
us
unknown
180.0
3 minutes
Slow moving, silent craft accelerated at an...
2008-02-14
37.153611
-83.761944
5
14
1971-10-10 21:00:00
lexington
nc
us
oval
30.0
30 seconds
green oval shaped light over my local church&#...
2010-02-14
35.823889
-80.253611
6
15
1972-10-10 19:00:00
harlan county
ky
us
circle
1200.0
20minutes
On october 10, 1972 myself,my 5yrs.daugh...
2005-09-15
36.843056
-83.321944
1
16
1972-10-10 22:30:00
west bloomfield
mi
us
disk
120.0
2 minutes
The UFO was so close, my battery in the car...
2007-08-14
42.537778
-83.233056
1
17
1973-10-10 19:00:00
niantic
ct
us
disk
1800.0
20-30 min
Oh, what a night ! Two (2) saucer-shape...
2003-09-24
41.325278
-72.193611
2
18
1973-10-10 23:00:00
bermuda nas
NaN
NaN
light
20.0
20 sec.
saw fast moving blip on the radar scope thin w...
2002-01-11
32.364167
-64.678611
2
19
1974-10-10 19:30:00
hudson
ma
us
other
2700.0
45 minutes
Not sure of the eact month or year of this sig...
1999-08-10
42.391667
-71.566667
3
20
1974-10-10 21:30:00
cardiff (uk/wales)
NaN
gb
disk
1200.0
20 minutes
back in 1974 I was 19 at the time and lived i...
2007-02-01
51.500000
-3.200000
3
21
1974-10-10 23:00:00
hudson
ks
us
light
1200.0
one hour?
The light chased us.
2004-07-25
38.105556
-98.659722
3
22
1975-10-10 17:00:00
north charleston
sc
us
light
360.0
5-6 minutes
Several Flashing UFO lights over Charleston Na...
2008-02-14
32.854444
-79.975000
4
23
1976-10-10 20:30:00
washougal
wa
us
oval
60.0
1 minute
Three extremely large lights hanging above nea...
2014-02-07
45.582778
-122.352222
6
24
1976-10-10 22:00:00
stoke mandeville (uk/england)
NaN
gb
cigar
3.0
3 seconds
White object over Buckinghamshire UK.
2009-12-12
51.783333
-0.783333
6
25
1977-10-10 12:00:00
san antonio
tx
us
other
30.0
30 seconds
i was about six or seven and my family and me ...
2005-02-24
29.423889
-98.493333
0
26
1977-10-10 22:00:00
louisville
ky
us
light
30.0
approx: 30 seconds
HBCCUFO CANADIAN REPORT: Pilot Sighting Of Un...
2004-03-17
38.254167
-85.759444
0
27
1978-10-10 02:00:00
elmont
ny
us
rectangle
300.0
5min
A memory I will never forget that happened men...
2007-02-01
40.700833
-73.713333
1
28
1979-10-10 00:00:00
poughkeepsie
ny
us
chevron
900.0
15 minutes
1/4 moon-like, its 'chord' or flat s...
2005-04-16
41.700278
-73.921389
2
29
1979-10-10 22:00:00
saddle lake (canada)
ab
NaN
triangle
270.0
4.5 or more min.
Lights far above, that glance; then flee f...
2005-01-19
53.970571
-111.689885
2
...
...
...
...
...
...
...
...
...
...
...
...
...
80302
2012-09-09 20:00:00
wilson
nc
us
light
10800.0
3 hours
Bright orb being chased by a jet along with se...
2012-09-24
35.721111
-77.915833
6
80303
2012-09-09 20:10:00
elmont
ny
us
circle
600.0
10 minutes
Orange lights seen in Elmont, Long Island&#...
2012-09-24
40.700833
-73.713333
6
80304
2012-09-09 20:30:00
mt. juliet
tn
us
light
120.0
2 minutes
Bright white light moving slowly across sky wi...
2012-09-24
36.200000
-86.518611
6
80305
2012-09-09 20:30:00
ventura
ca
us
chevron
900.0
15 minutes
Beautiful bright blue delta shaped aerobatics.
2012-09-24
34.278333
-119.292222
6
80306
2012-09-09 20:52:00
south jordan
ut
us
circle
10.0
10 seconds
Circular disk with blinking lights scares two ...
2012-09-24
40.562222
-111.928889
6
80307
2012-09-09 21:00:00
elkhart
in
us
oval
600.0
10 minutes
It was the night of sept 9 between 9 and 10 pm...
2012-09-24
41.681944
-85.976667
6
80308
2012-09-09 21:00:00
new york city (brooklyn)
ny
us
light
1290.0
21:30
Glowing, circular lights visible in the clo...
2012-09-24
40.714167
-74.006389
6
80309
2012-09-09 21:00:00
pawleys island
sc
us
oval
60.0
less than a minute
One large bright orange flanked by three small...
2012-09-24
33.433056
-79.121667
6
80310
2012-09-09 21:00:00
ventura
ca
us
circle
300.0
5 minutes
Bright Blue Object seen floating in sky near C...
2012-09-24
34.278333
-119.292222
6
80311
2012-09-09 21:55:00
charleston
sc
us
flash
900.0
15 minutes
Orb of light flashing reds and blues, stati...
2012-09-24
32.776389
-79.931111
6
80312
2012-09-09 23:00:00
gainesville
ga
us
light
5.0
5 seconds
Ball of light
2012-09-24
34.297778
-83.824167
6
80313
2013-09-09 00:15:00
norfolk
va
us
unknown
1.0
split second
Two or three lights shoot across sky over nava...
2013-09-30
36.846667
-76.285556
0
80314
2013-09-09 01:50:00
buffalo (west of; on highway 90 west)
ny
us
triangle
180.0
3 minutes
Massive Flat Black triangle with 3 red lights.
2013-09-30
42.886389
-78.878611
0
80315
2013-09-09 03:00:00
struthers
oh
us
unknown
120.0
2 minutes
I saw a routaing line of stares that seemed to...
2013-09-09
41.052500
-80.608056
0
80316
2013-09-09 09:51:00
san diego
ca
us
light
4.0
~4 seconds
2 white lights zig-zag over Qualcomm Stadium (...
2013-09-30
32.715278
-117.156389
0
80317
2013-09-09 12:34:00
cedar park
tx
us
cigar
8.0
5-8 seconds
Cigar Shaped Object Descending in the Directio...
2013-09-09
30.505000
-97.820000
0
80318
2013-09-09 13:10:00
calmar (canada)
ab
ca
unknown
90.0
45-90 seconds
Fastest dot I have ever seen in the sky!
2013-09-09
53.250000
-113.783333
0
80319
2013-09-09 20:15:00
clifton
nj
NaN
other
3600.0
~1hr+
Luminous line seen in New Jersey sky.
2013-09-30
40.858433
-74.163755
0
80320
2013-09-09 20:20:00
tuscaloosa
al
us
fireball
60.0
1:00
White/green object much larger than "shoo...
2013-09-30
33.209722
-87.569167
0
80321
2013-09-09 20:21:00
clarksville
tn
us
fireball
3.0
3 seconds
Green fireball like object shooting across the...
2013-09-30
36.529722
-87.359444
0
80322
2013-09-09 21:00:00
aleksandrow (poland)
NaN
NaN
light
15.0
15 seconds
Two points of light following one another in a...
2013-09-30
50.465843
22.891814
0
80323
2013-09-09 21:00:00
gainesville
fl
us
triangle
60.0
1 minute
Three lights in the sky that didn't look li...
2013-09-30
29.651389
-82.325000
0
80324
2013-09-09 21:00:00
hamstead (hollyridge)
nc
NaN
light
120.0
2 minutes
8 to ten lights bright orange in color large t...
2013-09-30
34.367594
-77.710548
0
80325
2013-09-09 21:00:00
milton (canada)
on
ca
fireball
180.0
3 minutes
Massive Bright Orange Fireball in Sky
2013-09-30
46.300000
-63.216667
0
80326
2013-09-09 21:00:00
woodstock
ga
us
sphere
20.0
20 seconds
Driving 575 at 21:00 hrs saw a white and green...
2013-09-30
34.101389
-84.519444
0
80327
2013-09-09 21:15:00
nashville
tn
us
light
600.0
10 minutes
Round from the distance/slowly changing colors...
2013-09-30
36.165833
-86.784444
0
80328
2013-09-09 22:00:00
boise
id
us
circle
1200.0
20 minutes
Boise, ID, spherical, 20 min, 10 r...
2013-09-30
43.613611
-116.202500
0
80329
2013-09-09 22:00:00
napa
ca
us
other
1200.0
hour
Napa UFO,
2013-09-30
38.297222
-122.284444
0
80330
2013-09-09 22:20:00
vienna
va
us
circle
5.0
5 seconds
Saw a five gold lit cicular craft moving fastl...
2013-09-30
38.901111
-77.265556
0
80331
2013-09-09 23:00:00
edmond
ok
us
cigar
1020.0
17 minutes
2 witnesses 2 miles apart, Red & White...
2013-09-30
35.652778
-97.477778
0
80332 rows × 12 columns
In [263]:
week = ufo.set_index( ["day_of_week", "state", "shape"] )
In [275]:
week.loc[, 'il', 'cigar']
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py:1325: PerformanceWarning: indexing past lexsort depth may impact performance.
return self._getitem_tuple(key)
---------------------------------------------------------------------------
UnsortedIndexError Traceback (most recent call last)
<ipython-input-275-dd41524e6dfe> in <module>()
----> 1 week.loc[[0, 3], 'il', 'cigar']
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
1323 except (KeyError, IndexError):
1324 pass
-> 1325 return self._getitem_tuple(key)
1326 else:
1327 key = com._apply_if_callable(key, self.obj)
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
834 def _getitem_tuple(self, tup):
835 try:
--> 836 return self._getitem_lowerdim(tup)
837 except IndexingError:
838 pass
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
946 # we may have a nested tuples indexer here
947 if self._is_nested_tuple_indexer(tup):
--> 948 return self._getitem_nested_tuple(tup)
949
950 # we maybe be using a tuple to represent multiple dimensions here
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_nested_tuple(self, tup)
1008 # this is a series with a multi-index specified a tuple of
1009 # selectors
-> 1010 return self._getitem_axis(tup, axis=0)
1011
1012 # handle the multi-axis by taking sections and reducing
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1543 # nested tuple slicing
1544 if is_nested_tuple(key, labels):
-> 1545 locs = labels.get_locs(key)
1546 indexer = [slice(None)] * self.ndim
1547 indexer[axis] = locs
/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/multi.py in get_locs(self, tup)
2271 'to be fully lexsorted tuple len ({0}), '
2272 'lexsort depth ({1})'
-> 2273 .format(len(tup), self.lexsort_depth))
2274
2275 # indexer
UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (0)'
In [278]:
new_week = week.sort_index()
In [283]:
Out[283]:
city
country
duration_seconds
duration_reported
description
report_date
latitude
longitude
day_of_week
state
shape
0
ab
cigar
brooks (canada)
ca
60.00
1 min
Floating Barge seen near Suffield Air Base Alb...
2012-02-10
50.566667
-111.900000
cigar
edmonton (canada)
ca
2.00
a few seconds
them object moved speedily throught the sky to...
2005-05-11
53.550000
-113.500000
cigar
edmonton (canada)
ca
360.00
6 mins
two crafts, one ontop of the other, flyi...
2006-03-11
53.550000
-113.500000
circle
edmonton (canada)
ca
240.00
4 mins
The black ball then seems to go strait up and ...
2007-08-07
53.550000
-113.500000
circle
airdrie (canada)
ca
30.00
30 seconds
HBCCUFO CANADIAN REPORT: Large ring of white ...
2003-12-09
51.266667
-114.016667
cone
grande prairie (canada)
ca
300.00
5 min.
Green Cone UFO
2005-04-16
55.166667
-118.800000
cylinder
edmonton (canada)
ca
2400.00
approx: 40 mins
HBCCUFO CANADIAN REPORT: I noticed a black cy...
2003-08-01
53.550000
-113.500000
disk
bragg creek (canada)
NaN
28800.00
8 hours
At about 2am, A freind and I were awoken in...
2006-02-14
50.951740
-114.569234
disk
st. albert (canada)
ca
5.00
5 seconds
White/Silver Saucer with white lights hovering...
2013-09-30
53.633333
-113.633333
fireball
canmore (canada)
ca
1.00
1 second
Huge Green Fireball Dropping from sky by Lac D...
2012-11-19
51.100000
-115.350000
fireball
calgary (canada)
ca
240.00
2-4 min
12 ufos flew over head
2005-01-27
51.083333
-114.083333
fireball
edmonton (canada)
ca
2.00
2 seconds
A green ball of light.
2001-10-12
53.550000
-113.500000
flash
whitecourt (canada)
ca
240.00
3-4 mins
I took a minute to watch the star's before ...
2005-05-11
54.133333
-115.683333
flash
waiparous (west of calgary) (canada)
NaN
1800.00
30 mins
The speed and color was amazing to see.
2005-05-11
51.282376
-114.841158
flash
lacombe (canada)
ca
10.00
10 seconds
bright blue flash followed by long blue streak
2010-08-24
52.466667
-113.733333
formation
calgary (canada)
ca
120.00
1-2 minutes
Spherical White-Silver objects in formation se...
2005-07-05
51.083333
-114.083333
formation
edmonton (canada)
ca
3.00
2-3 seconds
tight faint formation directly over Edmonton&#...
2004-04-09
53.550000
-113.500000
formation
edmonton (canada)
ca
10.00
5-10sec
3 or 4 orange round lights with tails in the f...
2004-06-04
53.550000
-113.500000
formation
edmonton (canada)
ca
10.00
5-10sec
lights in the north
2004-06-04
53.550000
-113.500000
formation
edmonton (canada)
ca
20.00
20 seconds
Silent, fast, low traveling lights fly o...
2004-05-10
53.550000
-113.500000
formation
edmonton (canada)
ca
20.00
20 seconds
7 objects travelling VERY FAST, changing po...
2004-05-10
53.550000
-113.500000
light
barrhead (canada)
ca
90.00
1.5 minutes
Bright white light by tower
2010-11-21
54.116667
-114.400000
light
ponoka (canada)
ca
3600.00
>1 hour
Bright light, watched for an hour, watch...
2013-11-20
52.683333
-113.566667
light
lethbridge (canada)
ca
60.00
30-60 seconds
Light in the sky passing over head, then fa...
2013-12-23
48.366667
-53.866667
light
cold lake (canada)
ca
300.00
5 min
Gold color slow moving star no noise over CFB...
2010-04-13
54.465000
-110.183056
light
calgary (canada)
ca
10.00
10 seconds
Calgary Orb streaming across the night sky,...
2011-03-10
51.083333
-114.083333
light
nanton (north of) (canada)
ca
600.00
10 minutes
On March 29th 2004 I saw a bright light in the...
2004-04-09
50.350000
-113.766667
light
calgary (canada)
ca
10.00
10 seconds
7-8 UFO's and a pterodactyl sighting.
2010-04-13
51.083333
-114.083333
light
edmonton (canada)
ca
1200.00
10-20 minutes
Bright star like object moves across sky in st...
2008-06-12
53.550000
-113.500000
light
pidgeon lake (canada)
NaN
1200.00
20 minutes
Did I see crop circles being made?
2005-07-05
53.038633
-114.095330
...
...
...
...
...
...
...
...
...
...
...
2
NaN
unknown
paredes de coura (portugal)
NaN
60.00
1 min
A strange flying object almost crash into my car
2010-05-12
41.913112
-8.561438
unknown
utrecht/amsterdam (between; utrecht, noord)...
NaN
900.00
15 minutes
I was drivin on the A2 when I saw in the sunse...
1999-06-23
52.370216
4.895168
unknown
atlantic ocean (off africa)
NaN
30.00
0:30
While flying night navigational mission. We pi...
2004-06-18
-14.599413
-28.673147
unknown
putten (netherlands)
NaN
15.00
15 sec
a strange object right above the sun in a picture
2005-05-24
52.258676
5.605373
unknown
leicester (uk/england)
gb
180.00
2-3 mins
3 bright lights and a small red light on a cir...
2005-05-24
52.664913
-1.034894
unknown
zadar (croatia)
NaN
10.00
10seconds
((HOAX??)) Red light in the sky,the object...
2011-05-29
44.119371
15.231365
unknown
europe
NaN
5.00
5 sec
RADAR WARNING
2003-05-09
54.525961
15.255119
unknown
scunthorpe (uk/england)
gb
1200.00
20 mins
A green floating unkown object up in the sky
2007-08-07
53.583333
-0.650000
unknown
axminister (uk/england)
NaN
300.00
5 min
It was a round object with multicoloured light...
2011-05-02
50.782727
-2.994937
unknown
london (uk/england)
gb
60.00
1 minute
London 3 witnesses - nothing special just a ni...
2003-07-23
51.514125
-0.093689
unknown
dirksland (netherlands)
NaN
1800.00
atleast 30 minutes
Extremely bright light, completely stationa...
2010-07-28
51.750635
4.092097
unknown
barra do tejuco (rio de janeiro) (brazil)
NaN
420.00
5-7 minutes
Huge shapeless cluster of red and blue green l...
2003-07-16
-23.000371
-43.365895
unknown
broadway (uk/england)
gb
2.00
2sec
bright red lights and fast
2005-10-11
51.764722
-4.472778
unknown
lierskogen (norway)
NaN
180.00
3 min.
It looked as a pointed horseshoe. It was full ...
1999-04-26
59.819814
10.330936
unknown
oxford (uk/england)
gb
600.00
10 minutes
Scanned by something
2001-10-12
51.750000
-1.250000
unknown
warnambool, vic (australia)
NaN
0.05
0.05 seconds
It was very quick. We have a picture of it.
1999-11-02
-38.382766
142.484499
NaN
cancun (mexico)
NaN
1200.00
20 minutes
WAS JUST BEFORE SUNSET, AIRCRAFT MIDLE A...
2003-12-09
21.161908
-86.851528
NaN
monterrey (mexico) (outside city, on large ...
NaN
600.00
10+ minutes
There was many objects stopping traffic on a m...
2003-03-04
25.686614
-100.316113
NaN
cuiaba (brazil)
NaN
41.00
41 seconds
http://www.youtube.com/watch?v=7LcHGSG-0fc v&...
2014-03-18
-15.601411
-56.097892
NaN
glasgow (near) (uk/scotland)
gb
900.00
15mins
Large white light hovered and followed train f...
2002-12-23
55.833333
-4.250000
NaN
london (uk/england)
gb
300.00
2 to 5 mins
square with rounded edges and a dome at the top
2011-04-03
51.514125
-0.093689
NaN
jafir (jordan)
NaN
100.00
1:40
3 Saucers land in Jordan with tall Aliens head...
2010-04-13
27.330000
52.903333
NaN
bombay (india)
NaN
5.00
4to5 seconds
I and my friend were watching the star.suddenl...
2001-08-05
19.075984
72.877656
NaN
leicester (uk/england)
gb
1200.00
15-20 min
Alien standing at the bottom of my bed.
2006-10-30
52.664913
-1.034894
NaN
casalabate (italy)
NaN
180.00
3 minutes
catena metallica dove le maglie di ferro erano...
2002-11-20
40.497171
18.120590
NaN
barnoldswick (uk/england)
gb
60.00
60 seconds
Unidentified flying man
2003-08-28
53.916667
-2.183333
NaN
puerto de mazarron (spain)
NaN
1800.00
30 mins
two objects and a bright flash
2001-11-20
37.564007
-1.266238
NaN
plymouth (uk/england)
gb
60.00
1 min
i have video of a type of orb like light on my...
2005-10-11
50.396389
-4.138611
NaN
ubud (indonesia)
NaN
120.00
~2 minutes
Strong orange light flickering in the dark clo...
2012-10-30
-8.519268
115.263298
NaN
fes (morocco)
NaN
300.00
5 minutes
standing on roof of apartment, clear sunny ...
2006-12-07
34.033333
-5.000000
31833 rows × 8 columns
In [285]:
ufo = ufo.reset_index()
In [289]:
ufo.index = ufo.date
ufo.index.dayofweek
Out[289]:
Int64Index([0, 0, 0, 2, 0, 1, 6, 6, 0, 0,
...
0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
dtype='int64', name='date', length=80332)
In [ ]:
Content source: UIUC-iSchool-DataViz/fall2017
Similar notebooks: