Basics of Pandas

The last few weeks we have been using low-level methods to read data in to Python and manipulate it. This week we will be exploring pandas to accelerate this process.

Pandas is based around the notion that arrays can be indexed in a flexible manner, and that we can structure our data access around the indexing labels.

We will start out, as we often do, by applying our boilerplate setup.


In [2]:
%matplotlib inline

In [3]:
import pandas as pd
import matplotlib.pyplot as plt

Pandas provides a number of read_* options, including read_csv, which we will use here.

One important note about read_csv in particular is that there are over 50 possible arguments to it. This allows for intensely flexible specification of how to read data in, how to parse it, and very detailed control over things like encoding of files and so forth. This flexibility is designed to eliminate the need to pre-process any data files before importing, but it can also make for a complex import process if you only have to adjust a few columns. We will use this in some of its more simple ways here.

Below, we read the building inventory file into an object called df (for Data Frame).


In [5]:
df = pd.read_csv("data-readonly/IL_Building_Inventory.csv")

One of the first things we can do is examine the columns that the dataframe has identified.


In [6]:
df.columns


Out[6]:
Index(['Agency Name', 'Location Name', 'Address', 'City', 'Zip code', 'County',
       'Congress Dist', 'Congressional Full Name', 'Rep Dist', 'Rep Full Name',
       'Senate Dist', 'Senator Full Name', 'Bldg Status', 'Year Acquired',
       'Year Constructed', 'Square Footage', 'Total Floors',
       'Floors Above Grade', 'Floors Below Grade', 'Usage Description',
       'Usage Description 2', 'Usage Description 3'],
      dtype='object')

In [9]:
df.head()


Out[9]:
Agency Name Location Name Address City Zip code County Congress Dist Congressional Full Name Rep Dist Rep Full Name ... Bldg Status Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade Usage Description Usage Description 2 Usage Description 3
0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 1975 1975 144 1 1 0 Unusual Unusual Not provided
1 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004 2004 144 1 1 0 Unusual Unusual Not provided
2 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004 2004 144 1 1 0 Unusual Unusual Not provided
3 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004 2004 144 1 1 0 Unusual Unusual Not provided
4 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004 2004 144 1 1 0 Unusual Unusual Not provided

5 rows × 22 columns


In [11]:
df.tail()


Out[11]:
Agency Name Location Name Address City Zip code County Congress Dist Congressional Full Name Rep Dist Rep Full Name ... Bldg Status Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade Usage Description Usage Description 2 Usage Description 3
8857 Department of Transportation Belvidere Maintenance Storage Facility - Boone... 9797 Illinois Rte. 76 Belvidere 61008 Boone 16 Adam Kinzinger 69 Sosnowski Joe ... In Use 0 0 432 1 0 0 Storage NaN NaN
8858 Department of Transportation Belvidere Maintenance Storage Facility - Boone... 9797 Illinois Rte 76 Belvidere 61008 Boone 16 Adam Kinzinger 69 Sosnowski Joe ... In Use 0 0 330 1 0 0 Storage NaN NaN
8859 Department of Transportation Quincy Maintenance Storage Facility 800 Koch's Lane Quincy 62305 Adams 18 Darin M. LaHood 94 Frese Randy E. ... In Use 0 1987 130 1 0 0 Storage High Hazard NaN
8860 Illinois Community College Board Illinois Valley Community College - Oglesby 815 North Orlando Smith Avenue Oglesby 61348 LaSalle 16 Adam Kinzinger 76 Long Jerry Lee ... In Use 1971 1971 49552 1 1 0 Education Education Not provided
8861 Department of Military Affairs Peoria Army Aviation Support Facility 2323 S. Airport Rd Peoria 61607 Peoria 17 Cheri Bustos 92 Gordon-Booth Jehan ... In Progress 0 2017 288 1 0 0 Utiility & Miscellan Utiility & Miscellan NaN

5 rows × 22 columns


In [23]:
df.describe()


Out[23]:
Zip code Congress Dist Rep Dist Senate Dist Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade
count 8862.000000 8862.000000 8862.000000 8862.000000 8862.000000 8862.000000 8.862000e+03 8862.000000 8862.000000 8862.000000
mean 61821.076845 13.404085 92.303318 46.408599 1972.593320 1906.135184 1.147603e+04 1.636087 1.449334 0.161589
std 1095.203357 4.037936 23.568457 11.781038 27.491941 351.180642 3.817263e+04 1.537603 1.286898 0.392717
min 1235.000000 0.000000 0.000000 0.000000 1753.000000 0.000000 0.000000e+00 0.000000 0.000000 0.000000
25% 61105.000000 12.000000 79.000000 40.000000 1960.000000 1953.000000 2.330000e+02 1.000000 1.000000 0.000000
50% 62023.000000 14.000000 97.000000 49.000000 1976.000000 1974.000000 1.600000e+03 1.000000 1.000000 0.000000
75% 62650.000000 16.000000 110.000000 55.000000 1993.000000 1991.000000 6.426500e+03 2.000000 1.000000 0.000000
max 68297.000000 18.000000 119.000000 60.000000 2019.000000 2019.000000 1.200000e+06 31.000000 30.000000 4.000000

In [24]:
df.dtypes


Out[24]:
Agency Name                object
Location Name              object
Address                    object
City                       object
Zip code                    int64
County                     object
Congress Dist               int64
Congressional Full Name    object
Rep Dist                    int64
Rep Full Name              object
Senate Dist                 int64
Senator Full Name          object
Bldg Status                object
Year Acquired               int64
Year Constructed            int64
Square Footage              int64
Total Floors                int64
Floors Above Grade          int64
Floors Below Grade          int64
Usage Description          object
Usage Description 2        object
Usage Description 3        object
dtype: object

In [28]:
df.groupby(["Agency Name"])["Square Footage"].sum()


Out[28]:
Agency Name
Appellate Court / Fifth District                15124
Appellate Court / Fourth District               16400
Appellate Court / Second District               43330
Appellate Court / Third District                18700
Chicago State University                      1219492
Department of Agriculture                     2608398
Department of Central Management Services     4260911
Department of Corrections                    15120750
Department of Human Services                  8466774
Department of Juvenile Justice                1147982
Department of Military Affairs                4579470
Department of Natural Resources               3937319
Department of Public Health                      7160
Department of Revenue                          913236
Department of State Police                     828851
Department of Transportation                  5659737
Department of Veterans' Affairs               1483981
Eastern Illinois University                   1164674
Governor's Office                               45120
Governors State University                    1055971
Historic Preservation Agency                  1667954
IL State Board of Education                     19147
Illinois Board of Higher Education             545816
Illinois Community College Board               486473
Illinois Courts                                 54540
Illinois Emergency Management Agency            55650
Illinois Medical District Commission            46200
Illinois State University                     2960272
Northeastern Illinois University              1110103
Northern Illinois University                  3751095
Office of the Attorney General                  60500
Office of the Secretary of State              2273828
Southern Illinois University                  8709473
University of Illinois                       25018006
Western Illinois University                   2348109
Name: Square Footage, dtype: int64

In [29]:
df["Agency Name"].value_counts()


Out[29]:
Department of Natural Resources              3223
Department of Corrections                    1428
Department of Transportation                 1137
Department of Human Services                  617
University of Illinois                        525
Southern Illinois University                  420
Historic Preservation Agency                  284
Department of Military Affairs                231
Department of Agriculture                     228
Department of Juvenile Justice                120
Department of State Police                    109
Illinois State University                     102
Department of Veterans' Affairs                94
Northern Illinois University                   79
Department of Central Management Services      60
Western Illinois University                    42
Office of the Secretary of State               41
Eastern Illinois University                    35
Northeastern Illinois University               18
Chicago State University                       16
Illinois Community College Board               15
Governors State University                     11
Illinois Board of Higher Education             10
Illinois Medical District Commission            3
Illinois Emergency Management Agency            2
Appellate Court / Third District                2
Department of Public Health                     2
Appellate Court / Fifth District                1
Department of Revenue                           1
Illinois Courts                                 1
Appellate Court / Second District               1
Appellate Court / Fourth District               1
Governor's Office                               1
Office of the Attorney General                  1
IL State Board of Education                     1
Name: Agency Name, dtype: int64

In [30]:
df.describe()


Out[30]:
Zip code Congress Dist Rep Dist Senate Dist Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade
count 8862.000000 8862.000000 8862.000000 8862.000000 8862.000000 8862.000000 8.862000e+03 8862.000000 8862.000000 8862.000000
mean 61821.076845 13.404085 92.303318 46.408599 1972.593320 1906.135184 1.147603e+04 1.636087 1.449334 0.161589
std 1095.203357 4.037936 23.568457 11.781038 27.491941 351.180642 3.817263e+04 1.537603 1.286898 0.392717
min 1235.000000 0.000000 0.000000 0.000000 1753.000000 0.000000 0.000000e+00 0.000000 0.000000 0.000000
25% 61105.000000 12.000000 79.000000 40.000000 1960.000000 1953.000000 2.330000e+02 1.000000 1.000000 0.000000
50% 62023.000000 14.000000 97.000000 49.000000 1976.000000 1974.000000 1.600000e+03 1.000000 1.000000 0.000000
75% 62650.000000 16.000000 110.000000 55.000000 1993.000000 1991.000000 6.426500e+03 2.000000 1.000000 0.000000
max 68297.000000 18.000000 119.000000 60.000000 2019.000000 2019.000000 1.200000e+06 31.000000 30.000000 4.000000

In [31]:
df["Total Floors"].median()


Out[31]:
1.0

In [32]:
df.median()


Out[32]:
Zip code              62023.0
Congress Dist            14.0
Rep Dist                 97.0
Senate Dist              49.0
Year Acquired          1976.0
Year Constructed       1974.0
Square Footage         1600.0
Total Floors              1.0
Floors Above Grade        1.0
Floors Below Grade        0.0
dtype: float64

In [35]:
df.quantile([0.1, 0.2, 0.9])


Out[35]:
Zip code Congress Dist Rep Dist Senate Dist Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade
0.1 60450.0 10.0 64.0 32.0 1935.0 1929.0 80.0 1.0 1.0 0.0
0.2 61001.0 12.0 75.0 38.0 1953.0 1947.0 150.0 1.0 1.0 0.0
0.9 62901.0 18.0 116.0 58.0 2001.0 2001.0 25568.1 3.0 3.0 1.0

In [38]:
df["Agency Name"].apply(lambda a: a.upper()).head()


Out[38]:
0    DEPARTMENT OF NATURAL RESOURCES
1    DEPARTMENT OF NATURAL RESOURCES
2    DEPARTMENT OF NATURAL RESOURCES
3    DEPARTMENT OF NATURAL RESOURCES
4    DEPARTMENT OF NATURAL RESOURCES
Name: Agency Name, dtype: object

In [39]:
df["Agency Name"].apply(lambda a: a).head()


Out[39]:
0    Department of Natural Resources
1    Department of Natural Resources
2    Department of Natural Resources
3    Department of Natural Resources
4    Department of Natural Resources
Name: Agency Name, dtype: object

In [46]:
"This is my string".lower()


Out[46]:
'this is my string'

In [45]:
"this is my string. here is another.".capitalize()


Out[45]:
'This is my string. here is another.'

In [63]:
df = pd.read_csv("data-readonly/IL_Building_Inventory.csv", na_values={'Year Acquired': 0, 'Year Constructed': 0})

In [64]:
df.count()


Out[64]:
Agency Name                8862
Location Name              8862
Address                    8811
City                       8862
Zip code                   8862
County                     8837
Congress Dist              8862
Congressional Full Name    8699
Rep Dist                   8862
Rep Full Name              8839
Senate Dist                8862
Senator Full Name          8839
Bldg Status                8862
Year Acquired              8597
Year Constructed           8573
Square Footage             8862
Total Floors               8862
Floors Above Grade         8862
Floors Below Grade         8862
Usage Description          8862
Usage Description 2        8832
Usage Description 3        8774
dtype: int64

In [65]:
df.iloc[10]


Out[65]:
Agency Name                        Department of Natural Resources
Location Name              Matthiessen State Park - LaSalle County
Address                                         R. R. 178, Box 509
City                                                         Utica
Zip code                                                     61373
County                                                     LaSalle
Congress Dist                                                   16
Congressional Full Name                             Adam Kinzinger
Rep Dist                                                        76
Rep Full Name                                       Long Jerry Lee
Senate Dist                                                     38
Senator Full Name                                        Sue Rezin
Bldg Status                                                 In Use
Year Acquired                                                 2000
Year Constructed                                              2000
Square Footage                                                 144
Total Floors                                                     1
Floors Above Grade                                               1
Floors Below Grade                                               0
Usage Description                                          Unusual
Usage Description 2                                        Unusual
Usage Description 3                                   Not provided
Name: 10, dtype: object

In [73]:
df.iloc[10]


Out[73]:
Agency Name                        Department of Natural Resources
Location Name              Matthiessen State Park - LaSalle County
Address                                         R. R. 178, Box 509
City                                                         Utica
Zip code                                                     61373
County                                                     LaSalle
Congress Dist                                                   16
Congressional Full Name                             Adam Kinzinger
Rep Dist                                                        76
Rep Full Name                                       Long Jerry Lee
Senate Dist                                                     38
Senator Full Name                                        Sue Rezin
Bldg Status                                                 In Use
Year Acquired                                                 2000
Year Constructed                                              2000
Square Footage                                                 144
Total Floors                                                     1
Floors Above Grade                                               1
Floors Below Grade                                               0
Usage Description                                          Unusual
Usage Description 2                                        Unusual
Usage Description 3                                   Not provided
Name: 10, dtype: object

In [68]:
df.loc[10, ["County", "Senate Dist"]]


Out[68]:
County         LaSalle
Senate Dist         38
Name: 10, dtype: object

In [75]:
year = df.groupby("Year Acquired")

In [88]:
df.index = df["Year Acquired"]

In [89]:
df.head()


Out[89]:
Agency Name Location Name Address City Zip code County Congress Dist Congressional Full Name Rep Dist Rep Full Name ... Bldg Status Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade Usage Description Usage Description 2 Usage Description 3
Year Acquired
1975.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 1975.0 1975.0 144 1 1 0 Unusual Unusual Not provided
2004.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004.0 2004.0 144 1 1 0 Unusual Unusual Not provided
2004.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004.0 2004.0 144 1 1 0 Unusual Unusual Not provided
2004.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004.0 2004.0 144 1 1 0 Unusual Unusual Not provided
2004.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004.0 2004.0 144 1 1 0 Unusual Unusual Not provided

5 rows × 22 columns


In [91]:
df.loc[1970].head()


Out[91]:
Agency Name Location Name Address City Zip code County Congress Dist Congressional Full Name Rep Dist Rep Full Name ... Bldg Status Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade Usage Description Usage Description 2 Usage Description 3
Year Acquired
1970.0 Governors State University Governors State University - Will County Governor's Hwy & Univ Pkwy University Park 60466 Will 3 Daniel William Lipinski 85 Connor John ... In Use 1970.0 1970.0 10000 2 2 0 Storage Storage Not provided
1970.0 Department of Natural Resources Chain O'Lakes CA and SP - McHenry County 39947 North State Park Road Spring Grove 60081 McHenry 14 Randy Hultgren 64 Wheeler Barbara ... In Use 1970.0 1970.0 1440 1 1 0 Assembly Assembly Not provided
1970.0 Office of the Secretary of State Capitol Complex 1st And Capitol Springfield 62704 Sangamon 13 Rodney L. Davis 96 Scherer Sue ... In Use 1970.0 1970.0 500 2 1 1 Industrial Industrial Not provided
1970.0 Department of Transportation Dixon Springs Maintenance Storage Facility - P... Rt. 145 1 Mi. S Of Rt. 146 Dixon Springs 62943 Pope 15 John Shimkus 118 Phelps Brandon W. ... In Use 1970.0 1970.0 240 1 1 0 Storage Storage Not provided
1970.0 Department of Transportation Anna Maintenance Storage Facility - Union County 215 North Lime Kiln Road Anna 62906 Union 12 Mike Bost 118 Phelps Brandon W. ... In Use 1970.0 1970.0 612 1 1 0 Storage Storage Not provided

5 rows × 22 columns


In [93]:
df.head()


Out[93]:
Agency Name Location Name Address City Zip code County Congress Dist Congressional Full Name Rep Dist Rep Full Name ... Bldg Status Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade Usage Description Usage Description 2 Usage Description 3
Year Acquired
1975.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 1975.0 1975.0 144 1 1 0 Unusual Unusual Not provided
2004.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004.0 2004.0 144 1 1 0 Unusual Unusual Not provided
2004.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004.0 2004.0 144 1 1 0 Unusual Unusual Not provided
2004.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004.0 2004.0 144 1 1 0 Unusual Unusual Not provided
2004.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... In Use 2004.0 2004.0 144 1 1 0 Unusual Unusual Not provided

5 rows × 22 columns


In [95]:
df.loc[1974]


Out[95]:
Agency Name Location Name Address City Zip code County Congress Dist Congressional Full Name Rep Dist Rep Full Name ... Bldg Status Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade Usage Description Usage Description 2 Usage Description 3
Year Acquired
1974.0 Department of Human Services Howe Developmental Center - Tinley Park 7600 West 183rd Street Tinley Park 60477 Cook 0 NaN 38 Riley Al ... In Use 1974.0 1974.0 112 1 1 0 Storage Storage Not provided
1974.0 Department of Natural Resources Union County Conservation Area R. R. 2 Jonesboro 62952 Union 12 Mike Bost 115 Bryant Terri ... In Use 1974.0 1974.0 120 1 1 0 Storage Storage Not provided
1974.0 Department of Central Management Services Statewide Program 4200 North Oak Park Ave Chicago 60634 Statewide 0 NaN 119 District Multiple ... Abandon 1974.0 1974.0 2000 1 1 0 Unusual Unusual Unusual
1974.0 Department of Natural Resources Sand Ridge Forest - Mason County 25799 E County Road 2300 N. Forest City 61532 Mason 18 Darin M. LaHood 93 Hammond Norine K. ... In Use 1974.0 1974.0 1800 1 1 0 Storage Storage Not provided
1974.0 Department of Natural Resources Mississippi State Fish & Wildlife Area R. R. Box 182 Grafton 62037 Jersey 17 Cheri Bustos 97 Batinick Mark ... In Use 1974.0 1974.0 27 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Mississippi State Fish & Wildlife Area R. R. Box 182 Grafton 62037 Jersey 17 Cheri Bustos 97 Batinick Mark ... In Use 1974.0 1974.0 27 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Woodford County Conservation Area R. R. #1 Lowpoint 61545 Woodford 18 Darin M. LaHood 73 Spain Ryan ... In Use 1974.0 1974.0 560 1 1 0 Assembly Assembly Not provided
1974.0 Department of Natural Resources Woodford County Conservation Area R. R. #1 Lowpoint 61545 Woodford 18 Darin M. LaHood 73 Spain Ryan ... In Use 1974.0 1974.0 160 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Woodford County Conservation Area R. R. #1 Lowpoint 61545 Woodford 18 Darin M. LaHood 73 Spain Ryan ... In Use 1974.0 1974.0 160 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kaskaskia River Fish & Wildlife Area - Randolp... Rt. 1, Box 49 Baldwin 62217 Randolph 12 Mike Bost 116 Costello, II Jerry ... In Use 1974.0 1974.0 1470 1 1 0 Storage Storage Not provided
1974.0 Department of Natural Resources Apple River Canyon State Park - Jo Daviess County 8763 E. Canyon Rd. Apple River 61001 Jo Daviess 17 Cheri Bustos 89 Stewart Brian W. ... In Use 1974.0 1974.0 380 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Ferne Clyffe State Park - Johnson County Rte 38 So Goreville 62939 Johnson 15 John Shimkus 118 Phelps Brandon W. ... In Use 1974.0 1974.0 18 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Ferne Clyffe State Park - Johnson County Rte 37 So Goreville 62939 Johnson 15 John Shimkus 118 Phelps Brandon W. ... In Use 1974.0 1974.0 18 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Illinois Beach State Park - Lake County Il Beach State Park, Ranger Zion 60099 Lake 10 Robert Dold 61 Jesiel Sheri ... In Use 1974.0 1974.0 2157 2 1 1 Business Business Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Kickapoo State Park - Vermilion County Rr #1, Box 374 Oakwood 61858 Vermilion 15 John Shimkus 104 Hays Chad ... In Use 1974.0 1974.0 20 1 1 0 Unusual Unusual Not provided
1974.0 Department of Natural Resources Lake Le-Aqua-Na State Park - Stephenson County 8542 North Lake Road Lena 61048 Stephenson 17 Cheri Bustos 89 Stewart Brian W. ... In Use 1974.0 1974.0 560 1 1 0 Assembly Assembly Not provided
1974.0 Department of Natural Resources Johnson-Sauk Trail State Park - Henry County 27500 N. 1200 Avenue Kewanee 61443 Henry 17 Cheri Bustos 74 Swanson Daniel ... In Use 1974.0 1974.0 50 1 1 0 Unusual Unusual Not provided
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1974.0 Department of Central Management Services Statewide Program 4200 North Oak Park Ave Chicago 60634 Statewide 0 NaN 119 District Multiple ... Abandon 1974.0 1974.0 14725 2 2 0 Health Care Health Care Assembly
1974.0 Department of Central Management Services Statewide Program 4200 North Oak Park Ave Chicago 60634 Statewide 0 NaN 119 District Multiple ... Abandon 1974.0 1974.0 14725 2 2 0 Health Care Health Care Assembly
1974.0 Department of Central Management Services Statewide Program 4200 North Oak Park Ave Chicago 60634 Statewide 0 NaN 119 District Multiple ... In Use 1974.0 1974.0 54600 4 3 1 Education Education Assembly
1974.0 Department of Central Management Services Statewide Program 4200 North Oak Park Ave Chicago 60634 Statewide 0 NaN 119 District Multiple ... In Use 1974.0 1974.0 28000 4 3 1 Assembly Assembly Education
1974.0 Department of Central Management Services Statewide Program 4200 North Oak Park Ave Chicago 60634 Statewide 0 NaN 119 District Multiple ... In Use 1974.0 1974.0 15000 2 1 1 Industrial Industrial Not provided
1974.0 Department of Corrections Stateville Correctional Center - Joliet Rt 53 & Division St Joliet 60434 Will 11 Bill Foster 86 Walsh, Jr. Lawrence M. ... Abandon 1974.0 1974.0 1050 1 1 0 Detention & Correc Education Not provided
1974.0 Department of Corrections Menard Correctional Center - Randolph County Route 3 & Rainbow Drive Menard 62259 Randolph 12 Mike Bost 116 Costello, II Jerry ... In Use 1974.0 1974.0 288 2 2 0 Detention & Correc Detention & Correc Not provided
1974.0 Department of Corrections Menard Correctional Center - Randolph County Route 3 & Rainbow Drive Menard 62259 Randolph 12 Mike Bost 116 Costello, II Jerry ... In Use 1974.0 1974.0 25 1 1 0 Detention & Correc Detention & Correc Not provided
1974.0 Department of Corrections Vienna Correctional Center - Johnson County P.o. Box 200, Hwy 146e Vienna 62995 Johnson 15 John Shimkus 118 Phelps Brandon W. ... In Use 1974.0 1974.0 50 1 1 0 Business Business Not provided
1974.0 Department of Juvenile Justice Illinois Youth Center - Warrenville 30w200 Ferry Road Warrenville 60555 DuPage 6 Peter J. Roskam 41 Wehrli Grant ... In Use 1974.0 1974.0 1000 1 1 0 Storage Storage Not provided
1974.0 Department of Juvenile Justice Illinois Youth Center - Warrenville 30w200 Ferry Road Warrenville 60555 DuPage 6 Peter J. Roskam 41 Wehrli Grant ... In Use 1974.0 1974.0 4295 1 1 0 Detention & Correc Detention & Correc Residential
1974.0 Department of Juvenile Justice Illinois Youth Center - Warrenville 30w200 Ferry Road Warrenville 60555 DuPage 6 Peter J. Roskam 41 Wehrli Grant ... In Use 1974.0 1974.0 4295 1 1 0 Detention & Correc Detention & Correc Residential
1974.0 Department of Juvenile Justice Illinois Youth Center - Warrenville 30w200 Ferry Road Warrenville 60555 DuPage 6 Peter J. Roskam 41 Wehrli Grant ... In Use 1974.0 1974.0 4295 1 1 0 Detention & Correc Detention & Correc Residential
1974.0 Department of Juvenile Justice Illinois Youth Center - St. Charles 38 West 060 Rte 38 St Charles 60174 Kane 14 Randy Hultgren 50 Wheeler Keith R. ... In Use 1974.0 1974.0 288 1 1 0 Industrial Industrial Not provided
1974.0 Department of Corrections Menard Correctional Center - Randolph County Route 3 & Rainbow Drive Menard 62259 Randolph 12 Mike Bost 116 Costello, II Jerry ... In Use 1974.0 1974.0 144 3 2 1 Detention & Correc Detention & Correc Not provided
1974.0 Department of Corrections Menard Correctional Center - Randolph County Route 3 & Rainbow Drive Menard 62259 Randolph 12 Mike Bost 116 Costello, II Jerry ... In Use 1974.0 1936.0 116 3 2 1 Detention & Correc Detention & Correc Not provided
1974.0 Department of Transportation Buckley Maintenance Storage Facility - Iroquoi... I 57 Buckley 60918 Iroquois 16 Adam Kinzinger 106 Bennett Thomas M. ... In Use 1974.0 1973.0 3200 1 1 0 Unusual Unusual Not provided
1974.0 Department of Transportation Buckley Maintenance Storage Facility - Iroquoi... I 57 Buckley 60918 Iroquois 16 Adam Kinzinger 106 Bennett Thomas M. ... In Use 1974.0 1973.0 3200 1 1 0 Unusual Unusual Not provided
1974.0 Department of Transportation Wyoming Maintenance Storage Facility - Stark C... South Seventh Street Wyoming 61491 Marshall 18 Darin M. LaHood 73 Spain Ryan ... In Use 1974.0 1974.0 4224 1 1 0 Storage Storage Not provided
1974.0 IL State Board of Education The Philip J. Rock Center and School - Glen Ellyn Rte 38 & 53 Glen Ellyn 60137 DuPage 6 Peter J. Roskam 48 Breen Peter ... In Use 1974.0 1974.0 19147 3 2 1 Education Education Residential
1974.0 Department of State Police Sterling District 1 - Whiteside County 3107 East Lincolnway Sterling 61081 Whiteside 17 Cheri Bustos 71 McCombie Tony ... In Use 1974.0 1974.0 192 1 1 0 Storage Storage Not provided
1974.0 Department of State Police Effingham District 12 - Effingham County 401 Industrial Ave Effingham 62401 Effingham 15 John Shimkus 107 Cavaletto John ... In Use 1974.0 1974.0 280 1 1 0 Unusual Unusual Not provided
1974.0 Office of the Secretary of State Motor Vehicle Services Facility - Springfield 2701 Dirksen Parkway Springfield 62703 Sangamon 13 Rodney L. Davis 96 Scherer Sue ... In Use 1974.0 1974.0 131400 2 2 0 Business Business Not provided
1974.0 University of Illinois University of Illinois Urbana-Champaign 50 East Gerty Drive Champaign 61820 Champaign 13 Rodney L. Davis 103 Ammons Carol ... In Use 1974.0 1974.0 32017 3 2 1 Business Business Not provided
1974.0 University of Illinois University of Illinois Urbana-Champaign 11 Airport Road Savoy 61874 Champaign 13 Rodney L. Davis 103 Ammons Carol ... In Use 1974.0 1974.0 2750 1 1 0 Industrial Storage Not provided
1974.0 Southern Illinois University Southern Illinois University - Carbondale 1000 Faner Drive Carbondale 62901 Jackson 12 Mike Bost 115 Bryant Terri ... In Use 1974.0 1974.0 277831 7 6 1 Education Unusual Not provided
1974.0 Northern Illinois University Northern Illinois University - DeKalb Northern Illinois University Dekalb 60115 DeKalb 16 Adam Kinzinger 70 Pritchard Robert W. ... In Use 1974.0 1974.0 4698 1 1 0 Storage Storage Not provided
1974.0 University of Illinois University of Illinois - Springfield County Road 1 South Mechanicsburg 62794 Sangamon 13 Rodney L. Davis 99 Wojcicki Jimene Sara ... In Use 1974.0 1974.0 800 1 1 0 Industrial Industrial Not provided
1974.0 University of Illinois University of Illinois - Springfield 1301 West Lake Drive Springfield 62794 Sangamon 13 Rodney L. Davis 99 Wojcicki Jimene Sara ... In Use 1974.0 1974.0 5594 4 3 1 Residential Residential Not provided
1974.0 University of Illinois University of Illinois - Springfield 1301 West Lake Drive Springfield 62794 Sangamon 13 Rodney L. Davis 99 Wojcicki Jimene Sara ... In Use 1974.0 1974.0 971 1 1 0 Storage Storage Not provided

192 rows × 22 columns


In [96]:
df.loc[0]


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1433                 if not ax.contains(key):
-> 1434                     error()
   1435             except TypeError as e:

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in error()
   1428                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1429                                (key, self.obj._get_axis_name(axis)))
   1430 

KeyError: 'the label [0] is not in the [index]'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-96-549f4325178e> in <module>()
----> 1 df.loc[0]

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1326         else:
   1327             key = com._apply_if_callable(key, self.obj)
-> 1328             return self._getitem_axis(key, axis=0)
   1329 
   1330     def _is_scalar_access(self, key):

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1549 
   1550         # fall thru to straight lookup
-> 1551         self._has_valid_type(key, axis)
   1552         return self._get_label(key, axis=axis)
   1553 

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1440                 raise
   1441             except:
-> 1442                 error()
   1443 
   1444         return True

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in error()
   1427                                     "key")
   1428                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1429                                (key, self.obj._get_axis_name(axis)))
   1430 
   1431             try:

KeyError: 'the label [0] is not in the [index]'

In [100]:
df = pd.read_csv("data-readonly/IL_Building_Inventory.csv",
                 na_values={'Year Acquired': 0, 'Year Constructed': 0})

In [101]:
df.index


Out[101]:
RangeIndex(start=0, stop=8862, step=1)

In [104]:
df2 = df.set_index("Year Acquired")

In [105]:
df2.index


Out[105]:
Float64Index([1975.0, 2004.0, 2004.0, 2004.0, 2004.0, 2004.0, 2000.0, 2000.0,
              2000.0, 2000.0,
              ...
              2017.0, 2019.0, 2019.0,    nan,    nan,    nan,    nan,    nan,
              1971.0,    nan],
             dtype='float64', name='Year Acquired', length=8862)

In [106]:
df2.loc[1975].head()


Out[106]:
Agency Name Location Name Address City Zip code County Congress Dist Congressional Full Name Rep Dist Rep Full Name ... Senator Full Name Bldg Status Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade Usage Description Usage Description 2 Usage Description 3
Year Acquired
1975.0 Department of Natural Resources Anderson Lake Conservation Area - Fulton County Anderson Lake C.a. Astoria 61501 Fulton 17 Cheri Bustos 93 Hammond Norine K. ... Jil Tracy In Use 1975.0 144 1 1 0 Unusual Unusual Not provided
1975.0 Department of State Police Effingham District 12 - Effingham County 2 Miles East Of Mason Effingham 62454 Effingham 15 John Shimkus 107 Cavaletto John ... Kyle McCarter In Use 1975.0 120 1 1 0 Industrial Industrial Not provided
1975.0 Department of Natural Resources Pyramid State Park - Perry County Rr #1, Box298 Pinckneyville 62274 Perry 12 Mike Bost 116 Costello, II Jerry ... Paul Schimpf In Use 1975.0 2400 1 1 0 Business Storage Not provided
1975.0 Department of Natural Resources Wolf Creek State Park R.r. 1 Box 99 Windsor 62534 Shelby 15 John Shimkus 102 Halbrook Brad ... Chapin Rose In Use 1975.0 1860 1 1 0 Unusual Unusual Not provided
1975.0 Department of Natural Resources Wolf Creek State Park R.r. 1 Box 99 Windsor 62534 Shelby 15 John Shimkus 102 Halbrook Brad ... Chapin Rose In Use 1975.0 20 1 1 0 Unusual Unusual Not provided

5 rows × 21 columns


In [109]:
df2.iloc[[1974, 1975]]


Out[109]:
Agency Name Location Name Address City Zip code County Congress Dist Congressional Full Name Rep Dist Rep Full Name ... Senator Full Name Bldg Status Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade Usage Description Usage Description 2 Usage Description 3
Year Acquired
1963.0 Department of Natural Resources MARION CO FISH-R-KINMUNDY 6401 Mecham Road Kinmundy 62854 Marion 0 NaN 107 Cavaletto John ... Kyle McCarter In Use 1963.0 2000 2 2 0 Residential Residential Not provided
1965.0 Department of Natural Resources MARION CO FISH-R-KINMUNDY Sam Parr Biological Station Kinmundy 62854 Marion 0 NaN 107 Cavaletto John ... Kyle McCarter In Use 1965.0 2560 1 1 0 Industrial Industrial Not provided

2 rows × 21 columns


In [110]:
keith = df.set_index("City")

In [112]:
keith.loc["Kinmundy"].describe()


Out[112]:
Zip code Congress Dist Rep Dist Senate Dist Year Acquired Year Constructed Square Footage Total Floors Floors Above Grade Floors Below Grade
count 55.000000 55.000000 55.000000 55.000000 53.000000 53.000000 55.000000 55.000000 55.000000 55.0
mean 62682.454545 12.490909 107.800000 54.381818 1981.245283 1981.264151 670.600000 1.036364 1.036364 0.0
std 186.461874 5.590576 2.444949 1.224607 16.208860 16.219825 998.700927 0.188919 0.188919 0.0
min 62263.000000 0.000000 107.000000 54.000000 1950.000000 1950.000000 16.000000 1.000000 1.000000 0.0
25% 62584.000000 15.000000 107.000000 54.000000 1974.000000 1974.000000 24.000000 1.000000 1.000000 0.0
50% 62584.000000 15.000000 107.000000 54.000000 1974.000000 1974.000000 196.000000 1.000000 1.000000 0.0
75% 62854.000000 15.000000 107.000000 54.000000 1996.000000 1996.000000 660.000000 1.000000 1.000000 0.0
max 62854.000000 15.000000 117.000000 59.000000 2009.000000 2009.000000 4200.000000 2.000000 2.000000 0.0

In [113]:


In [118]:
names = ["date", "city", "state", "country", "shape", "duration_seconds", "duration_reported", "description", "report_date", "latitude", "longitude"]

In [121]:
ufo = pd.read_csv("data-readonly/ufo-scrubbed-geocoded-time-standardized.csv",
                 names = names, parse_dates = ["date", "report_date"])

In [122]:
ufo.dtypes


Out[122]:
date                 datetime64[ns]
city                         object
state                        object
country                      object
shape                        object
duration_seconds            float64
duration_reported            object
description                  object
report_date          datetime64[ns]
latitude                    float64
longitude                   float64
dtype: object

In [123]:
ufo.describe()


Out[123]:
duration_seconds latitude longitude
count 8.033200e+04 80332.000000 80332.000000
mean 9.016889e+03 38.124416 -86.772885
std 6.202168e+05 10.469585 39.697205
min 1.000000e-03 -82.862752 -176.658056
25% 3.000000e+01 34.134722 -112.073333
50% 1.800000e+02 39.411111 -87.903611
75% 6.000000e+02 42.788333 -78.755000
max 9.783600e+07 72.700000 178.441900

In [129]:
sum_seconds = ufo.groupby("state")["duration_seconds"].sum()

In [131]:
sum_seconds.sort_values() / (365*24*3600)


Out[131]:
state
yk    0.000324
pe    0.000326
yt    0.000507
nf    0.000666
pr    0.000922
nt    0.001039
sa    0.001223
pq    0.001751
nb    0.002321
dc    0.003645
sk    0.004072
nd    0.004837
de    0.005042
qc    0.005191
ns    0.005800
vt    0.010490
wy    0.011972
ne    0.013659
sd    0.016213
ri    0.016662
id    0.017015
ab    0.018725
mb    0.019460
ia    0.021290
md    0.024086
bc    0.027567
al    0.030532
ks    0.031364
mt    0.034749
nh    0.036332
        ...   
or    0.062261
il    0.073586
nc    0.075969
nv    0.083969
wi    0.086566
wv    0.096154
ut    0.111653
ms    0.115596
oh    0.121203
ky    0.121884
nm    0.134020
in    0.136016
co    0.144334
hi    0.215672
la    0.217719
nj    0.254109
mi    0.255138
tx    0.276491
ny    0.297516
pa    0.326689
ga    0.340368
ok    0.362630
ct    0.401774
va    0.435646
az    0.507069
ca    1.202704
fl    1.798539
wa    1.833226
ar    2.130185
on    2.684267
Name: duration_seconds, Length: 67, dtype: float64

In [132]:
states = ufo.groupby("state")

In [133]:
states["duration_seconds"].mean()


Out[133]:
state
ab      1773.351351
ak      4231.830508
al      1393.408828
ar    100867.138889
az      5946.797731
bc      1103.225660
ca      3928.374984
co      3024.394751
ct     13089.214928
dc      1161.224545
de       868.904372
fl     13504.459262
ga      7968.701633
hi     19267.543909
ia       949.626591
id       968.556498
il       877.362219
in      3094.797763
ks      1514.714395
ky      4205.386761
la     11481.575251
ma      1312.187776
mb      3959.309677
md       833.780790
me      3066.431122
mi      3885.088653
mn      1411.120564
mo      1144.388198
ms      8784.190361
mt      2148.723529
          ...      
nm      5185.816675
ns      1279.048951
nt      1638.100000
nv      2926.016519
ny      2914.711572
oh      1576.187773
ok     14929.384204
on     53441.317677
or      1064.210450
pa      3990.113091
pe       605.117647
pq       613.494444
pr       881.515152
qc       919.660112
ri      1811.881034
sa      1285.266667
sc      1102.498885
sd      2608.640306
sk      1310.357143
tn      1610.461526
tx      2371.343283
ut      4739.033647
va      9862.555276
vt      1077.587948
wa     13545.595628
wi      2047.964966
wv      6239.355967
wy      1841.629273
yk      1459.714286
yt      1228.769231
Name: duration_seconds, Length: 67, dtype: float64

In [153]:
ufo.loc[ ufo["duration_seconds"] > 900 , ["state", "duration_seconds", "shape"] ].groupby("state")["duration_seconds"].sum()


Out[153]:
state
ab      538980.0
ak     1424520.0
al      833760.0
ar    67072060.0
az    15454118.0
bc      752845.0
ca    36134927.0
co     4307084.0
ct    12486902.0
dc       98874.0
de      124918.0
fl    55930974.0
ga    10503360.0
hi     6738330.0
ia      536289.0
id      446312.0
il     1788347.0
in     4053510.0
ks      880575.0
ky     3690050.0
la     6759962.0
ma     1549755.0
mb      590280.0
md      608694.0
me     1806480.0
mi     7662699.0
mn     1350930.0
mo     1528211.0
ms     3568533.0
mt     1007850.0
         ...    
nm     4100389.0
ns      149136.0
nt       29400.0
nv     2480905.0
ny     8797885.0
oh     3359160.0
ok    11299680.0
on    84384178.0
or     1635428.0
pa     9832828.0
pe        6300.0
pq       43560.0
pr       20700.0
qc      138600.0
ri      471600.0
sa       36600.0
sc      983385.0
sd      479220.0
sk      116700.0
tn     1717685.0
tx     8110517.0
ut     3395018.0
va    13515652.0
vt      269160.0
wa    57106208.0
wi     2470803.0
wv     2940834.0
wy      346560.0
yk        9300.0
yt       12840.0
Name: duration_seconds, Length: 67, dtype: float64

In [156]:
ufo["date"].min()


Out[156]:
Timestamp('1906-11-11 00:00:00')

In [157]:
ufo["date"].max()


Out[157]:
Timestamp('2014-05-08 18:45:00')

In [161]:
first_sighting = ufo.groupby("state")["date"].min()
last_sighting = ufo.groupby("state")["date"].max()
last_sighting - first_sighting


Out[161]:
state
ab   22476 days 23:46:00
ak   28325 days 23:00:00
al   27369 days 01:00:00
ar   23385 days 13:44:00
az   24686 days 01:45:00
bc   25576 days 06:45:00
ca   28023 days 23:30:00
co   30391 days 20:30:00
ct   23103 days 23:10:00
dc   22138 days 13:30:00
de   21801 days 08:00:00
fl   25498 days 07:30:00
ga   26270 days 04:50:00
hi   19564 days 23:30:00
ia   27321 days 20:20:00
id   24400 days 11:31:00
il   32268 days 05:30:00
in   34286 days 20:00:00
ks   30278 days 10:30:00
ky   25769 days 09:15:00
la   25832 days 22:15:00
ma   22293 days 08:00:00
mb   22967 days 03:10:00
md   24403 days 02:30:00
me   24749 days 01:00:00
mi   26637 days 11:00:00
mn   24433 days 07:30:00
mo   38111 days 21:21:00
ms   23670 days 04:30:00
mt   22211 days 10:32:00
             ...        
nm   25602 days 12:30:00
ns   15929 days 03:40:00
nt   10798 days 10:30:00
nv   24399 days 12:02:00
ny   30654 days 01:04:00
oh   24443 days 17:30:00
ok   24358 days 02:25:00
on   23063 days 11:02:00
or   30982 days 18:15:00
pa   24433 days 02:12:00
pe   27010 days 05:30:00
pq    9630 days 05:41:00
pr   18291 days 04:46:00
qc   21792 days 17:15:00
ri   25498 days 14:00:00
sa   16107 days 12:00:00
sc   27339 days 03:00:00
sd   20975 days 03:00:00
sk   14439 days 10:40:00
tn   26229 days 23:00:00
tx   37957 days 09:00:00
ut   25508 days 12:00:00
va   25133 days 21:10:00
vt   20454 days 00:20:00
wa   24782 days 00:00:00
wi   24415 days 07:15:00
wv   23743 days 07:45:00
wy   22934 days 06:00:00
yk    6349 days 20:50:00
yt    7553 days 08:56:00
Name: date, Length: 67, dtype: timedelta64[ns]

In [165]:
first_sighting.index


Out[165]:
Index(['ab', 'ak', 'al', 'ar', 'az', 'bc', 'ca', 'co', 'ct', 'dc', 'de', 'fl',
       'ga', 'hi', 'ia', 'id', 'il', 'in', 'ks', 'ky', 'la', 'ma', 'mb', 'md',
       'me', 'mi', 'mn', 'mo', 'ms', 'mt', 'nb', 'nc', 'nd', 'ne', 'nf', 'nh',
       'nj', 'nm', 'ns', 'nt', 'nv', 'ny', 'oh', 'ok', 'on', 'or', 'pa', 'pe',
       'pq', 'pr', 'qc', 'ri', 'sa', 'sc', 'sd', 'sk', 'tn', 'tx', 'ut', 'va',
       'vt', 'wa', 'wi', 'wv', 'wy', 'yk', 'yt'],
      dtype='object', name='state')

In [167]:
ufo["state"].nunique()


Out[167]:
67

In [169]:
ufo["country"].unique()


Out[169]:
array(['us', nan, 'gb', 'ca', 'au', 'de'], dtype=object)

In [173]:
ufo["country"] = ufo["country"].astype("category")
ufo["shape"] = ufo["shape"].astype("category")
ufo["state"] = ufo["state"].astype("category")

In [174]:
ufo["city"].nunique()


Out[174]:
19900

In [175]:
ufo.shape


Out[175]:
(80332, 11)

In [179]:
ufo.groupby("city").count().nlargest(10, "date")


Out[179]:
date state country shape duration_seconds duration_reported description report_date latitude longitude
city
seattle 525 525 524 473 525 525 524 525 525 525
phoenix 454 454 454 438 454 454 454 454 454 454
portland 374 374 373 355 374 374 374 374 374 374
las vegas 368 368 367 357 368 368 368 368 368 368
los angeles 353 353 352 348 353 353 353 353 353 353
san diego 338 338 338 328 338 338 338 338 338 338
houston 297 297 297 293 297 297 296 297 297 297
chicago 265 265 264 257 265 265 265 265 265 265
tucson 241 241 241 237 241 241 241 241 241 241
miami 239 239 239 230 239 239 239 239 239 239

In [182]:
ufo.dtypes


Out[182]:
date                 datetime64[ns]
city                         object
state                      category
country                    category
shape                      category
duration_seconds            float64
duration_reported            object
description                  object
report_date          datetime64[ns]
latitude                    float64
longitude                   float64
dtype: object

In [184]:
shape_times = ufo.groupby("shape")["duration_seconds"].sum()

In [186]:
shape_times.index


Out[186]:
CategoricalIndex(['changed', 'changing', 'chevron', 'cigar', 'circle', 'cone',
                  'crescent', 'cross', 'cylinder', 'delta', 'diamond', 'disk',
                  'dome', 'egg', 'fireball', 'flare', 'flash', 'formation',
                  'hexagon', 'light', 'other', 'oval', 'pyramid', 'rectangle',
                  'round', 'sphere', 'teardrop', 'triangle', 'unknown'],
                 categories=['changed', 'changing', 'chevron', 'cigar', 'circle', 'cone', 'crescent', 'cross', ...], ordered=False, name='shape', dtype='category')

In [185]:
shape_times.plot()


Out[185]:
<matplotlib.axes._subplots.AxesSubplot at 0x7facde55af98>

In [189]:
shape_times.sort_values().plot()


Out[189]:
<matplotlib.axes._subplots.AxesSubplot at 0x7facdef309b0>

In [201]:
shape_times.nlargest(5)


Out[201]:
shape
light      2.181668e+08
sphere     1.173682e+08
other      1.165627e+08
circle     3.627088e+07
unknown    3.097290e+07
Name: duration_seconds, dtype: float64

In [203]:
shape_state = ufo.groupby(["state", "shape"])

In [205]:
times = shape_state["duration_seconds"].sum()

In [210]:
times.loc[ ["il", "mi", "oh"], ["sphere", "unknown"] ]


Out[210]:
state  shape  
il     sphere      90337.00
       unknown    130649.50
mi     sphere     126384.30
       unknown    232625.00
oh     sphere     132796.50
       unknown    702877.05
Name: duration_seconds, dtype: float64

In [212]:
times.loc["il":"ok", "sphere":"unknown"]


Out[212]:
state  shape   
il     sphere        90337.00
       teardrop      11508.00
       triangle     319862.01
       unknown      130649.50
in     sphere        64863.00
       teardrop       2528.00
       triangle     144674.00
       unknown      357716.00
ks     sphere        19858.00
       teardrop       6457.00
       triangle      39154.00
       unknown      211927.00
ky     sphere        32884.00
       teardrop       3420.00
       triangle      84589.50
       unknown       70038.00
la     sphere        69414.00
       teardrop       8852.00
       triangle      34620.00
       unknown     6356899.00
ma     sphere        81274.00
       teardrop       3847.00
       triangle      82519.00
       unknown      195458.00
mb     sphere         8690.00
       triangle      46193.00
       unknown       23527.00
md     sphere        47169.00
       teardrop       1777.00
       triangle      86253.00
                      ...    
nj     sphere        92185.00
       teardrop       3345.00
       triangle      83989.00
       unknown      719678.00
nm     sphere        74126.00
       teardrop      29955.50
       triangle      82508.39
       unknown      135486.20
ns     sphere        16569.00
       teardrop       1416.00
       triangle       4020.00
       unknown        7160.00
nt     sphere          900.00
       unknown         320.00
nv     sphere      1282555.00
       teardrop       8164.00
       triangle      36728.50
       unknown       60179.00
ny     sphere       178021.00
       teardrop      14063.00
       triangle     212210.30
       unknown      285847.00
oh     sphere       132796.50
       teardrop      26946.00
       triangle     577885.00
       unknown      702877.05
ok     sphere        38560.30
       teardrop       1280.00
       triangle     227335.00
       unknown      152659.00
Name: duration_seconds, Length: 107, dtype: float64

In [217]:
unsorted_nonsense = times.sort_index()

In [219]:
unsorted_nonsense.loc["il":"ok"]


Out[219]:
state  shape    
il     changing       104672.00
       chevron         12885.00
       cigar           27140.00
       circle         248601.50
       cone             2404.00
       cross            2135.00
       cylinder        21268.00
       diamond         15441.00
       disk           109203.50
       egg            283910.00
       fireball        80942.00
       flash           15532.00
       formation       80328.75
       light          474224.81
       other          111285.00
       oval            87292.00
       rectangle       15598.00
       sphere          90337.00
       teardrop        11508.00
       triangle       319862.01
       unknown        130649.50
in     changing        24620.00
       chevron          1228.00
       cigar           16922.00
       circle          93794.50
       cone            16203.00
       cross              20.00
       cylinder         6960.00
       delta           14400.00
       diamond         13854.00
                       ...     
oh     formation       32233.00
       light          834661.00
       other          339451.50
       oval            62381.00
       rectangle        9556.00
       sphere         132796.50
       teardrop        26946.00
       triangle       577885.00
       unknown        702877.05
ok     changing        46870.00
       chevron           778.00
       cigar           10432.00
       circle       10593937.00
       cone            21050.00
       cross             495.00
       cylinder         7795.00
       diamond          4868.00
       disk            76342.50
       egg              3785.00
       fireball        20647.50
       flash             566.00
       formation        8672.00
       light          126701.00
       other           60161.00
       oval            25883.00
       rectangle        5088.00
       sphere          38560.30
       teardrop         1280.00
       triangle       227335.00
       unknown        152659.00
Name: duration_seconds, Length: 554, dtype: float64

In [221]:
unsorted_nonsense.loc["il":"ok"].plot()


Out[221]:
<matplotlib.axes._subplots.AxesSubplot at 0x7facde560e80>

In [226]:
ufo.set_index("date", inplace=True)

In [228]:
ufo.resample("A")["duration_seconds"].sum()


Out[228]:
date
1906-12-31    1.080000e+04
1907-12-31             NaN
1908-12-31             NaN
1909-12-31             NaN
1910-12-31    2.400000e+02
1911-12-31             NaN
1912-12-31             NaN
1913-12-31             NaN
1914-12-31             NaN
1915-12-31             NaN
1916-12-31    6.000000e+01
1917-12-31             NaN
1918-12-31             NaN
1919-12-31             NaN
1920-12-31    6.000000e+01
1921-12-31             NaN
1922-12-31             NaN
1923-12-31             NaN
1924-12-31             NaN
1925-12-31    6.000000e+01
1926-12-31             NaN
1927-12-31             NaN
1928-12-31             NaN
1929-12-31    6.000000e+01
1930-12-31    1.200000e+03
1931-12-31    1.860000e+03
1932-12-31             NaN
1933-12-31    1.800000e+03
1934-12-31    5.000000e+00
1935-12-31             NaN
                  ...     
1985-12-31    2.351740e+05
1986-12-31    2.206870e+05
1987-12-31    2.898583e+06
1988-12-31    5.649914e+06
1989-12-31    5.467870e+05
1990-12-31    2.728360e+05
1991-12-31    6.706685e+07
1992-12-31    2.911250e+05
1993-12-31    7.046784e+06
1994-12-31    1.106235e+07
1995-12-31    4.250211e+06
1996-12-31    3.552288e+06
1997-12-31    3.456926e+06
1998-12-31    7.337902e+06
1999-12-31    2.796181e+06
2000-12-31    2.956642e+06
2001-12-31    1.314867e+07
2002-12-31    6.844939e+07
2003-12-31    4.989784e+06
2004-12-31    2.253751e+07
2005-12-31    5.577597e+06
2006-12-31    1.137782e+07
2007-12-31    1.101497e+07
2008-12-31    2.058855e+07
2009-12-31    1.303383e+07
2010-12-31    9.210828e+07
2011-12-31    1.258026e+07
2012-12-31    6.820438e+07
2013-12-31    3.043190e+07
2014-12-31    1.930258e+06
Freq: A-DEC, Name: duration_seconds, Length: 109, dtype: float64

In [233]:
myplot = ufo.resample("10A")["duration_seconds"].sum().plot()
myplot.set_yscale('log')



In [235]:
r = ufo.resample("10A")

In [237]:
r["duration_seconds"].sum()


Out[237]:
date
1906-12-31    1.080000e+04
1916-12-31    3.000000e+02
1926-12-31    1.200000e+02
1936-12-31    6.305000e+03
1946-12-31    8.942870e+05
1956-12-31    4.835465e+05
1966-12-31    1.371158e+07
1976-12-31    3.597365e+07
1986-12-31    1.781056e+08
1996-12-31    1.026377e+08
2006-12-31    1.426284e+08
2016-12-31    2.498924e+08
Freq: 10A-DEC, Name: duration_seconds, dtype: float64

In [240]:
ufo.resample("W")["duration_seconds"].sum()


Out[240]:
date
1906-11-11      10800.00
1906-11-18           NaN
1906-11-25           NaN
1906-12-02           NaN
1906-12-09           NaN
1906-12-16           NaN
1906-12-23           NaN
1906-12-30           NaN
1907-01-06           NaN
1907-01-13           NaN
1907-01-20           NaN
1907-01-27           NaN
1907-02-03           NaN
1907-02-10           NaN
1907-02-17           NaN
1907-02-24           NaN
1907-03-03           NaN
1907-03-10           NaN
1907-03-17           NaN
1907-03-24           NaN
1907-03-31           NaN
1907-04-07           NaN
1907-04-14           NaN
1907-04-21           NaN
1907-04-28           NaN
1907-05-05           NaN
1907-05-12           NaN
1907-05-19           NaN
1907-05-26           NaN
1907-06-02           NaN
                 ...    
2013-10-20     113983.50
2013-10-27     218682.50
2013-11-03     192999.00
2013-11-10     218170.00
2013-11-17    1376755.50
2013-11-24     432983.00
2013-12-01     149513.50
2013-12-08     130273.00
2013-12-15      77970.00
2013-12-22      92732.50
2013-12-29     121536.00
2014-01-05    2408500.00
2014-01-12      93786.00
2014-01-19      96740.00
2014-01-26      75544.00
2014-02-02      65642.00
2014-02-09      49998.50
2014-02-16     254023.00
2014-02-23      81940.50
2014-03-02      81376.00
2014-03-09     103976.00
2014-03-16     114800.00
2014-03-23      73526.53
2014-03-30      78260.00
2014-04-06      84559.00
2014-04-13     162560.00
2014-04-20      69973.00
2014-04-27     259356.00
2014-05-04      76259.00
2014-05-11      26368.00
Freq: W-SUN, Name: duration_seconds, Length: 5610, dtype: float64

In [252]:
day_of_week = ufo.index.dayofweek

In [253]:
ufo["day_of_week"] = day_of_week

In [257]:
ufo.groupby("day_of_week")["duration_seconds"].sum().plot()


Out[257]:
<matplotlib.axes._subplots.AxesSubplot at 0x7facddbec860>

In [260]:
ufo.groupby("state").sum().loc["tx"]


Out[260]:
duration_seconds    8.719429e+06
latitude            1.144141e+05
longitude          -3.584412e+05
day_of_week         1.137300e+04
Name: tx, dtype: float64

In [261]:
ufo.reset_index()


Out[261]:
date city state country shape duration_seconds duration_reported description report_date latitude longitude day_of_week
0 1949-10-10 20:30:00 san marcos tx us cylinder 2700.0 45 minutes This event took place in early fall around 194... 2004-04-27 29.883056 -97.941111 0
1 1949-10-10 21:00:00 lackland afb tx NaN light 7200.0 1-2 hrs 1949 Lackland AFB&#44 TX. Lights racing acros... 2005-12-16 29.384210 -98.581082 0
2 1955-10-10 17:00:00 chester (uk/england) NaN gb circle 20.0 20 seconds Green/Orange circular disc over Chester&#44 En... 2008-01-21 53.200000 -2.916667 0
3 1956-10-10 21:00:00 edna tx us circle 20.0 1/2 hour My older brother and twin sister were leaving ... 2004-01-17 28.978333 -96.645833 2
4 1960-10-10 20:00:00 kaneohe hi us light 900.0 15 minutes AS a Marine 1st Lt. flying an FJ4B fighter/att... 2004-01-22 21.418056 -157.803611 0
5 1961-10-10 19:00:00 bristol tn us sphere 300.0 5 minutes My father is now 89 my brother 52 the girl wit... 2007-04-27 36.595000 -82.188889 1
6 1965-10-10 21:00:00 penarth (uk/wales) NaN gb circle 180.0 about 3 mins penarth uk circle 3mins stayed 30ft above m... 2006-02-14 51.434722 -3.180000 6
7 1965-10-10 23:45:00 norwalk ct us disk 1200.0 20 minutes A bright orange color changing to reddish colo... 1999-10-02 41.117500 -73.408333 6
8 1966-10-10 20:00:00 pell city al us disk 180.0 3 minutes Strobe Lighted disk shape object observed clos... 2009-03-19 33.586111 -86.286111 0
9 1966-10-10 21:00:00 live oak fl us disk 120.0 several minutes Saucer zaps energy from powerline as my pregna... 2005-05-11 30.294722 -82.984167 0
10 1968-10-10 13:00:00 hawthorne ca us circle 300.0 5 min. ROUND &#44 ORANGE &#44 WITH WHAT I WOULD SAY W... 2003-10-31 33.916389 -118.351667 3
11 1968-10-10 19:00:00 brevard nc us fireball 180.0 3 minutes silent red /orange mass of energy floated by t... 2008-06-12 35.233333 -82.734444 3
12 1970-10-10 16:00:00 bellmore ny us disk 1800.0 30 min. silver disc seen by family and neighbors 2000-05-11 40.668611 -73.527500 5
13 1970-10-10 19:00:00 manchester ky us unknown 180.0 3 minutes Slow moving&#44 silent craft accelerated at an... 2008-02-14 37.153611 -83.761944 5
14 1971-10-10 21:00:00 lexington nc us oval 30.0 30 seconds green oval shaped light over my local church&#... 2010-02-14 35.823889 -80.253611 6
15 1972-10-10 19:00:00 harlan county ky us circle 1200.0 20minutes On october 10&#44 1972 myself&#44my 5yrs.daugh... 2005-09-15 36.843056 -83.321944 1
16 1972-10-10 22:30:00 west bloomfield mi us disk 120.0 2 minutes The UFO was so close&#44 my battery in the car... 2007-08-14 42.537778 -83.233056 1
17 1973-10-10 19:00:00 niantic ct us disk 1800.0 20-30 min Oh&#44 what a night &#33 Two (2) saucer-shape... 2003-09-24 41.325278 -72.193611 2
18 1973-10-10 23:00:00 bermuda nas NaN NaN light 20.0 20 sec. saw fast moving blip on the radar scope thin w... 2002-01-11 32.364167 -64.678611 2
19 1974-10-10 19:30:00 hudson ma us other 2700.0 45 minutes Not sure of the eact month or year of this sig... 1999-08-10 42.391667 -71.566667 3
20 1974-10-10 21:30:00 cardiff (uk/wales) NaN gb disk 1200.0 20 minutes back in 1974 I was 19 at the time and lived i... 2007-02-01 51.500000 -3.200000 3
21 1974-10-10 23:00:00 hudson ks us light 1200.0 one hour? The light chased us. 2004-07-25 38.105556 -98.659722 3
22 1975-10-10 17:00:00 north charleston sc us light 360.0 5-6 minutes Several Flashing UFO lights over Charleston Na... 2008-02-14 32.854444 -79.975000 4
23 1976-10-10 20:30:00 washougal wa us oval 60.0 1 minute Three extremely large lights hanging above nea... 2014-02-07 45.582778 -122.352222 6
24 1976-10-10 22:00:00 stoke mandeville (uk/england) NaN gb cigar 3.0 3 seconds White object over Buckinghamshire UK. 2009-12-12 51.783333 -0.783333 6
25 1977-10-10 12:00:00 san antonio tx us other 30.0 30 seconds i was about six or seven and my family and me ... 2005-02-24 29.423889 -98.493333 0
26 1977-10-10 22:00:00 louisville ky us light 30.0 approx: 30 seconds HBCCUFO CANADIAN REPORT: Pilot Sighting Of Un... 2004-03-17 38.254167 -85.759444 0
27 1978-10-10 02:00:00 elmont ny us rectangle 300.0 5min A memory I will never forget that happened men... 2007-02-01 40.700833 -73.713333 1
28 1979-10-10 00:00:00 poughkeepsie ny us chevron 900.0 15 minutes 1/4 moon-like&#44 its &#39chord&#39 or flat s... 2005-04-16 41.700278 -73.921389 2
29 1979-10-10 22:00:00 saddle lake (canada) ab NaN triangle 270.0 4.5 or more min. Lights far above&#44 that glance; then flee f... 2005-01-19 53.970571 -111.689885 2
... ... ... ... ... ... ... ... ... ... ... ... ...
80302 2012-09-09 20:00:00 wilson nc us light 10800.0 3 hours Bright orb being chased by a jet along with se... 2012-09-24 35.721111 -77.915833 6
80303 2012-09-09 20:10:00 elmont ny us circle 600.0 10 minutes Orange lights seen in Elmont&#44 Long Island&#... 2012-09-24 40.700833 -73.713333 6
80304 2012-09-09 20:30:00 mt. juliet tn us light 120.0 2 minutes Bright white light moving slowly across sky wi... 2012-09-24 36.200000 -86.518611 6
80305 2012-09-09 20:30:00 ventura ca us chevron 900.0 15 minutes Beautiful bright blue delta shaped aerobatics. 2012-09-24 34.278333 -119.292222 6
80306 2012-09-09 20:52:00 south jordan ut us circle 10.0 10 seconds Circular disk with blinking lights scares two ... 2012-09-24 40.562222 -111.928889 6
80307 2012-09-09 21:00:00 elkhart in us oval 600.0 10 minutes It was the night of sept 9 between 9 and 10 pm... 2012-09-24 41.681944 -85.976667 6
80308 2012-09-09 21:00:00 new york city (brooklyn) ny us light 1290.0 21:30 Glowing&#44 circular lights visible in the clo... 2012-09-24 40.714167 -74.006389 6
80309 2012-09-09 21:00:00 pawleys island sc us oval 60.0 less than a minute One large bright orange flanked by three small... 2012-09-24 33.433056 -79.121667 6
80310 2012-09-09 21:00:00 ventura ca us circle 300.0 5 minutes Bright Blue Object seen floating in sky near C... 2012-09-24 34.278333 -119.292222 6
80311 2012-09-09 21:55:00 charleston sc us flash 900.0 15 minutes Orb of light flashing reds and blues&#44 stati... 2012-09-24 32.776389 -79.931111 6
80312 2012-09-09 23:00:00 gainesville ga us light 5.0 5 seconds Ball of light 2012-09-24 34.297778 -83.824167 6
80313 2013-09-09 00:15:00 norfolk va us unknown 1.0 split second Two or three lights shoot across sky over nava... 2013-09-30 36.846667 -76.285556 0
80314 2013-09-09 01:50:00 buffalo (west of; on highway 90 west) ny us triangle 180.0 3 minutes Massive Flat Black triangle with 3 red lights. 2013-09-30 42.886389 -78.878611 0
80315 2013-09-09 03:00:00 struthers oh us unknown 120.0 2 minutes I saw a routaing line of stares that seemed to... 2013-09-09 41.052500 -80.608056 0
80316 2013-09-09 09:51:00 san diego ca us light 4.0 ~4 seconds 2 white lights zig-zag over Qualcomm Stadium (... 2013-09-30 32.715278 -117.156389 0
80317 2013-09-09 12:34:00 cedar park tx us cigar 8.0 5-8 seconds Cigar Shaped Object Descending in the Directio... 2013-09-09 30.505000 -97.820000 0
80318 2013-09-09 13:10:00 calmar (canada) ab ca unknown 90.0 45-90 seconds Fastest dot I have ever seen in the sky&#33 2013-09-09 53.250000 -113.783333 0
80319 2013-09-09 20:15:00 clifton nj NaN other 3600.0 ~1hr+ Luminous line seen in New Jersey sky. 2013-09-30 40.858433 -74.163755 0
80320 2013-09-09 20:20:00 tuscaloosa al us fireball 60.0 1:00 White/green object much larger than &quot;shoo... 2013-09-30 33.209722 -87.569167 0
80321 2013-09-09 20:21:00 clarksville tn us fireball 3.0 3 seconds Green fireball like object shooting across the... 2013-09-30 36.529722 -87.359444 0
80322 2013-09-09 21:00:00 aleksandrow (poland) NaN NaN light 15.0 15 seconds Two points of light following one another in a... 2013-09-30 50.465843 22.891814 0
80323 2013-09-09 21:00:00 gainesville fl us triangle 60.0 1 minute Three lights in the sky that didn&#39t look li... 2013-09-30 29.651389 -82.325000 0
80324 2013-09-09 21:00:00 hamstead (hollyridge) nc NaN light 120.0 2 minutes 8 to ten lights bright orange in color large t... 2013-09-30 34.367594 -77.710548 0
80325 2013-09-09 21:00:00 milton (canada) on ca fireball 180.0 3 minutes Massive Bright Orange Fireball in Sky 2013-09-30 46.300000 -63.216667 0
80326 2013-09-09 21:00:00 woodstock ga us sphere 20.0 20 seconds Driving 575 at 21:00 hrs saw a white and green... 2013-09-30 34.101389 -84.519444 0
80327 2013-09-09 21:15:00 nashville tn us light 600.0 10 minutes Round from the distance/slowly changing colors... 2013-09-30 36.165833 -86.784444 0
80328 2013-09-09 22:00:00 boise id us circle 1200.0 20 minutes Boise&#44 ID&#44 spherical&#44 20 min&#44 10 r... 2013-09-30 43.613611 -116.202500 0
80329 2013-09-09 22:00:00 napa ca us other 1200.0 hour Napa UFO&#44 2013-09-30 38.297222 -122.284444 0
80330 2013-09-09 22:20:00 vienna va us circle 5.0 5 seconds Saw a five gold lit cicular craft moving fastl... 2013-09-30 38.901111 -77.265556 0
80331 2013-09-09 23:00:00 edmond ok us cigar 1020.0 17 minutes 2 witnesses 2 miles apart&#44 Red &amp; White... 2013-09-30 35.652778 -97.477778 0

80332 rows × 12 columns


In [263]:
week = ufo.set_index( ["day_of_week", "state", "shape"] )

In [275]:
week.loc[, 'il', 'cigar']


/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py:1325: PerformanceWarning: indexing past lexsort depth may impact performance.
  return self._getitem_tuple(key)
---------------------------------------------------------------------------
UnsortedIndexError                        Traceback (most recent call last)
<ipython-input-275-dd41524e6dfe> in <module>()
----> 1 week.loc[[0, 3], 'il', 'cigar']

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1323             except (KeyError, IndexError):
   1324                 pass
-> 1325             return self._getitem_tuple(key)
   1326         else:
   1327             key = com._apply_if_callable(key, self.obj)

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
    834     def _getitem_tuple(self, tup):
    835         try:
--> 836             return self._getitem_lowerdim(tup)
    837         except IndexingError:
    838             pass

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    946         # we may have a nested tuples indexer here
    947         if self._is_nested_tuple_indexer(tup):
--> 948             return self._getitem_nested_tuple(tup)
    949 
    950         # we maybe be using a tuple to represent multiple dimensions here

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_nested_tuple(self, tup)
   1008             # this is a series with a multi-index specified a tuple of
   1009             # selectors
-> 1010             return self._getitem_axis(tup, axis=0)
   1011 
   1012         # handle the multi-axis by taking sections and reducing

/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1543             # nested tuple slicing
   1544             if is_nested_tuple(key, labels):
-> 1545                 locs = labels.get_locs(key)
   1546                 indexer = [slice(None)] * self.ndim
   1547                 indexer[axis] = locs

/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/multi.py in get_locs(self, tup)
   2271                                      'to be fully lexsorted tuple len ({0}), '
   2272                                      'lexsort depth ({1})'
-> 2273                                      .format(len(tup), self.lexsort_depth))
   2274 
   2275         # indexer

UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (0)'

In [278]:
new_week = week.sort_index()

In [283]:



Out[283]:
city country duration_seconds duration_reported description report_date latitude longitude
day_of_week state shape
0 ab cigar brooks (canada) ca 60.00 1 min Floating Barge seen near Suffield Air Base Alb... 2012-02-10 50.566667 -111.900000
cigar edmonton (canada) ca 2.00 a few seconds them object moved speedily throught the sky to... 2005-05-11 53.550000 -113.500000
cigar edmonton (canada) ca 360.00 6 mins two crafts&#44 one ontop of the other&#44 flyi... 2006-03-11 53.550000 -113.500000
circle edmonton (canada) ca 240.00 4 mins The black ball then seems to go strait up and ... 2007-08-07 53.550000 -113.500000
circle airdrie (canada) ca 30.00 30 seconds HBCCUFO CANADIAN REPORT: Large ring of white ... 2003-12-09 51.266667 -114.016667
cone grande prairie (canada) ca 300.00 5 min. Green Cone UFO 2005-04-16 55.166667 -118.800000
cylinder edmonton (canada) ca 2400.00 approx: 40 mins HBCCUFO CANADIAN REPORT: I noticed a black cy... 2003-08-01 53.550000 -113.500000
disk bragg creek (canada) NaN 28800.00 8 hours At about 2am&#44 A freind and I were awoken in... 2006-02-14 50.951740 -114.569234
disk st. albert (canada) ca 5.00 5 seconds White/Silver Saucer with white lights hovering... 2013-09-30 53.633333 -113.633333
fireball canmore (canada) ca 1.00 1 second Huge Green Fireball Dropping from sky by Lac D... 2012-11-19 51.100000 -115.350000
fireball calgary (canada) ca 240.00 2-4 min 12 ufos flew over head 2005-01-27 51.083333 -114.083333
fireball edmonton (canada) ca 2.00 2 seconds A green ball of light. 2001-10-12 53.550000 -113.500000
flash whitecourt (canada) ca 240.00 3-4 mins I took a minute to watch the star&#39s before ... 2005-05-11 54.133333 -115.683333
flash waiparous (west of calgary) (canada) NaN 1800.00 30 mins The speed and color was amazing to see. 2005-05-11 51.282376 -114.841158
flash lacombe (canada) ca 10.00 10 seconds bright blue flash followed by long blue streak 2010-08-24 52.466667 -113.733333
formation calgary (canada) ca 120.00 1-2 minutes Spherical White-Silver objects in formation se... 2005-07-05 51.083333 -114.083333
formation edmonton (canada) ca 3.00 2-3 seconds tight faint formation directly over Edmonton&#... 2004-04-09 53.550000 -113.500000
formation edmonton (canada) ca 10.00 5-10sec 3 or 4 orange round lights with tails in the f... 2004-06-04 53.550000 -113.500000
formation edmonton (canada) ca 10.00 5-10sec lights in the north 2004-06-04 53.550000 -113.500000
formation edmonton (canada) ca 20.00 20 seconds Silent&#44 fast&#44 low traveling lights fly o... 2004-05-10 53.550000 -113.500000
formation edmonton (canada) ca 20.00 20 seconds 7 objects travelling VERY FAST&#44 changing po... 2004-05-10 53.550000 -113.500000
light barrhead (canada) ca 90.00 1.5 minutes Bright white light by tower 2010-11-21 54.116667 -114.400000
light ponoka (canada) ca 3600.00 >1 hour Bright light&#44 watched for an hour&#44 watch... 2013-11-20 52.683333 -113.566667
light lethbridge (canada) ca 60.00 30-60 seconds Light in the sky passing over head&#44 then fa... 2013-12-23 48.366667 -53.866667
light cold lake (canada) ca 300.00 5 min Gold color slow moving star no noise over CFB... 2010-04-13 54.465000 -110.183056
light calgary (canada) ca 10.00 10 seconds Calgary Orb streaming across the night sky&#44... 2011-03-10 51.083333 -114.083333
light nanton (north of) (canada) ca 600.00 10 minutes On March 29th 2004 I saw a bright light in the... 2004-04-09 50.350000 -113.766667
light calgary (canada) ca 10.00 10 seconds 7-8 UFO&#39s and a pterodactyl sighting. 2010-04-13 51.083333 -114.083333
light edmonton (canada) ca 1200.00 10-20 minutes Bright star like object moves across sky in st... 2008-06-12 53.550000 -113.500000
light pidgeon lake (canada) NaN 1200.00 20 minutes Did I see crop circles being made? 2005-07-05 53.038633 -114.095330
... ... ... ... ... ... ... ... ... ... ...
2 NaN unknown paredes de coura (portugal) NaN 60.00 1 min A strange flying object almost crash into my car 2010-05-12 41.913112 -8.561438
unknown utrecht/amsterdam (between; utrecht&#44 noord)... NaN 900.00 15 minutes I was drivin on the A2 when I saw in the sunse... 1999-06-23 52.370216 4.895168
unknown atlantic ocean (off africa) NaN 30.00 0:30 While flying night navigational mission. We pi... 2004-06-18 -14.599413 -28.673147
unknown putten (netherlands) NaN 15.00 15 sec a strange object right above the sun in a picture 2005-05-24 52.258676 5.605373
unknown leicester (uk/england) gb 180.00 2-3 mins 3 bright lights and a small red light on a cir... 2005-05-24 52.664913 -1.034894
unknown zadar (croatia) NaN 10.00 10seconds ((HOAX??)) Red light in the sky&#44the object... 2011-05-29 44.119371 15.231365
unknown europe NaN 5.00 5 sec RADAR WARNING 2003-05-09 54.525961 15.255119
unknown scunthorpe (uk/england) gb 1200.00 20 mins A green floating unkown object up in the sky 2007-08-07 53.583333 -0.650000
unknown axminister (uk/england) NaN 300.00 5 min It was a round object with multicoloured light... 2011-05-02 50.782727 -2.994937
unknown london (uk/england) gb 60.00 1 minute London 3 witnesses - nothing special just a ni... 2003-07-23 51.514125 -0.093689
unknown dirksland (netherlands) NaN 1800.00 atleast 30 minutes Extremely bright light&#44 completely stationa... 2010-07-28 51.750635 4.092097
unknown barra do tejuco (rio de janeiro) (brazil) NaN 420.00 5-7 minutes Huge shapeless cluster of red and blue green l... 2003-07-16 -23.000371 -43.365895
unknown broadway (uk/england) gb 2.00 2sec bright red lights and fast 2005-10-11 51.764722 -4.472778
unknown lierskogen (norway) NaN 180.00 3 min. It looked as a pointed horseshoe. It was full ... 1999-04-26 59.819814 10.330936
unknown oxford (uk/england) gb 600.00 10 minutes Scanned by something 2001-10-12 51.750000 -1.250000
unknown warnambool&#44 vic (australia) NaN 0.05 0.05 seconds It was very quick. We have a picture of it. 1999-11-02 -38.382766 142.484499
NaN cancun (mexico) NaN 1200.00 20 minutes WAS JUST BEFORE SUNSET&#44 AIRCRAFT MIDLE A... 2003-12-09 21.161908 -86.851528
NaN monterrey (mexico) (outside city&#44 on large ... NaN 600.00 10+ minutes There was many objects stopping traffic on a m... 2003-03-04 25.686614 -100.316113
NaN cuiaba (brazil) NaN 41.00 41 seconds http://www.youtube.com/watch?v=7LcHGSG-0fc v&... 2014-03-18 -15.601411 -56.097892
NaN glasgow (near) (uk/scotland) gb 900.00 15mins Large white light hovered and followed train f... 2002-12-23 55.833333 -4.250000
NaN london (uk/england) gb 300.00 2 to 5 mins square with rounded edges and a dome at the top 2011-04-03 51.514125 -0.093689
NaN jafir (jordan) NaN 100.00 1:40 3 Saucers land in Jordan with tall Aliens head... 2010-04-13 27.330000 52.903333
NaN bombay (india) NaN 5.00 4to5 seconds I and my friend were watching the star.suddenl... 2001-08-05 19.075984 72.877656
NaN leicester (uk/england) gb 1200.00 15-20 min Alien standing at the bottom of my bed. 2006-10-30 52.664913 -1.034894
NaN casalabate (italy) NaN 180.00 3 minutes catena metallica dove le maglie di ferro erano... 2002-11-20 40.497171 18.120590
NaN barnoldswick (uk/england) gb 60.00 60 seconds Unidentified flying man 2003-08-28 53.916667 -2.183333
NaN puerto de mazarron (spain) NaN 1800.00 30 mins two objects and a bright flash 2001-11-20 37.564007 -1.266238
NaN plymouth (uk/england) gb 60.00 1 min i have video of a type of orb like light on my... 2005-10-11 50.396389 -4.138611
NaN ubud (indonesia) NaN 120.00 ~2 minutes Strong orange light flickering in the dark clo... 2012-10-30 -8.519268 115.263298
NaN fes (morocco) NaN 300.00 5 minutes standing on roof of apartment&#44 clear sunny ... 2006-12-07 34.033333 -5.000000

31833 rows × 8 columns


In [285]:
ufo = ufo.reset_index()

In [289]:
ufo.index = ufo.date
ufo.index.dayofweek


Out[289]:
Int64Index([0, 0, 0, 2, 0, 1, 6, 6, 0, 0,
            ...
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
           dtype='int64', name='date', length=80332)

In [ ]: