In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
import dateutil.parser

First, I made a mistake naming the data set! It's 2015 data, not 2014 data. But yes, still use 311-2014.csv. You can rename it.

Importing and preparing your data

Import your data, but only the first 200,000 rows. You'll also want to change the index to be a datetime based on the Created Date column - you'll want to check if it's already a datetime, and parse it if not.


In [2]:
df = pd.read_csv("311-2014.csv", nrows=200000, low_memory = False)

In [3]:
df.head(3)
df.columns


Out[3]:
Index(['Unique Key', 'Created Date', 'Closed Date', 'Agency', 'Agency Name',
       'Complaint Type', 'Descriptor', 'Location Type', 'Incident Zip',
       'Incident Address', 'Street Name', 'Cross Street 1', 'Cross Street 2',
       'Intersection Street 1', 'Intersection Street 2', 'Address Type',
       'City', 'Landmark', 'Facility Type', 'Status', 'Due Date',
       'Resolution Description', 'Resolution Action Updated Date',
       'Community Board', 'Borough', 'X Coordinate (State Plane)',
       'Y Coordinate (State Plane)', 'Park Facility Name', 'Park Borough',
       'School Name', 'School Number', 'School Region', 'School Code',
       'School Phone Number', 'School Address', 'School City', 'School State',
       'School Zip', 'School Not Found', 'School or Citywide Complaint',
       'Vehicle Type', 'Taxi Company Borough', 'Taxi Pick Up Location',
       'Bridge Highway Name', 'Bridge Highway Direction', 'Road Ramp',
       'Bridge Highway Segment', 'Garage Lot Name', 'Ferry Direction',
       'Ferry Terminal Name', 'Latitude', 'Longitude', 'Location'],
      dtype='object')

In [4]:
type(df['Created Date'][0])
print(df['Created Date'][0])
dateutil.parser.parse(df['Created Date'][0])


07/06/2015 10:58:27 AM
Out[4]:
datetime.datetime(2015, 7, 6, 10, 58, 27)

In [5]:
def str_to_time(str_date):
    datetype_date = dateutil.parser.parse(str_date)
    return datetype_date

In [6]:
df['created_date'] = df['Created Date'].apply(str_to_time)

In [7]:
df.index=df['created_date']
df.head(3)


Out[7]:
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location created_date
created_date
2015-07-06 10:58:27 31015465 07/06/2015 10:58:27 AM 07/22/2015 01:07:20 AM DCA Department of Consumer Affairs Consumer Complaint Demand for Cash NaN 11360 27-16 203 STREET ... NaN NaN NaN NaN NaN NaN 40.773540 -73.788237 (40.773539552542, -73.78823697228408) 2015-07-06 10:58:27
2015-07-03 13:26:29 30997660 07/03/2015 01:26:29 PM 07/03/2015 02:08:20 PM NYPD New York City Police Department Vending In Prohibited Area Residential Building/House 10019 200 CENTRAL PARK SOUTH ... NaN NaN NaN NaN NaN NaN 40.767021 -73.979448 (40.76702142171206, -73.97944780718524) 2015-07-03 13:26:29
2015-11-09 03:55:09 31950223 11/09/2015 03:55:09 AM 11/09/2015 08:08:57 AM NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 10453 1993 GRAND AVENUE ... NaN NaN NaN NaN NaN NaN 40.852671 -73.910608 (40.85267061877697, -73.91060771362552) 2015-11-09 03:55:09

3 rows × 54 columns

What was the most popular type of complaint, and how many times was it filed?


In [8]:
df.groupby("Complaint Type").count()['Unique Key'].sort_values(ascending=False)


Out[8]:
Complaint Type
Blocked Driveway                       21779
Illegal Parking                        19837
HEAT/HOT WATER                         12408
Noise - Street/Sidewalk                11949
Noise - Commercial                      9603
Street Condition                        9008
Graffiti                                5993
Benefit Card Replacement                5859
Broken Muni Meter                       5659
UNSANITARY CONDITION                    4774
Derelict Vehicle                        4704
Noise - Vehicle                         4495
DOF Literature Request                  4382
PAINT/PLASTER                           4306
Consumer Complaint                      4030
Damaged Tree                            3558
Taxi Complaint                          3543
PLUMBING                                3388
HPD Literature Request                  3305
Overgrown Tree/Branches                 2949
Maintenance or Facility                 2517
DOOR/WINDOW                             2336
Rodent                                  2279
Dead Tree                               2210
Animal Abuse                            2166
WATER LEAK                              2047
Food Establishment                      1878
GENERAL                                 1875
ELECTRIC                                1865
Root/Sewer/Sidewalk Condition           1834
                                       ...  
Highway Sign - Damaged                     3
Calorie Labeling                           3
Recycling Enforcement                      3
Lifeguard                                  3
Transportation Provider Complaint          3
Highway Sign - Dangling                    3
Hazmat Storage/Use                         3
DFTA Literature Request                    3
Water Conservation                         2
BEST/Site Safety                           2
Electrical                                 2
Lead                                       2
Investigations and Discipline (IAD)        2
Sweeping/Inadequate                        2
Stalled Sites                              2
DHS Income Savings Requirement             2
Sprinkler - Mechanical                     2
Fire Alarm - Replacement                   2
Building Condition                         1
Rangehood                                  1
Industrial Waste                           1
Plant                                      1
Public Assembly - Temporary                1
Squeegee                                   1
Day Care                                   1
Unspecified                                1
Standpipe - Mechanical                     1
Tanning                                    1
DOF Property - Owner Issue                 1
X-Ray Machine/Equipment                    1
Name: Unique Key, dtype: int64

Make a horizontal bar graph of the top 5 most frequent complaint types.


In [9]:
plt.style.use('ggplot')
ax = df.groupby("Complaint Type").count()['Unique Key'].sort_values(ascending=True).tail(5).plot(kind='barh', figsize=(10,7))
ax.set_title("5 most frequent complaint types")
ax.set_xlabel("How many times was it filed?")


Out[9]:
<matplotlib.text.Text at 0x106b83b38>

Which borough has the most complaints per capita? Since it's only 5 boroughs, you can do the math manually.


In [10]:
df.groupby('Borough').count()['Unique Key'].sort_values(ascending=False)


Out[10]:
Borough
BROOKLYN         57129
QUEENS           46824
MANHATTAN        42050
BRONX            29610
Unspecified      17000
STATEN ISLAND     7387
Name: Unique Key, dtype: int64

In [11]:
#https://en.wikipedia.org/wiki/Brooklyn
inhabitants=[{'state': "BROOKLYN", 'inhabitants':2621793},
            {'state': "QUEENS", 'inhabitants': 2321580},
            {'state': "MANHATTAN", 'inhabitants': 1636268},
            {'state': "BRONX", 'inhabitants': 1438159},
            {'state': "STATEN ISLAND", 'inhabitants': 473279},]
inhabitantsdf=pd.DataFrame(inhabitants)
inhabitantsdf


Out[11]:
inhabitants state
0 2621793 BROOKLYN
1 2321580 QUEENS
2 1636268 MANHATTAN
3 1438159 BRONX
4 473279 STATEN ISLAND

In [12]:
print("Complaints per capita: Brooklyn", 57129/2621793)
print("Complaints per capita: Queens", 46824/2321580)
print("Complaints per capita: Manhattan", 42050/1636268)
print("Complaints per capita: Bronx", 29610/1438159)
print("Complaints per capita: Staten Island", 7387/473279)


Complaints per capita: Brooklyn 0.02179004978653921
Complaints per capita: Queens 0.020169022820665235
Complaints per capita: Manhattan 0.025698724169879263
Complaints per capita: Bronx 0.020588822237318682
Complaints per capita: Staten Island 0.015608129665588374

Manhattan has the most complaints per capita.

According to your selection of data, how many cases were filed in March? How about May?


In [13]:
march_cases = df["2015-03"]['Unique Key'].count()
may_cases = df["2015-05"]['Unique Key'].count()
print("Cases filed in March:", march_cases)
print("Cases filed in May:", may_cases)


Cases filed in March: 15025
Cases filed in May: 49715

I'd like to see all of the 311 complaints called in on April 1st.

Surprise! We couldn't do this in class, but it was just a limitation of our data set


In [14]:
df["2015-04-01"]


Out[14]:
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location created_date
created_date
2015-04-01 21:37:42 30311691 04/01/2015 09:37:42 PM 04/01/2015 10:49:33 PM NYPD New York City Police Department Illegal Parking Blocked Sidewalk Street/Sidewalk 11234 NaN ... NaN NaN NaN NaN NaN NaN 40.609810 -73.922498 (40.60980966645303, -73.92249759633725) 2015-04-01 21:37:42
2015-04-01 23:12:04 30307701 04/01/2015 11:12:04 PM 04/01/2015 11:32:40 PM NYPD New York City Police Department Noise - Commercial Loud Music/Party Store/Commercial 11205 700 MYRTLE AVENUE ... NaN NaN NaN NaN NaN NaN 40.694644 -73.955504 (40.694643700748486, -73.95550356170298) 2015-04-01 23:12:04
2015-04-01 13:10:35 30313389 04/01/2015 01:10:35 PM 04/07/2015 04:01:08 PM DPR Department of Parks and Recreation Root/Sewer/Sidewalk Condition Trees and Sidewalks Program Street 11422 245-16 149 AVENUE ... NaN NaN NaN NaN NaN NaN 40.653016 -73.738626 (40.653016256598534, -73.73862588133056) 2015-04-01 13:10:35
2015-04-01 17:37:38 30314393 04/01/2015 05:37:38 PM 04/03/2015 11:40:54 AM DPR Department of Parks and Recreation Maintenance or Facility Hours of Operation Park 11211 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 17:37:38
2015-04-01 12:32:40 30309207 04/01/2015 12:32:40 PM 04/17/2015 01:06:49 AM DCA Department of Consumer Affairs Consumer Complaint Installation/Work Quality NaN 11423 90-71 198 STREET ... NaN NaN NaN NaN NaN NaN 40.714299 -73.761158 (40.71429859671565, -73.76115807774032) 2015-04-01 12:32:40
2015-04-01 18:44:50 30311759 04/01/2015 06:44:50 PM 06/24/2015 11:27:00 AM DPR Department of Parks and Recreation Damaged Tree Entire Tree Has Fallen Down Street 10467 862 EAST 213 STREET ... NaN NaN NaN NaN NaN NaN 40.878028 -73.860237 (40.87802828144708, -73.86023734606933) 2015-04-01 18:44:50
2015-04-01 16:30:15 30309690 04/01/2015 04:30:15 PM 04/01/2015 11:27:22 PM NYPD New York City Police Department Animal Abuse Neglected Residential Building/House 11368 107-15 NORTHERN BOULEVARD ... NaN NaN NaN NaN NaN NaN 40.757811 -73.861677 (40.757811195752154, -73.86167714731972) 2015-04-01 16:30:15
2015-04-01 09:04:07 30307990 04/01/2015 09:04:07 AM 04/06/2015 09:17:10 AM DOF Senior Citizen Rent Increase Exemption Unit SCRIE Miscellaneous Senior Address 10027 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 09:04:07
2015-04-01 07:46:58 30308253 04/01/2015 07:46:58 AM 04/01/2015 09:32:31 AM NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 11370 32-51 80 STREET ... NaN NaN NaN NaN NaN NaN 40.756412 -73.887405 (40.75641194675221, -73.88740503059863) 2015-04-01 07:46:58
2015-04-01 17:12:17 30314214 04/01/2015 05:12:17 PM 04/09/2015 02:20:11 PM DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN ... West/Manhattan Bound Roadway Clearview Expwy (I-295) (Exit 27 S-N) - Utopia... NaN NaN NaN NaN NaN NaN 2015-04-01 17:12:17
2015-04-01 21:30:48 30307111 04/01/2015 09:30:48 PM NaN DOHMH Department of Health and Mental Hygiene Food Establishment Food Temperature Restaurant/Bar/Deli/Bakery 11215 709 5 AVENUE ... NaN NaN NaN NaN NaN NaN 40.660699 -73.994082 (40.660699296661825, -73.99408169463258) 2015-04-01 21:30:48
2015-04-01 15:51:04 30311571 04/01/2015 03:51:04 PM 04/14/2015 09:23:30 AM DPR Department of Parks and Recreation Maintenance or Facility Hours of Operation Park 11210 NaN ... NaN NaN NaN NaN NaN NaN 40.621474 -73.950711 (40.62147413119333, -73.95071097029123) 2015-04-01 15:51:04
2015-04-01 10:43:28 30313817 04/01/2015 10:43:28 AM NaN DPR Department of Parks and Recreation Damaged Tree Branch Cracked and Will Fall NaN 10009 620 EAST 12TH STREET ... NaN NaN NaN NaN NaN NaN 40.727725 -73.978204 (40.72772462544187, -73.97820435916094) 2015-04-01 10:43:28
2015-04-01 15:12:46 30308922 04/01/2015 03:12:46 PM 06/01/2015 06:25:48 AM DOHMH Department of Health and Mental Hygiene Food Establishment Letter Grading Restaurant/Bar/Deli/Bakery 11238 663 FRANKLIN AVENUE ... NaN NaN NaN NaN NaN NaN 40.675746 -73.956122 (40.67574618440852, -73.9561218336512) 2015-04-01 15:12:46
2015-04-01 06:15:42 30311132 04/01/2015 06:15:42 AM 04/01/2015 10:28:30 AM DOT Department of Transportation Highway Condition Pothole - Highway Highway 10304 NaN ... East/Brooklyn Bound Roadway Clove Rd/Richmond Rd (Exit 13) - Lily Pond Ave... NaN NaN NaN 40.606875 -74.085408 (40.60687536641399, -74.0854077221027) 2015-04-01 06:15:42
2015-04-01 11:28:02 30308180 04/01/2015 11:28:02 AM 04/01/2015 11:42:53 AM DOT Department of Transportation Highway Condition Pothole - Highway Highway 11432 NaN ... West/Toward Triborough Br Ramp 168th St (Exit 17) NaN NaN NaN 40.719228 -73.791963 (40.71922760413319, -73.791962929951) 2015-04-01 11:28:02
2015-04-01 17:35:18 30313207 04/01/2015 05:35:18 PM 06/01/2015 06:25:54 AM DOHMH Department of Health and Mental Hygiene Food Establishment Rodents/Insects/Garbage Restaurant/Bar/Deli/Bakery 10011 140 WEST 13 STREET ... NaN NaN NaN NaN NaN NaN 40.737182 -73.998585 (40.737182358685516, -73.99858548189518) 2015-04-01 17:35:18
2015-04-01 13:54:54 30310017 04/01/2015 01:54:54 PM 04/06/2015 10:11:11 AM DOF Senior Citizen Rent Increase Exemption Unit SCRIE Miscellaneous Senior Address 11435 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 13:54:54
2015-04-01 23:49:33 30306774 04/01/2015 11:49:33 PM 04/02/2015 12:20:59 AM NYPD New York City Police Department Noise - Commercial Loud Music/Party Store/Commercial 10003 36 SAINT MARKS PLACE ... NaN NaN NaN NaN NaN NaN 40.728733 -73.988011 (40.72873338955463, -73.98801059255561) 2015-04-01 23:49:33
2015-04-01 07:50:49 30313339 04/01/2015 07:50:49 AM 07/08/2015 02:19:25 PM DOT Department of Transportation Street Condition Rough, Pitted or Cracked Roads Street 11385 NaN ... NaN NaN NaN NaN NaN NaN 40.703414 -73.862854 (40.70341423569781, -73.86285397616253) 2015-04-01 07:50:49
2015-04-01 13:50:29 30312146 04/01/2015 01:50:29 PM 06/01/2015 06:25:49 AM DOHMH Department of Health and Mental Hygiene Food Establishment Rodents/Insects/Garbage Restaurant/Bar/Deli/Bakery 10028 1291 LEXINGTON AVENUE ... NaN NaN NaN NaN NaN NaN 40.780069 -73.955158 (40.78006850471446, -73.95515761412761) 2015-04-01 13:50:29
2015-04-01 16:14:19 30313259 04/01/2015 04:14:19 PM 04/01/2015 04:21:53 PM HRA HRA Benefit Card Replacement Benefit Card Replacement Medicaid NYC Street Address NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 16:14:19
2015-04-01 19:27:34 30308920 04/01/2015 07:27:34 PM 04/01/2015 08:45:17 PM NYPD New York City Police Department Noise - Street/Sidewalk Loud Music/Party Street/Sidewalk 10017 210 EAST 46 STREET ... NaN NaN NaN NaN NaN NaN 40.753104 -73.972096 (40.75310402468627, -73.97209629231209) 2015-04-01 19:27:34
2015-04-01 05:30:02 30314164 04/01/2015 05:30:02 AM 04/01/2015 02:57:31 PM DOT Department of Transportation Highway Condition Pothole - Highway Highway NaN NaN ... East/Queens Bound Roadway Williamsburg Br / Metropolitan Ave (Exit 32) -... NaN NaN NaN NaN NaN NaN 2015-04-01 05:30:02
2015-04-01 10:33:26 30311790 04/01/2015 10:33:26 AM 04/01/2015 11:19:12 AM NYPD New York City Police Department Illegal Parking Blocked Sidewalk Street/Sidewalk 10033 2284 AMSTERDAM AVENUE ... NaN NaN NaN NaN NaN NaN 40.843149 -73.934539 (40.84314882753921, -73.93453937669832) 2015-04-01 10:33:26
2015-04-01 11:47:38 30310940 04/01/2015 11:47:38 AM 04/06/2015 09:23:32 AM DOF Senior Citizen Rent Increase Exemption Unit SCRIE Miscellaneous Senior Address 11355 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 11:47:38
2015-04-01 11:01:27 30310409 04/01/2015 11:01:27 AM 04/17/2015 01:06:42 AM DCA Department of Consumer Affairs Consumer Complaint Exchange/Refund/Return NaN 10455 2997 3 AVENUE ... NaN NaN NaN NaN NaN NaN 40.819111 -73.913908 (40.819110789789214, -73.91390802507868) 2015-04-01 11:01:27
2015-04-01 08:51:52 30310350 04/01/2015 08:51:52 AM 04/03/2015 04:33:46 PM DCA Department of Consumer Affairs Consumer Complaint Cars Parked on Sidewalk/Street NaN 11223 1701 WEST 8 STREET ... NaN NaN NaN NaN NaN NaN 40.605657 -73.981194 (40.60565667868274, -73.98119372058547) 2015-04-01 08:51:52
2015-04-01 14:58:55 30313106 04/01/2015 02:58:55 PM 04/06/2015 10:06:35 AM DOF Senior Citizen Rent Increase Exemption Unit SCRIE Rent Discrepancy Senior Address 11201 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 14:58:55
2015-04-01 16:59:19 30309324 04/01/2015 04:59:19 PM 04/01/2015 07:48:33 PM NYPD New York City Police Department Blocked Driveway Partial Access Street/Sidewalk 11210 650 EAST 24 STREET ... NaN NaN NaN NaN NaN NaN 40.634497 -73.954167 (40.63449684441219, -73.95416735372353) 2015-04-01 16:59:19
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2015-04-01 17:12:09 30313532 04/01/2015 05:12:09 PM 04/30/2015 06:02:47 PM DOT Department of Transportation Street Condition Line/Marking - Faded Street 11207 NaN ... NaN NaN NaN NaN NaN NaN 40.679561 -73.898899 (40.67956105192572, -73.89889884573184) 2015-04-01 17:12:09
2015-04-01 17:09:29 30311473 04/01/2015 05:09:29 PM 04/30/2015 05:59:38 PM DOT Department of Transportation Street Condition Line/Marking - Faded Street 11203 NaN ... NaN NaN NaN NaN NaN NaN 40.658529 -73.939568 (40.6585289219231, -73.93956820621213) 2015-04-01 17:09:29
2015-04-01 18:30:22 30307427 04/01/2015 06:30:22 PM 05/06/2015 10:59:47 AM DOT Department of Transportation Street Condition Failed Street Repair Street 11234 J AVENUE ... NaN NaN NaN NaN NaN NaN 40.628542 -73.921838 (40.62854243316789, -73.92183818389044) 2015-04-01 18:30:22
2015-04-01 21:07:21 30314301 04/01/2015 09:07:21 PM 05/08/2015 11:30:22 AM TLC Taxi and Limousine Commission Taxi Complaint Driver Complaint NaN 10001 511 WEST 25 STREET ... NaN NaN NaN NaN NaN NaN 40.749380 -74.004169 (40.74937996228322, -74.00416853967121) 2015-04-01 21:07:21
2015-04-01 10:50:12 30312508 04/01/2015 10:50:12 AM 05/08/2015 10:21:38 AM TLC Taxi and Limousine Commission Taxi Complaint Driver Complaint NaN 10032 NaN ... NaN NaN NaN NaN NaN NaN 40.842154 -73.942278 (40.84215388602991, -73.94227827092928) 2015-04-01 10:50:12
2015-04-01 09:07:38 30310225 04/01/2015 09:07:38 AM 05/04/2015 10:43:15 AM DPR Department of Parks and Recreation Root/Sewer/Sidewalk Condition Trees and Sidewalks Program Street 10307 647 CRAIG AVENUE ... NaN NaN NaN NaN NaN NaN 40.506708 -74.252182 (40.50670803830861, -74.25218246259357) 2015-04-01 09:07:38
2015-04-01 16:18:25 30313554 04/01/2015 04:18:25 PM 05/08/2015 11:29:12 AM TLC Taxi and Limousine Commission Taxi Complaint Driver Complaint NaN 11369 22-19 93 STREET ... NaN NaN NaN NaN NaN NaN 40.769574 -73.877480 (40.769573850244676, -73.8774799367093) 2015-04-01 16:18:25
2015-04-01 10:23:09 30313061 04/01/2015 10:23:09 AM 05/07/2015 02:19:57 PM TLC Taxi and Limousine Commission Taxi Complaint Driver Complaint Street 10021 NaN ... NaN NaN NaN NaN NaN NaN 40.765546 -73.954702 (40.765545913197165, -73.95470170187454) 2015-04-01 10:23:09
2015-04-01 14:31:57 30312110 04/01/2015 02:31:57 PM 05/08/2015 06:05:44 PM DPR Department of Parks and Recreation Dead Tree Dead/Dying Tree Street 11229 2056 EAST 29 STREET ... NaN NaN NaN NaN NaN NaN 40.601403 -73.943106 (40.60140342407911, -73.94310580244269) 2015-04-01 14:31:57
2015-04-01 18:50:19 30307758 04/01/2015 06:50:19 PM 05/07/2015 07:46:48 AM DPR Department of Parks and Recreation Damaged Tree Branch Cracked and Will Fall Street NaN NaN ... NaN NaN NaN NaN NaN NaN 40.720103 -73.790376 (40.72010305201917, -73.79037648278602) 2015-04-01 18:50:19
2015-04-01 14:03:43 30313462 04/01/2015 02:03:43 PM 05/06/2015 12:48:52 PM DOT Department of Transportation Street Condition Blocked - Construction Street 11209 NaN ... NaN NaN NaN NaN NaN NaN 40.633428 -74.032876 (40.63342806685948, -74.03287604669814) 2015-04-01 14:03:43
2015-04-01 11:59:20 30310246 04/01/2015 11:59:20 AM 11/09/2015 03:58:34 PM DOT Department of Transportation Street Condition Rough, Pitted or Cracked Roads Street 11217 90 PROSPECT PLACE ... NaN NaN NaN NaN NaN NaN 40.679040 -73.974579 (40.67903998236064, -73.97457889877462) 2015-04-01 11:59:20
2015-04-01 09:17:40 30310085 04/01/2015 09:17:40 AM 05/07/2015 06:53:11 PM DOT Department of Transportation Highway Condition Graffiti - Highway Highway NaN NaN ... West/Staten Island Bound Roadway Crospey Ave Stillwell Ave (Exit 6N) - Crospey ... NaN NaN NaN NaN NaN NaN 2015-04-01 09:17:40
2015-04-01 21:13:08 30314474 04/01/2015 09:13:08 PM 05/08/2015 11:27:01 AM TLC Taxi and Limousine Commission Taxi Complaint Driver Complaint NaN 11429 214-16 110 AVENUE ... NaN NaN NaN NaN NaN NaN 40.708131 -73.743041 (40.70813050331176, -73.74304104617282) 2015-04-01 21:13:08
2015-04-01 12:59:08 30308968 04/01/2015 12:59:08 PM 04/01/2015 12:59:23 PM HRA HRA Benefit Card Replacement Benefit Card Replacement Medicaid Address Outside of NYC NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 12:59:08
2015-04-01 13:31:23 30308389 04/01/2015 01:31:23 PM 04/01/2015 01:32:08 PM HRA HRA Benefit Card Replacement Benefit Card Replacement Medicaid NYC Street Address NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 13:31:23
2015-04-01 16:42:15 30309377 04/01/2015 04:42:15 PM 04/01/2015 04:43:11 PM HRA HRA Benefit Card Replacement Benefit Card Replacement Food Stamp NYC Street Address NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 16:42:15
2015-04-01 13:37:07 30310992 04/01/2015 01:37:07 PM 04/01/2015 01:37:28 PM HRA HRA Benefit Card Replacement Benefit Card Replacement Food Stamp NYC Street Address NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-04-01 13:37:07
2015-04-01 23:44:04 30310652 04/01/2015 11:44:04 PM 04/02/2015 01:25:52 AM NYPD New York City Police Department Derelict Vehicle With License Plate Street/Sidewalk 11421 85-86 87 STREET ... NaN NaN NaN NaN NaN NaN 40.694888 -73.857927 (40.69488849346232, -73.85792744070989) 2015-04-01 23:44:04
2015-04-01 16:32:12 30309028 04/01/2015 04:32:12 PM 05/20/2015 05:36:29 PM TLC Taxi and Limousine Commission For Hire Vehicle Complaint Car Service Company Complaint Street 10451 215 EAST 161 STREET ... NaN NaN NaN NaN NaN NaN 40.826235 -73.920529 (40.8262353417949, -73.92052920426786) 2015-04-01 16:32:12
2015-04-01 08:26:06 30312622 04/01/2015 08:26:06 AM 06/01/2015 06:25:41 AM DOHMH Department of Health and Mental Hygiene Food Establishment Facility Construction Restaurant/Bar/Deli/Bakery 11234 2301 FLATBUSH AVENUE ... NaN NaN NaN NaN NaN NaN 40.613889 -73.927186 (40.61388875283825, -73.92718600732812) 2015-04-01 08:26:06
2015-04-01 15:08:20 30308371 04/01/2015 03:08:20 PM 06/01/2015 06:18:02 PM TLC Taxi and Limousine Commission Taxi Complaint Driver Complaint Street 10128 NaN ... NaN NaN NaN NaN NaN NaN 40.781533 -73.958320 (40.78153263581957, -73.9583197488706) 2015-04-01 15:08:20
2015-04-01 10:19:21 30311001 04/01/2015 10:19:21 AM 06/01/2015 06:25:39 AM DOHMH Department of Health and Mental Hygiene Food Establishment Rodents/Insects/Garbage Restaurant/Bar/Deli/Bakery 11377 59-21 ROOSEVELT AVENUE ... NaN NaN NaN NaN NaN NaN 40.745586 -73.904573 (40.74558568959288, -73.90457292624892) 2015-04-01 10:19:21
2015-04-01 20:20:13 30311341 04/01/2015 08:20:13 PM 04/01/2015 10:49:32 PM NYPD New York City Police Department Blocked Driveway Partial Access Street/Sidewalk 11691 348 BEACH 40 STREET ... NaN NaN NaN NaN NaN NaN 40.595019 -73.772153 (40.5950185756628, -73.77215306630436) 2015-04-01 20:20:13
2015-04-01 02:16:44 30308863 04/01/2015 02:16:44 AM 04/01/2015 02:54:17 AM NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 10013 301 CHURCH STREET ... NaN NaN NaN NaN NaN NaN 40.719322 -74.004470 (40.71932215308254, -74.00446968948569) 2015-04-01 02:16:44
2015-04-01 13:12:58 30307673 04/01/2015 01:12:58 PM 04/01/2015 10:01:26 PM NYPD New York City Police Department Illegal Parking Posted Parking Sign Violation Street/Sidewalk 10306 200 ADELAIDE AVENUE ... NaN NaN NaN NaN NaN NaN 40.561690 -74.124622 (40.5616902523158, -74.12462211525013) 2015-04-01 13:12:58
2015-04-01 13:17:23 30307732 04/01/2015 01:17:23 PM 04/01/2015 01:31:22 PM NYPD New York City Police Department Traffic Congestion/Gridlock Street/Sidewalk 10013 NaN ... NaN NaN NaN NaN NaN NaN 40.720557 -74.003510 (40.72055732795014, -74.00351016018516) 2015-04-01 13:17:23
2015-04-01 21:39:04 30311958 04/01/2015 09:39:04 PM 04/01/2015 09:50:48 PM NYPD New York City Police Department Noise - Vehicle Car/Truck Music Street/Sidewalk 11207 184 JEROME STREET ... NaN NaN NaN NaN NaN NaN 40.677739 -73.887888 (40.677739297670584, -73.8878875660618) 2015-04-01 21:39:04
2015-04-01 12:53:45 30309365 04/01/2015 12:53:45 PM 04/02/2015 12:04:38 PM DCA Department of Consumer Affairs Consumer Complaint Overcharge NaN 11418 NaN ... NaN NaN NaN NaN NaN NaN 40.700108 -73.832667 (40.70010803283339, -73.83266746664873) 2015-04-01 12:53:45
2015-04-01 10:46:01 30312487 04/01/2015 10:46:01 AM 04/02/2015 03:34:31 PM DCA Department of Consumer Affairs Consumer Complaint Damaged/Defective Goods NaN 11232 807 42 STREET ... NaN NaN NaN NaN NaN NaN 40.645348 -73.998616 (40.64534787518196, -73.99861625677346) 2015-04-01 10:46:01

573 rows × 54 columns

What was the most popular type of complaint on April 1st?


In [15]:
df["2015-04-01"].groupby('Complaint Type').count()["Unique Key"].sort_values(ascending=False).head(1)


Out[15]:
Complaint Type
Illegal Parking    67
Name: Unique Key, dtype: int64

What were the most popular three types of complaint on April 1st


In [16]:
df["2015-04-01"].groupby('Complaint Type').count()["Unique Key"].sort_values(ascending=False).head(3)


Out[16]:
Complaint Type
Illegal Parking     67
Street Condition    64
Blocked Driveway    58
Name: Unique Key, dtype: int64

What month has the most reports filed? How many? Graph it.


In [17]:
ax = df.resample('m')['Unique Key'].count().plot(figsize=(10,5))
ax.set_title("Reports filed per month")
ax.set_ylabel("Number of complaints")
ax.set_xlabel("Month")


Out[17]:
<matplotlib.text.Text at 0x128f1be10>

In [18]:
df.resample('m')['Unique Key'].count().max()
print("The month with most cases is May with",df.resample('m')['Unique Key'].count().max(), "cases.")


The month with most cases is May with 49715 cases.

What week of the year has the most reports filed? How many? Graph the weekly complaints.


In [19]:
ax = df.resample('W')['Unique Key'].count().plot(figsize=(10,5))
ax.set_title("Reports filed per week")
ax.set_ylabel("Number of complaints")
ax.set_xlabel("Week")

# weeknumbers= list(range(1,52))
# ax.set_xticks(weeknumbers)


Out[19]:
<matplotlib.text.Text at 0x106be7cf8>

In [20]:
print("The week with most cases is the second in May with",df.resample('W')['Unique Key'].count().max(), "cases.")


The week with most cases is the second in May with 13559 cases.

Noise complaints are a big deal. Use .str.contains to select noise complaints, and make an chart of when they show up annually. Then make a chart about when they show up every day (cyclic).


In [21]:
noise_df = df[df["Complaint Type"].str.contains("Noise ")]

In [22]:
noiseax = noise_df.resample('W')['Unique Key'].count().plot(figsize=(10,5))
noiseax.set_title("Noise complaints filed over the year")
noiseax.set_ylabel("Number of noise-related complaints")
noiseax.set_xlabel("Time of year")


Out[22]:
<matplotlib.text.Text at 0x10ab1a1d0>

In [23]:
noisedayax = noise_df.groupby(by=noise_df.index.hour)['Unique Key'].count().plot(figsize=(10,5))
noisedayax.set_title("Hour when noise complaints are filed")
noisedayax.set_ylabel("Number of noise complaints")
noisedayax.set_xticks([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23])
noisedayax.set_xlabel("Hour of the day")


Out[23]:
<matplotlib.text.Text at 0x10b4cbcf8>

Which were the top five days of the year for filing complaints? How many on each of those days? Graph it.


In [24]:
df.resample('D')['Unique Key'].count().sort_values(ascending=False).head(5)


Out[24]:
created_date
2015-10-28    2697
2015-11-09    2529
2015-05-04    2465
2015-05-11    2293
2015-10-29    2258
Name: Unique Key, dtype: int64

In [25]:
df.resample('D')['Unique Key'].count().sort_values().tail(5).plot(kind='barh')


Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ab44978>

What hour of the day are the most complaints? Graph a day of complaints.


In [26]:
df.groupby(by=df.index.hour)['Unique Key'].count()


Out[26]:
0     22427
1      3927
2      2400
3      1644
4      1460
5      1528
6      2687
7      4992
8      8157
9     11490
10    12343
11    12729
12    12469
13    11745
14    11205
15    11454
16    10966
17     9291
18     8965
19     7636
20     7322
21     7658
22     8085
23     7420
Name: Unique Key, dtype: int64

In [27]:
ax = df.groupby(by=df.index.hour)['Unique Key'].count().plot(figsize=(10,5))
ax.set_title("Hour when complaints are filed")
ax.set_ylabel("Number of complaints")
ax.set_xticks([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23])
ax.set_xlabel("Hour of the day")


Out[27]:
<matplotlib.text.Text at 0x10afd8748>

One of the hours has an odd number of complaints. What are the most common complaints at that hour, and what are the most common complaints the hour before and after?


In [28]:
# Midnight is is an outlier, complaint-number-wise
hourly_df = pd.DataFrame(df.groupby(df.index.hour)['Complaint Type'].value_counts())
hourly_df


Out[28]:
Complaint Type
Complaint Type
0 HEAT/HOT WATER 4534
Rodent 2112
PAINT/PLASTER 1946
UNSANITARY CONDITION 1820
PLUMBING 1502
Noise - Commercial 1338
Noise - Street/Sidewalk 1188
DOOR/WINDOW 1015
GENERAL 894
WATER LEAK 875
ELECTRIC 852
Blocked Driveway 701
FLOORING/STAIRS 683
Illegal Parking 588
Noise - Vehicle 336
SAFETY 296
APPLIANCE 258
Standing Water 213
Taxi Complaint 130
Unsanitary Animal Pvt Property 119
Graffiti 92
Benefit Card Replacement 83
Street Condition 74
OUTSIDE BUILDING 51
Noise - Park 50
Derelict Vehicle 49
Food Poisoning 41
ELEVATOR 32
Animal Abuse 31
Food Establishment 31
... ... ...
23 Noise 3
Public Payphone Complaint 3
SAFETY 3
Asbestos 2
Broken Parking Meter 2
Compliment 2
Ferry Inquiry 2
Illegal Animal Kept as Pet 2
Mold 2
Public Toilet 2
SCRIE 2
School Maintenance 2
Animal Facility - No Permit 1
Bike/Roller/Skate Chronic 1
City Vehicle Placard Complaint 1
Drinking Water 1
Elevator 1
Highway Sign - Damaged 1
Illegal Animal Sold 1
Litter Basket / Request 1
Misc. Comments 1
Missed Collection (All Materials) 1
Panhandling 1
Recycling Enforcement 1
Request for Information 1
Sanitation Condition 1
Senior Center Complaint 1
Sewer 1
Special Projects Inspection Team (SPIT) 1
Tattooing 1

2741 rows × 1 columns


In [29]:
#Most common complaint types at midnight
hourly_df['Complaint Type'][0].head(5)


Out[29]:
Complaint Type
HEAT/HOT WATER          4534
Rodent                  2112
PAINT/PLASTER           1946
UNSANITARY CONDITION    1820
PLUMBING                1502
Name: Complaint Type, dtype: int64

In [30]:
#Most common complaint types the hour before midnight
hourly_df['Complaint Type'][23].head(5)


Out[30]:
Complaint Type
Noise - Street/Sidewalk    1599
Noise - Commercial         1503
Blocked Driveway            973
Illegal Parking             882
Noise - Vehicle             478
Name: Complaint Type, dtype: int64

In [31]:
#Most common complaint types the hour after midnight
hourly_df['Complaint Type'][1].head(5)


Out[31]:
Complaint Type
Noise - Commercial         1025
Noise - Street/Sidewalk     897
Blocked Driveway            479
Illegal Parking             400
Noise - Vehicle             249
Name: Complaint Type, dtype: int64

So odd. What's the per-minute breakdown of complaints between 12am and 1am? You don't need to include 1am.


In [32]:
midnight_df = df[df.index.hour==0]
minutely_df = midnight_df.groupby(by=midnight_df.index.minute)

minax = minutely_df['Unique Key'].count().plot(figsize=(15,5))
minax.set_title("Complaints filed per minute during midnight hour")
minax.set_xlabel("Minutes of the hour")
minax.set_ylabel("Numbers of complaints filed")
minax.set_xticks([0,5,10,15,20,25,30,35,40,45,50,55,60])


Out[32]:
[<matplotlib.axis.XTick at 0x10ab26eb8>,
 <matplotlib.axis.XTick at 0x10b44b518>,
 <matplotlib.axis.XTick at 0x10afae550>,
 <matplotlib.axis.XTick at 0x10db707f0>,
 <matplotlib.axis.XTick at 0x10db73278>,
 <matplotlib.axis.XTick at 0x10db73cc0>,
 <matplotlib.axis.XTick at 0x10db77748>,
 <matplotlib.axis.XTick at 0x10b43cc88>,
 <matplotlib.axis.XTick at 0x10b844630>,
 <matplotlib.axis.XTick at 0x10db80588>,
 <matplotlib.axis.XTick at 0x10ab35e80>,
 <matplotlib.axis.XTick at 0x10e01ccf8>,
 <matplotlib.axis.XTick at 0x10e023780>]

Looks like midnight is a little bit of an outlier. Why might that be? Take the 5 most common agencies and graph the times they file reports at (all day, not just midnight).


In [33]:
midnight_df['Agency'].value_counts().sort_values(ascending=False).head(5)


Out[33]:
HPD      14768
NYPD      4356
DOHMH     2588
TLC        184
DOT        182
Name: Agency, dtype: int64

In [53]:
#Write test code for first agency
hpd_df = df[df['Agency'] == 'HPD']
ax = hpd_df.groupby(by=hpd_df.index.hour).count()['Unique Key'].plot(figsize=(12,7), label= 'HPD', legend=True, linewidth=2)
ax.set_title("Complaints per agency and time of the day")
ax.set_xlabel("Hour of the day")
ax.set_xticks([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23])
ax.set_ylabel("Number of complaints filed")

#Turn working code into a function:
def complaints_by_agency(agency_name):
    complaints_agency = df[df['Agency'] == agency_name]
    return complaints_agency.groupby(by=complaints_agency.index.hour).count()['Unique Key']

#run code for remaining agencies
for agency in ['NYPD', 'DOHMH', 'TLC', 'DOT']:
    complaints_by_agency(agency).plot(ax=ax, label = agency, legend=True, linewidth=2)


Graph those same agencies on an annual basis - make it weekly. When do people like to complain? When does the NYPD have an odd number of complaints?


In [70]:
#Write test code for first agency
#Copied code from above replacing groupby by resample('W')
hpd_df = df[df['Agency'] == 'HPD']
ax = hpd_df.resample('W')['Unique Key'].count().plot(figsize=(12,7), label= 'HPD', legend=True, linewidth=2)
ax.set_title("Complaints per agency and time of the year")
ax.set_xlabel("Week of the year")
ax.set_ylabel("Number of complaints filed")

#Turn working code into a function:
def agency_complaints_weekly(agency_name):
    complaints_weekly = df[df['Agency'] == agency_name]
    return complaints_weekly.resample('W')['Unique Key'].count()

#run code for remaining agencies
for agency in ['NYPD', 'DOHMH', 'TLC', 'DOT']:
    agency_complaints_weekly(agency).plot(ax=ax, label = agency, legend=True, linewidth=2)


NYPD has an odd number after the first week in May

Maybe the NYPD deals with different issues at different times? Check the most popular complaints in July and August vs the month of May. Also check the most common complaints for the Housing Preservation Bureau (HPD) in winter vs. summer.


In [105]:
def ag_complaints(agency_name, str_date_s, str_date_e):
    newdf = df[df['Agency'] == agency_name]
    newdf.resample('M')
    return newdf[str_date_s : str_date_e].groupby("Complaint Type")['Unique Key'].count().sort_values(ascending=False).head(10)

In [108]:
print("NYPD most popular complaints in July and August:")
ag_complaints('NYPD', '2015-07', '2015-08')


NYPD most popular complaints in July and August:
Out[108]:
Complaint Type
Illegal Parking            3444
Blocked Driveway           3258
Noise - Street/Sidewalk    3165
Noise - Commercial         1201
Noise - Vehicle             942
Derelict Vehicle            904
Animal Abuse                406
Noise - Park                331
Homeless Encampment         275
Vending                     205
Name: Unique Key, dtype: int64

In [107]:
print("NYPD most popular complaints in May:")
ag_complaints('NYPD', '2015-05', '2015-05')


NYPD most popular complaints in May:
Out[107]:
Complaint Type
Blocked Driveway           4114
Illegal Parking            3975
Noise - Street/Sidewalk    3385
Noise - Commercial         2263
Noise - Vehicle            1232
Derelict Vehicle           1000
Animal Abuse                417
Vending                     337
Traffic                     335
Homeless Encampment         264
Name: Unique Key, dtype: int64

In [106]:
print("Most common complaints filed at the Housing Preservation Bureau (HPD) in winter")
ag_complaints('HPD', '2015-11', '2016-02')


Most common complaints filed at the Housing Preservation Bureau (HPD) in winter
Out[106]:
Complaint Type
HEAT/HOT WATER            3189
UNSANITARY CONDITION       713
PAINT/PLASTER              598
PLUMBING                   532
DOOR/WINDOW                397
WATER LEAK                 302
FLOORING/STAIRS            290
ELECTRIC                   278
GENERAL                    272
HPD Literature Request     109
Name: Unique Key, dtype: int64

In [104]:
print("Most common complaints filed at the Housing Preservation Bureau (HPD) in summer")
ag_complaints('HPD', '2015-05', '2016-09')


Most common complaints filed at the Housing Preservation Bureau (HPD) in summer
Out[104]:
Complaint Type
HEAT/HOT WATER            12165
UNSANITARY CONDITION       4642
PAINT/PLASTER              4161
PLUMBING                   3293
HPD Literature Request     2679
DOOR/WINDOW                2285
WATER LEAK                 1985
GENERAL                    1836
ELECTRIC                   1831
FLOORING/STAIRS            1631
Name: Unique Key, dtype: int64