In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
import dateutil.parser

First, I made a mistake naming the data set! It's 2015 data, not 2014 data. But yes, still use 311-2014.csv. You can rename it.

Importing and preparing your data

Import your data, but only the first 200,000 rows. You'll also want to change the index to be a datetime based on the Created Date column - you'll want to check if it's already a datetime, and parse it if not.


In [2]:
df = pd.read_csv("311-2015.csv", nrows=200000)


/usr/local/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2723: DtypeWarning: Columns (8,17,48) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

In [3]:
#total number of rows 
print("Total rows: {0}".format(len(df)))


Total rows: 200000

In [4]:
#Alternative way to do it
#df2 = df.ix[:200000]

In [6]:
print("Total rows: {0}".format(len(df))) #20000 data points+1 for columns name


Total rows: 200000

In [7]:
#you can see that df2 dataframe displays 20000 rows of data.
df.tail()


Out[7]:
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location
199995 30804200 06/09/2015 11:11:56 AM 06/09/2015 11:38:39 AM NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 10458 2971 WEBSTER AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.867659 -73.882990 (40.86765943109168, -73.88299009268677)
199996 30804335 06/09/2015 12:00:00 AM 06/10/2015 12:00:00 AM HPD Department of Housing Preservation and Develop... GENERAL COOKING GAS RESIDENTIAL BUILDING 11368 40-56 JUNCTION BOULEVARD ... NaN NaN NaN NaN NaN NaN NaN 40.748051 -73.868768 (40.74805092746579, -73.86876755648586)
199997 30509208 04/29/2015 02:29:00 PM 04/23/2015 10:55:00 PM DOT Department of Transportation Street Light Condition Street Light Out NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
199998 30516417 04/30/2015 12:17:53 PM 04/30/2015 02:55:58 PM NYPD New York City Police Department Traffic Congestion/Gridlock Street/Sidewalk 10017 NaN ... NaN NaN NaN NaN NaN NaN NaN 40.755301 -73.975344 (40.755300562504196, -73.97534387376678)
199999 30804324 06/09/2015 12:48:25 PM 06/09/2015 12:48:42 PM HRA HRA Benefit Card Replacement Benefit Card Replacement Medicaid NYC Street Address NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 53 columns


In [ ]:
#time variables like "Create Date" and "Closed Date" are not considered datatime
df.info()
df.columns

In [117]:
def parse_date(str_date):    
    return dateutil.parser.parse(str_date)

parse_date("07/06/2015 10:58:27 AM")


Out[117]:
datetime.datetime(2015, 7, 6, 10, 58, 27)

In [115]:
df['Created Date'].head(2)


Out[115]:
Date
2015-07-06 10:58:27    07/06/2015 10:58:27 AM
2015-07-03 13:26:29    07/03/2015 01:26:29 PM
Name: Created Date, dtype: object

In [118]:
df['Date'] = df['Created Date'].apply(parse_date)
df.head(2)


Out[118]:
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location Date
Date
2015-07-06 10:58:27 31015465 07/06/2015 10:58:27 AM 07/22/2015 01:07:20 AM DCA Department of Consumer Affairs Consumer Complaint Demand for Cash NaN 11360 27-16 203 STREET ... NaN NaN NaN NaN NaN NaN 40.773540 -73.788237 (40.773539552542, -73.78823697228408) 2015-07-06 10:58:27
2015-07-03 13:26:29 30997660 07/03/2015 01:26:29 PM 07/03/2015 02:08:20 PM NYPD New York City Police Department Vending In Prohibited Area Residential Building/House 10019 200 CENTRAL PARK SOUTH ... NaN NaN NaN NaN NaN NaN 40.767021 -73.979448 (40.76702142171206, -73.97944780718524) 2015-07-03 13:26:29

2 rows × 54 columns


In [10]:
#print(df['Date'])
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200000 entries, 0 to 199999
Data columns (total 54 columns):
Unique Key                        200000 non-null int64
Created Date                      200000 non-null object
Closed Date                       188913 non-null object
Agency                            200000 non-null object
Agency Name                       200000 non-null object
Complaint Type                    200000 non-null object
Descriptor                        198197 non-null object
Location Type                     179328 non-null object
Incident Zip                      181049 non-null object
Incident Address                  152173 non-null object
Street Name                       152152 non-null object
Cross Street 1                    108035 non-null object
Cross Street 2                    107583 non-null object
Intersection Street 1             24790 non-null object
Intersection Street 2             24530 non-null object
Address Type                      177091 non-null object
City                              181095 non-null object
Landmark                          127 non-null object
Facility Type                     80031 non-null object
Status                            199998 non-null object
Due Date                          152018 non-null object
Resolution Description            198936 non-null object
Resolution Action Updated Date    188529 non-null object
Community Board                   200000 non-null object
Borough                           200000 non-null object
X Coordinate (State Plane)        175825 non-null float64
Y Coordinate (State Plane)        175825 non-null float64
Park Facility Name                200000 non-null object
Park Borough                      200000 non-null object
School Name                       200000 non-null object
School Number                     199907 non-null object
School Region                     197128 non-null object
School Code                       197128 non-null object
School Phone Number               200000 non-null object
School Address                    200000 non-null object
School City                       200000 non-null object
School State                      200000 non-null object
School Zip                        199999 non-null object
School Not Found                  151897 non-null object
School or Citywide Complaint      0 non-null float64
Vehicle Type                      34 non-null object
Taxi Company Borough              434 non-null object
Taxi Pick Up Location             3680 non-null object
Bridge Highway Name               1960 non-null object
Bridge Highway Direction          1959 non-null object
Road Ramp                         1946 non-null object
Bridge Highway Segment            2134 non-null object
Garage Lot Name                   143 non-null object
Ferry Direction                   86 non-null object
Ferry Terminal Name               215 non-null object
Latitude                          175825 non-null float64
Longitude                         175825 non-null float64
Location                          175825 non-null object
Date                              200000 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(5), int64(1), object(47)
memory usage: 82.4+ MB

In [11]:
df.index = df['Date']

In [12]:
del df['Date']

In [13]:
df.head(10)


Out[13]:
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location
Date
2015-07-06 10:58:27 31015465 07/06/2015 10:58:27 AM 07/22/2015 01:07:20 AM DCA Department of Consumer Affairs Consumer Complaint Demand for Cash NaN 11360 27-16 203 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.773540 -73.788237 (40.773539552542, -73.78823697228408)
2015-07-03 13:26:29 30997660 07/03/2015 01:26:29 PM 07/03/2015 02:08:20 PM NYPD New York City Police Department Vending In Prohibited Area Residential Building/House 10019 200 CENTRAL PARK SOUTH ... NaN NaN NaN NaN NaN NaN NaN 40.767021 -73.979448 (40.76702142171206, -73.97944780718524)
2015-11-09 03:55:09 31950223 11/09/2015 03:55:09 AM 11/09/2015 08:08:57 AM NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 10453 1993 GRAND AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.852671 -73.910608 (40.85267061877697, -73.91060771362552)
2015-07-03 02:18:32 31000038 07/03/2015 02:18:32 AM 07/03/2015 07:54:48 AM NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 11372 84-16 NORTHERN BOULEVARD ... NaN NaN NaN NaN NaN NaN NaN 40.755774 -73.883262 (40.755773786469966, -73.88326243225418)
2015-07-04 00:03:27 30995614 07/04/2015 12:03:27 AM 07/04/2015 03:33:09 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11216 1057 BERGEN STREET ... NaN NaN NaN NaN NaN NaN NaN 40.676175 -73.951269 (40.67617516102934, -73.9512690004692)
2015-07-09 00:00:00 31042454 07/09/2015 12:00:00 AM 07/20/2015 12:00:00 AM DOHMH Department of Health and Mental Hygiene Standing Water Other - Explain Below Other NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2015-07-09 12:04:06 31043076 07/09/2015 12:04:06 PM NaN DPR Department of Parks and Recreation Root/Sewer/Sidewalk Condition Trees and Sidewalks Program Street 10469 3344 PEARSALL AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.873552 -73.851666 (40.8735519638795, -73.85166629554799)
2015-07-09 00:00:00 31037751 07/09/2015 12:00:00 AM NaN DOHMH Department of Health and Mental Hygiene Standing Water Puddle in Ground 3+ Family Apartment Building 10016 379 THIRD AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.741537 -73.981163 (40.741536747969185, -73.98116258383294)
2015-08-12 11:09:49 31298553 08/12/2015 11:09:49 AM 08/28/2015 01:06:41 AM DCA Department of Consumer Affairs Consumer Complaint Damaged/Defective Goods NaN 11420 127-19 111 AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.682584 -73.814056 (40.68258427168297, -73.81405649149323)
2015-09-09 21:59:03 31492526 09/09/2015 09:59:03 PM 09/09/2015 11:17:39 PM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11238 238 SAINT JAMES PLACE ... NaN NaN NaN NaN NaN NaN NaN 40.683308 -73.963775 (40.68330795503152, -73.96377504548408)

10 rows × 53 columns

What was the most popular type of complaint, and how many times was it filed?


In [14]:
df['Complaint Type'].describe()
#top= mode. Frequency: 21779 cases


Out[14]:
count               200000
unique                 180
top       Blocked Driveway
freq                 21779
Name: Complaint Type, dtype: object

In [15]:
#using mode function in complain type
df['Complaint Type'].mode()


Out[15]:
0    Blocked Driveway
dtype: object

Make a horizontal bar graph of the top 5 most frequent complaint types.


In [18]:
complains = df.groupby('Complaint Type').count()

In [20]:
complains.sort('Unique Key', ascending=False).head(5)


/usr/local/lib/python3.5/site-packages/ipykernel/__main__.py:1: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  if __name__ == '__main__':
Out[20]:
Unique Key Created Date Closed Date Agency Agency Name Descriptor Location Type Incident Zip Incident Address Street Name ... Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location
Complaint Type
Blocked Driveway 21779 21779 21711 21779 21779 21779 21771 21683 21410 21410 ... 0 0 0 0 0 0 0 21672 21672 21672
Illegal Parking 19837 19837 19622 19837 19837 19837 19826 19552 16248 16248 ... 0 0 0 0 0 0 0 19484 19484 19484
HEAT/HOT WATER 12408 12408 12341 12408 12408 12408 12408 12405 12408 12408 ... 0 0 0 0 0 0 0 12405 12405 12405
Noise - Street/Sidewalk 11949 11949 11805 11949 11949 11949 11947 11783 9754 9754 ... 0 0 0 0 0 0 0 11730 11730 11730
Noise - Commercial 9603 9603 9487 9603 9603 9603 9601 9478 8854 8854 ... 0 0 0 0 0 0 0 9468 9468 9468

5 rows × 52 columns


In [21]:
COM = complains.sort('Unique Key', ascending=False)


/usr/local/lib/python3.5/site-packages/ipykernel/__main__.py:1: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  if __name__ == '__main__':

In [30]:
COM['Unique Key'].head(5)


Out[30]:
Complaint Type
Blocked Driveway           21779
Illegal Parking            19837
HEAT/HOT WATER             12408
Noise - Street/Sidewalk    11949
Noise - Commercial          9603
Name: Unique Key, dtype: int64

In [109]:
COM['Unique Key'].head(5).sort_values().plot.barh()


Out[109]:
<matplotlib.axes._subplots.AxesSubplot at 0x10fd4d4e0>

In [ ]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
COMG.plot.hist()

Which borough has the most complaints per capita? Since it's only 5 boroughs, you can do the math manually.


In [61]:
#Now NaNs are 0
df["Incident Zip"].fillna(0, inplace=True)


Out[61]:
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Name Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location
Date
2015-07-06 10:58:27 31015465 07/06/2015 10:58:27 AM 07/22/2015 01:07:20 AM DCA Department of Consumer Affairs Consumer Complaint Demand for Cash NaN 11360 27-16 203 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.773540 -73.788237 (40.773539552542, -73.78823697228408)
2015-07-03 13:26:29 30997660 07/03/2015 01:26:29 PM 07/03/2015 02:08:20 PM NYPD New York City Police Department Vending In Prohibited Area Residential Building/House 10019 200 CENTRAL PARK SOUTH ... NaN NaN NaN NaN NaN NaN NaN 40.767021 -73.979448 (40.76702142171206, -73.97944780718524)
2015-11-09 03:55:09 31950223 11/09/2015 03:55:09 AM 11/09/2015 08:08:57 AM NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 10453 1993 GRAND AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.852671 -73.910608 (40.85267061877697, -73.91060771362552)
2015-07-03 02:18:32 31000038 07/03/2015 02:18:32 AM 07/03/2015 07:54:48 AM NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 11372 84-16 NORTHERN BOULEVARD ... NaN NaN NaN NaN NaN NaN NaN 40.755774 -73.883262 (40.755773786469966, -73.88326243225418)
2015-07-04 00:03:27 30995614 07/04/2015 12:03:27 AM 07/04/2015 03:33:09 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11216 1057 BERGEN STREET ... NaN NaN NaN NaN NaN NaN NaN 40.676175 -73.951269 (40.67617516102934, -73.9512690004692)
2015-07-09 00:00:00 31042454 07/09/2015 12:00:00 AM 07/20/2015 12:00:00 AM DOHMH Department of Health and Mental Hygiene Standing Water Other - Explain Below Other 0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2015-07-09 12:04:06 31043076 07/09/2015 12:04:06 PM NaN DPR Department of Parks and Recreation Root/Sewer/Sidewalk Condition Trees and Sidewalks Program Street 10469 3344 PEARSALL AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.873552 -73.851666 (40.8735519638795, -73.85166629554799)
2015-07-09 00:00:00 31037751 07/09/2015 12:00:00 AM NaN DOHMH Department of Health and Mental Hygiene Standing Water Puddle in Ground 3+ Family Apartment Building 10016 379 THIRD AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.741537 -73.981163 (40.741536747969185, -73.98116258383294)
2015-08-12 11:09:49 31298553 08/12/2015 11:09:49 AM 08/28/2015 01:06:41 AM DCA Department of Consumer Affairs Consumer Complaint Damaged/Defective Goods NaN 11420 127-19 111 AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.682584 -73.814056 (40.68258427168297, -73.81405649149323)
2015-09-09 21:59:03 31492526 09/09/2015 09:59:03 PM 09/09/2015 11:17:39 PM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11238 238 SAINT JAMES PLACE ... NaN NaN NaN NaN NaN NaN NaN 40.683308 -73.963775 (40.68330795503152, -73.96377504548408)
2015-09-09 12:12:46 31495596 09/09/2015 12:12:46 PM 12/15/2015 02:07:21 PM DPR Department of Parks and Recreation Overgrown Tree/Branches Hitting Power/Phone Lines Street 11412 197-16 LINDEN BOULEVARD ... NaN NaN NaN NaN NaN NaN NaN 40.693751 -73.754719 (40.693751390945614, -73.7547190161349)
2015-09-22 13:50:05 31593923 09/22/2015 01:50:05 PM NaN DPR Department of Parks and Recreation Root/Sewer/Sidewalk Condition Affecting Sewer or Foundation Street 11413 139-02 SOUTHGATE PLAZA ... NaN NaN NaN NaN NaN NaN NaN 40.673440 -73.758456 (40.67343967153766, -73.75845642658422)
2015-09-22 13:12:13 31593417 09/22/2015 01:12:13 PM 09/25/2015 11:20:14 AM DOB DOB Inspections - Queens Construction Initial - Construction Street Address 11692 158 SEA GRASS LANE ... NaN NaN NaN NaN NaN NaN NaN 40.590360 -73.792260 (40.59036031868076, -73.79225998389893)
2015-09-22 15:07:51 31593599 09/22/2015 03:07:51 PM NaN DPR Department of Parks and Recreation Overgrown Tree/Branches Hitting Power/Phone Lines Street 11001 86-15 262 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.733555 -73.704955 (40.7335546067691, -73.704954968102)
2015-04-28 18:26:58 30502370 04/28/2015 06:26:58 PM 04/28/2015 07:29:34 PM NYPD New York City Police Department Noise - Commercial Car/Truck Music Store/Commercial 10035 1911 MADISON AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.804617 -73.941505 (40.80461674564084, -73.9415053197214)
2015-04-28 17:54:46 30498881 04/28/2015 05:54:46 PM 10/08/2015 04:30:57 PM DOT Department of Transportation Street Condition Line/Marking - Faded Street 11420 NaN ... NaN NaN NaN NaN NaN NaN NaN 40.666895 -73.805453 (40.666894598092256, -73.80545262528688)
2015-09-13 13:35:02 31524474 09/13/2015 01:35:02 PM 09/13/2015 10:02:02 PM NYPD New York City Police Department Derelict Vehicle With License Plate Street/Sidewalk 11249 84 SOUTH 10 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.708154 -73.965644 (40.70815416838622, -73.96564434164706)
2015-09-13 21:04:42 31527345 09/13/2015 09:04:42 PM 09/14/2015 01:51:56 AM NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 10461 2443 POPLAR STREET ... NaN NaN NaN NaN NaN NaN NaN 40.842972 -73.853234 (40.84297247714469, -73.85323437828721)
2015-07-04 16:57:07 31006258 07/04/2015 04:57:07 PM 09/03/2015 06:26:41 AM DOHMH Department of Health and Mental Hygiene Food Establishment Food Contaminated Restaurant/Bar/Deli/Bakery 10024 519 COLUMBUS AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.785904 -73.972588 (40.785903785020274, -73.97258810089549)
2015-05-21 19:01:52 30668699 05/21/2015 07:01:52 PM 05/21/2015 09:56:29 PM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 10026 8 WEST 111 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.797731 -73.949399 (40.79773121644539, -73.94939942634502)
2015-07-13 01:14:41 31060994 07/13/2015 01:14:41 AM 07/13/2015 07:20:37 AM NYPD New York City Police Department Illegal Parking Blocked Hydrant Street/Sidewalk 11229 2365 EAST 13 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.592868 -73.957156 (40.59286770783599, -73.9571556744473)
2015-07-28 10:16:21 31181885 07/28/2015 10:16:21 AM 07/28/2015 11:52:28 AM NYPD New York City Police Department Illegal Parking Double Parked Blocking Traffic Street/Sidewalk 10029 PARK AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.786883 -73.952307 (40.78688253597128, -73.95230743177314)
2015-05-21 20:40:35 30673749 05/21/2015 08:40:35 PM 05/21/2015 11:19:33 PM NYPD New York City Police Department Blocked Driveway Partial Access Street/Sidewalk 11201 125 BOERUM PLACE ... NaN NaN NaN NaN NaN NaN NaN 40.687708 -73.991663 (40.687707734021295, -73.99166342004337)
2015-05-21 04:43:05 30671660 05/21/2015 04:43:05 AM 05/21/2015 06:18:38 AM NYPD New York City Police Department Illegal Parking Blocked Hydrant Street/Sidewalk 11228 284 BAY 11 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.605001 -74.013815 (40.605001380818734, -74.01381466478367)
2015-05-21 18:31:40 30670824 05/21/2015 06:31:40 PM 05/21/2015 09:45:49 PM NYPD New York City Police Department Vending In Prohibited Area Street/Sidewalk 10004 NaN ... NaN NaN NaN NaN NaN NaN NaN 40.704549 -74.014290 (40.70454904344675, -74.01428973798721)
2015-08-31 15:35:00 31435374 08/31/2015 03:35:00 PM 09/01/2015 12:00:00 PM DSNY P - Manhattan and Bronx Dirty Conditions E12 Illegal Dumping Surveillance Sidewalk 10474 NaN ... NaN NaN NaN NaN NaN NaN NaN 40.819269 -73.884856 (40.81926933497002, -73.88485634621742)
2015-09-22 20:51:13 31591848 09/22/2015 08:51:13 PM 02/16/2016 01:44:34 PM DPR Department of Parks and Recreation Illegal Tree Damage Unauthorized Tree Removal NaN 11378 5310 69TH ST ... NaN NaN NaN NaN NaN NaN NaN 40.731155 -73.896021 (40.731154603318835, -73.8960206506641)
2015-09-22 10:29:56 31593561 09/22/2015 10:29:56 AM 10/13/2015 11:01:42 AM DPR Department of Parks and Recreation Damaged Tree Branch Cracked and Will Fall Street 11364 53-48 BELL BOULEVARD ... NaN NaN NaN NaN NaN NaN NaN 40.751748 -73.763532 (40.75174757106758, -73.76353224937567)
2015-09-03 14:13:35 31458870 09/03/2015 02:13:35 PM 09/19/2015 01:06:53 AM DCA Department of Consumer Affairs Consumer Complaint False Advertising NaN 11434 129-24 MERRICK BOULEVARD ... NaN NaN NaN NaN NaN NaN NaN 40.680562 -73.763384 (40.68056164808672, -73.76338446963653)
2015-09-22 13:06:44 31591347 09/22/2015 01:06:44 PM 11/11/2015 01:34:37 PM DPR Department of Parks and Recreation Root/Sewer/Sidewalk Condition Trees and Sidewalks Program Street 11358 35-27 171 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.764134 -73.794967 (40.764134495293504, -73.79496691492976)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2015-06-09 07:53:15 30804422 06/09/2015 07:53:15 AM 06/26/2015 08:00:47 AM DOT Department of Transportation Street Sign - Damaged St Name - Attached to Pole Street 0 8300-8498 85TH AVE ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2015-10-18 10:19:26 31780102 10/18/2015 10:19:26 AM 10/21/2015 09:26:54 AM HPD Department of Housing Preservation and Develop... HEAT/HOT WATER ENTIRE BUILDING RESIDENTIAL BUILDING 10468 2265 DR M L KING JR BOULEVARD ... NaN NaN NaN NaN NaN NaN NaN 40.860342 -73.907123 (40.86034205445221, -73.90712324420304)
2015-06-09 10:53:13 30804353 06/09/2015 10:53:13 AM 07/07/2015 01:00:44 PM DPR Department of Parks and Recreation Illegal Tree Damage Branches Damaged Street 11372 34-36 80 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.752717 -73.886758 (40.752716881470704, -73.88675799123796)
2015-04-29 09:18:00 30509207 04/29/2015 09:18:00 AM 04/27/2015 09:18:00 AM DOT Department of Transportation Street Light Condition Street Light Out NaN 0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2015-06-09 00:00:00 30803892 06/09/2015 12:00:00 AM NaN HPD Division of Alternative Management HEAT/HOT WATER ENTIRE BUILDING RESIDENTIAL BUILDING 11238 840 WASHINGTON AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.673037 -73.962854 (40.673036716903944, -73.96285414071265)
2015-01-09 21:51:45 29682375 01/09/2015 09:51:45 PM 01/14/2015 09:37:31 AM DOT Department of Transportation Street Sign - Missing Stop Street 11357 NaN ... NaN NaN NaN NaN NaN NaN NaN 40.784673 -73.802548 (40.784672939515595, -73.80254825723279)
2015-06-09 17:17:35 30803802 06/09/2015 05:17:35 PM 06/30/2015 09:14:42 AM DPR Department of Parks and Recreation Overgrown Tree/Branches Dead Branches in Tree Street 10021 49 EAST 74 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.772894 -73.963965 (40.77289410750929, -73.96396460036094)
2015-06-08 15:30:05 30802790 06/08/2015 03:30:05 PM 06/09/2015 03:33:25 PM HPD Department of Housing Preservation and Develop... HPD Literature Request The ABCs of Housing NaN 0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2015-06-08 21:00:10 30802223 06/08/2015 09:00:10 PM 06/08/2015 11:00:33 PM NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 11210 640 EAST 31 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.635339 -73.946950 (40.63533913576982, -73.94695005015714)
2015-06-09 11:06:41 30803899 06/09/2015 11:06:41 AM 06/18/2015 08:16:38 AM DOF Senior Citizen Rent Increase Exemption Unit SCRIE Application Renewal Senior Address 10034 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2015-04-30 18:31:10 30518393 04/30/2015 06:31:10 PM 05/28/2015 07:54:55 PM DOT Department of Transportation Highway Condition Graffiti - Highway Highway 0 NaN ... Henry Hudson Pkwy/Rt 9A South/Downtown Ramp Cross Bronx Expwy/GWB (Exit 14) NaN NaN NaN NaN NaN NaN
2015-06-08 22:42:14 30802631 06/08/2015 10:42:14 PM 06/09/2015 02:50:42 AM NYPD New York City Police Department Derelict Vehicle With License Plate Street/Sidewalk 10040 738 WEST 189 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.855431 -73.933718 (40.85543091419037, -73.93371755520158)
2015-08-17 12:43:10 31329678 08/17/2015 12:43:10 PM 08/31/2015 11:05:55 AM HPD Department of Housing Preservation and Develop... WATER LEAK HEAVY FLOW RESIDENTIAL BUILDING 11213 913 PARK PLACE ... NaN NaN NaN NaN NaN NaN NaN 40.673258 -73.946981 (40.673258313508946, -73.94698117385035)
2015-04-30 05:56:19 30515724 04/30/2015 05:56:19 AM 04/30/2015 02:34:42 PM DOT Department of Transportation Street Condition Rough, Pitted or Cracked Roads Street 10474 NaN ... NaN NaN NaN NaN NaN NaN NaN 40.809408 -73.880297 (40.809408390457705, -73.88029651562117)
2015-06-08 15:30:46 30802923 06/08/2015 03:30:46 PM 06/09/2015 06:16:42 AM NYPD New York City Police Department Blocked Driveway Partial Access Street/Sidewalk 11213 1310 PRESIDENT STREET ... NaN NaN NaN NaN NaN NaN NaN 40.667829 -73.946351 (40.66782885279248, -73.94635106011667)
2015-03-09 09:56:11 30130812 03/09/2015 09:56:11 AM 03/09/2015 09:56:37 AM HRA HRA Benefit Card Replacement Benefit Card Replacement Medicaid NYC Street Address 0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2015-08-17 09:20:48 31329672 08/17/2015 09:20:48 AM 08/31/2015 12:34:12 PM HPD Department of Housing Preservation and Develop... WATER LEAK HEAVY FLOW RESIDENTIAL BUILDING 11221 755 GATES AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.687833 -73.936596 (40.68783329691167, -73.93659571341877)
2015-08-17 09:57:54 31329661 08/17/2015 09:57:54 AM 08/31/2015 11:05:14 AM HPD Department of Housing Preservation and Develop... WATER LEAK HEAVY FLOW RESIDENTIAL BUILDING 10453 95 WEST 183 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.859896 -73.908313 (40.8598956200191, -73.90831324365298)
2015-04-30 10:04:39 30514412 04/30/2015 10:04:39 AM 04/30/2015 05:27:31 PM NYPD New York City Police Department Illegal Parking Blocked Sidewalk Street/Sidewalk 11368 110-09 37 AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.754142 -73.857213 (40.75414150878806, -73.85721273588771)
2015-06-09 07:40:17 30804658 06/09/2015 07:40:17 AM 06/09/2015 02:52:00 PM NYPD New York City Police Department Illegal Parking Blocked Hydrant Street/Sidewalk 10471 4626 ARLINGTON AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.893127 -73.909957 (40.8931269073581, -73.90995651445718)
2015-10-22 21:32:33 31852899 10/22/2015 09:32:33 PM 12/17/2015 11:51:46 AM TLC Taxi and Limousine Commission Taxi Complaint Driver Complaint Street 11201 NaN ... NaN NaN NaN NaN NaN NaN NaN 40.691457 -73.987347 (40.69145669611528, -73.98734658135137)
2015-10-22 16:42:08 31841768 10/22/2015 04:42:08 PM 10/23/2015 07:35:22 AM NYPD New York City Police Department Derelict Vehicle With License Plate Street/Sidewalk 11417 90-07 107 AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.678300 -73.847760 (40.678299795854336, -73.84775982804008)
2015-04-30 21:20:00 30520027 04/30/2015 09:20:00 PM 04/30/2015 10:13:14 PM NYPD New York City Police Department Illegal Parking Blocked Hydrant Street/Sidewalk 11214 19 BAY 14 STREET ... NaN NaN NaN NaN NaN NaN NaN 40.608625 -74.006151 (40.6086251765526, -74.00615138898827)
2015-06-10 12:08:46 30812148 06/10/2015 12:08:46 PM 06/18/2015 04:50:34 PM DOF Senior Citizen Rent Increase Exemption Unit SCRIE TAC Report Senior Address 11204 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2015-04-16 21:27:44 30413818 04/16/2015 09:27:44 PM 04/16/2015 10:14:59 PM NYPD New York City Police Department Noise - Commercial Loud Talking Store/Commercial 10013 2 DESBROSSES STREET ... NaN NaN NaN NaN NaN NaN NaN 40.723310 -74.008215 (40.72331008108801, -74.0082147635032)
2015-06-09 11:11:56 30804200 06/09/2015 11:11:56 AM 06/09/2015 11:38:39 AM NYPD New York City Police Department Blocked Driveway No Access Street/Sidewalk 10458 2971 WEBSTER AVENUE ... NaN NaN NaN NaN NaN NaN NaN 40.867659 -73.882990 (40.86765943109168, -73.88299009268677)
2015-06-09 00:00:00 30804335 06/09/2015 12:00:00 AM 06/10/2015 12:00:00 AM HPD Department of Housing Preservation and Develop... GENERAL COOKING GAS RESIDENTIAL BUILDING 11368 40-56 JUNCTION BOULEVARD ... NaN NaN NaN NaN NaN NaN NaN 40.748051 -73.868768 (40.74805092746579, -73.86876755648586)
2015-04-29 14:29:00 30509208 04/29/2015 02:29:00 PM 04/23/2015 10:55:00 PM DOT Department of Transportation Street Light Condition Street Light Out NaN 0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2015-04-30 12:17:53 30516417 04/30/2015 12:17:53 PM 04/30/2015 02:55:58 PM NYPD New York City Police Department Traffic Congestion/Gridlock Street/Sidewalk 10017 NaN ... NaN NaN NaN NaN NaN NaN NaN 40.755301 -73.975344 (40.755300562504196, -73.97534387376678)
2015-06-09 12:48:25 30804324 06/09/2015 12:48:25 PM 06/09/2015 12:48:42 PM HRA HRA Benefit Card Replacement Benefit Card Replacement Medicaid NYC Street Address 0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

200000 rows × 53 columns


In [92]:
complains = df['Borough'].value_counts()


Out[92]:
BROOKLYN         57129
QUEENS           46824
MANHATTAN        42050
BRONX            29610
Unspecified      17000
STATEN ISLAND     7387
Name: Borough, dtype: int64

In [112]:
#brooklyn 2592000
#staten island 472621 
# 1419000
#Man 1626000
#queens 2296000

bcount = pd.DataFrame(df['Borough'].value_counts())
bcount['name'] = bcount.index
bcount


Out[112]:
Borough name
BROOKLYN 57129 BROOKLYN
QUEENS 46824 QUEENS
MANHATTAN 42050 MANHATTAN
BRONX 29610 BRONX
Unspecified 17000 Unspecified
STATEN ISLAND 7387 STATEN ISLAND

In [113]:
#Merged in pandas need to be learn to solve this!

According to your selection of data, how many cases were filed in March? How about May?


In [97]:
March = df['2015-03']['Unique Key'].count()
May = df['2015-05']['Unique Key'].count()
print("March:", March)
print("May:", May)


March: 15025
May: 49715

I'd like to see all of the 311 complaints called in on April 1st.

Surprise! We couldn't do this in class, but it was just a limitation of our data set


In [98]:
df['2015-04-01']['Unique Key'].count()


Out[98]:
573

What was the most popular type of complaint on April 1st?

What were the most popular three types of complaint on April 1st


In [108]:
top1 = df['2015-04-01']['Complaint Type'].value_counts().head(1)
top3 = df['2015-04-01']['Complaint Type'].value_counts().head(3)
print("April#1:", top1)
print("----------------------")
print("April TOP3:", top3)


April#1: Illegal Parking    67
Name: Complaint Type, dtype: int64
----------------------
April TOP3: Illegal Parking     67
Street Condition    64
Blocked Driveway    58
Name: Complaint Type, dtype: int64

What month has the most reports filed? How many? Graph it.


In [119]:
df.index.month[:15]


Out[119]:
array([ 7,  7, 11,  7,  7,  7,  7,  7,  8,  9,  9,  9,  9,  9,  4], dtype=int32)

In [121]:
df.groupby(by=df.index.month)['Unique Key'].count()


Out[121]:
1      7094
2      8141
3     15025
4     20087
5     49715
6     14459
7     15047
8     12204
9     13679
10    24700
11    16476
12     3373
Name: Unique Key, dtype: int64

In [122]:
df.groupby(by=df.index.month)['Unique Key'].count().plot() #all months no matter what year


Out[122]:
<matplotlib.axes._subplots.AxesSubplot at 0x110e13dd8>

In [123]:
df.resample('M')['Unique Key'].count() #resample takes care of the month of this year.


Out[123]:
Date
2015-01-31     7091
2015-02-28     8141
2015-03-31    15025
2015-04-30    20087
2015-05-31    49715
2015-06-30    14459
2015-07-31    15047
2015-08-31    12204
2015-09-30    13679
2015-10-31    24700
2015-11-30    16476
2015-12-31     3373
2016-01-31        3
Freq: M, Name: Unique Key, dtype: int64

In [124]:
df.resample('M')['Unique Key'].count().plot()


Out[124]:
<matplotlib.axes._subplots.AxesSubplot at 0x111f00358>

What week of the year has the most reports filed? How many? Graph the weekly complaints.


In [125]:
df.index.week


Out[125]:
array([28, 27, 46, ..., 18, 18, 24], dtype=int32)

In [126]:
df.groupby(by=df.index.week)['Unique Key'].count().plot()


Out[126]:
<matplotlib.axes._subplots.AxesSubplot at 0x10fd9dbe0>

In [127]:
df.resample('W')['Unique Key'].count().plot()


Out[127]:
<matplotlib.axes._subplots.AxesSubplot at 0x10fd7dbe0>

Noise complaints are a big deal. Use .str.contains to select noise complaints, and make an chart of when they show up annually. Then make a chart about when they show up every day (cyclic).


In [130]:
noise = df['Complaint Type'].str.contains("Noise")
noise_df = df[noise]
noise_df.head(2)


Out[130]:
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location Date
Date
2015-07-03 02:18:32 31000038 07/03/2015 02:18:32 AM 07/03/2015 07:54:48 AM NYPD New York City Police Department Noise - Commercial Loud Music/Party Club/Bar/Restaurant 11372 84-16 NORTHERN BOULEVARD ... NaN NaN NaN NaN NaN NaN 40.755774 -73.883262 (40.755773786469966, -73.88326243225418) 2015-07-03 02:18:32
2015-07-04 00:03:27 30995614 07/04/2015 12:03:27 AM 07/04/2015 03:33:09 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11216 1057 BERGEN STREET ... NaN NaN NaN NaN NaN NaN 40.676175 -73.951269 (40.67617516102934, -73.9512690004692) 2015-07-04 00:03:27

2 rows × 54 columns


In [133]:
noise_df.resample('W')['Unique Key'].count().plot()


Out[133]:
<matplotlib.axes._subplots.AxesSubplot at 0x111472978>

In [134]:
noise_df.resample('H')['Unique Key'].count().plot() #resample emember what day, month, hour and year. Waht we want is
#by day only.


Out[134]:
<matplotlib.axes._subplots.AxesSubplot at 0x111598e10>

In [135]:
noise_df.groupby(by=noise_df.index.hour)['Unique Key'].count().plot()


Out[135]:
<matplotlib.axes._subplots.AxesSubplot at 0x1115d90f0>

Which were the top five days of the year for filing complaints? How many on each of those days? Graph it.


In [140]:
df['Unique Key'].resample('D').count().sort_values(ascending=True).head(5).plot.barh()


Out[140]:
<matplotlib.axes._subplots.AxesSubplot at 0x111f8f4a8>

In [ ]:

What hour of the day are the most complaints? Graph a day of complaints.


In [ ]:
df.groupby()

In [ ]:

One of the hours has an odd number of complaints. What are the most common complaints at that hour, and what are the most common complaints the hour before and after?


In [152]:
midnigth_df = df[df.index.hour == 0]

In [155]:
midnigth_df.head(2)


Out[155]:
Unique Key Created Date Closed Date Agency Agency Name Complaint Type Descriptor Location Type Incident Zip Incident Address ... Bridge Highway Direction Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction Ferry Terminal Name Latitude Longitude Location Date
Date
2015-07-04 00:03:27 30995614 07/04/2015 12:03:27 AM 07/04/2015 03:33:09 AM NYPD New York City Police Department Noise - Street/Sidewalk Loud Talking Street/Sidewalk 11216 1057 BERGEN STREET ... NaN NaN NaN NaN NaN NaN 40.676175 -73.951269 (40.67617516102934, -73.9512690004692) 2015-07-04 00:03:27
2015-07-09 00:00:00 31042454 07/09/2015 12:00:00 AM 07/20/2015 12:00:00 AM DOHMH Department of Health and Mental Hygiene Standing Water Other - Explain Below Other 0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015-07-09 00:00:00

2 rows × 54 columns


In [ ]:

So odd. What's the per-minute breakdown of complaints between 12am and 1am? You don't need to include 1am.


In [ ]:
midnigth_df.groupby(by=midnigth_df.index.minute)['Unique Key'].

Looks like midnight is a little bit of an outlier. Why might that be? Take the 5 most common agencies and graph the times they file reports at (all day, not just midnight).


In [ ]:

Graph those same agencies on an annual basis - make it weekly. When do people like to complain? When does the NYPD have an odd number of complaints?


In [ ]:

Maybe the NYPD deals with different issues at different times? Check the most popular complaints in July and August vs the month of May. Also check the most common complaints for the Housing Preservation Bureau (HPD) in winter vs. summer.


In [157]:
df[df['Agency'] == 'NYPD']['2015-07':'2015-08']['Complaint Type'].value_counts().head(5)


Out[157]:
Illegal Parking            3444
Blocked Driveway           3258
Noise - Street/Sidewalk    3165
Noise - Commercial         1201
Noise - Vehicle             942
Name: Complaint Type, dtype: int64

In [159]:
df[df['Agency'] == 'NYPD']['2015-06':'2015-08']['Complaint Type'].value_counts().head(5)


Out[159]:
Illegal Parking            4769
Blocked Driveway           4646
Noise - Street/Sidewalk    3977
Noise - Commercial         1801
Derelict Vehicle           1260
Name: Complaint Type, dtype: int64

In [158]:
df[df['Agency'] == 'HPD']['2015-07':'2015-08']['Complaint Type'].value_counts().head(5)


Out[158]:
UNSANITARY CONDITION    193
PAINT/PLASTER           130
PLUMBING                 96
WATER LEAK               73
ELECTRIC                 65
Name: Complaint Type, dtype: int64

In [162]:
df[df['Agency'] == 'HPD']['2015-01':'2015-02']['Complaint Type'].value_counts().head(5)


Out[162]:
UNSANITARY CONDITION    8
GENERAL                 3
PAINT/PLASTER           3
WATER LEAK              2
APPLIANCE               2
Name: Complaint Type, dtype: int64

In [164]:
df[df['Agency'] == 'HPD']['2015-05':'2015-08']['Complaint Type'].value_counts()


Out[164]:
HEAT/HOT WATER            4200
HPD Literature Request    2424
UNSANITARY CONDITION      2063
PAINT/PLASTER             2015
PLUMBING                  1555
DOOR/WINDOW               1051
GENERAL                    978
WATER LEAK                 925
ELECTRIC                   917
FLOORING/STAIRS            714
APPLIANCE                  294
SAFETY                     292
OUTSIDE BUILDING            51
ELEVATOR                    36
Name: Complaint Type, dtype: int64

In [ ]: