In [754]:
import numpy as np 
import pandas as pd
import datetime
import matplotlib as plt
import re
%matplotlib inline

In [755]:
df = pd.read_csv('violations.csv', nrows=1000, parse_dates=['Issue Date'])

In [756]:
df.columns


Out[756]:
Index(['Summons Number', 'Plate ID', 'Registration State', 'Plate Type',
       'Issue Date', 'Violation Code', 'Vehicle Body Type', 'Vehicle Make',
       'Issuing Agency', 'Street Code1', 'Street Code2', 'Street Code3',
       'Vehicle Expiration Date', 'Violation Location', 'Violation Precinct',
       'Issuer Precinct', 'Issuer Code', 'Issuer Command', 'Issuer Squad',
       'Violation Time', 'Time First Observed', 'Violation County',
       'Violation In Front Of Or Opposite', 'House Number', 'Street Name',
       'Intersecting Street', 'Date First Observed', 'Law Section',
       'Sub Division', 'Violation Legal Code', 'Days Parking In Effect    ',
       'From Hours In Effect', 'To Hours In Effect', 'Vehicle Color',
       'Unregistered Vehicle?', 'Vehicle Year', 'Meter Number',
       'Feet From Curb', 'Violation Post Code', 'Violation Description',
       'No Standing or Stopping Violation', 'Hydrant Violation',
       'Double Parking Violation'],
      dtype='object')

In [757]:
df.dtypes


Out[757]:
Summons Number                                int64
Plate ID                                     object
Registration State                           object
Plate Type                                   object
Issue Date                           datetime64[ns]
Violation Code                                int64
Vehicle Body Type                            object
Vehicle Make                                 object
Issuing Agency                               object
Street Code1                                  int64
Street Code2                                  int64
Street Code3                                  int64
Vehicle Expiration Date                       int64
Violation Location                          float64
Violation Precinct                            int64
Issuer Precinct                               int64
Issuer Code                                   int64
Issuer Command                               object
Issuer Squad                                  int64
Violation Time                               object
Time First Observed                          object
Violation County                             object
Violation In Front Of Or Opposite            object
House Number                                 object
Street Name                                  object
Intersecting Street                          object
Date First Observed                           int64
Law Section                                   int64
Sub Division                                 object
Violation Legal Code                        float64
Days Parking In Effect                       object
From Hours In Effect                         object
To Hours In Effect                           object
Vehicle Color                                object
Unregistered Vehicle?                         int64
Vehicle Year                                  int64
Meter Number                                 object
Feet From Curb                                int64
Violation Post Code                         float64
Violation Description                       float64
No Standing or Stopping Violation           float64
Hydrant Violation                           float64
Double Parking Violation                    float64
dtype: object

1. I want to make sure my Plate ID is a string. Can't lose the leading zeroes!


In [758]:
# checks data type of each value in series Plate ID by printing if type does not equal string
# all values are strings
for x in df['Plate ID']: 
    if type(x) != str:
        print(type(x))

2. I don't think anyone's car was built in 0AD. Discard the '0's as NaN.


In [759]:
df['Vehicle Year'] = df['Vehicle Year'].replace(0, np.nan)
df['Vehicle Year'] = pd.to_datetime(df['Vehicle Year'], format='%Y', errors = 'coerce')

3. I want the dates to be dates! Read the read_csv documentation to find out how to make pandas automatically parse dates.


In [760]:
# see where I read in csv for inclusion of parse_date arg

4. "Date first observed" is a pretty weird column, but it seems like it has a date hiding inside. Using a function with .apply, transform the string (e.g. "20140324") into a Python date. Make the 0's show up as NaN.


In [761]:
df['Date First Observed'] = df['Date First Observed'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d', errors='coerce'))

5. "Violation time" is... not a time. Make it a time.


In [762]:
def remove_char(x):
    x = x[:-1]
    return x
def insert_char(x):
    x = x[:2] + ':' + x[2:]
    return x

In [763]:
# removes 'A' and 'P' from original values in Violation Time
df['Violation Time2'] = df['Violation Time'].apply(lambda x: remove_char(str(x)))
# inserts colon in to string 
df['Violation Time3'] = df['Violation Time2'].apply(lambda x: insert_char(str(x)))

In [764]:
#makes values datetime format, allowing .to_datetime to infer format from string
df['Violation Time4'] = df['Violation Time3'].apply(lambda x: pd.to_datetime(str(x), infer_datetime_format=True, errors='coerce'))
#selects only time from timestamp 
df['Violation Time'] = df['Violation Time4'].dt.time

In [765]:
# deletes temporary variables used to make violation time a time 
df=df.drop(df.columns[[43, 44, 45]], axis=1)
df.columns


Out[765]:
Index(['Summons Number', 'Plate ID', 'Registration State', 'Plate Type',
       'Issue Date', 'Violation Code', 'Vehicle Body Type', 'Vehicle Make',
       'Issuing Agency', 'Street Code1', 'Street Code2', 'Street Code3',
       'Vehicle Expiration Date', 'Violation Location', 'Violation Precinct',
       'Issuer Precinct', 'Issuer Code', 'Issuer Command', 'Issuer Squad',
       'Violation Time', 'Time First Observed', 'Violation County',
       'Violation In Front Of Or Opposite', 'House Number', 'Street Name',
       'Intersecting Street', 'Date First Observed', 'Law Section',
       'Sub Division', 'Violation Legal Code', 'Days Parking In Effect    ',
       'From Hours In Effect', 'To Hours In Effect', 'Vehicle Color',
       'Unregistered Vehicle?', 'Vehicle Year', 'Meter Number',
       'Feet From Curb', 'Violation Post Code', 'Violation Description',
       'No Standing or Stopping Violation', 'Hydrant Violation',
       'Double Parking Violation'],
      dtype='object')

6. There sure are a lot of colors of cars, too bad so many of them are the same. Make "BLK" and "BLACK", "WT" and "WHITE", and any other combinations that you notice.


In [766]:
def clean_color(x): 
    if x == 'BLK': 
        print(x)
        x = x.replace('BLK','BLACK')
        print('Now', x)
    elif x == 'BK': 
        print(x)
        x = x.replace('BK','BLACK')
        print('Now', x)
    elif x == 'W': 
        print(x)
        x = x.replace('W','WHITE')
        print('Now', x)
    elif x == 'WH': 
        print(x)
        x = x.replace('WH','WHITE')
        print('Now', x)
    elif x == 'WHT': 
        print(x)
        x = x.replace('WHT','WHITE')
        print('Now', x)
    elif x == 'WT': 
        print(x)
        x = x.replace('WT','WHITE')
        print('Now', x)
    elif x == 'GY': 
        print(x)
        x = x.replace('GY','GREY')
        print('Now', x)
    elif x == 'GRAY': 
        print(x)
        x = x.replace('GRAY','GREY')
        print('Now', x)
    elif x == 'GRY': 
        print(x)
        x = x.replace('GRY','GREY')
        print('Now', x)
    elif x == 'G/Y': 
        print(x)
        x = x.replace('G/Y','GREY')
        print('Now', x)
    elif x == 'BR': 
        print(x)
        x = x.replace('BR','BROWN')
        print('Now', x)
    elif x == 'BRW': 
        print(x)
        x = x.replace('BRW','BROWN')
        print('Now', x)
    elif x == 'TN': 
        print(x)
        x = x.replace('GY','TAN')
        print('Now', x)
    elif x == 'RD': 
        print(x)
        x = x.replace('RD','RED')
        print('Now', x)
    elif x == 'BL': 
        print(x)
        x = x.replace('BL','BLUE')
        print('Now', x)
    elif x == 'BLU': 
        print(x)
        x = x.replace('BLU','BLUE')
        print('Now', x)
    elif x == 'SIL':
        print(x)
        x = x.replace('SIL','SILVER')
        print('Now', x)  
    elif x == 'SILVE':
        print(x)
        x = x.replace('SILVE','SILVER')
        print('Now', x) 
    elif x == 'BURGA':
        print(x)
        x = x.replace('BURGA','BURGANDY')
        print('Now', x) 
    elif x == 'PURPL':
        print(x)
        x = x.replace('PURPL','PURPLE')
        print('Now', x) 
    elif x == 'PUR':
        print(x)
        x = x.replace('PUR','PURPLE')
        print('Now', x) 
    elif x == 'YELLO':
        print(x)
        x = x.replace('YELLO','YELLOW')
        print('Now', x) 
    elif x == 'YW':
        print(x)
        x = x.replace('YW','YELLOW')
        print('Now', x)
    elif x == 'OR':
        print(x)
        x = x.replace('OR','ORANGE')
        print('Now', x)
    elif x == 'GL':
        print(x)
        x = x.replace('GL','GOLD')
        print('Now', x)
    elif x == 'GR':
        print(x)
        x = x.replace('GR','GREEN')
        print('Now', x)
    elif x == 'GRN':
        print(x)
        x = x.replace('GRN','GREEN')
        print('Now', x)
    elif x == 'G':
        print(x)
        x = x.replace('G','GREEN')
        print('Now', x)
    return x

In [767]:
df['Vehicle Color'] = df['Vehicle Color'].apply(lambda x: clean_color(x))
df['Vehicle Color']


GY
Now GREY
WH
Now WHITE
WH
Now WHITE
BR
Now BROWN
RD
Now RED
GRAY
Now GREY
WH
Now WHITE
GY
Now GREY
GRAY
Now GREY
TN
Now TN
BK
Now BLACK
WHT
Now WHITE
WH
Now WHITE
GR
Now GREEN
RD
Now RED
GY
Now GREY
GY
Now GREY
BL
Now BLUE
BK
Now BLACK
BLU
Now BLUE
WHT
Now WHITE
GRAY
Now GREY
BR
Now BROWN
GY
Now GREY
BK
Now BLACK
BLK
Now BLACK
SILVE
Now SILVER
BK
Now BLACK
GRAY
Now GREY
WH
Now WHITE
GY
Now GREY
SILVE
Now SILVER
BK
Now BLACK
SILVE
Now SILVER
YELLO
Now YELLOW
GRY
Now GREY
WHT
Now WHITE
WHT
Now WHITE
SILVE
Now SILVER
SILVE
Now SILVER
GRAY
Now GREY
SILVE
Now SILVER
SILVE
Now SILVER
GRAY
Now GREY
SILVE
Now SILVER
SILVE
Now SILVER
GRAY
Now GREY
GRY
Now GREY
SILVE
Now SILVER
YELLO
Now YELLOW
BLK
Now BLACK
BL
Now BLUE
BLK
Now BLACK
BL
Now BLUE
GRAY
Now GREY
GRAY
Now GREY
GY
Now GREY
BK
Now BLACK
BK
Now BLACK
GY
Now GREY
GY
Now GREY
GRAY
Now GREY
WH
Now WHITE
GRAY
Now GREY
WH
Now WHITE
WH
Now WHITE
OR
Now ORANGE
BL
Now BLUE
W
Now WHITE
WHT
Now WHITE
GRAY
Now GREY
BK
Now BLACK
SIL
Now SILVER
WHT
Now WHITE
WHT
Now WHITE
WHT
Now WHITE
GRAY
Now GREY
BLK
Now BLACK
BLK
Now BLACK
BLK
Now BLACK
GR
Now GREEN
BK
Now BLACK
WH
Now WHITE
BLK
Now BLACK
YELLO
Now YELLOW
YELLO
Now YELLOW
WHT
Now WHITE
BL
Now BLUE
PURPL
Now PURPLE
G/Y
Now GREY
GY
Now GREY
BK
Now BLACK
WHT
Now WHITE
BRW
Now BROWN
SILVE
Now SILVER
SILVE
Now SILVER
WHT
Now WHITE
GRY
Now GREY
BLK
Now BLACK
GRY
Now GREY
GRY
Now GREY
GRY
Now GREY
BLK
Now BLACK
GRAY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
YELLO
Now YELLOW
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
PURPL
Now PURPLE
SILVE
Now SILVER
GRAY
Now GREY
SILVE
Now SILVER
BLK
Now BLACK
SILVE
Now SILVER
BLK
Now BLACK
SILVE
Now SILVER
BLK
Now BLACK
BLK
Now BLACK
BLK
Now BLACK
GRAY
Now GREY
YELLO
Now YELLOW
WH
Now WHITE
GY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
BLK
Now BLACK
SILVE
Now SILVER
BK
Now BLACK
BLK
Now BLACK
GY
Now GREY
GRAY
Now GREY
BL
Now BLUE
BLK
Now BLACK
SILVE
Now SILVER
BL
Now BLUE
GRY
Now GREY
WH
Now WHITE
RD
Now RED
WH
Now WHITE
BK
Now BLACK
GY
Now GREY
BLK
Now BLACK
WHT
Now WHITE
BL
Now BLUE
WH
Now WHITE
BK
Now BLACK
WHT
Now WHITE
BR
Now BROWN
WHT
Now WHITE
BLK
Now BLACK
GRY
Now GREY
BL
Now BLUE
BK
Now BLACK
GY
Now GREY
WH
Now WHITE
WHT
Now WHITE
WHT
Now WHITE
TN
Now TN
WHT
Now WHITE
WHT
Now WHITE
BLK
Now BLACK
GRY
Now GREY
WHT
Now WHITE
GRY
Now GREY
BK
Now BLACK
WH
Now WHITE
BK
Now BLACK
WH
Now WHITE
GY
Now GREY
G
Now GREEN
BK
Now BLACK
WH
Now WHITE
W
Now WHITE
GY
Now GREY
BK
Now BLACK
YELLO
Now YELLOW
G
Now GREEN
GY
Now GREY
BL
Now BLUE
GY
Now GREY
BLK
Now BLACK
BK
Now BLACK
BK
Now BLACK
GY
Now GREY
GY
Now GREY
WH
Now WHITE
BK
Now BLACK
BR
Now BROWN
WH
Now WHITE
BR
Now BROWN
BL
Now BLUE
GR
Now GREEN
BK
Now BLACK
GL
Now GOLD
GY
Now GREY
SILVE
Now SILVER
GY
Now GREY
BLK
Now BLACK
WHT
Now WHITE
BR
Now BROWN
RD
Now RED
BK
Now BLACK
BK
Now BLACK
GY
Now GREY
BLK
Now BLACK
BR
Now BROWN
BR
Now BROWN
BL
Now BLUE
BLK
Now BLACK
GY
Now GREY
BL
Now BLUE
GY
Now GREY
GY
Now GREY
BK
Now BLACK
BL
Now BLUE
BR
Now BROWN
WHT
Now WHITE
BK
Now BLACK
BLK
Now BLACK
GY
Now GREY
GY
Now GREY
BR
Now BROWN
GL
Now GOLD
BK
Now BLACK
BK
Now BLACK
GY
Now GREY
GY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
YELLO
Now YELLOW
BK
Now BLACK
SILVE
Now SILVER
SILVE
Now SILVER
SILVE
Now SILVER
BK
Now BLACK
WH
Now WHITE
GRAY
Now GREY
GRAY
Now GREY
BK
Now BLACK
BK
Now BLACK
GY
Now GREY
WH
Now WHITE
GY
Now GREY
YW
Now YELLOW
YELLO
Now YELLOW
WH
Now WHITE
BLK
Now BLACK
SILVE
Now SILVER
SILVE
Now SILVER
SILVE
Now SILVER
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
BLK
Now BLACK
BLK
Now BLACK
BK
Now BLACK
SILVE
Now SILVER
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
SILVE
Now SILVER
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
BLK
Now BLACK
BLK
Now BLACK
GRAY
Now GREY
BLK
Now BLACK
SILVE
Now SILVER
SILVE
Now SILVER
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
BK
Now BLACK
GY
Now GREY
BK
Now BLACK
BLK
Now BLACK
GY
Now GREY
RD
Now RED
GY
Now GREY
GRY
Now GREY
WHT
Now WHITE
RD
Now RED
TN
Now TN
GRN
Now GREEN
BK
Now BLACK
GY
Now GREY
GRN
Now GREEN
GRAY
Now GREY
WHT
Now WHITE
BK
Now BLACK
GY
Now GREY
RD
Now RED
WH
Now WHITE
RD
Now RED
BK
Now BLACK
RD
Now RED
GR
Now GREEN
BK
Now BLACK
GR
Now GREEN
GRY
Now GREY
BLK
Now BLACK
WHT
Now WHITE
PUR
Now PURPLE
BLK
Now BLACK
WHT
Now WHITE
BLK
Now BLACK
WHT
Now WHITE
WHT
Now WHITE
GRY
Now GREY
BLK
Now BLACK
GRY
Now GREY
BLK
Now BLACK
GRY
Now GREY
WHT
Now WHITE
WHT
Now WHITE
WH
Now WHITE
BK
Now BLACK
GY
Now GREY
WHT
Now WHITE
GY
Now GREY
BL
Now BLUE
BLK
Now BLACK
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
GRAY
Now GREY
BLK
Now BLACK
SILVE
Now SILVER
SILVE
Now SILVER
SILVE
Now SILVER
SILVE
Now SILVER
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
GRAY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
BLK
Now BLACK
SILVE
Now SILVER
SILVE
Now SILVER
GRAY
Now GREY
SILVE
Now SILVER
SILVE
Now SILVER
YELLO
Now YELLOW
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
GRAY
Now GREY
YELLO
Now YELLOW
G
Now GREEN
GRAY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
GRAY
Now GREY
GRAY
Now GREY
WH
Now WHITE
GRAY
Now GREY
WH
Now WHITE
WH
Now WHITE
GY
Now GREY
GY
Now GREY
GY
Now GREY
BL
Now BLUE
WH
Now WHITE
GY
Now GREY
WH
Now WHITE
GY
Now GREY
BK
Now BLACK
YELLO
Now YELLOW
GRAY
Now GREY
SILVE
Now SILVER
SILVE
Now SILVER
PURPL
Now PURPLE
BLK
Now BLACK
BL
Now BLUE
BLK
Now BLACK
SILVE
Now SILVER
BLK
Now BLACK
GRAY
Now GREY
BL
Now BLUE
BURGA
Now BURGANDY
YELLO
Now YELLOW
GRY
Now GREY
GRAY
Now GREY
SILVE
Now SILVER
Out[767]:
0          GREY
1         WHITE
2           NaN
3         WHITE
4         BROWN
5           RED
6            GN
7         WHITE
8         WHITE
9         BLACK
10         BLUE
11          TAN
12         GOLD
13        WHITE
14        GREEN
15        WHITE
16         GREY
17        GREEN
18        WHITE
19        WHITE
20          TAN
21         GREY
22         GREY
23           TN
24        BLACK
25        WHITE
26        WHITE
27        GREEN
28          RED
29         GREY
         ...   
970       BLACK
971      SILVER
972       BLACK
973        GREY
974        GREY
975        BLUE
976        GOLD
977         RED
978    BURGANDY
979       BLACK
980       WHITE
981       BLACK
982        BLUE
983        GOLD
984      YELLOW
985        BLUE
986        GREY
987        BLUE
988         RED
989       GREEN
990       GREEN
991       BLACK
992        GREY
993        GREY
994        GREY
995       WHITE
996        BLUE
997      SILVER
998       BLACK
999        GREY
Name: Vehicle Color, dtype: object

In [768]:
#check for remaining color abbreviations
for x in df['Vehicle Color']:
    if len(str(x)) <= 2: 
        print(x)


GN
TN
GN
TN
IL
TN

7. Join the data with the Parking Violations Code dataset from the NYC Open Data site.


In [769]:
df2 = pd.read_csv('DOF_Parking_Violation_Codes.csv')

In [770]:
df2


Out[770]:
CODE DEFINITION Manhattan  96th St. & below All Other Areas
0 10 Stopping, standing or parking where a sign, st... $115 $115
1 11 Hotel Loading/Unloading: Standing or parking w... $115 $115
2 12 Snow Emergency: Standing or parking where stan... $95 $95
3 13 Taxi Stand: Standing or parking where standing... $115 $115
4 14 General No Standing: Standing or parking where... $115 $115
5 16 Truck Loading/Unloading: Standing or parking w... $95 $95
6 17 Authorized Vehicles Only: Standing or parking ... $95 $95
7 18 Bus Lane: Standing or parking where standing i... $115 $115
8 19 Bus Stop: Standing or parking where standing i... $115 $115
9 20 General No Parking: No parking where parking i... $65 $60
10 21 Street Cleaning: No parking where parking is n... $65 $45
11 22 Hotel Loading/Unloading: No parking where park... $60 $60
12 23 Taxi Stand: No parking where parking is not al... $65 $60
13 24 Authorized Vehicles Only: No parking where par... $65 $60
14 25 Standing at a commuter van stop, other than te... $115 $115
15 26 Standing at a for-hire vehicle stop, other tha... $115 $115
16 27 No parking in a handicapped zone (off-street o... $180 $180
17 28 Overtime standing (diplomat) $95 $95
18 31 Standing of a non-commercial vehicle in a comm... $115 $115
19 32 Parking at a broken or missing meter for longe... $65 $35
20 33 "Feeding Meter" -- parking in a metered space ... $65 $35
21 34 Expired Meter -- parking in a metered space wh... $65 $35
22 35 Parking in a meter space for the purpose of di... $65 $35
23 37-38 Muni Meter --\n(37) Parking in excess of the a... $65 $35
24 39 Parking for longer than the maximum time permi... $65 $60
25 69 Failing to show a muni-meter receipt, commerci... $65 $65
26 40 Stopping, standing or parking closer than 15 f... $115 $115
27 42 Parking in a Muni Metered space in a commercia... $65 $35
28 43 Parking in a commercial metered zone in which ... $65 $35
29 44 Parking in a commercial metered zone for longe... $65 $35
... ... ... ... ...
54 71 Standing or parking a vehicle without showing ... $65 $65
55 72 Standing or parking a vehicle with NY Plates a... $65 $65
56 73 Standing or parking a vehicle showing an expir... $65 $65
57 74 Standing or parking a vehicle without properly... $65 $65
58 75 Standing or parking a vehicle in which the Lic... $65 $65
59 77 Parking a bus, unless allowed by signs. A char... $65 $45
60 78 Parking a Commercial Vehicle on a residential ... $65 $65
61 79 For a bus without passengers, waiting at a cur... $115 $115
62 80 Standing or parking a vehicle without head lam... $60 $45
63 81 No standing except diplomat $95 $95
64 82 Standing or parking a Commercial Vehicle unles... $115 $115
65 83 Standing or parking a vehicle which is not pro... $65 $65
66 84 Parking a Commercial Vehicle on any city stree... $65 $45
67 85 Parking a Commercial Vehicle more than 3 hours... $65 $65
68 86 Standing or parking a vehicle to make pickups,... $115 $115
69 89 Standing or parking a vehicle in the Garment D... $115 $115
70 91 Parking in order to sell a vehicle by a person... $65 $45
71 92 Parking in order to wash, grease, or repair a ... $65 $45
72 93 Stopping, standing or parking on paved roadway... $65 $65
73 94 Vehicle Release Agreement code associated with... 100\n(Regular Tow, plus violation fine) $200 (Heavy Tow, plus violation fine)
74 96 Standing or parking within 50 feet of the near... $95 $95
75 97 Parking in a vacant lot. A vehicle may be park... $65 $45
76 98 Standing or parking in front of a public or pr... $95 $95
77 99 All other parking, standing or stopping violat... vary vary
78 04 Vehicles parked illegally south of Houston Str... $115 NaN
79 05 Failure to make a right turn from a bus lane. $115 $115
80 06 Parking a tractor-trailer on a residential str... 1st Offense - $265\n2nd Offense - $515 2nd Offense - $265\n2nd Offense - $515
81 07 Vehicles photographed going through a red ligh... $50 $50
82 08 Vehicle idling in a restricted area. $115 $115
83 09 Blocking an Intersection: Obstructing traffic ... $115 $115

84 rows × 4 columns


In [771]:
# change format to string to match violation codes data set format 
df['Violation Code'] = df['Violation Code'].apply(lambda x: str(x).strip())

In [772]:
# check formatting and string length
for x in df['Violation Code']:
    if type(x) != str: 
        print(type(x))
    if len(x) > 2: 
        print(x)

In [773]:
df3 = pd.merge(df, df2, left_on='Violation Code', right_on='CODE', how='left', indicator=True)

In [774]:
# unsuccessful merges bc Violation Code 41 does not exist in Violation codes data.
# and bc 38 in documented at 37-38 in Violation codes data 
df3[df3['_merge']=='left_only']


Out[774]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation CODE DEFINITION Manhattan  96th St. & below All Other Areas _merge
4 1283294187 91648MC NY COM 2013-08-08 41 TRLR GMC P 37240 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
37 1288360800 HYM4303 PA PAS 2013-08-06 41 SDN MITSU P 29090 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
87 1294713218 GFC8215 NY PAS 2013-08-13 38 SUBN CHEVR P 0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
88 1294713220 6417CO CT COM 2013-08-13 38 VAN CHEVR P 0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
217 1321932730 900ZIH CA PAS 2013-08-12 41 SUBN AUDI S 0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
501 1295572450 GBJ3633 NY PAS 2013-07-20 37 SDN NISSA P 0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
581 1309169688 98981 NY COM 2013-08-01 38 VAN DODGE P 34770 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
885 1334596554 JFE4755 PA PAS 2013-08-03 41 SUBN CHEVR X 11710 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
930 1334738531 U3261 99 COM 2013-08-22 6 TRLR STRIC S 73570 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
981 1335087930 GGB3579 NY PAS 2013-07-27 41 SDN HYUND P 73110 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only

10 rows × 48 columns

8. How much money did NYC make off of parking violations?


In [775]:
# removes unsuccessful merges
df4 = df3[df3['_merge'] !='left_only']

In [776]:
# Create boolean variable to indicate 0/1: Parking violation 
df4['Parking Violation'] = df4['DEFINITION'].str.contains('[Pp]arking') | df4['DEFINITION'].str.contains('[Pp]arked')
df4['Parking Violation'].value_counts()
df4[df4['Parking Violation'] == False]


/Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
Out[776]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation CODE DEFINITION Manhattan  96th St. & below All Other Areas _merge Parking Violation
32 1287518126 WMI CT PAS 2013-07-24 31 SUBN ME/BE T 34510 ... NaN NaN NaN NaN 31 Standing of a non-commercial vehicle in a comm... $115 $115 both False
51 6020309034 GEW4836 99 PAS 2013-07-18 94 SDN ACURA X 0 ... NaN NaN NaN NaN 94 Vehicle Release Agreement code associated with... 100\n(Regular Tow, plus violation fine) $200 (Heavy Tow, plus violation fine) both False
57 1293094389 GDS9810 NJ PAS 2013-06-20 31 SDN BMW P 24890 ... NaN NaN NaN NaN 31 Standing of a non-commercial vehicle in a comm... $115 $115 both False
378 1333131951 947WT5 MA PAS 2013-07-22 31 SDN NISSA T 17790 ... NaN NaN NaN NaN 31 Standing of a non-commercial vehicle in a comm... $115 $115 both False
384 1333509005 447ZRW CT PAS 2013-07-20 31 SDN FORD T 34570 ... NaN NaN NaN NaN 31 Standing of a non-commercial vehicle in a comm... $115 $115 both False
390 6020310784 447ZRW CT PAS 2013-07-20 94 SDN FORD X 0 ... NaN NaN NaN NaN 94 Vehicle Release Agreement code associated with... 100\n(Regular Tow, plus violation fine) $200 (Heavy Tow, plus violation fine) both False
402 1333711438 ZTD55E NJ PAS 2013-07-06 31 SDN CHEVR T 34570 ... NaN NaN NaN NaN 31 Standing of a non-commercial vehicle in a comm... $115 $115 both False
520 6020314583 87336JV PA PAS 2013-07-26 94 SDN TOYOT X 0 ... NaN NaN NaN NaN 94 Vehicle Release Agreement code associated with... 100\n(Regular Tow, plus violation fine) $200 (Heavy Tow, plus violation fine) both False
625 6020342748 HWD923 SC PAS 2013-07-22 94 SUBN KIA X 0 ... NaN NaN NaN NaN 94 Vehicle Release Agreement code associated with... 100\n(Regular Tow, plus violation fine) $200 (Heavy Tow, plus violation fine) both False
746 1333717192 402YBY CT PAS 2013-07-13 31 SDN NISSA T 34570 ... NaN NaN NaN NaN 31 Standing of a non-commercial vehicle in a comm... $115 $115 both False
747 1333717209 H77CXW NJ PAS 2013-07-13 31 SDN HONDA T 34570 ... NaN NaN NaN NaN 31 Standing of a non-commercial vehicle in a comm... $115 $115 both False

11 rows × 49 columns


In [869]:
# df4.groupby('Parking Violation')[''].sum()[True]
df4['Street Name'].value_counts()
above_96_lst=[]
for x in df4['Street Name']: 
    if re.search(r'\S \d \S\w ', str(x)):
        above_96 =  False
        above_96_lst.append(above_96)
    elif re.search(r'\d[[Aa]-[Zz]][[Aa]-[Zz]] ', str(x)):
        above_96 =  False
        above_96_lst.append(above_96)
#         print(x, above_96)
    elif re.search(r'\w\s\d\d\s', str(x)):
        above_96 =  False
        above_96_lst.append(above_96)
#         print(x, above_96)
    elif re.search(r'[A-Z]\s\d\d[A-Z][A-Z] [A-Z][A-Z]', str(x)):
        above_96 =  False
        above_96_lst.append(above_96)
#         print(x, above_96)
    elif re.search(r'[0-2][A-Z][A-Z] AVE', str(x)):
        above_96 =  False
        above_96_lst.append(above_96)
#         print(x, above_96)
    else: 
        above_96 =  True
        above_96_lst.append(above_96)
#         print(x, above_96)

# above_96_lst.count(False)
df4['Above_96'] = above_96_lst


/Users/mercyemelike/.virtualenvs/data_analysis/lib/python3.5/site-packages/ipykernel/__main__.py:30: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

In [875]:
df4 .h


Out[875]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... No Standing or Stopping Violation Hydrant Violation Double Parking Violation CODE DEFINITION Manhattan  96th St. & below All Other Areas _merge Parking Violation Above_96
0 1283294138 GBB9093 NY PAS 2013-08-04 46 SUBN AUDI P 37250 ... NaN NaN NaN 46 Standing or parking on the roadway side of a v... $115 $115 both True True
1 1283294151 62416MB NY COM 2013-08-04 46 VAN FORD P 37290 ... NaN NaN NaN 46 Standing or parking on the roadway side of a v... $115 $115 both True True
2 1283294163 78755JZ NY COM 2013-08-05 46 P-U CHEVR P 37030 ... NaN NaN NaN 46 Standing or parking on the roadway side of a v... $115 $115 both True True
3 1283294175 63009MA NY COM 2013-08-05 46 VAN FORD P 37270 ... NaN NaN NaN 46 Standing or parking on the roadway side of a v... $115 $115 both True True
5 1283294217 T60DAR NJ PAS 2013-08-11 14 P-U DODGE P 37250 ... NaN NaN NaN 14 General No Standing: Standing or parking where... $115 $115 both True True
6 1283294229 GCR2838 NY PAS 2013-08-11 14 VAN NaN P 37250 ... NaN NaN NaN 14 General No Standing: Standing or parking where... $115 $115 both True True
7 1283983620 XZ764G NJ PAS 2013-08-07 24 DELV FORD X 63430 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
8 1283983631 GBH9379 NY PAS 2013-08-07 24 SDN TOYOT X 63430 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
9 1283983667 MCL78B NJ PAS 2013-07-18 24 SDN SUBAR H 0 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
10 1283983679 M367CN NY PAS 2013-07-18 24 SDN HYUND H 0 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
11 1283983734 GAR6813 NY PAS 2013-07-18 24 SDN TOYOT H 0 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
12 1283983771 GEN8674 NY PAS 2013-07-31 24 SDN AUDI X 0 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
13 1283983825 GAC2703 NY PAS 2013-08-12 24 SDN NISSA X 23230 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
14 1286036800 40793JY NY COM 2013-07-05 14 VAN CHEVR P 34190 ... NaN NaN NaN 14 General No Standing: Standing or parking where... $115 $115 both True False
15 1286123550 GAD1485 NY PAS 2013-08-12 20 SDN VOLKS T 28930 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
16 1286246398 GFC5338 NY PAS 2013-07-26 14 SDN TOYOT T 0 ... NaN NaN NaN 14 General No Standing: Standing or parking where... $115 $115 both True True
17 1286246416 815M342 MD PAS 2013-07-30 20 SUBN SATUR T 0 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
18 1286248000 GJA3452 NY PAS 2013-07-23 14 SDN KIA T 73690 ... NaN NaN NaN 14 General No Standing: Standing or parking where... $115 $115 both True True
19 1286282330 YZY6476 NC PAS 2013-07-29 20 SDN NISSA T 32030 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
20 1286282342 WBJ819 LA PAS 2013-08-07 17 SUBN HONDA T 0 ... NaN NaN NaN 17 Authorized Vehicles Only: Standing or parking ... $95 $95 both True True
21 1286289841 GAV9235 NY PAS 2013-07-20 50 SDN HONDA T 0 ... NaN NaN NaN 50 Stopping, standing or parking in a crosswalk. ... $115 $115 both True True
22 1286654920 ZTR66R NJ PAS 2013-07-18 50 SDN N/S T 0 ... NaN NaN NaN 50 Stopping, standing or parking in a crosswalk. ... $115 $115 both True True
23 1286799648 GDE3973 NY PAS 2013-07-20 40 SDN TOYOT T 0 ... NaN NaN NaN 40 Stopping, standing or parking closer than 15 f... $115 $115 both True True
24 1286807062 P193871 IL PAS 2013-08-10 40 SDN SCIO T 26630 ... NaN NaN NaN 40 Stopping, standing or parking closer than 15 f... $115 $115 both True True
25 1286807475 GGL6608 NY PAS 2013-07-26 40 SDN NISSA T 73650 ... NaN NaN NaN 40 Stopping, standing or parking closer than 15 f... $115 $115 both True True
26 1286807633 84301JU NY COM 2013-07-19 14 P-U DODGE X 63030 ... NaN NaN NaN 14 General No Standing: Standing or parking where... $115 $115 both True True
27 1286808807 HHG3545 PA PAS 2013-08-06 20 SDN TOYOT T 21130 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
28 1287320491 50097JX NY COM 2013-08-07 40 VAN MERCU T 23190 ... NaN NaN NaN 40 Stopping, standing or parking closer than 15 f... $115 $115 both True True
29 1287320570 GEJ9154 NY PAS 2013-08-06 51 VAN TOYOT X 93230 ... NaN NaN NaN 51 Stopping, standing or parking on a sidewalk. $115 $115 both True True
30 1287517304 6B22924 ID PAS 2013-07-22 20 SDN VOLKS T 36250 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
969 1334886817 T600324C NY SRF 2013-07-12 17 VAN FORD X 37290 ... NaN NaN NaN 17 Authorized Vehicles Only: Standing or parking ... $95 $95 both True True
970 1334893391 GEF8039 NY PAS 2013-08-08 14 SDN HONDA X 0 ... NaN NaN NaN 14 General No Standing: Standing or parking where... $115 $115 both True True
971 1334899332 GAR5973 NY PAS 2013-05-28 20 SUBN FORD H 24240 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
972 1334899356 76860 NY MED 2013-07-11 20 SUBN TOYOT H 24240 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
973 1334900190 G44DBB NJ PAS 2013-06-06 20 SDN FORD H 24240 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
974 1334960409 GCP7978 NY PAS 2013-07-16 20 SDN ME/BE P 31820 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
975 1334965547 T606011C NY SRF 2013-08-18 40 SDN LINCO P 68720 ... NaN NaN NaN 40 Stopping, standing or parking closer than 15 f... $115 $115 both True True
976 1335085889 GEB4077 NY PAS 2013-08-05 24 SUBN BMW P 58730 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
977 1335086067 408861 RI PAS 2013-08-04 24 SDN MAZDA P 10810 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
978 1335087308 505GSB OK PAS 2013-08-21 20 SDN NISSA P 63030 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
979 1335087436 GGF7010 NY PAS 2013-08-14 24 SUBN JEEP P 69230 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
980 1335087618 XH208F 99 COM 2013-07-27 20 VAN FORD P 69230 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
982 1335089433 49SD67 NY MOT 2013-08-05 51 MC YAMAH P 48830 ... NaN NaN NaN 51 Stopping, standing or parking on a sidewalk. $115 $115 both True True
983 1335089445 GDM5570 NY PAS 2013-08-20 20 SUBN FORD P 95030 ... NaN NaN NaN 20 General No Parking: No parking where parking i... $65 $60 both True True
984 1335089731 55RG48 NY MOT 2013-08-17 51 MOTO HONDA P 28430 ... NaN NaN NaN 51 Stopping, standing or parking on a sidewalk. $115 $115 both True True
985 1335090538 GDF7589 NY PAS 2013-07-21 24 SUBN BMW P 0 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
986 1335091440 PC613253 99 PAS 2013-07-27 24 SUBN CHEVR P 39930 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
987 1335092730 82GN94 NY MOT 2013-08-17 51 MOTO KAWAS P 34630 ... NaN NaN NaN 51 Stopping, standing or parking on a sidewalk. $115 $115 both True True
988 1335092742 47RE79 NY MOT 2013-08-17 51 MOTO KAWAS P 34630 ... NaN NaN NaN 51 Stopping, standing or parking on a sidewalk. $115 $115 both True True
989 1335092754 78GX62 NY MOT 2013-08-17 51 MOTO KAWAS P 34630 ... NaN NaN NaN 51 Stopping, standing or parking on a sidewalk. $115 $115 both True True
990 1335092766 49SD33 NY MOT 2013-08-20 51 MOTO KAWAS P 23930 ... NaN NaN NaN 51 Stopping, standing or parking on a sidewalk. $115 $115 both True True
991 1335092778 GDU8731 NY PAS 2013-08-21 24 SDN INFIN P 0 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True False
992 1335093072 GEN7535 NY PAS 2013-07-23 24 SDN NISSA P 56230 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
993 1335093199 T529952C NY SRF 2013-08-01 50 SDN LINCO P 0 ... NaN NaN NaN 50 Stopping, standing or parking in a crosswalk. ... $115 $115 both True True
994 1335093205 GBJ4073 NY PAS 2013-08-01 14 SUBN FORD P 56230 ... NaN NaN NaN 14 General No Standing: Standing or parking where... $115 $115 both True True
995 1335095342 JHA1050 PA PAS 2013-07-21 40 SUBN GMC P 19830 ... NaN NaN NaN 40 Stopping, standing or parking closer than 15 f... $115 $115 both True True
996 1335096139 72416R VA PAS 2013-08-11 24 SUBN ME/BE P 34330 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
997 1335096140 HPJ538 MS PAS 2013-08-11 24 SDN HONDA P 34330 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
998 1335096152 HYW9707 PA PAS 2013-08-19 24 SUBN VOLKS P 34330 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True
999 1335096395 HZC1957 PA PAS 2013-08-20 24 VAN HONDA P 68630 ... NaN NaN NaN 24 Authorized Vehicles Only: No parking where par... $65 $60 both True True

990 rows × 50 columns

9. What's the most lucrative kind of parking violation? The most frequent?

10. New Jersey has bad drivers, but does it have bad parkers, too? How much money does NYC make off of all non-New York vehicles?

11. Make a chart of the top few.

12. What time of day do people usually get their tickets? You can break the day up into several blocks - for example 12am-6am, 6am-12pm, 12pm-6pm, 6pm-12am.

13. What's the average ticket cost in NYC?

14. Make a graph of the number of tickets per day.

15. Make a graph of the amount of revenue collected per day.

16. Manually construct a dataframe out of https://dmv.ny.gov/statistic/2015licinforce-web.pdf (only NYC boroughts - bronx, queens, manhattan, staten island, brooklyn), having columns for borough name, abbreviation, and number of licensed drivers.

17. What's the parking-ticket-$-per-licensed-driver in each borough of NYC? Do this with pandas and the dataframe you just made, not with your head!


In [ ]: