1. I want to make sure my Plate ID is a string. Can't lose the leading zeroes!
  2. I don't think anyone's car was built in 0AD. Discard the '0's as NaN.
  3. I want the dates to be dates! Read the read_csv documentation to find out how to make pandas automatically parse dates.
  4. "Date first observed" is a pretty weird column, but it seems like it has a date hiding inside. Using a function with .apply, transform the string (e.g. "20140324") into a Python date. Make the 0's show up as NaN.
  5. "Violation time" is... not a time. Make it a time.
  6. There sure are a lot of colors of cars, too bad so many of them are the same. Make "BLK" and "BLACK", "WT" and "WHITE", and any other combinations that you notice.
  7. Join the data with the Parking Violations Code dataset from the NYC Open Data site.
  8. How much money did NYC make off of parking violations?
  9. What's the most lucrative kind of parking violation? The most frequent?
  10. New Jersey has bad drivers, but does it have bad parkers, too? How much money does NYC make off of all non-New York vehicles?
  11. Make a chart of the top few.
  12. What time of day do people usually get their tickets? You can break the day up into several blocks - for example 12am-6am, 6am-12pm, 12pm-6pm, 6pm-12am.
  13. What's the average ticket cost in NYC?
  14. Make a graph of the number of tickets per day.
  15. Make a graph of the amount of revenue collected per day.
  16. Manually construct a dataframe out of https://dmv.ny.gov/statistic/2015licinforce-web.pdf (only NYC boroughts - bronx, queens, manhattan, staten island, brooklyn), having columns for borough name, abbreviation, and number of licensed drivers.
  17. What's the parking-ticket-$-per-licensed-driver in each borough of NYC? Do this with pandas and the dataframe you just made, not with your head!

In [1]:
import dateutils

In [2]:
import dateutil.parser

In [3]:
import pandas as pd

In [4]:
parking_df = pd.read_csv("small-violations.csv")
parking_df


Out[4]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Vehicle Color Unregistered Vehicle? Vehicle Year Meter Number Feet From Curb Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... WH 0 2012 - 0 NaN NaN NaN NaN NaN
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... NaN 0 0 - 0 NaN NaN NaN NaN NaN
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... WH 0 2010 - 0 NaN NaN NaN NaN NaN
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... BR 0 2012 - 0 NaN NaN NaN NaN NaN
5 1283294217 T60DAR NJ PAS 08/11/2013 14 P-U DODGE P 37250 ... RD 0 0 - 0 NaN NaN NaN NaN NaN
6 1283294229 GCR2838 NY PAS 08/11/2013 14 VAN NaN P 37250 ... GN 0 2011 - 0 NaN NaN NaN NaN NaN
7 1283983620 XZ764G NJ PAS 08/07/2013 24 DELV FORD X 63430 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
8 1283983631 GBH9379 NY PAS 08/07/2013 24 SDN TOYOT X 63430 ... WHITE 0 2001 - 0 NaN NaN NaN NaN NaN
9 1283983667 MCL78B NJ PAS 07/18/2013 24 SDN SUBAR H 0 ... BLACK 0 2005 - 0 NaN NaN NaN NaN NaN
10 1283983679 M367CN NY PAS 07/18/2013 24 SDN HYUND H 0 ... BLUE 0 2010 - 0 NaN NaN NaN NaN NaN
11 1283983734 GAR6813 NY PAS 07/18/2013 24 SDN TOYOT H 0 ... TAN 0 1998 - 0 NaN NaN NaN NaN NaN
12 1283983771 GEN8674 NY PAS 07/31/2013 24 SDN AUDI X 0 ... GOLD 0 0 - 0 NaN NaN NaN NaN NaN
13 1283983825 GAC2703 NY PAS 08/12/2013 24 SDN NISSA X 23230 ... WHITE 0 2012 - 0 NaN NaN NaN NaN NaN
14 1286036800 40793JY NY COM 07/05/2013 14 VAN CHEVR P 34190 ... GREEN 0 1995 - 0 NaN NaN NaN NaN NaN
15 1286123550 GAD1485 NY PAS 08/12/2013 20 SDN VOLKS T 28930 ... WHITE 0 2012 - 0 NaN NaN NaN NaN NaN
16 1286246398 GFC5338 NY PAS 07/26/2013 14 SDN TOYOT T 0 ... GRAY 0 2010 - 0 NaN NaN NaN NaN NaN
17 1286246416 815M342 MD PAS 07/30/2013 20 SUBN SATUR T 0 ... GREEN 0 0 - 0 NaN NaN NaN NaN NaN
18 1286248000 GJA3452 NY PAS 07/23/2013 14 SDN KIA T 73690 ... WH 0 2013 - 0 NaN NaN NaN NaN NaN
19 1286282330 YZY6476 NC PAS 07/29/2013 20 SDN NISSA T 32030 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
20 1286282342 WBJ819 LA PAS 08/07/2013 17 SUBN HONDA T 0 ... TAN 0 0 - 0 NaN NaN NaN NaN NaN
21 1286289841 GAV9235 NY PAS 07/20/2013 50 SDN HONDA T 0 ... GY 0 2012 - 0 NaN NaN NaN NaN NaN
22 1286654920 ZTR66R NJ PAS 07/18/2013 50 SDN N/S T 0 ... GRAY 0 0 - 0 NaN NaN NaN NaN NaN
23 1286799648 GDE3973 NY PAS 07/20/2013 40 SDN TOYOT T 0 ... TN 0 2003 - 9 NaN NaN NaN NaN NaN
24 1286807062 P193871 IL PAS 08/10/2013 40 SDN SCIO T 26630 ... BK 0 0 - 6 NaN NaN NaN NaN NaN
25 1286807475 GGL6608 NY PAS 07/26/2013 40 SDN NISSA T 73650 ... WHT 0 2013 - 1 NaN NaN NaN NaN NaN
26 1286807633 84301JU NY COM 07/19/2013 14 P-U DODGE X 63030 ... WH 0 2001 - 0 NaN NaN NaN NaN NaN
27 1286808807 HHG3545 PA PAS 08/06/2013 20 SDN TOYOT T 21130 ... GR 0 0 - 0 NaN NaN NaN NaN NaN
28 1287320491 50097JX NY COM 08/07/2013 40 VAN MERCU T 23190 ... RD 0 2003 - 0 NaN NaN NaN NaN NaN
29 1287320570 GEJ9154 NY PAS 08/06/2013 51 VAN TOYOT X 93230 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9969 1349992501 GFZ7651 NY PAS 07/23/2013 71 SDN TOYOT P 25870 ... GREY 0 2007 - 0 NaN NaN NaN NaN NaN
9970 1349992975 LBS219 TX PAS 08/13/2013 91 SUBN FORD P 0 ... BK 0 1996 - 0 NaN NaN NaN NaN NaN
9971 1349993244 RG2C42 TX PAS 07/16/2013 78 TRAC PETER X 20440 ... BLUE 0 2004 - 0 NaN NaN NaN NaN NaN
9972 1349993270 GAA1625 NY PAS 07/16/2013 71 SDN HYUND X 26040 ... BLUE 0 2013 - 0 NaN NaN NaN NaN NaN
9973 1349993335 ZEB4706 NY COM 08/15/2013 78 VAN FORD P 63290 ... WHITE 0 2006 - 0 NaN NaN NaN NaN NaN
9974 1349993463 GFD7394 NY PAS 08/13/2013 74 SUBN ROVER P 25340 ... BLK 0 2008 - 0 NaN NaN NaN NaN NaN
9975 1349993505 MSVEE NY PAS 08/07/2013 71 SUBN ME/BE P 25440 ... BLK 0 2002 - 0 NaN NaN NaN NaN NaN
9976 1349993682 73143JK NY COM 07/31/2013 78 VAN FORD P 53890 ... BLK 0 1999 - 0 NaN NaN NaN NaN NaN
9977 1349993797 GDP1969 NY PAS 07/27/2013 71 SUBN FORD P 25870 ... BLACK 0 0 - 0 NaN NaN NaN NaN NaN
9978 1349994250 GHL3117 NY PAS 07/17/2013 98 SDN NISSA X 22390 ... GOLD 0 0 - 0 NaN NaN NaN NaN NaN
9979 1349994406 GEF7027 NY PAS 07/15/2013 52 SDN NISSA X 0 ... GRY 0 2003 - 0 NaN NaN NaN NaN NaN
9980 1349994571 EUX4225 NY PAS 07/21/2013 40 SDN NISSA P 0 ... BLUE 0 2003 - 0 NaN NaN NaN NaN NaN
9981 1349994716 GHF9746 NY PAS 08/10/2013 40 SDN HONDA P 0 ... BLACK 0 2013 - 2 NaN NaN NaN NaN NaN
9982 1349994741 HX55885 NY PAS 08/09/2013 70 SDN VOLKS P 21290 ... GOLD 0 1974 - 0 NaN NaN NaN NaN NaN
9983 1349994753 HX55885 NY PAS 08/09/2013 71 SDN VOLKS P 21290 ... GOLD 0 1974 - 0 NaN NaN NaN NaN NaN
9984 1349994807 TME33R NJ PAS 08/09/2013 78 TRLR NaN P 0 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
9985 1349994819 98166JT 99 COM 08/09/2013 78 VAN FORD P 0 ... WHITE 0 2008 - 0 NaN NaN NaN NaN NaN
9986 1349995514 EUE8098 99 PAS 07/23/2013 27 SUBN MERCU P 0 ... GRN 0 1999 - 0 NaN NaN NaN NaN NaN
9987 1349995526 54726MC NY COM 07/23/2013 78 VAN CHEVR P 0 ... WH 0 2008 - 0 NaN NaN NaN NaN NaN
9988 1349995538 7AB6161 MD PAS 07/23/2013 78 SUBN GMC P 0 ... GRY 0 0 - 0 NaN NaN NaN NaN NaN
9989 1349995551 66247MC NY COM 07/25/2013 78 DUMP MACK P 44690 ... RD 0 2001 - 0 NaN NaN NaN NaN NaN
9990 1349995654 GBZ2160 NY PAS 08/14/2013 71 SDN NISSA P 24490 ... BLACK 0 1999 - 0 NaN NaN NaN NaN NaN
9991 1349995666 HST2419 PA PAS 08/14/2013 40 SDN BMW P 24490 ... BLK 0 0 - 10 NaN NaN NaN NaN NaN
9992 1349995678 GHL3189 NY PAS 08/04/2013 71 SUBN CHRYS P 23090 ... BLUE 0 2009 - 0 NaN NaN NaN NaN NaN
9993 1349995691 GAX3159 NY PAS 08/04/2013 71 SDN NISSA P 23090 ... GY 0 2006 - 0 NaN NaN NaN NaN NaN
9994 1349995721 GBJ4159 NY PAS 07/27/2013 71 SDN TOYOT P 25935 ... BLUE 0 2004 - 0 NaN NaN NaN NaN NaN
9995 1349995733 JDD3205 PA PAS 07/13/2013 40 SDN NISSA P 24740 ... GY 0 0 - 1 NaN NaN NaN NaN NaN
9996 1349995745 GFD4344 NY PAS 07/13/2013 40 SDN CHRYS P 29090 ... GY 0 2013 - 5 NaN NaN NaN NaN NaN
9997 1349995836 GCG2966 NY PAS 08/07/2013 74 SDN NISSA P 24090 ... BLACK 0 2005 - 0 NaN NaN NaN NaN NaN
9998 1349995850 GGH5393 NY PAS 08/04/2013 74 SDN HYUND P 0 ... RED 0 1998 - 0 NaN NaN NaN NaN NaN

9999 rows × 43 columns


In [5]:
parking_df.dtypes


Out[5]:
Summons Number                         int64
Plate ID                              object
Registration State                    object
Plate Type                            object
Issue Date                            object
Violation Code                         int64
Vehicle Body Type                     object
Vehicle Make                          object
Issuing Agency                        object
Street Code1                           int64
Street Code2                           int64
Street Code3                           int64
Vehicle Expiration Date                int64
Violation Location                   float64
Violation Precinct                     int64
Issuer Precinct                        int64
Issuer Code                            int64
Issuer Command                        object
Issuer Squad                           int64
Violation Time                        object
Time First Observed                   object
Violation County                      object
Violation In Front Of Or Opposite     object
House Number                          object
Street Name                           object
Intersecting Street                   object
Date First Observed                    int64
Law Section                            int64
Sub Division                          object
Violation Legal Code                 float64
Days Parking In Effect                object
From Hours In Effect                  object
To Hours In Effect                    object
Vehicle Color                         object
Unregistered Vehicle?                  int64
Vehicle Year                           int64
Meter Number                          object
Feet From Curb                         int64
Violation Post Code                  float64
Violation Description                float64
No Standing or Stopping Violation    float64
Hydrant Violation                    float64
Double Parking Violation             float64
dtype: object

In [6]:
import datetime

In [7]:
parking_df.head()['Issue Date'].astype(datetime.datetime)


Out[7]:
0    08/04/2013
1    08/04/2013
2    08/05/2013
3    08/05/2013
4    08/08/2013
Name: Issue Date, dtype: object

In [8]:
import pandas as pd
parking_df = pd.read_csv("small-violations.csv")
parking_df


Out[8]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Vehicle Color Unregistered Vehicle? Vehicle Year Meter Number Feet From Curb Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... WH 0 2012 - 0 NaN NaN NaN NaN NaN
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... NaN 0 0 - 0 NaN NaN NaN NaN NaN
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... WH 0 2010 - 0 NaN NaN NaN NaN NaN
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... BR 0 2012 - 0 NaN NaN NaN NaN NaN
5 1283294217 T60DAR NJ PAS 08/11/2013 14 P-U DODGE P 37250 ... RD 0 0 - 0 NaN NaN NaN NaN NaN
6 1283294229 GCR2838 NY PAS 08/11/2013 14 VAN NaN P 37250 ... GN 0 2011 - 0 NaN NaN NaN NaN NaN
7 1283983620 XZ764G NJ PAS 08/07/2013 24 DELV FORD X 63430 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
8 1283983631 GBH9379 NY PAS 08/07/2013 24 SDN TOYOT X 63430 ... WHITE 0 2001 - 0 NaN NaN NaN NaN NaN
9 1283983667 MCL78B NJ PAS 07/18/2013 24 SDN SUBAR H 0 ... BLACK 0 2005 - 0 NaN NaN NaN NaN NaN
10 1283983679 M367CN NY PAS 07/18/2013 24 SDN HYUND H 0 ... BLUE 0 2010 - 0 NaN NaN NaN NaN NaN
11 1283983734 GAR6813 NY PAS 07/18/2013 24 SDN TOYOT H 0 ... TAN 0 1998 - 0 NaN NaN NaN NaN NaN
12 1283983771 GEN8674 NY PAS 07/31/2013 24 SDN AUDI X 0 ... GOLD 0 0 - 0 NaN NaN NaN NaN NaN
13 1283983825 GAC2703 NY PAS 08/12/2013 24 SDN NISSA X 23230 ... WHITE 0 2012 - 0 NaN NaN NaN NaN NaN
14 1286036800 40793JY NY COM 07/05/2013 14 VAN CHEVR P 34190 ... GREEN 0 1995 - 0 NaN NaN NaN NaN NaN
15 1286123550 GAD1485 NY PAS 08/12/2013 20 SDN VOLKS T 28930 ... WHITE 0 2012 - 0 NaN NaN NaN NaN NaN
16 1286246398 GFC5338 NY PAS 07/26/2013 14 SDN TOYOT T 0 ... GRAY 0 2010 - 0 NaN NaN NaN NaN NaN
17 1286246416 815M342 MD PAS 07/30/2013 20 SUBN SATUR T 0 ... GREEN 0 0 - 0 NaN NaN NaN NaN NaN
18 1286248000 GJA3452 NY PAS 07/23/2013 14 SDN KIA T 73690 ... WH 0 2013 - 0 NaN NaN NaN NaN NaN
19 1286282330 YZY6476 NC PAS 07/29/2013 20 SDN NISSA T 32030 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
20 1286282342 WBJ819 LA PAS 08/07/2013 17 SUBN HONDA T 0 ... TAN 0 0 - 0 NaN NaN NaN NaN NaN
21 1286289841 GAV9235 NY PAS 07/20/2013 50 SDN HONDA T 0 ... GY 0 2012 - 0 NaN NaN NaN NaN NaN
22 1286654920 ZTR66R NJ PAS 07/18/2013 50 SDN N/S T 0 ... GRAY 0 0 - 0 NaN NaN NaN NaN NaN
23 1286799648 GDE3973 NY PAS 07/20/2013 40 SDN TOYOT T 0 ... TN 0 2003 - 9 NaN NaN NaN NaN NaN
24 1286807062 P193871 IL PAS 08/10/2013 40 SDN SCIO T 26630 ... BK 0 0 - 6 NaN NaN NaN NaN NaN
25 1286807475 GGL6608 NY PAS 07/26/2013 40 SDN NISSA T 73650 ... WHT 0 2013 - 1 NaN NaN NaN NaN NaN
26 1286807633 84301JU NY COM 07/19/2013 14 P-U DODGE X 63030 ... WH 0 2001 - 0 NaN NaN NaN NaN NaN
27 1286808807 HHG3545 PA PAS 08/06/2013 20 SDN TOYOT T 21130 ... GR 0 0 - 0 NaN NaN NaN NaN NaN
28 1287320491 50097JX NY COM 08/07/2013 40 VAN MERCU T 23190 ... RD 0 2003 - 0 NaN NaN NaN NaN NaN
29 1287320570 GEJ9154 NY PAS 08/06/2013 51 VAN TOYOT X 93230 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9969 1349992501 GFZ7651 NY PAS 07/23/2013 71 SDN TOYOT P 25870 ... GREY 0 2007 - 0 NaN NaN NaN NaN NaN
9970 1349992975 LBS219 TX PAS 08/13/2013 91 SUBN FORD P 0 ... BK 0 1996 - 0 NaN NaN NaN NaN NaN
9971 1349993244 RG2C42 TX PAS 07/16/2013 78 TRAC PETER X 20440 ... BLUE 0 2004 - 0 NaN NaN NaN NaN NaN
9972 1349993270 GAA1625 NY PAS 07/16/2013 71 SDN HYUND X 26040 ... BLUE 0 2013 - 0 NaN NaN NaN NaN NaN
9973 1349993335 ZEB4706 NY COM 08/15/2013 78 VAN FORD P 63290 ... WHITE 0 2006 - 0 NaN NaN NaN NaN NaN
9974 1349993463 GFD7394 NY PAS 08/13/2013 74 SUBN ROVER P 25340 ... BLK 0 2008 - 0 NaN NaN NaN NaN NaN
9975 1349993505 MSVEE NY PAS 08/07/2013 71 SUBN ME/BE P 25440 ... BLK 0 2002 - 0 NaN NaN NaN NaN NaN
9976 1349993682 73143JK NY COM 07/31/2013 78 VAN FORD P 53890 ... BLK 0 1999 - 0 NaN NaN NaN NaN NaN
9977 1349993797 GDP1969 NY PAS 07/27/2013 71 SUBN FORD P 25870 ... BLACK 0 0 - 0 NaN NaN NaN NaN NaN
9978 1349994250 GHL3117 NY PAS 07/17/2013 98 SDN NISSA X 22390 ... GOLD 0 0 - 0 NaN NaN NaN NaN NaN
9979 1349994406 GEF7027 NY PAS 07/15/2013 52 SDN NISSA X 0 ... GRY 0 2003 - 0 NaN NaN NaN NaN NaN
9980 1349994571 EUX4225 NY PAS 07/21/2013 40 SDN NISSA P 0 ... BLUE 0 2003 - 0 NaN NaN NaN NaN NaN
9981 1349994716 GHF9746 NY PAS 08/10/2013 40 SDN HONDA P 0 ... BLACK 0 2013 - 2 NaN NaN NaN NaN NaN
9982 1349994741 HX55885 NY PAS 08/09/2013 70 SDN VOLKS P 21290 ... GOLD 0 1974 - 0 NaN NaN NaN NaN NaN
9983 1349994753 HX55885 NY PAS 08/09/2013 71 SDN VOLKS P 21290 ... GOLD 0 1974 - 0 NaN NaN NaN NaN NaN
9984 1349994807 TME33R NJ PAS 08/09/2013 78 TRLR NaN P 0 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
9985 1349994819 98166JT 99 COM 08/09/2013 78 VAN FORD P 0 ... WHITE 0 2008 - 0 NaN NaN NaN NaN NaN
9986 1349995514 EUE8098 99 PAS 07/23/2013 27 SUBN MERCU P 0 ... GRN 0 1999 - 0 NaN NaN NaN NaN NaN
9987 1349995526 54726MC NY COM 07/23/2013 78 VAN CHEVR P 0 ... WH 0 2008 - 0 NaN NaN NaN NaN NaN
9988 1349995538 7AB6161 MD PAS 07/23/2013 78 SUBN GMC P 0 ... GRY 0 0 - 0 NaN NaN NaN NaN NaN
9989 1349995551 66247MC NY COM 07/25/2013 78 DUMP MACK P 44690 ... RD 0 2001 - 0 NaN NaN NaN NaN NaN
9990 1349995654 GBZ2160 NY PAS 08/14/2013 71 SDN NISSA P 24490 ... BLACK 0 1999 - 0 NaN NaN NaN NaN NaN
9991 1349995666 HST2419 PA PAS 08/14/2013 40 SDN BMW P 24490 ... BLK 0 0 - 10 NaN NaN NaN NaN NaN
9992 1349995678 GHL3189 NY PAS 08/04/2013 71 SUBN CHRYS P 23090 ... BLUE 0 2009 - 0 NaN NaN NaN NaN NaN
9993 1349995691 GAX3159 NY PAS 08/04/2013 71 SDN NISSA P 23090 ... GY 0 2006 - 0 NaN NaN NaN NaN NaN
9994 1349995721 GBJ4159 NY PAS 07/27/2013 71 SDN TOYOT P 25935 ... BLUE 0 2004 - 0 NaN NaN NaN NaN NaN
9995 1349995733 JDD3205 PA PAS 07/13/2013 40 SDN NISSA P 24740 ... GY 0 0 - 1 NaN NaN NaN NaN NaN
9996 1349995745 GFD4344 NY PAS 07/13/2013 40 SDN CHRYS P 29090 ... GY 0 2013 - 5 NaN NaN NaN NaN NaN
9997 1349995836 GCG2966 NY PAS 08/07/2013 74 SDN NISSA P 24090 ... BLACK 0 2005 - 0 NaN NaN NaN NaN NaN
9998 1349995850 GGH5393 NY PAS 08/04/2013 74 SDN HYUND P 0 ... RED 0 1998 - 0 NaN NaN NaN NaN NaN

9999 rows × 43 columns

1. I want to make sure my Plate ID is a string. Can't lose the leading zeroes!


In [9]:
col_plateid = { 'Plate ID': 'str',  }
violations_df = pd.read_csv("small-violations.csv", dtype=col_plateid)
violations_df.head(20)


Out[9]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Vehicle Color Unregistered Vehicle? Vehicle Year Meter Number Feet From Curb Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... WH 0 2012 - 0 NaN NaN NaN NaN NaN
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... NaN 0 0 - 0 NaN NaN NaN NaN NaN
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... WH 0 2010 - 0 NaN NaN NaN NaN NaN
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... BR 0 2012 - 0 NaN NaN NaN NaN NaN
5 1283294217 T60DAR NJ PAS 08/11/2013 14 P-U DODGE P 37250 ... RD 0 0 - 0 NaN NaN NaN NaN NaN
6 1283294229 GCR2838 NY PAS 08/11/2013 14 VAN NaN P 37250 ... GN 0 2011 - 0 NaN NaN NaN NaN NaN
7 1283983620 XZ764G NJ PAS 08/07/2013 24 DELV FORD X 63430 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
8 1283983631 GBH9379 NY PAS 08/07/2013 24 SDN TOYOT X 63430 ... WHITE 0 2001 - 0 NaN NaN NaN NaN NaN
9 1283983667 MCL78B NJ PAS 07/18/2013 24 SDN SUBAR H 0 ... BLACK 0 2005 - 0 NaN NaN NaN NaN NaN
10 1283983679 M367CN NY PAS 07/18/2013 24 SDN HYUND H 0 ... BLUE 0 2010 - 0 NaN NaN NaN NaN NaN
11 1283983734 GAR6813 NY PAS 07/18/2013 24 SDN TOYOT H 0 ... TAN 0 1998 - 0 NaN NaN NaN NaN NaN
12 1283983771 GEN8674 NY PAS 07/31/2013 24 SDN AUDI X 0 ... GOLD 0 0 - 0 NaN NaN NaN NaN NaN
13 1283983825 GAC2703 NY PAS 08/12/2013 24 SDN NISSA X 23230 ... WHITE 0 2012 - 0 NaN NaN NaN NaN NaN
14 1286036800 40793JY NY COM 07/05/2013 14 VAN CHEVR P 34190 ... GREEN 0 1995 - 0 NaN NaN NaN NaN NaN
15 1286123550 GAD1485 NY PAS 08/12/2013 20 SDN VOLKS T 28930 ... WHITE 0 2012 - 0 NaN NaN NaN NaN NaN
16 1286246398 GFC5338 NY PAS 07/26/2013 14 SDN TOYOT T 0 ... GRAY 0 2010 - 0 NaN NaN NaN NaN NaN
17 1286246416 815M342 MD PAS 07/30/2013 20 SUBN SATUR T 0 ... GREEN 0 0 - 0 NaN NaN NaN NaN NaN
18 1286248000 GJA3452 NY PAS 07/23/2013 14 SDN KIA T 73690 ... WH 0 2013 - 0 NaN NaN NaN NaN NaN
19 1286282330 YZY6476 NC PAS 07/29/2013 20 SDN NISSA T 32030 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN

20 rows × 43 columns


In [10]:
print("The data type is",(type(violations_df['Plate ID'][0])))


The data type is <class 'str'>

2. I don't think anyone's car was built in 0AD. Discard the '0's as NaN.


In [11]:
type(parking_df['Vehicle Year'][0])


Out[11]:
numpy.int64

In [12]:
# DISCOVERY - pass value as [0] rather than 0
col_types = { 'Vehicle Year': [0] }
test_df = pd.read_csv("violations.csv", na_values=col_types, nrows=10)
test_df.head(10)


Out[12]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Vehicle Color Unregistered Vehicle? Vehicle Year Meter Number Feet From Curb Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... GY 0 2013.0 - 0 NaN NaN NaN NaN NaN
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... WH 0 2012.0 - 0 NaN NaN NaN NaN NaN
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... NaN 0 NaN - 0 NaN NaN NaN NaN NaN
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... WH 0 2010.0 - 0 NaN NaN NaN NaN NaN
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... BR 0 2012.0 - 0 NaN NaN NaN NaN NaN
5 1283294217 T60DAR NJ PAS 08/11/2013 14 P-U DODGE P 37250 ... RD 0 NaN - 0 NaN NaN NaN NaN NaN
6 1283294229 GCR2838 NY PAS 08/11/2013 14 VAN NaN P 37250 ... GN 0 2011.0 - 0 NaN NaN NaN NaN NaN
7 1283983620 XZ764G NJ PAS 08/07/2013 24 DELV FORD X 63430 ... WHITE 0 NaN - 0 NaN NaN NaN NaN NaN
8 1283983631 GBH9379 NY PAS 08/07/2013 24 SDN TOYOT X 63430 ... WHITE 0 2001.0 - 0 NaN NaN NaN NaN NaN
9 1283983667 MCL78B NJ PAS 07/18/2013 24 SDN SUBAR H 0 ... BLACK 0 2005.0 - 0 NaN NaN NaN NaN NaN

10 rows × 43 columns


In [13]:
violations_df['Vehicle Year'] = violations_df['Vehicle Year'].replace("0","NaN")
violations_df.head(10)


Out[13]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Vehicle Color Unregistered Vehicle? Vehicle Year Meter Number Feet From Curb Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... WH 0 2012 - 0 NaN NaN NaN NaN NaN
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... NaN 0 NaN - 0 NaN NaN NaN NaN NaN
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... WH 0 2010 - 0 NaN NaN NaN NaN NaN
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... BR 0 2012 - 0 NaN NaN NaN NaN NaN
5 1283294217 T60DAR NJ PAS 08/11/2013 14 P-U DODGE P 37250 ... RD 0 NaN - 0 NaN NaN NaN NaN NaN
6 1283294229 GCR2838 NY PAS 08/11/2013 14 VAN NaN P 37250 ... GN 0 2011 - 0 NaN NaN NaN NaN NaN
7 1283983620 XZ764G NJ PAS 08/07/2013 24 DELV FORD X 63430 ... WHITE 0 NaN - 0 NaN NaN NaN NaN NaN
8 1283983631 GBH9379 NY PAS 08/07/2013 24 SDN TOYOT X 63430 ... WHITE 0 2001 - 0 NaN NaN NaN NaN NaN
9 1283983667 MCL78B NJ PAS 07/18/2013 24 SDN SUBAR H 0 ... BLACK 0 2005 - 0 NaN NaN NaN NaN NaN

10 rows × 43 columns

3. I want the dates to be dates! Read the read_csv documentation to find out how to make pandas automatically parse dates.


In [14]:
type(violations_df['Issue Date'][0])


Out[14]:
str

In [15]:
violate_df = pd.read_csv("small-violations.csv", parse_dates=True, infer_datetime_format=True, keep_date_col=True, date_parser=True, dayfirst=True, nrows=10)

In [16]:
#violate_df['Vehicle Year'] = test1_df['Vehicle Year'].replace("0","NaN")

In [17]:
violate_df.head()


Out[17]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Vehicle Color Unregistered Vehicle? Vehicle Year Meter Number Feet From Curb Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... WH 0 2012 - 0 NaN NaN NaN NaN NaN
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... NaN 0 0 - 0 NaN NaN NaN NaN NaN
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... WH 0 2010 - 0 NaN NaN NaN NaN NaN
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... BR 0 2012 - 0 NaN NaN NaN NaN NaN

5 rows × 43 columns


In [18]:
yourdate = dateutil.parser.parse(violate_df['Issue Date'][0])
yourdate


Out[18]:
datetime.datetime(2013, 8, 4, 0, 0)

In [19]:
violate_df.head()['Issue Date'].astype(datetime.datetime)


Out[19]:
0    08/04/2013
1    08/04/2013
2    08/05/2013
3    08/05/2013
4    08/08/2013
Name: Issue Date, dtype: object

4. "Date first observed" is a pretty weird column, but it seems like it has a date hiding inside. Using a function with .apply, transform the string (e.g. "20140324") into a Python date. Make the 0's show up as NaN.


In [20]:
violate_df.columns


Out[20]:
Index(['Summons Number', 'Plate ID', 'Registration State', 'Plate Type',
       'Issue Date', 'Violation Code', 'Vehicle Body Type', 'Vehicle Make',
       'Issuing Agency', 'Street Code1', 'Street Code2', 'Street Code3',
       'Vehicle Expiration Date', 'Violation Location', 'Violation Precinct',
       'Issuer Precinct', 'Issuer Code', 'Issuer Command', 'Issuer Squad',
       'Violation Time', 'Time First Observed', 'Violation County',
       'Violation In Front Of Or Opposite', 'House Number', 'Street Name',
       'Intersecting Street', 'Date First Observed', 'Law Section',
       'Sub Division', 'Violation Legal Code', 'Days Parking In Effect    ',
       'From Hours In Effect', 'To Hours In Effect', 'Vehicle Color',
       'Unregistered Vehicle?', 'Vehicle Year', 'Meter Number',
       'Feet From Curb', 'Violation Post Code', 'Violation Description',
       'No Standing or Stopping Violation', 'Hydrant Violation',
       'Double Parking Violation'],
      dtype='object')

In [21]:
# changing it to string because it later needs to be converted into Python time.
col_observ = { 'Date First Observed': 'str',  }
test2_df = pd.read_csv("violations.csv", dtype=col_observ, nrows=10)
test2_df.head()


Out[21]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Vehicle Color Unregistered Vehicle? Vehicle Year Meter Number Feet From Curb Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... WH 0 2012 - 0 NaN NaN NaN NaN NaN
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... NaN 0 0 - 0 NaN NaN NaN NaN NaN
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... WH 0 2010 - 0 NaN NaN NaN NaN NaN
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... BR 0 2012 - 0 NaN NaN NaN NaN NaN

5 rows × 43 columns


In [22]:
# defining conversion into python time
def to_date(num):
      
    if num == "0":
        return num.replace("0","NaN")
    else:
        yourdate = dateutil.parser.parse(num)
        date_in_py = yourdate.strftime("%Y %B %d")
        return date_in_py

to_date("20140324")


Out[22]:
'2014 March 24'

In [23]:
# confirming its string.
type(test2_df['Date First Observed'][0])


Out[23]:
str

In [24]:
test2_df['Date First Observed'].apply(to_date)


Out[24]:
0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
8    NaN
9    NaN
Name: Date First Observed, dtype: object

In [25]:
#replacing Date First Observed with Date First Observed column as already there are so many columns.
test2_df['Date First Observed'] = test2_df['Date First Observed'].apply(to_date)

5. "Violation time" is... not a time. Make it a time


In [26]:
violate_df['Violation Time'].head(5)


Out[26]:
0    0752A
1    1240P
2    1243P
3    0232P
4    1239P
Name: Violation Time, dtype: object

In [27]:
type(violate_df['Violation Time'][0])


Out[27]:
str

In [28]:
# am replacing A and P with AM and PM to 
def str_to_time(time_str):
    s = time_str.replace("P"," PM").replace("A"," AM")
    x = x = s[:2] + ":" + s[2:]
    return x
str_to_time("1239P")


Out[28]:
'12:39 PM'

In [29]:
test2_df['Violation Time'] = test2_df['Violation Time'].apply(str_to_time)

In [30]:
def vio_date(time_str):
    parsed_date = dateutil.parser.parse(time_str)
    date_vio = parsed_date.strftime("%H:%M %p")
    return date_vio
    #return parsed_date.hour
print(vio_date("12:32 PM"))


12:32 PM

In [31]:
test2_df['Violation Time'].apply(vio_date)
#replacing Violation Time with Date Violation Time column as already there are so many columns.
test2_df['Violation Time'] = test2_df['Violation Time'].apply(vio_date)

In [32]:
test2_df['Violation Time']


Out[32]:
0    07:52 AM
1    12:40 PM
2    12:43 PM
3    14:32 PM
4    12:39 PM
5    18:17 PM
6    19:41 PM
7    04:25 AM
8    04:37 AM
9    08:39 AM
Name: Violation Time, dtype: object

6. There sure are a lot of colors of cars, too bad so many of them are the same. Make "BLK" and "BLACK", "WT" and "WHITE", and any other combinations that you notice.


In [33]:
#violate_df['Vehicle Color'].count_values()
violate_df.groupby('Vehicle Color').describe()


c:\users\radhika\appdata\local\programs\python\python35-32\lib\site-packages\numpy\lib\function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
Out[33]:
Date First Observed Double Parking Violation Feet From Curb House Number Hydrant Violation Issuer Code Issuer Command Issuer Precinct Issuer Squad Law Section ... Time First Observed Unregistered Vehicle? Vehicle Expiration Date Vehicle Year Violation Code Violation Description Violation Legal Code Violation Location Violation Post Code Violation Precinct
Vehicle Color
BLACK count 1.0 0.0 1.0 0.000000 0.0 1.0 1.0 1.0 1.0 1.0 ... 0.0 1.0 1.000000e+00 1.000000 1.0 0.0 0.0 1.0 0.0 1.0
mean 0.0 NaN 0.0 NaN NaN 101043.0 976.0 976.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2005.000000 24.0 NaN NaN 79.0 NaN 79.0
std NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
min 0.0 NaN 0.0 NaN NaN 101043.0 976.0 976.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2005.000000 24.0 NaN NaN 79.0 NaN 79.0
25% 0.0 NaN 0.0 NaN NaN 101043.0 976.0 976.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2005.000000 24.0 NaN NaN 79.0 NaN 79.0
50% 0.0 NaN 0.0 NaN NaN 101043.0 976.0 976.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2005.000000 24.0 NaN NaN 79.0 NaN 79.0
75% 0.0 NaN 0.0 NaN NaN 101043.0 976.0 976.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2005.000000 24.0 NaN NaN 79.0 NaN 79.0
max 0.0 NaN 0.0 NaN NaN 101043.0 976.0 976.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2005.000000 24.0 NaN NaN 79.0 NaN 79.0
BR count 1.0 0.0 1.0 1.000000 0.0 1.0 1.0 1.0 1.0 1.0 ... 0.0 1.0 1.000000e+00 1.000000 1.0 0.0 0.0 1.0 0.0 1.0
mean 0.0 NaN 0.0 590.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2012.000000 41.0 NaN NaN 33.0 NaN 33.0
std NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
min 0.0 NaN 0.0 590.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2012.000000 41.0 NaN NaN 33.0 NaN 33.0
25% 0.0 NaN 0.0 590.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2012.000000 41.0 NaN NaN 33.0 NaN 33.0
50% 0.0 NaN 0.0 590.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2012.000000 41.0 NaN NaN 33.0 NaN 33.0
75% 0.0 NaN 0.0 590.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2012.000000 41.0 NaN NaN 33.0 NaN 33.0
max 0.0 NaN 0.0 590.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 2012.000000 41.0 NaN NaN 33.0 NaN 33.0
GN count 1.0 0.0 1.0 1.000000 0.0 1.0 1.0 1.0 1.0 1.0 ... 0.0 1.0 1.000000e+00 1.000000 1.0 0.0 0.0 1.0 0.0 1.0
mean 0.0 NaN 0.0 551.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014122e+07 2011.000000 14.0 NaN NaN 33.0 NaN 33.0
std NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
min 0.0 NaN 0.0 551.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014122e+07 2011.000000 14.0 NaN NaN 33.0 NaN 33.0
25% 0.0 NaN 0.0 551.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014122e+07 2011.000000 14.0 NaN NaN 33.0 NaN 33.0
50% 0.0 NaN 0.0 551.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014122e+07 2011.000000 14.0 NaN NaN 33.0 NaN 33.0
75% 0.0 NaN 0.0 551.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014122e+07 2011.000000 14.0 NaN NaN 33.0 NaN 33.0
max 0.0 NaN 0.0 551.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014122e+07 2011.000000 14.0 NaN NaN 33.0 NaN 33.0
GY count 1.0 0.0 1.0 1.000000 0.0 1.0 1.0 1.0 1.0 1.0 ... 0.0 1.0 1.000000e+00 1.000000 1.0 0.0 0.0 1.0 0.0 1.0
mean 0.0 NaN 0.0 712.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014083e+07 2013.000000 46.0 NaN NaN 33.0 NaN 33.0
std NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
min 0.0 NaN 0.0 712.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014083e+07 2013.000000 46.0 NaN NaN 33.0 NaN 33.0
25% 0.0 NaN 0.0 712.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014083e+07 2013.000000 46.0 NaN NaN 33.0 NaN 33.0
50% 0.0 NaN 0.0 712.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014083e+07 2013.000000 46.0 NaN NaN 33.0 NaN 33.0
75% 0.0 NaN 0.0 712.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014083e+07 2013.000000 46.0 NaN NaN 33.0 NaN 33.0
max 0.0 NaN 0.0 712.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014083e+07 2013.000000 46.0 NaN NaN 33.0 NaN 33.0
RD count 1.0 0.0 1.0 1.000000 0.0 1.0 1.0 1.0 1.0 1.0 ... 0.0 1.0 1.000000e+00 1.000000 1.0 0.0 0.0 1.0 0.0 1.0
mean 0.0 NaN 0.0 525.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 0.000000 14.0 NaN NaN 33.0 NaN 33.0
std NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
min 0.0 NaN 0.0 525.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 0.000000 14.0 NaN NaN 33.0 NaN 33.0
25% 0.0 NaN 0.0 525.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 0.000000 14.0 NaN NaN 33.0 NaN 33.0
50% 0.0 NaN 0.0 525.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 0.000000 14.0 NaN NaN 33.0 NaN 33.0
75% 0.0 NaN 0.0 525.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 0.000000 14.0 NaN NaN 33.0 NaN 33.0
max 0.0 NaN 0.0 525.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 0.000000e+00 0.000000 14.0 NaN NaN 33.0 NaN 33.0
WH count 2.0 0.0 2.0 2.000000 0.0 2.0 2.0 2.0 2.0 2.0 ... 0.0 2.0 2.000000e+00 2.000000 2.0 0.0 0.0 2.0 0.0 2.0
mean 0.0 NaN 0.0 359.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014073e+07 2011.000000 46.0 NaN NaN 33.0 NaN 33.0
std 0.0 NaN 0.0 223.445743 NaN 0.0 0.0 0.0 0.0 0.0 ... NaN 0.0 4.249712e+02 1.414214 0.0 NaN NaN 0.0 NaN 0.0
min 0.0 NaN 0.0 201.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014043e+07 2010.000000 46.0 NaN NaN 33.0 NaN 33.0
25% 0.0 NaN 0.0 280.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014058e+07 2010.500000 46.0 NaN NaN 33.0 NaN 33.0
50% 0.0 NaN 0.0 359.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014073e+07 2011.000000 46.0 NaN NaN 33.0 NaN 33.0
75% 0.0 NaN 0.0 438.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014088e+07 2011.500000 46.0 NaN NaN 33.0 NaN 33.0
max 0.0 NaN 0.0 517.000000 NaN 921043.0 33.0 33.0 0.0 408.0 ... NaN 0.0 2.014103e+07 2012.000000 46.0 NaN NaN 33.0 NaN 33.0
WHITE count 2.0 0.0 2.0 2.000000 0.0 2.0 2.0 2.0 2.0 2.0 ... 0.0 2.0 2.000000e+00 2.000000 2.0 0.0 0.0 2.0 0.0 2.0
mean 0.0 NaN 0.0 100.000000 NaN 101079.0 976.0 976.0 0.0 408.0 ... NaN 0.0 1.007036e+07 1000.500000 24.0 NaN NaN 88.0 NaN 88.0
std 0.0 NaN 0.0 0.000000 NaN 0.0 0.0 0.0 0.0 0.0 ... NaN 0.0 1.424164e+07 1414.920669 0.0 NaN NaN 0.0 NaN 0.0
min 0.0 NaN 0.0 100.000000 NaN 101079.0 976.0 976.0 0.0 408.0 ... NaN 0.0 0.000000e+00 0.000000 24.0 NaN NaN 88.0 NaN 88.0
25% 0.0 NaN 0.0 100.000000 NaN 101079.0 976.0 976.0 0.0 408.0 ... NaN 0.0 5.035180e+06 500.250000 24.0 NaN NaN 88.0 NaN 88.0
50% 0.0 NaN 0.0 100.000000 NaN 101079.0 976.0 976.0 0.0 408.0 ... NaN 0.0 1.007036e+07 1000.500000 24.0 NaN NaN 88.0 NaN 88.0
75% 0.0 NaN 0.0 100.000000 NaN 101079.0 976.0 976.0 0.0 408.0 ... NaN 0.0 1.510554e+07 1500.750000 24.0 NaN NaN 88.0 NaN 88.0
max 0.0 NaN 0.0 100.000000 NaN 101079.0 976.0 976.0 0.0 408.0 ... NaN 0.0 2.014072e+07 2001.000000 24.0 NaN NaN 88.0 NaN 88.0

56 rows × 25 columns


In [34]:
def to_color(color_str):
    if color_str == "WH":
        return str(color_str.replace("WH","White"))
    if color_str == "WHT":
        return str(color_str.replace("WHT","White"))
    if color_str == "RD":
        return str(color_str.replace("RD","Red"))
    if color_str == "BLK":
        return str(color_str.replace("BLK","BLACK"))
    if color_str == "BK":
        return str(color_str.replace("BK","BLACK"))
    if color_str == "BR":
        return str(color_str.replace("BR","Brown"))
    if color_str == "BRW":
        return str(color_str.replace("BRW","Brown"))
    if color_str == "GN":
        return str(color_str.replace("GN","Green"))
    if color_str == "GRY":
        return str(color_str.replace("GRY","Gray"))
    if color_str == "GY":
        return str(color_str.replace("GY","Gray"))
    if color_str == "BL":
        return str(color_str.replace("BL","Blue"))
    if color_str == "SILVR":
        return str(color_str.replace("SILVR","Silver"))
    if color_str == "SILVE":
        return str(color_str.replace("SILVE","Silver"))
    if color_str == "MAROO":
        return str(color_str.replace("MAROO","Maroon"))
to_color("WHT")


Out[34]:
'White'

In [35]:
test2_df['Vehicle Color'].apply(to_color)


Out[35]:
0     Gray
1    White
2     None
3    White
4    Brown
5      Red
6    Green
7     None
8     None
9     None
Name: Vehicle Color, dtype: object

In [36]:
#replacing Vehicle Color with Vehicle Color column as already there are so many columns.
test2_df['Vehicle Color'] = test2_df['Vehicle Color'].apply(to_color)

In [37]:
test2_df['Vehicle Color'].head()


Out[37]:
0     Gray
1    White
2     None
3    White
4    Brown
Name: Vehicle Color, dtype: object

7. Join the data with the Parking Violations Code dataset from the NYC Open Data site.


In [38]:
df_code = pd.read_csv("DOF_Parking_Violation_Codes.csv")
df_code.head(10)


Out[38]:
CODE DEFINITION Manhattan  96th St. & below All Other Areas
0 10 Stopping, standing or parking where a sign, st... $115 $115
1 11 Hotel Loading/Unloading: Standing or parking w... $115 $115
2 12 Snow Emergency: Standing or parking where stan... $95 $95
3 13 Taxi Stand: Standing or parking where standing... $115 $115
4 14 General No Standing: Standing or parking where... $115 $115
5 16 Truck Loading/Unloading: Standing or parking w... $95 $95
6 17 Authorized Vehicles Only: Standing or parking ... $95 $95
7 18 Bus Lane: Standing or parking where standing i... $115 $115
8 19 Bus Stop: Standing or parking where standing i... $115 $115
9 20 General No Parking: No parking where parking i... $65 $60

In [39]:
violate_df['Violation Legal Code'].head()


Out[39]:
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
Name: Violation Legal Code, dtype: float64

In [40]:
test2_df.join(df_code, on='Violation Code', how='left')


Out[40]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Feet From Curb Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation CODE DEFINITION Manhattan  96th St. & below All Other Areas
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... 0 NaN NaN NaN NaN NaN 62 Standing or parking a vehicle beyond markings ... $65 $45
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... 0 NaN NaN NaN NaN NaN 62 Standing or parking a vehicle beyond markings ... $65 $45
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... 0 NaN NaN NaN NaN NaN 62 Standing or parking a vehicle beyond markings ... $65 $45
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... 0 NaN NaN NaN NaN NaN 62 Standing or parking a vehicle beyond markings ... $65 $45
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... 0 NaN NaN NaN NaN NaN 57 Parking a vehicle within the area designated a... $65 $65
5 1283294217 T60DAR NJ PAS 08/11/2013 14 P-U DODGE P 37250 ... 0 NaN NaN NaN NaN NaN 25 Standing at a commuter van stop, other than te... $115 $115
6 1283294229 GCR2838 NY PAS 08/11/2013 14 VAN NaN P 37250 ... 0 NaN NaN NaN NaN NaN 25 Standing at a commuter van stop, other than te... $115 $115
7 1283983620 XZ764G NJ PAS 08/07/2013 24 DELV FORD X 63430 ... 0 NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60
8 1283983631 GBH9379 NY PAS 08/07/2013 24 SDN TOYOT X 63430 ... 0 NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60
9 1283983667 MCL78B NJ PAS 07/18/2013 24 SDN SUBAR H 0 ... 0 NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60

10 rows × 47 columns

8. How much money did NYC make off of parking violations?


In [41]:
for ammount in df_code["All Other Areas"]:
    try:
        money_to_int(ammount)
    except:
        print(ammount)
        print(type(ammount))


$115 
<class 'str'>
$115 
<class 'str'>
$95 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$95 
<class 'str'>
$95 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$60 
<class 'str'>
$45 
<class 'str'>
$60 
<class 'str'>
$60 
<class 'str'>
$60 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$180 
<class 'str'>
$95 
<class 'str'>
$115 
<class 'str'>
$35 
<class 'str'>
$35 
<class 'str'>
$35 
<class 'str'>
$35 
<class 'str'>
$35 
<class 'str'>
$60 
<class 'str'>
$65 
<class 'str'>
$115 
<class 'str'>
$35 
<class 'str'>
$35 
<class 'str'>
$35 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$95 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$65 
<class 'str'>
$45 
<class 'str'>
$115 
<class 'str'>
$45 
<class 'str'>
$45 
<class 'str'>
$45 
<class 'str'>
$95 
<class 'str'>
$95 
<class 'str'>
$95 
<class 'str'>
$45 
<class 'str'>
$165 
<class 'str'>
$65 
<class 'str'>
$65 
<class 'str'>
$65 
<class 'str'>
$65 
<class 'str'>
$65 
<class 'str'>
$65 
<class 'str'>
$65 
<class 'str'>
$45 
<class 'str'>
$65 
<class 'str'>
$115 
<class 'str'>
$45 
<class 'str'>
$95 
<class 'str'>
$115 
<class 'str'>
$65 
<class 'str'>
$45 
<class 'str'>
$65 
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>
$45 
<class 'str'>
$45 
<class 'str'>
$65 
<class 'str'>
$200 (Heavy Tow, plus violation fine)
<class 'str'>
$95 
<class 'str'>
$45 
<class 'str'>
$95 
<class 'str'>
vary
<class 'str'>
nan
<class 'float'>
$115 
<class 'str'>
2nd Offense - $265
2nd Offense - $515
<class 'str'>
$50  
<class 'str'>
$115 
<class 'str'>
$115 
<class 'str'>

In [42]:
def money_to_int(money_str):
    return int(money_str.replace("$","").replace(",",""))

print(money_to_int("$115"))


115

In [43]:
import re
ammount_list = []
other_area = df_code["All Other Areas"]
for ammount in other_area:
    try:
        x = money_to_int(ammount)
        ammount_list.append(x)
        #print(amount_list)
    except:
        print("made it to except")
        if isinstance(ammount,str):
            print("is a string!")
            clean = re.findall(r"\d{3}", ammount)
            z = [int(i) for i in clean]
            #print(type(z[0]))
            #print(clean)
            if len(z) > 1:
                print("z is greater than 1")
                avg = int(sum(z) / len(z))
                print(type(avg))
                #print(avg)
                ammount_list.append(avg)
            elif len(z) == 1:
                print("only one item in list!")
                print("Let's append", str(z[0]))
                ammount_list.append(z[0])
                #print(amount_list)
            else:
                ammount_list.append(None)
        else: 
            ammount_list.append(None)
len(ammount_list) 
df_code['new_areas'] = ammount_list


made it to except
is a string!
only one item in list!
Let's append 200
made it to except
is a string!
made it to except
made it to except
is a string!
z is greater than 1
<class 'int'>

In [44]:
df_code


Out[44]:
CODE DEFINITION Manhattan  96th St. & below All Other Areas new_areas
0 10 Stopping, standing or parking where a sign, st... $115 $115 115.0
1 11 Hotel Loading/Unloading: Standing or parking w... $115 $115 115.0
2 12 Snow Emergency: Standing or parking where stan... $95 $95 95.0
3 13 Taxi Stand: Standing or parking where standing... $115 $115 115.0
4 14 General No Standing: Standing or parking where... $115 $115 115.0
5 16 Truck Loading/Unloading: Standing or parking w... $95 $95 95.0
6 17 Authorized Vehicles Only: Standing or parking ... $95 $95 95.0
7 18 Bus Lane: Standing or parking where standing i... $115 $115 115.0
8 19 Bus Stop: Standing or parking where standing i... $115 $115 115.0
9 20 General No Parking: No parking where parking i... $65 $60 60.0
10 21 Street Cleaning: No parking where parking is n... $65 $45 45.0
11 22 Hotel Loading/Unloading: No parking where park... $60 $60 60.0
12 23 Taxi Stand: No parking where parking is not al... $65 $60 60.0
13 24 Authorized Vehicles Only: No parking where par... $65 $60 60.0
14 25 Standing at a commuter van stop, other than te... $115 $115 115.0
15 26 Standing at a for-hire vehicle stop, other tha... $115 $115 115.0
16 27 No parking in a handicapped zone (off-street o... $180 $180 180.0
17 28 Overtime standing (diplomat) $95 $95 95.0
18 31 Standing of a non-commercial vehicle in a comm... $115 $115 115.0
19 32 Parking at a broken or missing meter for longe... $65 $35 35.0
20 33 "Feeding Meter" -- parking in a metered space ... $65 $35 35.0
21 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
22 35 Parking in a meter space for the purpose of di... $65 $35 35.0
23 37-38 Muni Meter --\n(37) Parking in excess of the a... $65 $35 35.0
24 39 Parking for longer than the maximum time permi... $65 $60 60.0
25 69 Failing to show a muni-meter receipt, commerci... $65 $65 65.0
26 40 Stopping, standing or parking closer than 15 f... $115 $115 115.0
27 42 Parking in a Muni Metered space in a commercia... $65 $35 35.0
28 43 Parking in a commercial metered zone in which ... $65 $35 35.0
29 44 Parking in a commercial metered zone for longe... $65 $35 35.0
... ... ... ... ... ...
54 71 Standing or parking a vehicle without showing ... $65 $65 65.0
55 72 Standing or parking a vehicle with NY Plates a... $65 $65 65.0
56 73 Standing or parking a vehicle showing an expir... $65 $65 65.0
57 74 Standing or parking a vehicle without properly... $65 $65 65.0
58 75 Standing or parking a vehicle in which the Lic... $65 $65 65.0
59 77 Parking a bus, unless allowed by signs. A char... $65 $45 45.0
60 78 Parking a Commercial Vehicle on a residential ... $65 $65 65.0
61 79 For a bus without passengers, waiting at a cur... $115 $115 115.0
62 80 Standing or parking a vehicle without head lam... $60 $45 45.0
63 81 No standing except diplomat $95 $95 95.0
64 82 Standing or parking a Commercial Vehicle unles... $115 $115 115.0
65 83 Standing or parking a vehicle which is not pro... $65 $65 65.0
66 84 Parking a Commercial Vehicle on any city stree... $65 $45 45.0
67 85 Parking a Commercial Vehicle more than 3 hours... $65 $65 65.0
68 86 Standing or parking a vehicle to make pickups,... $115 $115 115.0
69 89 Standing or parking a vehicle in the Garment D... $115 $115 115.0
70 91 Parking in order to sell a vehicle by a person... $65 $45 45.0
71 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
72 93 Stopping, standing or parking on paved roadway... $65 $65 65.0
73 94 Vehicle Release Agreement code associated with... 100\n(Regular Tow, plus violation fine) $200 (Heavy Tow, plus violation fine) 200.0
74 96 Standing or parking within 50 feet of the near... $95 $95 95.0
75 97 Parking in a vacant lot. A vehicle may be park... $65 $45 45.0
76 98 Standing or parking in front of a public or pr... $95 $95 95.0
77 99 All other parking, standing or stopping violat... vary vary NaN
78 04 Vehicles parked illegally south of Houston Str... $115 NaN NaN
79 05 Failure to make a right turn from a bus lane. $115 $115 115.0
80 06 Parking a tractor-trailer on a residential str... 1st Offense - $265\n2nd Offense - $515 2nd Offense - $265\n2nd Offense - $515 390.0
81 07 Vehicles photographed going through a red ligh... $50 $50 50.0
82 08 Vehicle idling in a restricted area. $115 $115 115.0
83 09 Blocking an Intersection: Obstructing traffic ... $115 $115 115.0

84 rows × 5 columns


In [45]:
#df_code['new_areas'].sum()

In [46]:
#since I am unable to read the entire data set using the subset to calculate the sum.
test3_df = pd.read_csv("small-violations.csv", dtype=col_observ)
test3_df


Out[46]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Vehicle Color Unregistered Vehicle? Vehicle Year Meter Number Feet From Curb Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... WH 0 2012 - 0 NaN NaN NaN NaN NaN
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... NaN 0 0 - 0 NaN NaN NaN NaN NaN
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... WH 0 2010 - 0 NaN NaN NaN NaN NaN
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... BR 0 2012 - 0 NaN NaN NaN NaN NaN
5 1283294217 T60DAR NJ PAS 08/11/2013 14 P-U DODGE P 37250 ... RD 0 0 - 0 NaN NaN NaN NaN NaN
6 1283294229 GCR2838 NY PAS 08/11/2013 14 VAN NaN P 37250 ... GN 0 2011 - 0 NaN NaN NaN NaN NaN
7 1283983620 XZ764G NJ PAS 08/07/2013 24 DELV FORD X 63430 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
8 1283983631 GBH9379 NY PAS 08/07/2013 24 SDN TOYOT X 63430 ... WHITE 0 2001 - 0 NaN NaN NaN NaN NaN
9 1283983667 MCL78B NJ PAS 07/18/2013 24 SDN SUBAR H 0 ... BLACK 0 2005 - 0 NaN NaN NaN NaN NaN
10 1283983679 M367CN NY PAS 07/18/2013 24 SDN HYUND H 0 ... BLUE 0 2010 - 0 NaN NaN NaN NaN NaN
11 1283983734 GAR6813 NY PAS 07/18/2013 24 SDN TOYOT H 0 ... TAN 0 1998 - 0 NaN NaN NaN NaN NaN
12 1283983771 GEN8674 NY PAS 07/31/2013 24 SDN AUDI X 0 ... GOLD 0 0 - 0 NaN NaN NaN NaN NaN
13 1283983825 GAC2703 NY PAS 08/12/2013 24 SDN NISSA X 23230 ... WHITE 0 2012 - 0 NaN NaN NaN NaN NaN
14 1286036800 40793JY NY COM 07/05/2013 14 VAN CHEVR P 34190 ... GREEN 0 1995 - 0 NaN NaN NaN NaN NaN
15 1286123550 GAD1485 NY PAS 08/12/2013 20 SDN VOLKS T 28930 ... WHITE 0 2012 - 0 NaN NaN NaN NaN NaN
16 1286246398 GFC5338 NY PAS 07/26/2013 14 SDN TOYOT T 0 ... GRAY 0 2010 - 0 NaN NaN NaN NaN NaN
17 1286246416 815M342 MD PAS 07/30/2013 20 SUBN SATUR T 0 ... GREEN 0 0 - 0 NaN NaN NaN NaN NaN
18 1286248000 GJA3452 NY PAS 07/23/2013 14 SDN KIA T 73690 ... WH 0 2013 - 0 NaN NaN NaN NaN NaN
19 1286282330 YZY6476 NC PAS 07/29/2013 20 SDN NISSA T 32030 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
20 1286282342 WBJ819 LA PAS 08/07/2013 17 SUBN HONDA T 0 ... TAN 0 0 - 0 NaN NaN NaN NaN NaN
21 1286289841 GAV9235 NY PAS 07/20/2013 50 SDN HONDA T 0 ... GY 0 2012 - 0 NaN NaN NaN NaN NaN
22 1286654920 ZTR66R NJ PAS 07/18/2013 50 SDN N/S T 0 ... GRAY 0 0 - 0 NaN NaN NaN NaN NaN
23 1286799648 GDE3973 NY PAS 07/20/2013 40 SDN TOYOT T 0 ... TN 0 2003 - 9 NaN NaN NaN NaN NaN
24 1286807062 P193871 IL PAS 08/10/2013 40 SDN SCIO T 26630 ... BK 0 0 - 6 NaN NaN NaN NaN NaN
25 1286807475 GGL6608 NY PAS 07/26/2013 40 SDN NISSA T 73650 ... WHT 0 2013 - 1 NaN NaN NaN NaN NaN
26 1286807633 84301JU NY COM 07/19/2013 14 P-U DODGE X 63030 ... WH 0 2001 - 0 NaN NaN NaN NaN NaN
27 1286808807 HHG3545 PA PAS 08/06/2013 20 SDN TOYOT T 21130 ... GR 0 0 - 0 NaN NaN NaN NaN NaN
28 1287320491 50097JX NY COM 08/07/2013 40 VAN MERCU T 23190 ... RD 0 2003 - 0 NaN NaN NaN NaN NaN
29 1287320570 GEJ9154 NY PAS 08/06/2013 51 VAN TOYOT X 93230 ... GY 0 2013 - 0 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9969 1349992501 GFZ7651 NY PAS 07/23/2013 71 SDN TOYOT P 25870 ... GREY 0 2007 - 0 NaN NaN NaN NaN NaN
9970 1349992975 LBS219 TX PAS 08/13/2013 91 SUBN FORD P 0 ... BK 0 1996 - 0 NaN NaN NaN NaN NaN
9971 1349993244 RG2C42 TX PAS 07/16/2013 78 TRAC PETER X 20440 ... BLUE 0 2004 - 0 NaN NaN NaN NaN NaN
9972 1349993270 GAA1625 NY PAS 07/16/2013 71 SDN HYUND X 26040 ... BLUE 0 2013 - 0 NaN NaN NaN NaN NaN
9973 1349993335 ZEB4706 NY COM 08/15/2013 78 VAN FORD P 63290 ... WHITE 0 2006 - 0 NaN NaN NaN NaN NaN
9974 1349993463 GFD7394 NY PAS 08/13/2013 74 SUBN ROVER P 25340 ... BLK 0 2008 - 0 NaN NaN NaN NaN NaN
9975 1349993505 MSVEE NY PAS 08/07/2013 71 SUBN ME/BE P 25440 ... BLK 0 2002 - 0 NaN NaN NaN NaN NaN
9976 1349993682 73143JK NY COM 07/31/2013 78 VAN FORD P 53890 ... BLK 0 1999 - 0 NaN NaN NaN NaN NaN
9977 1349993797 GDP1969 NY PAS 07/27/2013 71 SUBN FORD P 25870 ... BLACK 0 0 - 0 NaN NaN NaN NaN NaN
9978 1349994250 GHL3117 NY PAS 07/17/2013 98 SDN NISSA X 22390 ... GOLD 0 0 - 0 NaN NaN NaN NaN NaN
9979 1349994406 GEF7027 NY PAS 07/15/2013 52 SDN NISSA X 0 ... GRY 0 2003 - 0 NaN NaN NaN NaN NaN
9980 1349994571 EUX4225 NY PAS 07/21/2013 40 SDN NISSA P 0 ... BLUE 0 2003 - 0 NaN NaN NaN NaN NaN
9981 1349994716 GHF9746 NY PAS 08/10/2013 40 SDN HONDA P 0 ... BLACK 0 2013 - 2 NaN NaN NaN NaN NaN
9982 1349994741 HX55885 NY PAS 08/09/2013 70 SDN VOLKS P 21290 ... GOLD 0 1974 - 0 NaN NaN NaN NaN NaN
9983 1349994753 HX55885 NY PAS 08/09/2013 71 SDN VOLKS P 21290 ... GOLD 0 1974 - 0 NaN NaN NaN NaN NaN
9984 1349994807 TME33R NJ PAS 08/09/2013 78 TRLR NaN P 0 ... WHITE 0 0 - 0 NaN NaN NaN NaN NaN
9985 1349994819 98166JT 99 COM 08/09/2013 78 VAN FORD P 0 ... WHITE 0 2008 - 0 NaN NaN NaN NaN NaN
9986 1349995514 EUE8098 99 PAS 07/23/2013 27 SUBN MERCU P 0 ... GRN 0 1999 - 0 NaN NaN NaN NaN NaN
9987 1349995526 54726MC NY COM 07/23/2013 78 VAN CHEVR P 0 ... WH 0 2008 - 0 NaN NaN NaN NaN NaN
9988 1349995538 7AB6161 MD PAS 07/23/2013 78 SUBN GMC P 0 ... GRY 0 0 - 0 NaN NaN NaN NaN NaN
9989 1349995551 66247MC NY COM 07/25/2013 78 DUMP MACK P 44690 ... RD 0 2001 - 0 NaN NaN NaN NaN NaN
9990 1349995654 GBZ2160 NY PAS 08/14/2013 71 SDN NISSA P 24490 ... BLACK 0 1999 - 0 NaN NaN NaN NaN NaN
9991 1349995666 HST2419 PA PAS 08/14/2013 40 SDN BMW P 24490 ... BLK 0 0 - 10 NaN NaN NaN NaN NaN
9992 1349995678 GHL3189 NY PAS 08/04/2013 71 SUBN CHRYS P 23090 ... BLUE 0 2009 - 0 NaN NaN NaN NaN NaN
9993 1349995691 GAX3159 NY PAS 08/04/2013 71 SDN NISSA P 23090 ... GY 0 2006 - 0 NaN NaN NaN NaN NaN
9994 1349995721 GBJ4159 NY PAS 07/27/2013 71 SDN TOYOT P 25935 ... BLUE 0 2004 - 0 NaN NaN NaN NaN NaN
9995 1349995733 JDD3205 PA PAS 07/13/2013 40 SDN NISSA P 24740 ... GY 0 0 - 1 NaN NaN NaN NaN NaN
9996 1349995745 GFD4344 NY PAS 07/13/2013 40 SDN CHRYS P 29090 ... GY 0 2013 - 5 NaN NaN NaN NaN NaN
9997 1349995836 GCG2966 NY PAS 08/07/2013 74 SDN NISSA P 24090 ... BLACK 0 2005 - 0 NaN NaN NaN NaN NaN
9998 1349995850 GGH5393 NY PAS 08/04/2013 74 SDN HYUND P 0 ... RED 0 1998 - 0 NaN NaN NaN NaN NaN

9999 rows × 43 columns


In [47]:
test3_df.join(df_code, on='Violation Code', how='left')


Out[47]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation CODE DEFINITION Manhattan  96th St. & below All Other Areas new_areas
0 1283294138 GBB9093 NY PAS 08/04/2013 46 SUBN AUDI P 37250 ... NaN NaN NaN NaN NaN 62 Standing or parking a vehicle beyond markings ... $65 $45 45.0
1 1283294151 62416MB NY COM 08/04/2013 46 VAN FORD P 37290 ... NaN NaN NaN NaN NaN 62 Standing or parking a vehicle beyond markings ... $65 $45 45.0
2 1283294163 78755JZ NY COM 08/05/2013 46 P-U CHEVR P 37030 ... NaN NaN NaN NaN NaN 62 Standing or parking a vehicle beyond markings ... $65 $45 45.0
3 1283294175 63009MA NY COM 08/05/2013 46 VAN FORD P 37270 ... NaN NaN NaN NaN NaN 62 Standing or parking a vehicle beyond markings ... $65 $45 45.0
4 1283294187 91648MC NY COM 08/08/2013 41 TRLR GMC P 37240 ... NaN NaN NaN NaN NaN 57 Parking a vehicle within the area designated a... $65 $65 65.0
5 1283294217 T60DAR NJ PAS 08/11/2013 14 P-U DODGE P 37250 ... NaN NaN NaN NaN NaN 25 Standing at a commuter van stop, other than te... $115 $115 115.0
6 1283294229 GCR2838 NY PAS 08/11/2013 14 VAN NaN P 37250 ... NaN NaN NaN NaN NaN 25 Standing at a commuter van stop, other than te... $115 $115 115.0
7 1283983620 XZ764G NJ PAS 08/07/2013 24 DELV FORD X 63430 ... NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60 60.0
8 1283983631 GBH9379 NY PAS 08/07/2013 24 SDN TOYOT X 63430 ... NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60 60.0
9 1283983667 MCL78B NJ PAS 07/18/2013 24 SDN SUBAR H 0 ... NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60 60.0
10 1283983679 M367CN NY PAS 07/18/2013 24 SDN HYUND H 0 ... NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60 60.0
11 1283983734 GAR6813 NY PAS 07/18/2013 24 SDN TOYOT H 0 ... NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60 60.0
12 1283983771 GEN8674 NY PAS 07/31/2013 24 SDN AUDI X 0 ... NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60 60.0
13 1283983825 GAC2703 NY PAS 08/12/2013 24 SDN NISSA X 23230 ... NaN NaN NaN NaN NaN 39 Parking for longer than the maximum time permi... $65 $60 60.0
14 1286036800 40793JY NY COM 07/05/2013 14 VAN CHEVR P 34190 ... NaN NaN NaN NaN NaN 25 Standing at a commuter van stop, other than te... $115 $115 115.0
15 1286123550 GAD1485 NY PAS 08/12/2013 20 SDN VOLKS T 28930 ... NaN NaN NaN NaN NaN 33 "Feeding Meter" -- parking in a metered space ... $65 $35 35.0
16 1286246398 GFC5338 NY PAS 07/26/2013 14 SDN TOYOT T 0 ... NaN NaN NaN NaN NaN 25 Standing at a commuter van stop, other than te... $115 $115 115.0
17 1286246416 815M342 MD PAS 07/30/2013 20 SUBN SATUR T 0 ... NaN NaN NaN NaN NaN 33 "Feeding Meter" -- parking in a metered space ... $65 $35 35.0
18 1286248000 GJA3452 NY PAS 07/23/2013 14 SDN KIA T 73690 ... NaN NaN NaN NaN NaN 25 Standing at a commuter van stop, other than te... $115 $115 115.0
19 1286282330 YZY6476 NC PAS 07/29/2013 20 SDN NISSA T 32030 ... NaN NaN NaN NaN NaN 33 "Feeding Meter" -- parking in a metered space ... $65 $35 35.0
20 1286282342 WBJ819 LA PAS 08/07/2013 17 SUBN HONDA T 0 ... NaN NaN NaN NaN NaN 28 Overtime standing (diplomat) $95 $95 95.0
21 1286289841 GAV9235 NY PAS 07/20/2013 50 SDN HONDA T 0 ... NaN NaN NaN NaN NaN 66 Parking a trailer or semi-trailer which is not... $65 $45 45.0
22 1286654920 ZTR66R NJ PAS 07/18/2013 50 SDN N/S T 0 ... NaN NaN NaN NaN NaN 66 Parking a trailer or semi-trailer which is not... $65 $45 45.0
23 1286799648 GDE3973 NY PAS 07/20/2013 40 SDN TOYOT T 0 ... NaN NaN NaN NaN NaN 56 Stopping, standing or parking alongside a barr... $115 $115 115.0
24 1286807062 P193871 IL PAS 08/10/2013 40 SDN SCIO T 26630 ... NaN NaN NaN NaN NaN 56 Stopping, standing or parking alongside a barr... $115 $115 115.0
25 1286807475 GGL6608 NY PAS 07/26/2013 40 SDN NISSA T 73650 ... NaN NaN NaN NaN NaN 56 Stopping, standing or parking alongside a barr... $115 $115 115.0
26 1286807633 84301JU NY COM 07/19/2013 14 P-U DODGE X 63030 ... NaN NaN NaN NaN NaN 25 Standing at a commuter van stop, other than te... $115 $115 115.0
27 1286808807 HHG3545 PA PAS 08/06/2013 20 SDN TOYOT T 21130 ... NaN NaN NaN NaN NaN 33 "Feeding Meter" -- parking in a metered space ... $65 $35 35.0
28 1287320491 50097JX NY COM 08/07/2013 40 VAN MERCU T 23190 ... NaN NaN NaN NaN NaN 56 Stopping, standing or parking alongside a barr... $115 $115 115.0
29 1287320570 GEJ9154 NY PAS 08/06/2013 51 VAN TOYOT X 93230 ... NaN NaN NaN NaN NaN 67 Parking in front of a pedestrian ramp $165 $165 165.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9969 1349992501 GFZ7651 NY PAS 07/23/2013 71 SDN TOYOT P 25870 ... NaN NaN NaN NaN NaN 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
9970 1349992975 LBS219 TX PAS 08/13/2013 91 SUBN FORD P 0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9971 1349993244 RG2C42 TX PAS 07/16/2013 78 TRAC PETER X 20440 ... NaN NaN NaN NaN NaN 04 Vehicles parked illegally south of Houston Str... $115 NaN NaN
9972 1349993270 GAA1625 NY PAS 07/16/2013 71 SDN HYUND X 26040 ... NaN NaN NaN NaN NaN 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
9973 1349993335 ZEB4706 NY COM 08/15/2013 78 VAN FORD P 63290 ... NaN NaN NaN NaN NaN 04 Vehicles parked illegally south of Houston Str... $115 NaN NaN
9974 1349993463 GFD7394 NY PAS 08/13/2013 74 SUBN ROVER P 25340 ... NaN NaN NaN NaN NaN 96 Standing or parking within 50 feet of the near... $95 $95 95.0
9975 1349993505 MSVEE NY PAS 08/07/2013 71 SUBN ME/BE P 25440 ... NaN NaN NaN NaN NaN 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
9976 1349993682 73143JK NY COM 07/31/2013 78 VAN FORD P 53890 ... NaN NaN NaN NaN NaN 04 Vehicles parked illegally south of Houston Str... $115 NaN NaN
9977 1349993797 GDP1969 NY PAS 07/27/2013 71 SUBN FORD P 25870 ... NaN NaN NaN NaN NaN 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
9978 1349994250 GHL3117 NY PAS 07/17/2013 98 SDN NISSA X 22390 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9979 1349994406 GEF7027 NY PAS 07/15/2013 52 SDN NISSA X 0 ... NaN NaN NaN NaN NaN 68 Not parking as marked on a posted sign $65 $65 65.0
9980 1349994571 EUX4225 NY PAS 07/21/2013 40 SDN NISSA P 0 ... NaN NaN NaN NaN NaN 56 Stopping, standing or parking alongside a barr... $115 $115 115.0
9981 1349994716 GHF9746 NY PAS 08/10/2013 40 SDN HONDA P 0 ... NaN NaN NaN NaN NaN 56 Stopping, standing or parking alongside a barr... $115 $115 115.0
9982 1349994741 HX55885 NY PAS 08/09/2013 70 SDN VOLKS P 21290 ... NaN NaN NaN NaN NaN 91 Parking in order to sell a vehicle by a person... $65 $45 45.0
9983 1349994753 HX55885 NY PAS 08/09/2013 71 SDN VOLKS P 21290 ... NaN NaN NaN NaN NaN 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
9984 1349994807 TME33R NJ PAS 08/09/2013 78 TRLR NaN P 0 ... NaN NaN NaN NaN NaN 04 Vehicles parked illegally south of Houston Str... $115 NaN NaN
9985 1349994819 98166JT 99 COM 08/09/2013 78 VAN FORD P 0 ... NaN NaN NaN NaN NaN 04 Vehicles parked illegally south of Houston Str... $115 NaN NaN
9986 1349995514 EUE8098 99 PAS 07/23/2013 27 SUBN MERCU P 0 ... NaN NaN NaN NaN NaN 42 Parking in a Muni Metered space in a commercia... $65 $35 35.0
9987 1349995526 54726MC NY COM 07/23/2013 78 VAN CHEVR P 0 ... NaN NaN NaN NaN NaN 04 Vehicles parked illegally south of Houston Str... $115 NaN NaN
9988 1349995538 7AB6161 MD PAS 07/23/2013 78 SUBN GMC P 0 ... NaN NaN NaN NaN NaN 04 Vehicles parked illegally south of Houston Str... $115 NaN NaN
9989 1349995551 66247MC NY COM 07/25/2013 78 DUMP MACK P 44690 ... NaN NaN NaN NaN NaN 04 Vehicles parked illegally south of Houston Str... $115 NaN NaN
9990 1349995654 GBZ2160 NY PAS 08/14/2013 71 SDN NISSA P 24490 ... NaN NaN NaN NaN NaN 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
9991 1349995666 HST2419 PA PAS 08/14/2013 40 SDN BMW P 24490 ... NaN NaN NaN NaN NaN 56 Stopping, standing or parking alongside a barr... $115 $115 115.0
9992 1349995678 GHL3189 NY PAS 08/04/2013 71 SUBN CHRYS P 23090 ... NaN NaN NaN NaN NaN 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
9993 1349995691 GAX3159 NY PAS 08/04/2013 71 SDN NISSA P 23090 ... NaN NaN NaN NaN NaN 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
9994 1349995721 GBJ4159 NY PAS 07/27/2013 71 SDN TOYOT P 25935 ... NaN NaN NaN NaN NaN 92 Parking in order to wash, grease, or repair a ... $65 $45 45.0
9995 1349995733 JDD3205 PA PAS 07/13/2013 40 SDN NISSA P 24740 ... NaN NaN NaN NaN NaN 56 Stopping, standing or parking alongside a barr... $115 $115 115.0
9996 1349995745 GFD4344 NY PAS 07/13/2013 40 SDN CHRYS P 29090 ... NaN NaN NaN NaN NaN 56 Stopping, standing or parking alongside a barr... $115 $115 115.0
9997 1349995836 GCG2966 NY PAS 08/07/2013 74 SDN NISSA P 24090 ... NaN NaN NaN NaN NaN 96 Standing or parking within 50 feet of the near... $95 $95 95.0
9998 1349995850 GGH5393 NY PAS 08/04/2013 74 SDN HYUND P 0 ... NaN NaN NaN NaN NaN 96 Standing or parking within 50 feet of the near... $95 $95 95.0

9999 rows × 48 columns


In [48]:
# joining with the violation dataset

new_data = test3_df.join(df_code, on='Violation Code', how='left')

In [49]:
new_data['new_areas'].sum()


Out[49]:
614500.0

9. What's the most lucrative kind of parking violation? The most frequent?


In [50]:
new_data.columns


Out[50]:
Index(['Summons Number', 'Plate ID', 'Registration State', 'Plate Type',
       'Issue Date', 'Violation Code', 'Vehicle Body Type', 'Vehicle Make',
       'Issuing Agency', 'Street Code1', 'Street Code2', 'Street Code3',
       'Vehicle Expiration Date', 'Violation Location', 'Violation Precinct',
       'Issuer Precinct', 'Issuer Code', 'Issuer Command', 'Issuer Squad',
       'Violation Time', 'Time First Observed', 'Violation County',
       'Violation In Front Of Or Opposite', 'House Number', 'Street Name',
       'Intersecting Street', 'Date First Observed', 'Law Section',
       'Sub Division', 'Violation Legal Code', 'Days Parking In Effect    ',
       'From Hours In Effect', 'To Hours In Effect', 'Vehicle Color',
       'Unregistered Vehicle?', 'Vehicle Year', 'Meter Number',
       'Feet From Curb', 'Violation Post Code', 'Violation Description',
       'No Standing or Stopping Violation', 'Hydrant Violation',
       'Double Parking Violation', 'CODE', 'DEFINITION',
       'Manhattan  96th St. & below', 'All Other Areas', 'new_areas'],
      dtype='object')

In [51]:
new_data['Violation Code'].value_counts()


Out[51]:
21    1894
46    1366
14     987
20     745
40     634
19     567
71     487
78     382
24     329
98     272
17     271
66     254
74     205
70     143
67     138
80     129
45     109
51      91
16      86
27      82
50      81
38      79
99      74
41      67
48      57
85      48
84      43
37      41
83      34
61      29
      ... 
31      15
63      14
72       9
34       9
77       8
60       7
75       6
52       6
65       6
94       6
6        5
39       4
86       4
95       3
18       3
82       3
79       3
13       3
73       2
90       2
89       1
49       1
4        1
76       1
92       1
56       1
5        1
29       1
55       1
8        1
Name: Violation Code, dtype: int64

In [52]:
new_data[new_data['Violation Code'] == 21]


Out[52]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation CODE DEFINITION Manhattan  96th St. & below All Other Areas new_areas
67 1294364133 4BXK89 MD PAS 08/13/2013 21 SDN CHEVR S 56590 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
68 1294364145 IHR526 SC PAS 08/13/2013 21 SDN INFIN S 50303 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
104 1295357240 T624858C NY PAS 07/22/2013 21 SUBN TOYOT X 28790 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
144 1305348461 WVE5014 VA PAS 08/16/2013 21 SDN TOYOT S 54130 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
155 1306755700 Y309CZ NY PAS 08/02/2013 21 SUBN NaN S 52490 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
156 1306757605 T506622C NY SRF 08/06/2013 21 SDN LINCO S 54580 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
157 1306757629 561CVB NJ PAS 08/06/2013 21 SUBN BMW S 45990 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
158 1306757850 HFB706 SC PAS 08/02/2013 21 SDN TOYOT S 53190 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
159 1306757861 5CA14E MI PAS 08/02/2013 21 SDN HONDA S 53190 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
160 1306759237 GHN2704 NY PAS 08/02/2013 21 SUBN JEEP S 61090 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
161 1306759249 GEK4279 NY PAS 08/02/2013 21 SDN BMW S 23180 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
162 1306759250 G652391 IL PAS 08/16/2013 21 SDN NISSA S 23290 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
163 1306759262 GDL7143 NY PAS 08/16/2013 21 SDN BMW S 23180 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
164 1306759353 GFH7323 NY PAS 07/22/2013 21 SDN NISSA S 24340 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
165 1306759432 JLE4572 PA PAS 07/23/2013 21 SDN NISSA S 24540 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
166 1306759444 GHT3778 NY PAS 07/23/2013 21 SDN AUDI S 24540 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
167 1306759481 HSW7971 PA PAS 07/24/2013 21 SUBN MAZDA S 24040 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
168 1306771560 T615425C NY SRF 08/19/2013 21 SDN FORD S 40930 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
169 1306771572 WUV6389 VA PAS 08/19/2013 21 SUBN GMC S 12330 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
170 1306771584 XAPL92 NJ PAS 08/19/2013 21 VAN FORD S 14080 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
171 1306771596 GFF5181 NY PAS 08/19/2013 21 SUBN CHEVR S 12330 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
172 1306771614 62JH904 CA PAS 08/19/2013 21 SUBN ROVER S 12330 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
174 1306816531 GBJ1802 NY PAS 07/23/2013 21 NHL HONDA S 38690 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
176 1307045157 H892515 IL PAS 07/26/2013 21 VAN FORD P 44330 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
178 1307373021 X6672Z NJ PAS 07/26/2013 21 DELV NaN S 54070 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
208 1321434807 48686MA NY COM 08/16/2013 21 VAN CHEVR X 34820 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
213 1321931773 609ZLS CO PAS 08/21/2013 21 SDN TOYOT S 42430 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
214 1321931785 NNT925 TX PAS 08/21/2013 21 SUBN TOYOT S 58330 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
215 1321931797 YVF78P NJ PAS 08/21/2013 21 SDN TOYOT S 58330 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
216 1321931920 GDR9503 NY PAS 08/20/2013 21 SDN CHRYS S 19230 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9181 1349174725 GFN9532 NY PAS 07/26/2013 21 SUBN ACURA P 10620 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9187 1349175481 546089 RI PAS 07/29/2013 21 SDN NaN P 0 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9188 1349175500 6XPA903 CA PAS 07/29/2013 21 SDN BMW P 0 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9189 1349175559 GHV3037 NY PAS 08/12/2013 21 SDN NISSA P 57720 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9191 1349176539 GEX3395 NY PAS 07/22/2013 21 SDN HONDA P 0 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9192 1349176734 JGM7318 PA PAS 07/22/2013 21 NaN NaN P 10620 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9193 1349176746 GCX3539 NY PAS 07/22/2013 21 SUBN TOYOT P 10620 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9194 1349176771 9AS9241 MD PAS 07/22/2013 21 NaN FORD P 13320 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9197 1349177301 UDM77D NJ PAS 08/12/2013 21 SDN NISSA P 57720 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9224 1349201790 J66C5C NJ PAS 08/12/2013 21 SDN DODGE P 11995 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9237 1349207240 GEX4660 NY PAS 07/25/2013 21 SDN DODGE P 77150 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9405 1350134090 GGF4586 99 PAS 08/23/2013 21 SDN HONDA P 59890 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9430 1350153771 6V79H NY SRF 07/30/2013 21 TAXI TOYOT X 33720 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9444 1350156024 W16BWP NJ PAS 07/30/2013 21 SDN ACURA X 24830 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9563 1349356372 T61721ZC NY PAS 07/18/2013 21 SDN FORD P 0 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9564 1349356384 GFC6745 NY PAS 07/25/2013 21 NaN DODGE P 23230 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9565 1349356402 GJC7385 NY PAS 07/25/2013 21 SDN DODGE P 23230 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9573 1349364691 GCK7266 NY PAS 08/05/2013 21 SDN VOLKS P 73110 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9574 1349365300 GHS1062 NY PAS 08/05/2013 21 SDN FORD P 0 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9575 1349365336 GFR9539 NY PAS 08/05/2013 21 SDN INFIN P 0 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9576 1349365397 EXP6385 NY 999 07/23/2013 21 SDN SUBAR P 58030 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9670 1349981850 GEF7552 NY PAS 07/25/2013 21 SDN CHRYS P 64790 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9698 1349983901 EJR6466 NY PAS 07/19/2013 21 NaN NISSA P 64790 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9777 1350163430 GAR3599 NY PAS 07/16/2013 21 NaN HONDA X 0 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9778 1350163442 GCG9716 NY PAS 07/16/2013 21 SUBN TOYOT X 5680 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9779 1350163480 P717BL NY PAS 08/06/2013 21 SDN CHEVR P 5380 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9796 1350165463 GBE2667 NY PAS 07/30/2013 21 SDN AUDI X 24230 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9802 1350166145 GGM9359 NY PAS 08/01/2013 21 SDN FORD P 0 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9860 1349503988 51208KA NY COM 07/18/2013 21 VAN GMC P 41330 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0
9893 1349517197 GAR7221 NY PAS 07/30/2013 21 SDN ME/BE P 23230 ... NaN NaN NaN NaN NaN 34 Expired Meter -- parking in a metered space wh... $65 $35 35.0

1894 rows × 48 columns


In [53]:
most_frequent = new_data[new_data['Violation Code'] == 21].head(1)

In [54]:
print("The most frequent violation is", most_frequent['DEFINITION'])


The most frequent violation is 67    Expired Meter -- parking in a metered space wh...
Name: DEFINITION, dtype: object

In [55]:
columns_to_show = ['Violation Code','new_areas']
new_data[columns_to_show]


Out[55]:
Violation Code new_areas
0 46 45.0
1 46 45.0
2 46 45.0
3 46 45.0
4 41 65.0
5 14 115.0
6 14 115.0
7 24 60.0
8 24 60.0
9 24 60.0
10 24 60.0
11 24 60.0
12 24 60.0
13 24 60.0
14 14 115.0
15 20 35.0
16 14 115.0
17 20 35.0
18 14 115.0
19 20 35.0
20 17 95.0
21 50 45.0
22 50 45.0
23 40 115.0
24 40 115.0
25 40 115.0
26 14 115.0
27 20 35.0
28 40 115.0
29 51 165.0
... ... ...
9969 71 45.0
9970 91 NaN
9971 78 NaN
9972 71 45.0
9973 78 NaN
9974 74 95.0
9975 71 45.0
9976 78 NaN
9977 71 45.0
9978 98 NaN
9979 52 65.0
9980 40 115.0
9981 40 115.0
9982 70 45.0
9983 71 45.0
9984 78 NaN
9985 78 NaN
9986 27 35.0
9987 78 NaN
9988 78 NaN
9989 78 NaN
9990 71 45.0
9991 40 115.0
9992 71 45.0
9993 71 45.0
9994 71 45.0
9995 40 115.0
9996 40 115.0
9997 74 95.0
9998 74 95.0

9999 rows × 2 columns


In [56]:
lucrative_df = new_data[columns_to_show]

In [57]:
freq_df = new_data
#df.sort_values('length', ascending=False).head(3)

In [58]:
lucrative_df.groupby('Violation Code')['new_areas'].sum().sort_values(ascending=False)


Out[58]:
Violation Code
14    113505.0
40     72910.0
21     66290.0
46     61470.0
80     50310.0
20     26075.0
17     25745.0
71     21915.0
19     19845.0
24     19740.0
74     19475.0
16     15480.0
51     15015.0
66     11430.0
38      9085.0
67      8970.0
70      6435.0
48      5415.0
45      4905.0
37      4715.0
41      4355.0
83      3910.0
50      3645.0
61      3335.0
27      2870.0
68      1955.0
47      1805.0
31      1725.0
63      1330.0
62      1260.0
        ...   
60       455.0
73       400.0
65       390.0
52       390.0
18       345.0
79       345.0
82       345.0
75       270.0
13       180.0
8        115.0
4        115.0
76        95.0
49        95.0
5         95.0
56        65.0
55        65.0
29        35.0
77         NaN
78         NaN
84         NaN
85         NaN
86         NaN
89         NaN
90         NaN
91         NaN
92         NaN
94         NaN
95         NaN
98         NaN
99         NaN
Name: new_areas, dtype: float64

In [59]:
new_data[new_data['Violation Code'] == 14].head(1)


Out[59]:
Summons Number Plate ID Registration State Plate Type Issue Date Violation Code Vehicle Body Type Vehicle Make Issuing Agency Street Code1 ... Violation Post Code Violation Description No Standing or Stopping Violation Hydrant Violation Double Parking Violation CODE DEFINITION Manhattan  96th St. & below All Other Areas new_areas
5 1283294217 T60DAR NJ PAS 08/11/2013 14 P-U DODGE P 37250 ... NaN NaN NaN NaN NaN 25 Standing at a commuter van stop, other than te... $115 $115 115.0

1 rows × 48 columns


In [60]:
most_lucrative = new_data[new_data['Violation Code'] == 14].head(1)

In [61]:
print("The most lucrative is Violation Code 14 which corresponds to", most_frequent['DEFINITION'])


The most lucrative is Violation Code 14 which corresponds to 67    Expired Meter -- parking in a metered space wh...
Name: DEFINITION, dtype: object

10. New Jersey has bad drivers, but does it have bad parkers, too? How much money does NYC make off of all non-New York vehicles?


In [62]:
columns_to_show = ['Registration State','new_areas']
new_data[columns_to_show]


Out[62]:
Registration State new_areas
0 NY 45.0
1 NY 45.0
2 NY 45.0
3 NY 45.0
4 NY 65.0
5 NJ 115.0
6 NY 115.0
7 NJ 60.0
8 NY 60.0
9 NJ 60.0
10 NY 60.0
11 NY 60.0
12 NY 60.0
13 NY 60.0
14 NY 115.0
15 NY 35.0
16 NY 115.0
17 MD 35.0
18 NY 115.0
19 NC 35.0
20 LA 95.0
21 NY 45.0
22 NJ 45.0
23 NY 115.0
24 IL 115.0
25 NY 115.0
26 NY 115.0
27 PA 35.0
28 NY 115.0
29 NY 165.0
... ... ...
9969 NY 45.0
9970 TX NaN
9971 TX NaN
9972 NY 45.0
9973 NY NaN
9974 NY 95.0
9975 NY 45.0
9976 NY NaN
9977 NY 45.0
9978 NY NaN
9979 NY 65.0
9980 NY 115.0
9981 NY 115.0
9982 NY 45.0
9983 NY 45.0
9984 NJ NaN
9985 99 NaN
9986 99 35.0
9987 NY NaN
9988 MD NaN
9989 NY NaN
9990 NY 45.0
9991 PA 115.0
9992 NY 45.0
9993 NY 45.0
9994 NY 45.0
9995 PA 115.0
9996 NY 115.0
9997 NY 95.0
9998 NY 95.0

9999 rows × 2 columns


In [63]:
df_reg = new_data[columns_to_show]

In [64]:
df_reg[df_reg['Registration State'] != "NY"]


Out[64]:
Registration State new_areas
5 NJ 115.0
7 NJ 60.0
9 NJ 60.0
17 MD 35.0
19 NC 35.0
20 LA 95.0
22 NJ 45.0
24 IL 115.0
27 PA 35.0
30 ID 35.0
31 ME 115.0
32 CT 115.0
33 99 NaN
34 IL NaN
37 PA 65.0
38 VA 115.0
43 NB 35.0
44 PA 95.0
47 NJ 115.0
51 99 NaN
52 CT 35.0
53 99 35.0
54 IL 35.0
57 NJ 115.0
66 NJ 45.0
67 MD 35.0
68 SC 35.0
69 NJ 95.0
70 99 95.0
71 CT 95.0
... ... ...
9902 AZ 115.0
9903 WI 115.0
9904 NJ 45.0
9910 PA 115.0
9915 AL NaN
9916 NJ NaN
9919 PA 45.0
9920 PA NaN
9924 IN 35.0
9925 TN 115.0
9926 NJ 115.0
9930 TX NaN
9938 99 45.0
9943 99 45.0
9944 99 45.0
9945 99 45.0
9948 99 NaN
9951 99 NaN
9952 MD 35.0
9957 99 35.0
9965 CA 45.0
9968 IL NaN
9970 TX NaN
9971 TX NaN
9984 NJ NaN
9985 99 NaN
9986 99 35.0
9988 MD NaN
9991 PA 115.0
9995 PA 115.0

3540 rows × 2 columns


In [65]:
df_nonNY = df_reg[df_reg['Registration State'] != "NY"]

In [66]:
print("The total money that NYC make off of all non-New York vehicles is", df_nonNY['new_areas'].sum())


The total money that NYC make off of all non-New York vehicles is 236795.0

11. Make a chart of the top few.


In [67]:
df_nonNY.groupby('Registration State')['new_areas'].sum().sort_values(ascending=False).head(10)


Out[67]:
Registration State
NJ    57650.0
PA    36965.0
99    16400.0
TN    14655.0
CT    12085.0
IL    11400.0
VA    10965.0
FL    10610.0
CA     9785.0
MA     7435.0
Name: new_areas, dtype: float64

In [68]:
import matplotlib.pyplot as plt

In [69]:
%matplotlib inline

In [70]:
df_nonNY.groupby('Registration State')['new_areas'].sum().sort_values(ascending=False).head(10).plot(kind='bar',x='Registration State', color='green')


Out[70]:
<matplotlib.axes._subplots.AxesSubplot at 0x8010b50>

12. What time of day do people usually get their tickets? You can break the day up into several blocks - for example 12am-6am, 6am-12pm, 12pm-6pm, 6pm-12am.


In [71]:
new_data.columns


Out[71]:
Index(['Summons Number', 'Plate ID', 'Registration State', 'Plate Type',
       'Issue Date', 'Violation Code', 'Vehicle Body Type', 'Vehicle Make',
       'Issuing Agency', 'Street Code1', 'Street Code2', 'Street Code3',
       'Vehicle Expiration Date', 'Violation Location', 'Violation Precinct',
       'Issuer Precinct', 'Issuer Code', 'Issuer Command', 'Issuer Squad',
       'Violation Time', 'Time First Observed', 'Violation County',
       'Violation In Front Of Or Opposite', 'House Number', 'Street Name',
       'Intersecting Street', 'Date First Observed', 'Law Section',
       'Sub Division', 'Violation Legal Code', 'Days Parking In Effect    ',
       'From Hours In Effect', 'To Hours In Effect', 'Vehicle Color',
       'Unregistered Vehicle?', 'Vehicle Year', 'Meter Number',
       'Feet From Curb', 'Violation Post Code', 'Violation Description',
       'No Standing or Stopping Violation', 'Hydrant Violation',
       'Double Parking Violation', 'CODE', 'DEFINITION',
       'Manhattan  96th St. & below', 'All Other Areas', 'new_areas'],
      dtype='object')

In [72]:
test2_df['Violation Time'].head()


Out[72]:
0    07:52 AM
1    12:40 PM
2    12:43 PM
3    14:32 PM
4    12:39 PM
Name: Violation Time, dtype: object

In [73]:
#new_data['Violation Time'] = test2_df['Violation Time']

In [74]:
violate_df['Violation Time']


Out[74]:
0    0752A
1    1240P
2    1243P
3    0232P
4    1239P
5    0617P
6    0741P
7    0425A
8    0437A
9    0839A
Name: Violation Time, dtype: object

In [ ]:


In [75]:
type(new_data['Violation Time'][0])


Out[75]:
str

In [76]:
v_time = new_data['Violation Time']

In [77]:
v_time.head(10)


Out[77]:
0    0752A
1    1240P
2    1243P
3    0232P
4    1239P
5    0617P
6    0741P
7    0425A
8    0437A
9    0839A
Name: Violation Time, dtype: object

In [78]:
def vio_date(time_str):
    parsed_date = dateutil.parser.parse(time_str)
    date_vio = parsed_date.strftime("%H:%M %p")
    return date_vio
    #return parsed_date.hour
print(vio_date("12:32 PM"))


12:32 PM

In [79]:
# 12am-6am, 6am-12pm, 12pm-6pm, 6pm-12am
count1 = []
count2 = []
count3 = []
count4 = []
count5 = []
z = [i for i in v_time]
#print(z)
for i in z:
    if i != None:
        #print(i)
        try:
            vio_date(i)
            #print(type(i))
            #print("finished printing i")
        except:
            pass
        

#print(type(z[0]))
for item in z:
    item = str(item)
    #print(type(item))
    
    if item < "06.00 AM":
        count1.append(item)
        #print(len(count1))
    if item < "12.00 PM":
        count2.append(item)
        
    if item < "06.00 PM":
        count3.append(item)
        
    if item < "12.00 AM":
        count4.append(item)
        

    #else:
        #count5.append(item)
#print(len(count5))
print(len(count4))
print(len(count3))
print(len(count2))
print(len(count1))


9101
3045
9101
3045

13. What's the average ticket cost in NYC?


In [80]:
#gives the Registration State wise ticket cost (new_areas) 
df_reg


Out[80]:
Registration State new_areas
0 NY 45.0
1 NY 45.0
2 NY 45.0
3 NY 45.0
4 NY 65.0
5 NJ 115.0
6 NY 115.0
7 NJ 60.0
8 NY 60.0
9 NJ 60.0
10 NY 60.0
11 NY 60.0
12 NY 60.0
13 NY 60.0
14 NY 115.0
15 NY 35.0
16 NY 115.0
17 MD 35.0
18 NY 115.0
19 NC 35.0
20 LA 95.0
21 NY 45.0
22 NJ 45.0
23 NY 115.0
24 IL 115.0
25 NY 115.0
26 NY 115.0
27 PA 35.0
28 NY 115.0
29 NY 165.0
... ... ...
9969 NY 45.0
9970 TX NaN
9971 TX NaN
9972 NY 45.0
9973 NY NaN
9974 NY 95.0
9975 NY 45.0
9976 NY NaN
9977 NY 45.0
9978 NY NaN
9979 NY 65.0
9980 NY 115.0
9981 NY 115.0
9982 NY 45.0
9983 NY 45.0
9984 NJ NaN
9985 99 NaN
9986 99 35.0
9987 NY NaN
9988 MD NaN
9989 NY NaN
9990 NY 45.0
9991 PA 115.0
9992 NY 45.0
9993 NY 45.0
9994 NY 45.0
9995 PA 115.0
9996 NY 115.0
9997 NY 95.0
9998 NY 95.0

9999 rows × 2 columns


In [81]:
df_reg.describe()


c:\users\radhika\appdata\local\programs\python\python35-32\lib\site-packages\numpy\lib\function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
Out[81]:
new_areas
count 9134.000000
mean 67.276111
std 52.092458
min 35.000000
25% NaN
50% NaN
75% NaN
max 390.000000

14. Make a graph of the number of tickets per day.


In [82]:
new_data.columns


Out[82]:
Index(['Summons Number', 'Plate ID', 'Registration State', 'Plate Type',
       'Issue Date', 'Violation Code', 'Vehicle Body Type', 'Vehicle Make',
       'Issuing Agency', 'Street Code1', 'Street Code2', 'Street Code3',
       'Vehicle Expiration Date', 'Violation Location', 'Violation Precinct',
       'Issuer Precinct', 'Issuer Code', 'Issuer Command', 'Issuer Squad',
       'Violation Time', 'Time First Observed', 'Violation County',
       'Violation In Front Of Or Opposite', 'House Number', 'Street Name',
       'Intersecting Street', 'Date First Observed', 'Law Section',
       'Sub Division', 'Violation Legal Code', 'Days Parking In Effect    ',
       'From Hours In Effect', 'To Hours In Effect', 'Vehicle Color',
       'Unregistered Vehicle?', 'Vehicle Year', 'Meter Number',
       'Feet From Curb', 'Violation Post Code', 'Violation Description',
       'No Standing or Stopping Violation', 'Hydrant Violation',
       'Double Parking Violation', 'CODE', 'DEFINITION',
       'Manhattan  96th St. & below', 'All Other Areas', 'new_areas'],
      dtype='object')

In [83]:
# parsing to the daytime format. did earlier (dateutil.parser.parse(violate_df['Issue Date'][0])
new_data['Issue Date'] = test3_df['Issue Date']

In [84]:
new_data['Issue Date'].value_counts().head(10)


Out[84]:
07/30/2013    435
08/05/2013    381
07/29/2013    369
07/23/2013    365
08/02/2013    338
07/31/2013    338
08/01/2013    334
07/25/2013    331
07/22/2013    329
07/26/2013    329
Name: Issue Date, dtype: int64

In [85]:
new_data['Issue Date'].value_counts().head(10).plot(kind='bar',x='Issue Date', color='orange')


Out[85]:
<matplotlib.axes._subplots.AxesSubplot at 0x90b9a70>

15. Make a graph of the amount of revenue collected per day.


In [86]:
# since new data issue date is showing so many nan. i am going back to old data
new_data['Issue Date'] = test3_df['Issue Date']

In [87]:
new_data['Issue Date']


Out[87]:
0       08/04/2013
1       08/04/2013
2       08/05/2013
3       08/05/2013
4       08/08/2013
5       08/11/2013
6       08/11/2013
7       08/07/2013
8       08/07/2013
9       07/18/2013
10      07/18/2013
11      07/18/2013
12      07/31/2013
13      08/12/2013
14      07/05/2013
15      08/12/2013
16      07/26/2013
17      07/30/2013
18      07/23/2013
19      07/29/2013
20      08/07/2013
21      07/20/2013
22      07/18/2013
23      07/20/2013
24      08/10/2013
25      07/26/2013
26      07/19/2013
27      08/06/2013
28      08/07/2013
29      08/06/2013
           ...    
9969    07/23/2013
9970    08/13/2013
9971    07/16/2013
9972    07/16/2013
9973    08/15/2013
9974    08/13/2013
9975    08/07/2013
9976    07/31/2013
9977    07/27/2013
9978    07/17/2013
9979    07/15/2013
9980    07/21/2013
9981    08/10/2013
9982    08/09/2013
9983    08/09/2013
9984    08/09/2013
9985    08/09/2013
9986    07/23/2013
9987    07/23/2013
9988    07/23/2013
9989    07/25/2013
9990    08/14/2013
9991    08/14/2013
9992    08/04/2013
9993    08/04/2013
9994    07/27/2013
9995    07/13/2013
9996    07/13/2013
9997    08/07/2013
9998    08/04/2013
Name: Issue Date, dtype: object

In [88]:
columns_to_show = ['Issue Date','new_areas']
new_data[columns_to_show].head()


Out[88]:
Issue Date new_areas
0 08/04/2013 45.0
1 08/04/2013 45.0
2 08/05/2013 45.0
3 08/05/2013 45.0
4 08/08/2013 65.0

In [89]:
new_data.groupby('Issue Date')['new_areas'].sum().sort_values(ascending=False).head(10).plot(kind='bar',x='Issue Date', color='green')


Out[89]:
<matplotlib.axes._subplots.AxesSubplot at 0x9114410>

16. Manually construct a dataframe out of https://dmv.ny.gov/statistic/2015licinforce-web.pdf (only NYC boroughts - bronx, queens, manhattan, staten island, brooklyn), having columns for borough name, abbreviation, and number of licensed drivers.


In [90]:
df = pd.read_csv("borough.csv")
df


Out[90]:
County Male Female Total
0 ALBANY 103989 107643 211632
1 ALLEGANY 16339 16245 32584
2 BROOME 67743 70054 137797
3 CATTARAUGUS 27550 27894 55444
4 CAYUGA 26028 26582 52610
5 CHAUTAUQUA 45279 46457 91736
6 CHEMUNG 29343 31379 60722
7 CHENANGO 18576 18434 37010
8 CLINTON 28429 29179 57608
9 COLUMBIA 24110 23843 47953
10 CORTLAND 15662 16169 31831
11 DELAWARE 17213 16425 33638
12 DUTCHESS 105240 108922 214162
13 ERIE 320845 337476 658321
14 ESSEX 13601 13660 27261
15 FRANKLIN 16898 17025 33923
16 FULTON 19350 19520 38870
17 GENESEE 21612 22203 43815
18 GREENE 18952 18636 37588
19 HAMILTON 2260 2180 4440
20 HERKIMER 22473 22628 45101
21 JEFFERSON 36600 37400 74000
22 LEWIS 9232 9098 18330
23 LIVINGSTON 21865 22768 44633
24 MADISON 24166 24965 49131
25 MONROE 249920 269103 519023
26 MONTGOMERY 18126 18595 36721
27 NASSAU 506652 515853 1022505
28 NIAGARA 77994 81298 159292
29 ONEIDA 78643 80965 159608
... ... ... ... ...
35 OTSEGO 21376 21980 43356
36 PUTNAM 40111 39833 79944
37 RENSSELAER 56498 58010 114508
38 ROCKLAND 107645 104571 212216
39 ST LAWRENCE 36279 36966 73245
40 SARATOGA 87076 90711 177787
41 SCHENECTADY 53935 55750 109685
42 SCHOHARIE 10800 10410 21210
43 SCHUYLER 7220 7209 14429
44 SENECA 11760 12207 23967
45 STEUBEN 35396 35773 71169
46 SUFFOLK 553668 564736 1118404
47 SULLIVAN 28560 26844 55404
48 TIOGA 19246 19613 38859
49 TOMPKINS 30826 32310 63136
50 ULSTER 67684 68551 136235
51 WARREN 25622 26690 52312
52 WASHINGTON 21466 21747 43213
53 WAYNE 34483 35363 69846
54 WESTCHESTER 327814 335904 663718
55 WYOMING 14758 14799 29557
56 YATES 8158 8330 16488
57 Outside NYC 3968875 4075599 8044474
58 BRONX 259845 197002 456847
59 KINGS 555917 427852 983769
60 NEW YORK 399606 348977 748583
61 QUEENS 631912 502152 1134064
62 RICHMOND 156339 147911 304250
63 NYC 2003619 1623894 3627513
64 NYS 5972494 5699493 11671987

65 rows × 4 columns


In [91]:
# bronx, queens, manhattan, staten island, brooklyn o

df[58: 63]


Out[91]:
County Male Female Total
58 BRONX 259845 197002 456847
59 KINGS 555917 427852 983769
60 NEW YORK 399606 348977 748583
61 QUEENS 631912 502152 1134064
62 RICHMOND 156339 147911 304250

In [92]:
NYC = df[58: 63]

In [93]:
NYC['code'] = ["BX", "K", "NYC", "Q", "R"]


c:\users\radhika\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

In [94]:
NYC


Out[94]:
County Male Female Total code
58 BRONX 259845 197002 456847 BX
59 KINGS 555917 427852 983769 K
60 NEW YORK 399606 348977 748583 NYC
61 QUEENS 631912 502152 1134064 Q
62 RICHMOND 156339 147911 304250 R

17. What's the parking-ticket-$-per-licensed-driver in each borough of NYC? Do this with pandas and the dataframe you just made, not with your head!


In [95]:
columns_to_show = ['Violation County','new_areas']
new_data[columns_to_show]


Out[95]:
Violation County new_areas
0 NaN 45.0
1 NY 45.0
2 NY 45.0
3 NY 45.0
4 NY 65.0
5 NY 115.0
6 NY 115.0
7 K 60.0
8 K 60.0
9 K 60.0
10 K 60.0
11 K 60.0
12 K 60.0
13 K 60.0
14 NaN 115.0
15 K 35.0
16 K 115.0
17 K 35.0
18 K 115.0
19 K 35.0
20 K 95.0
21 K 45.0
22 NY 45.0
23 NY 115.0
24 K 115.0
25 K 115.0
26 K 115.0
27 K 35.0
28 NY 115.0
29 K 165.0
... ... ...
9969 Q 45.0
9970 Q NaN
9971 Q NaN
9972 Q 45.0
9973 Q NaN
9974 Q 95.0
9975 Q 45.0
9976 Q NaN
9977 Q 45.0
9978 Q NaN
9979 Q 65.0
9980 Q 115.0
9981 Q 115.0
9982 Q 45.0
9983 Q 45.0
9984 Q NaN
9985 Q NaN
9986 NaN 35.0
9987 Q NaN
9988 Q NaN
9989 Q NaN
9990 Q 45.0
9991 Q 115.0
9992 Q 45.0
9993 Q 45.0
9994 Q 45.0
9995 Q 115.0
9996 Q 115.0
9997 Q 95.0
9998 Q 95.0

9999 rows × 2 columns


In [96]:
columns_to_show = ['Violation County','new_areas']
new_data[columns_to_show]


Out[96]:
Violation County new_areas
0 NaN 45.0
1 NY 45.0
2 NY 45.0
3 NY 45.0
4 NY 65.0
5 NY 115.0
6 NY 115.0
7 K 60.0
8 K 60.0
9 K 60.0
10 K 60.0
11 K 60.0
12 K 60.0
13 K 60.0
14 NaN 115.0
15 K 35.0
16 K 115.0
17 K 35.0
18 K 115.0
19 K 35.0
20 K 95.0
21 K 45.0
22 NY 45.0
23 NY 115.0
24 K 115.0
25 K 115.0
26 K 115.0
27 K 35.0
28 NY 115.0
29 K 165.0
... ... ...
9969 Q 45.0
9970 Q NaN
9971 Q NaN
9972 Q 45.0
9973 Q NaN
9974 Q 95.0
9975 Q 45.0
9976 Q NaN
9977 Q 45.0
9978 Q NaN
9979 Q 65.0
9980 Q 115.0
9981 Q 115.0
9982 Q 45.0
9983 Q 45.0
9984 Q NaN
9985 Q NaN
9986 NaN 35.0
9987 Q NaN
9988 Q NaN
9989 Q NaN
9990 Q 45.0
9991 Q 115.0
9992 Q 45.0
9993 Q 45.0
9994 Q 45.0
9995 Q 115.0
9996 Q 115.0
9997 Q 95.0
9998 Q 95.0

9999 rows × 2 columns


In [97]:
county = new_data[columns_to_show]

In [98]:
county.groupby('Violation County')['new_areas'].sum().sort_values(ascending=False).head(10)


Out[98]:
Violation County
K     172915.0
Q     148235.0
BX    134670.0
NY    129845.0
R       4875.0
Name: new_areas, dtype: float64

In [ ]: