Hybrid data representation using Blaze

Import the Blaze.



In [1]:

    
import blaze as bl

Abstract data

Working with NumPy array

Let's create a simple NumPy array: we first load NumPy and then create a matrix with two rows and three columns.



In [2]:

    
import numpy as np

simpleArray = np.array([
        [1,2,3],
        [4,5,6]
    ])

Now that we have an array we can abstract it with Blaze's Data structure.



In [3]:

    
simpleData_np = bl.Data(simpleArray)

In order to peek inside the structure you can either use .peek() method



In [4]:

    
simpleData_np.peek()









    Out[4]:





array([[1, 2, 3],
       [4, 5, 6]])

or use (familiar to those of you versed in pandas' syntax) the .head(...) method



In [5]:

    
simpleData_np.head(1)









    Out[5]:




array([[1, 2, 3]])

If you want to retrieve the first column you can use indexing.



In [6]:

    
simpleData_np[0]

If you want to retrieve columns you have to transpose your DataShape.



In [7]:

    
simpleData_np.T[0]

Let's specify names of our fields.



In [8]:

    
simpleData_np = bl.Data(simpleArray, fields=['a', 'b', 'c'])

You can now retrieve the data simply by calling the column by it's name; let's retrieve column 'b'



In [9]:

    
simpleData_np['b']

Working with pandas DataFrame

Start by importing pandas.



In [10]:

    
import pandas as pd

Next, we create a DataFrame.



In [11]:

    
simpleDf = pd.DataFrame([
        [1,2,3],
        [4,5,6]
    ], columns=['a','b','c'])

and transform it into a DataShape.



In [12]:

    
simpleData_df = bl.Data(simpleDf)

You can retrieve data in the same manner as with the DataShape created from the NumPy array.



In [13]:

    
simpleData_df['a']

Working with files

DataShapes can be created directly from a CSV file.



In [3]:

    
import odo

traffic = bl.Data('../Data/TrafficViolations.csv')

To save the data into multiple archives (for each year of traffic violation) use this.



In [8]:

    
for year in traffic.Stop_year.distinct().sort():
    odo.odo(traffic[traffic.Stop_year == year], 
        '../Data/Years/TrafficViolations_{0}.csv.gz'\
        .format(year))

If you do not know the names of columns in any dataset, you can get these from the DataShape.



In [15]:

    
print(traffic.fields)









    



['Stop_month', 'Stop_day', 'Stop_year', 'Stop_hr', 'Stop_min', 'Stop_sec', 'Agency', 'SubAgency', 'Description', 'Location', 'Latitude', 'Longitude', 'Accident', 'Belts', 'Personal_Injury', 'Property_Damage', 'Fatal', 'Commercial_License', 'HAZMAT', 'Commercial_Vehicle', 'Alcohol', 'Work_Zone', 'State', 'VehicleType', 'Year', 'Make', 'Model', 'Color', 'Violation_Type', 'Charge', 'Article', 'Contributed_To_Accident', 'Race', 'Gender', 'Driver_City', 'Driver_State', 'DL_State', 'Arrest_Type', 'Geolocation']

Blaze can also read directly from a GZipped archives.



In [16]:

    
traffic_gz = bl.Data('../Data/TrafficViolations.csv.gz')

To compare we get exactly the same data let's retrieve the first two records from each structure.



In [17]:

    
traffic.head(2)









    Out[17]:





  
    
      
      Stop_month
      Stop_day
      Stop_year
      Stop_hr
      Stop_min
      Stop_sec
      Agency
      SubAgency
      Description
      Location
      Latitude
      Longitude
      Accident
      Belts
      Personal_Injury
      Property_Damage
      Fatal
      Commercial_License
      HAZMAT
      Commercial_Vehicle
      Alcohol
      Work_Zone
      State
      VehicleType
      Year
      Make
      Model
      Color
      Violation_Type
      Charge
      Article
      Contributed_To_Accident
      Race
      Gender
      Driver_City
      Driver_State
      DL_State
      Arrest_Type
      Geolocation
    
  
  
    
      0
      9
      30
      2014
      23
      51
      0
      MCP
      1st district, Rockville
      DRIVER FAILURE TO STOP AT STEADY CIRCULAR RED ...
      PARK RD AT HUNGERFORD DR
      NaN
      NaN
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2014.0
      FORD
      MUSTANG
      BLACK
      Citation
      21-202(h1)
      Transportation Article
      No
      BLACK
      M
      ROCKVILLE
      MD
      MD
      A - Marked Patrol
      NaN
    
    
      1
      3
      31
      2015
      23
      59
      0
      MCP
      2nd district, Bethesda
      HEADLIGHTS (*)
      CONNECTICUT AT METROPOLITAN AVE
      NaN
      NaN
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2003.0
      HONDA
      2S
      BLUE
      ESERO
      55*
      NaN
      No
      HISPANIC
      M
      SILVER SPRING
      MD
      MD
      A - Marked Patrol
      NaN



In [18]:

    
traffic_gz.head(2)









    Out[18]:





  
    
      
      Stop_month
      Stop_day
      Stop_year
      Stop_hr
      Stop_min
      Stop_sec
      Agency
      SubAgency
      Description
      Location
      Latitude
      Longitude
      Accident
      Belts
      Personal_Injury
      Property_Damage
      Fatal
      Commercial_License
      HAZMAT
      Commercial_Vehicle
      Alcohol
      Work_Zone
      State
      VehicleType
      Year
      Make
      Model
      Color
      Violation_Type
      Charge
      Article
      Contributed_To_Accident
      Race
      Gender
      Driver_City
      Driver_State
      DL_State
      Arrest_Type
      Geolocation
    
  
  
    
      0
      9
      30
      2014
      23
      51
      0
      MCP
      1st district, Rockville
      DRIVER FAILURE TO STOP AT STEADY CIRCULAR RED ...
      PARK RD AT HUNGERFORD DR
      NaN
      NaN
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2014.0
      FORD
      MUSTANG
      BLACK
      Citation
      21-202(h1)
      Transportation Article
      No
      BLACK
      M
      ROCKVILLE
      MD
      MD
      A - Marked Patrol
      NaN
    
    
      1
      3
      31
      2015
      23
      59
      0
      MCP
      2nd district, Bethesda
      HEADLIGHTS (*)
      CONNECTICUT AT METROPOLITAN AVE
      NaN
      NaN
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2003.0
      HONDA
      2S
      BLUE
      ESERO
      55*
      NaN
      No
      HISPANIC
      M
      SILVER SPRING
      MD
      MD
      A - Marked Patrol
      NaN

To read from multiple files you can use the asterisk *.



In [19]:

    
traffic_multiple = bl.Data(
    '../Data/Years/TrafficViolations_*.csv.gz')
traffic_multiple.head(2)









    Out[19]:





  
    
      
      Stop_month
      Stop_day
      Stop_year
      Stop_hr
      Stop_min
      Stop_sec
      Agency
      SubAgency
      Description
      Location
      Latitude
      Longitude
      Accident
      Belts
      Personal_Injury
      Property_Damage
      Fatal
      Commercial_License
      HAZMAT
      Commercial_Vehicle
      Alcohol
      Work_Zone
      State
      VehicleType
      Year
      Make
      Model
      Color
      Violation_Type
      Charge
      Article
      Contributed_To_Accident
      Race
      Gender
      Driver_City
      Driver_State
      DL_State
      Arrest_Type
      Geolocation
      Age_of_car
    
  
  
    
      0
      3
      29
      2013
      17
      34
      0
      MCP
      1st district, Rockville
      FAILURE TO DISPLAY REGISTRATION CARD UPON DEMA...
      WOOTTON PKWAY AND DARNSTOWN ROAD
      39.092008
      -77.19444
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2003.0
      NISSAN
      350Z
      SILVER
      Citation
      13-409(b)
      Transportation Article
      No
      HISPANIC
      M
      MONTGOMERY VILLAGE
      MD
      MD
      A - Marked Patrol
      (39.0920083333333, -77.19444)
      10.0
    
    
      1
      8
      12
      2013
      8
      41
      0
      MCP
      5th district, Germantown
      DRIVER CHANGING LANES WHEN UNSAFE
      RIDGE RD / OBSERVATION DR
      NaN
      NaN
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2003.0
      GMC
      YUKON
      BLACK
      Citation
      21-309(b)
      Transportation Article
      No
      BLACK
      M
      DAMASCUS
      MD
      MD
      B - Unmarked Patrol
      NaN
      10.0

In order to save traffic data for year 2013 you can call odo like this



In [20]:

    
odo.odo(traffic[traffic.Stop_year == 2013], 
    '../Data/Years/TrafficViolations_2013.csv.gz')









    Out[20]:





<odo.backends.csv.CSV at 0x1247e8f60>

Working with databases

Interacting with relational databases

Let's read the data from the PostGRE SQL database now. The URI for accessing PostGRE SQL database has the following syntax postgresql://<user_name>:<password>@<server>:<port>/<database>::<table>.



In [47]:

    
traffic_psql = bl.Data(
    'postgresql://{0}:{1}@localhost:5432/drabast::traffic'\
    .format('drabast', 'pAck7!B0ok')
)
traffic_psql.head(2)









    Out[47]:





  
    
      
      index
      Stop_month
      Stop_day
      Stop_year
      Stop_hr
      Stop_min
      Stop_sec
      Agency
      SubAgency
      Description
      Location
      Latitude
      Longitude
      Accident
      Belts
      Personal_Injury
      Property_Damage
      Fatal
      Commercial_License
      HAZMAT
      Commercial_Vehicle
      Alcohol
      Work_Zone
      State
      VehicleType
      Year
      Make
      Model
      Color
      Violation_Type
      Charge
      Article
      Contributed_To_Accident
      Race
      Gender
      Driver_City
      Driver_State
      DL_State
      Arrest_Type
      Geolocation
    
  
  
    
      0
      0
      9
      30
      2014
      23
      51
      0
      MCP
      1st district, Rockville
      DRIVER FAILURE TO STOP AT STEADY CIRCULAR RED ...
      PARK RD AT HUNGERFORD DR
      None
      None
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2014.0
      FORD
      MUSTANG
      BLACK
      Citation
      21-202(h1)
      Transportation Article
      No
      BLACK
      M
      ROCKVILLE
      MD
      MD
      A - Marked Patrol
      None
    
    
      1
      1
      3
      31
      2015
      23
      59
      0
      MCP
      2nd district, Bethesda
      HEADLIGHTS (*)
      CONNECTICUT AT METROPOLITAN AVE
      None
      None
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2003.0
      HONDA
      2S
      BLUE
      ESERO
      55*
      None
      No
      HISPANIC
      M
      SILVER SPRING
      MD
      MD
      A - Marked Patrol
      None

We will output traffic violations that involved cars manufactured in 2016 to both, PostGRE SQL and SQLite databases. We will use odo to manage the transfers.



In [25]:

    
traffic_2016 = traffic_psql[traffic_psql['Year'] == 2016]

# odo.drop('sqlite:///traffic_local.sqlite::traffic2016')
# odo.drop('postgresql://drabast:pAck7!B0ok@localhost:5432/drabast::traffic_2016')
odo.odo(traffic_2016, 'sqlite:///traffic_local.sqlite::traffic2016')
odo.odo(traffic_2016, 'postgresql://drabast:pAck7!B0ok@localhost:5432/drabast::traffic_2016')









    Out[25]:





Table('traffic_2016', MetaData(bind=Engine(postgresql://drabast:***@localhost:5432/drabast)), Column('index', BigInteger(), table=<traffic_2016>), Column('Stop_month', BigInteger(), table=<traffic_2016>), Column('Stop_day', BigInteger(), table=<traffic_2016>), Column('Stop_year', BigInteger(), table=<traffic_2016>), Column('Stop_hr', BigInteger(), table=<traffic_2016>), Column('Stop_min', BigInteger(), table=<traffic_2016>), Column('Stop_sec', BigInteger(), table=<traffic_2016>), Column('Agency', Text(), table=<traffic_2016>), Column('SubAgency', Text(), table=<traffic_2016>), Column('Description', Text(), table=<traffic_2016>), Column('Location', Text(), table=<traffic_2016>), Column('Latitude', FLOAT(), table=<traffic_2016>), Column('Longitude', FLOAT(), table=<traffic_2016>), Column('Accident', Text(), table=<traffic_2016>), Column('Belts', Text(), table=<traffic_2016>), Column('Personal_Injury', Text(), table=<traffic_2016>), Column('Property_Damage', Text(), table=<traffic_2016>), Column('Fatal', Text(), table=<traffic_2016>), Column('Commercial_License', Text(), table=<traffic_2016>), Column('HAZMAT', Text(), table=<traffic_2016>), Column('Commercial_Vehicle', Text(), table=<traffic_2016>), Column('Alcohol', Text(), table=<traffic_2016>), Column('Work_Zone', Text(), table=<traffic_2016>), Column('State', Text(), table=<traffic_2016>), Column('VehicleType', Text(), table=<traffic_2016>), Column('Year', FLOAT(), table=<traffic_2016>), Column('Make', Text(), table=<traffic_2016>), Column('Model', Text(), table=<traffic_2016>), Column('Color', Text(), table=<traffic_2016>), Column('Violation_Type', Text(), table=<traffic_2016>), Column('Charge', Text(), table=<traffic_2016>), Column('Article', Text(), table=<traffic_2016>), Column('Contributed_To_Accident', Text(), table=<traffic_2016>), Column('Race', Text(), table=<traffic_2016>), Column('Gender', Text(), table=<traffic_2016>), Column('Driver_City', Text(), table=<traffic_2016>), Column('Driver_State', Text(), table=<traffic_2016>), Column('DL_State', Text(), table=<traffic_2016>), Column('Arrest_Type', Text(), table=<traffic_2016>), Column('Geolocation', Text(), table=<traffic_2016>), schema=None)

Reading data from the SQLite database should be trivial by now.



In [26]:

    
traffic_sqlt = bl.Data('sqlite:///traffic_local.sqlite::traffic2016')
traffic_sqlt.head(2)









    Out[26]:





  
    
      
      index
      Stop_month
      Stop_day
      Stop_year
      Stop_hr
      Stop_min
      Stop_sec
      Agency
      SubAgency
      Description
      Location
      Latitude
      Longitude
      Accident
      Belts
      Personal_Injury
      Property_Damage
      Fatal
      Commercial_License
      HAZMAT
      Commercial_Vehicle
      Alcohol
      Work_Zone
      State
      VehicleType
      Year
      Make
      Model
      Color
      Violation_Type
      Charge
      Article
      Contributed_To_Accident
      Race
      Gender
      Driver_City
      Driver_State
      DL_State
      Arrest_Type
      Geolocation
    
  
  
    
      0
      2692
      4
      2
      2015
      8
      31
      0
      MCP
      2nd district, Bethesda
      DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC...
      POOKS HILL RD AT LINDEN AV
      39.065777
      -77.159442
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2016.0
      AUDI
      4S
      SILVER
      Warning
      21-201(a1)
      Transportation Article
      No
      WHITE
      M
      NORTH BETHESDA
      MD
      MD
      A - Marked Patrol
      (39.0657766666667, -77.1594416666667)
    
    
      1
      8862
      4
      11
      2015
      19
      27
      0
      MCP
      2nd district, Bethesda
      FAILURE TO STOP AT STOP SIGN
      NORFOLK AVE / DEL RAY AVE
      38.989567
      -77.098130
      No
      No
      No
      No
      No
      No
      No
      No
      No
      No
      MD
      02 - Automobile
      2016.0
      AUDI
      A7
      BLACK
      Warning
      21-707(a)
      Transportation Article
      No
      WHITE
      M
      BETHESDA
      MD
      MD
      A - Marked Patrol
      (38.9895666666667, -77.09813)

Interacting with MongoDB database

Reading from MongoDB is very similar to reading from a PostGRE SQL or SQLite databases.



In [27]:

    
traffic_mongo = bl.Data('mongodb://localhost:27017/packt::traffic')
traffic_mongo.head(2)









    Out[27]:





  
    
      
      Accident
      Agency
      Alcohol
      Arrest_Type
      Article
      Belts
      Charge
      Color
      Commercial_License
      Commercial_Vehicle
      Contributed_To_Accident
      DL_State
      Description
      Driver_City
      Driver_State
      Fatal
      Gender
      Geolocation
      HAZMAT
      Latitude
      Location
      Longitude
      Make
      Model
      Personal_Injury
      Property_Damage
      Race
      State
      Stop_day
      Stop_hr
      Stop_min
      Stop_month
      Stop_sec
      Stop_year
      SubAgency
      VehicleType
      Violation_Type
      Work_Zone
      Year
    
  
  
    
      0
      No
      MCP
      No
      A - Marked Patrol
      Transportation Article
      No
      21-201(a1)
      RED
      No
      No
      No
      MD
      DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC...
      TAKOMA PARK
      MD
      No
      M
      (39.00438825, -77.0061869666667)
      No
      39.004388
      FLOWER AVE AT WAYNE AVE
      -77.006187
      VOLK
      4S
      No
      No
      WHITE
      MD
      16
      21
      46
      7
      0
      2013
      3rd district, Silver Spring
      02 - Automobile
      Citation
      No
      2011.0
    
    
      1
      No
      MCP
      No
      A - Marked Patrol
      Transportation Article
      No
      13-401(h)
      BLACK
      No
      No
      No
      MD
      DRIVING VEHICLE ON HIGHWAY WITH SUSPENDED REGI...
      GERMANTOWN
      MD
      No
      M
      (39.0821319333333, -77.1463659333333)
      No
      39.082132
      355 / VEIRS MILL
      -77.146366
      CADILAC
      DEVILLE
      No
      No
      BLACK
      MD
      13
      17
      55
      1
      0
      2013
      1st district, Rockville
      02 - Automobile
      Citation
      No
      2004.0

Data operations

Accessing columns

There are two ways of accessing columns: you can get a single column at a time by accessing them as if they were a DataShape attribute



In [28]:

    
traffic.Year.head(2)

or indexing; this allows to select more than one column at a time



In [29]:

    
(traffic[['Location', 'Year', 'Accident', 'Fatal', 'Alcohol']]
    .head(2))









    Out[29]:





  
    
      
      Location
      Year
      Accident
      Fatal
      Alcohol
    
  
  
    
      0
      PARK RD AT HUNGERFORD DR
      2014.0
      No
      No
      No
    
    
      1
      CONNECTICUT AT METROPOLITAN AVE
      2003.0
      No
      No
      No

Symbolic transformations

If we could not reflect the schema from an already existing object, we would have to specify the schema manually.



In [30]:

    
schema_example = bl.symbol('schema_exampl', 
                           '{id: int, name: string}')

Since we already have an existing dataset traffic, we can reuse the schema by calling traffic.dshape and specify our transformations directly on it.



In [31]:

    
traffic_s = bl.symbol('traffic', traffic.dshape)
traffic_2013 = traffic_s[traffic_s['Stop_year'] == 2013][
    ['Stop_year', 'Arrest_Type','Color', 'Charge']
]

To present how this works, let's read our dataset into pandas' DataFrame.



In [32]:

    
traffic_pd = pd.read_csv('../Data/TrafficViolations.csv')

You can now pass the dataset directly to traffic_2013 object and perform the computation using the .compute(...) method of Blaze.



In [33]:

    
bl.compute(traffic_2013, traffic_pd).head(2)









    Out[33]:






  
    
      
      Stop_year
      Arrest_Type
      Color
      Charge
    
  
  
    
      73
      2013
      A - Marked Patrol
      SILVER
      13-409(b)
    
    
      215
      2013
      B - Unmarked Patrol
      BLACK
      21-309(b)

You can also pass a list of lists or a list of NumPy arrays.



In [51]:

    
bl.compute(traffic_2013, traffic_pd.values)[0:2]









    Out[51]:





array([[2013, 'A - Marked Patrol', 'SILVER', '13-409(b)'],
       [2013, 'B - Unmarked Patrol', 'BLACK', '21-309(b)']], dtype=object)

Operations on columns

You can check that by getting all the distinct values for the Stop_year column.



In [35]:

    
traffic['Stop_year'].distinct().sort()

We can subtract 2000 from the Stop_year column as we do not need a greater detail.



In [36]:

    
traffic['Stop_year'].head(2) - 2000

If you wanted to log-transform the Stop_year you need to



In [37]:

    
bl.log(traffic['Stop_year']).head(2)

Reducing data

Some reduction methods are also available, like .mean() (that calculates the average), .std (that calculates standard deviation) or .max() (that returns the maximum from the list).



In [38]:

    
traffic['Stop_year'].max()









    Out[38]:




2016

You can calculate the age of the car (in years) at the time when the violation occured



In [9]:

    
traffic = bl.transform(traffic, 
             Age_of_car = traffic.Stop_year - traffic.Year)

traffic.head(2)[['Stop_year', 'Year', 'Age_of_car']]

and to calculate the average age of the car involved in a fatal trafic violation and count the number of occurences you can use the by operation.



In [40]:

    
bl.by(traffic['Fatal'], 
      Fatal_AvgAge=traffic.Age_of_car.mean(),
      Fatal_Count =traffic.Age_of_car.count()
)









    Out[40]:





  
    
      
      Fatal
      Fatal_AvgAge
      Fatal_Count
    
  
  
    
      0
      No
      9.580998
      404418
    
    
      1
      Yes
      8.798246
      116

Joins

We first select all the traffic violations by violation type (the violation object) and the traffic violations involving belts (the belts object).



In [41]:

    
violation = traffic[
    ['Stop_month','Stop_day','Stop_year',
     'Stop_hr','Stop_min','Stop_sec','Violation_Type']]

belts = traffic[
    ['Stop_month','Stop_day','Stop_year',
     'Stop_hr','Stop_min','Stop_sec','Belts']]

Now, we join the two objects on the six date and time columns.



In [42]:

    
violation_belts = bl.join(violation, belts, 
      ['Stop_month','Stop_day','Stop_year',
       'Stop_hr','Stop_min','Stop_sec'])

Once we have the full dataset in place, let's check how many traffic violations involved belts and what sort of punishment was issued to the driver.



In [43]:

    
bl.by(violation_belts[['Violation_Type', 'Belts']],
      Violation_count=violation_belts.Belts.count()
).sort('Violation_count', ascending=False)









    Out[43]:





  
    
      
      Violation_Type
      Belts
      Violation_count
    
  
  
    
      0
      Citation
      No
      989728
    
    
      5
      Warning
      No
      439490
    
    
      2
      ESERO
      No
      56447
    
    
      1
      Citation
      Yes
      35596
    
    
      6
      Warning
      Yes
      12245
    
    
      3
      ESERO
      Yes
      1327
    
    
      4
      SERO
      No
      3

	Stop_month	Stop_day	Stop_year	Stop_hr	Stop_min	Stop_sec	Agency	SubAgency	Description	Location	Latitude	Longitude	Accident	Belts	Personal_Injury	Property_Damage	Fatal	Commercial_License	HAZMAT	Commercial_Vehicle	Alcohol	Work_Zone	State	VehicleType	Year	Make	Model	Color	Violation_Type	Charge	Article	Contributed_To_Accident	Race	Gender	Driver_City	Driver_State	DL_State	Arrest_Type	Geolocation
0	9	30	2014	23	51	0	MCP	1st district, Rockville	DRIVER FAILURE TO STOP AT STEADY CIRCULAR RED ...	PARK RD AT HUNGERFORD DR	NaN	NaN	No	No	No	No	No	No	No	No	No	No	MD	02 - Automobile	2014.0	FORD	MUSTANG	BLACK	Citation	21-202(h1)	Transportation Article	No	BLACK	M	ROCKVILLE	MD	MD	A - Marked Patrol	NaN
1	3	31	2015	23	59	0	MCP	2nd district, Bethesda	HEADLIGHTS (*)	CONNECTICUT AT METROPOLITAN AVE	NaN	NaN	No	No	No	No	No	No	No	No	No	No	MD	02 - Automobile	2003.0	HONDA	2S	BLUE	ESERO	55*	NaN	No	HISPANIC	M	SILVER SPRING	MD	MD	A - Marked Patrol	NaN

	Stop_month	Stop_day	Stop_year	Stop_hr	Stop_min	Stop_sec	Agency	SubAgency	Description	Location	Latitude	Longitude	Accident	Belts	Personal_Injury	Property_Damage	Fatal	Commercial_License	HAZMAT	Commercial_Vehicle	Alcohol	Work_Zone	State	VehicleType	Year	Make	Model	Color	Violation_Type	Charge	Article	Contributed_To_Accident	Race	Gender	Driver_City	Driver_State	DL_State	Arrest_Type	Geolocation	Age_of_car
0	3	29	2013	17	34	0	MCP	1st district, Rockville	FAILURE TO DISPLAY REGISTRATION CARD UPON DEMA...	WOOTTON PKWAY AND DARNSTOWN ROAD	39.092008	-77.19444	No	No	No	No	No	No	No	No	No	No	MD	02 - Automobile	2003.0	NISSAN	350Z	SILVER	Citation	13-409(b)	Transportation Article	No	HISPANIC	M	MONTGOMERY VILLAGE	MD	MD	A - Marked Patrol	(39.0920083333333, -77.19444)	10.0
1	8	12	2013	8	41	0	MCP	5th district, Germantown	DRIVER CHANGING LANES WHEN UNSAFE	RIDGE RD / OBSERVATION DR	NaN	NaN	No	No	No	No	No	No	No	No	No	No	MD	02 - Automobile	2003.0	GMC	YUKON	BLACK	Citation	21-309(b)	Transportation Article	No	BLACK	M	DAMASCUS	MD	MD	B - Unmarked Patrol	NaN	10.0

	index	Stop_month	Stop_day	Stop_year	Stop_hr	Stop_min	Stop_sec	Agency	SubAgency	Description	Location	Latitude	Longitude	Accident	Belts	Personal_Injury	Property_Damage	Fatal	Commercial_License	HAZMAT	Commercial_Vehicle	Alcohol	Work_Zone	State	VehicleType	Year	Make	Model	Color	Violation_Type	Charge	Article	Contributed_To_Accident	Race	Gender	Driver_City	Driver_State	DL_State	Arrest_Type	Geolocation
0	2692	4	2	2015	8	31	0	MCP	2nd district, Bethesda	DRIVER FAILURE TO OBEY PROPERLY PLACED TRAFFIC...	POOKS HILL RD AT LINDEN AV	39.065777	-77.159442	No	No	No	No	No	No	No	No	No	No	MD	02 - Automobile	2016.0	AUDI	4S	SILVER	Warning	21-201(a1)	Transportation Article	No	WHITE	M	NORTH BETHESDA	MD	MD	A - Marked Patrol	(39.0657766666667, -77.1594416666667)
1	8862	4	11	2015	19	27	0	MCP	2nd district, Bethesda	FAILURE TO STOP AT STOP SIGN	NORFOLK AVE / DEL RAY AVE	38.989567	-77.098130	No	No	No	No	No	No	No	No	No	No	MD	02 - Automobile	2016.0	AUDI	A7	BLACK	Warning	21-707(a)	Transportation Article	No	WHITE	M	BETHESDA	MD	MD	A - Marked Patrol	(38.9895666666667, -77.09813)

	Violation_Type	Belts	Violation_count
0	Citation	No	989728
5	Warning	No	439490
2	ESERO	No	56447
1	Citation	Yes	35596
6	Warning	Yes	12245
3	ESERO	Yes	1327
4	SERO	No	3