Exploring NYC Traffic Accident Data

New York City is one of the busiest cities in the world. With a population of over 8 million people, we can expect to see some interesting trends if we were to observe some data.

NYC Open Data offers a large list of data reports. I decided to download the "NYPD_Motor_Vehicle_Collisions" data set, which is 230 Mb and can be found online at: https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95/data. This data set contains information regarding every traffic accident entered into the NYPD system.

I will first try to make a few graphics that can describe and let us visualize the data set. Next, I will turn my focus to visualizing Taxi Cab data, and finally, run a regression to determine whether Taxi drivers are safer than ordinary drivers.



In [27]:

    
#Import the packages we will be using

import pandas as pd
import matplotlib 
import matplotlib.pyplot as plt

%matplotlib inline



In [28]:

    
data = pd.read_csv("/Users/Eli/Desktop/NYPD_Motor_Vehicle_Collisions.csv")









    



/Users/eli/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (3) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)



In [29]:

    
data.dtypes    #Show the data types of each column









    Out[29]:





DATE                              object
TIME                              object
BOROUGH                           object
ZIP CODE                          object
LATITUDE                         float64
LONGITUDE                        float64
LOCATION                          object
ON STREET NAME                    object
CROSS STREET NAME                 object
OFF STREET NAME                   object
NUMBER OF PERSONS INJURED          int64
NUMBER OF PERSONS KILLED           int64
NUMBER OF PEDESTRIANS INJURED      int64
NUMBER OF PEDESTRIANS KILLED       int64
NUMBER OF CYCLIST INJURED          int64
NUMBER OF CYCLIST KILLED           int64
NUMBER OF MOTORIST INJURED         int64
NUMBER OF MOTORIST KILLED          int64
CONTRIBUTING FACTOR VEHICLE 1     object
CONTRIBUTING FACTOR VEHICLE 2     object
CONTRIBUTING FACTOR VEHICLE 3     object
CONTRIBUTING FACTOR VEHICLE 4     object
CONTRIBUTING FACTOR VEHICLE 5     object
UNIQUE KEY                         int64
VEHICLE TYPE CODE 1               object
VEHICLE TYPE CODE 2               object
VEHICLE TYPE CODE 3               object
VEHICLE TYPE CODE 4               object
VEHICLE TYPE CODE 5               object
dtype: object



In [30]:

    
data.head() #Shows us what the data frame looks like









    Out[30]:






  
    
      
      DATE
      TIME
      BOROUGH
      ZIP CODE
      LATITUDE
      LONGITUDE
      LOCATION
      ON STREET NAME
      CROSS STREET NAME
      OFF STREET NAME
      ...
      CONTRIBUTING FACTOR VEHICLE 2
      CONTRIBUTING FACTOR VEHICLE 3
      CONTRIBUTING FACTOR VEHICLE 4
      CONTRIBUTING FACTOR VEHICLE 5
      UNIQUE KEY
      VEHICLE TYPE CODE 1
      VEHICLE TYPE CODE 2
      VEHICLE TYPE CODE 3
      VEHICLE TYPE CODE 4
      VEHICLE TYPE CODE 5
    
  
  
    
      0
      04/16/2017
      0:00
      BRONX
      10472
      40.831287
      -73.881730
      (40.831287, -73.88173)
      EAST 172 STREET
      BRONX RIVER AVENUE
      NaN
      ...
      Unspecified
      NaN
      NaN
      NaN
      3651861
      PASSENGER VEHICLE
      PICK-UP TRUCK
      NaN
      NaN
      NaN
    
    
      1
      04/16/2017
      0:00
      BROOKLYN
      11201
      40.686726
      -73.990470
      (40.686726, -73.99047)
      BERGEN STREET
      SMITH STREET
      NaN
      ...
      Unspecified
      NaN
      NaN
      NaN
      3651519
      TAXI
      PASSENGER VEHICLE
      NaN
      NaN
      NaN
    
    
      2
      04/16/2017
      0:00
      BROOKLYN
      11201
      40.702560
      -73.991140
      (40.70256, -73.99114)
      NaN
      NaN
      57        FRONT STREET
      ...
      Unspecified
      NaN
      NaN
      NaN
      3651522
      SPORT UTILITY / STATION WAGON
      SPORT UTILITY / STATION WAGON
      NaN
      NaN
      NaN
    
    
      3
      04/16/2017
      0:00
      QUEENS
      11429
      40.712590
      -73.733154
      (40.71259, -73.733154)
      NaN
      NaN
      220-20    HEMPSTEAD AVENUE
      ...
      Passing or Lane Usage Improper
      NaN
      NaN
      NaN
      3651540
      PASSENGER VEHICLE
      SPORT UTILITY / STATION WAGON
      NaN
      NaN
      NaN
    
    
      4
      04/16/2017
      0:00
      NaN
      NaN
      40.701440
      -73.984770
      (40.70144, -73.98477)
      BRIDGE STREET
      NaN
      NaN
      ...
      Unspecified
      NaN
      NaN
      NaN
      3651521
      PASSENGER VEHICLE
      SPORT UTILITY / STATION WAGON
      NaN
      NaN
      NaN
    
  

5 rows × 29 columns

As we can see, there is a lot of information on this data set. Some of the relevant columns of information we can use include: Borough, Street Name, Longitude, Latitude, Injured, Killed, Date and Time.

Vehicle Type Code 1 - 5 tells us what types of vehicle were involved in one observation.

For example, if one car and one taxi were involved, then "Vehicle Type Code 1" = Passenger Vehicle, and "Vehicle Type Code 2" = Taxi.



In [31]:

    
data['BOROUGH'] = data['BOROUGH'].astype('category')

The first thing I want to know about the data set is the distribution of car accidents by borough. To easily visualize these results, we can make a simple pie chart:



In [32]:

    
# Graph total number of accidents by borough
data['BOROUGH'].value_counts().plot(kind = "pie", title = "Car Crashes by Borough", )









    Out[32]:





<matplotlib.axes._subplots.AxesSubplot at 0x1326dd048>

Next, to give you an idea of how large the data set is and just how ubiquitous accidents are, I was able to reconstruct a map of NYC just by plotting the Longitude and Latitude of each accident on a scatter plot.

To separate the buroughs by color, I had to break the data frame down by boroughs, so they can each be graphed separatley.

There is a similar idea of a scatter plot map on this blog post: http://www.danielforsyth.me/mapping-nyc-taxi-data/

I found this while researching my data set and thought that I should try to make my own version. Here I used matplotlib.pyplot to overlay the graphs of each borough on to one scatter plot.



In [33]:

    
# Create 4 new dataframes based on their boroughs  - Staten Island is too far
bronx = data[data['BOROUGH'] == 'BRONX']              
manhattan = data[data['BOROUGH'] == 'MANHATTAN']             
brooklyn = data[data['BOROUGH'] == 'BROOKLYN']
queens = data[data['BOROUGH'] == 'QUEENS']


# Graph the scatter plot for the longitude and latitude for every accident in each burough on 1 graph
# The idea for the scatter plot map: http://www.danielforsyth.me/mapping-nyc-taxi-data/
plt.figure(figsize=(8.2,8))                 
plt.plot(bronx['LONGITUDE'],bronx['LATITUDE'],'.',color = 'red',ms= .63)      
plt.plot(manhattan['LONGITUDE'],manhattan['LATITUDE'],'.',color = 'green',ms= .63)
plt.plot(brooklyn['LONGITUDE'],brooklyn['LATITUDE'],'.',color = 'blue',ms= .63)
plt.plot(queens['LONGITUDE'],queens['LATITUDE'],'.',color = 'orange',ms= .63)

plt.xlabel('Longitude')
plt.ylabel('Latidude')
plt.axis([-74.04,-73.77,40.56,40.93,])  
plt.title('Lat & Long Scatter Plot - All Accidents by Borough')

# Coordinates for axis from http://boundingbox.klokantech.com/









    Out[33]:





<matplotlib.text.Text at 0x119cf3860>

Another interesting graph to look at would be the streets with the most car accidents. There are thousands of streets so I will look at only the top 15.



In [34]:

    
# Count the number of times each street appears, show the top 15, plot the results.
# Slice used to remove the top result, becuase it was a blank street name (missing information)

plt.figure(figsize=(6.2,6))                
data['ON STREET NAME'].value_counts().head(15)[1:].plot(kind = 'bar')









    Out[34]:





<matplotlib.axes._subplots.AxesSubplot at 0x19e318b38>

Here I will try to take a closer look at the types of vehicles involved in an accident. I will first show the graph for the whole data set, and then I will try to compare that with the data from only Manhattan.

We can expect to see a different distribution of vehicles for Manhattan because it has a very unique demographic. One thing to notice in the following bar charts is the level of Taxi accident in Manhattan relative to the rest of the data set. This shouldn't be a surprise.

*Note - this only graphs single vehicle collisions.



In [35]:

    
plt.figure(figsize=(6.2,6))                
data["VEHICLE TYPE CODE 1"].value_counts().head(10).plot(kind = "bar", title = "Accident Frequancy by Vehicle Type - NYC")









    Out[35]:





<matplotlib.axes._subplots.AxesSubplot at 0x11b543320>



In [36]:

    
plt.figure(figsize=(6.2,6))
manhattan["VEHICLE TYPE CODE 1"].value_counts().head(10).plot(kind = "bar", title = "Accidents by Vehicle Type - Manhattan")









    Out[36]:





<matplotlib.axes._subplots.AxesSubplot at 0x11d7f8978>

Although we can safely assume there are more taxis in Manhattan, I wanted to be able to create a visualization of where taxis get into accidents.

To do this, I will make a scatter plot of the lat/long of taxi accidents in red, and the lat/long of passenger vehicle accidents in black. This is very similar to what I did earlier.

Due to the organization of the data set, Vehicle Types can be found in up to 5 different columns. We have to create 2 new data frames where one captures every time "taxi" is mentioned in any of the five columns, and another dataframe for every time "passenger vehicle" is mentioned. I used the df.append() method to aggregate the relevant data into 2 different data frames.



In [37]:

    
#Create data frame only for accidents involving passenger vehicle
passenger = data[data['VEHICLE TYPE CODE 1'] == 'PASSENGER VEHICLE']
passenger = passenger.append(data[data['VEHICLE TYPE CODE 2'] == 'PASSENGER VEHICLE'])
passenger = passenger.append(data[data['VEHICLE TYPE CODE 3'] == 'PASSENGER VEHICLE'])
passenger = passenger.append(data[data['VEHICLE TYPE CODE 4'] == 'PASSENGER VEHICLE'])
passenger = passenger.append(data[data['VEHICLE TYPE CODE 5'] == 'PASSENGER VEHICLE'])


#Create data frame only for accidents involving taxis
taxis = data[data['VEHICLE TYPE CODE 1'] == 'TAXI']
taxis = taxis.append(data[data['VEHICLE TYPE CODE 2'] == 'TAXI'])
taxis = taxis.append(data[data['VEHICLE TYPE CODE 3'] == 'TAXI'])
taxis = taxis.append(data[data['VEHICLE TYPE CODE 4'] == 'TAXI'])
taxis = taxis.append(data[data['VEHICLE TYPE CODE 5'] == 'TAXI'])



In [38]:

    
#Graph the lat and long of passengers and of taxis
plt.figure(figsize=(8.2,8))
plt.plot(passenger['LONGITUDE'],passenger['LATITUDE'],'k.',ms= .8,alpha=.3)
plt.plot(taxis['LONGITUDE'],taxis['LATITUDE'],'r.',ms= .8)
plt.axis([-74.04,-73.77,40.56,40.93,]) 

plt.xlabel('Longitude')
plt.ylabel('Latidude')
plt.title('Taxi Vs. Passenger Vehicle - Location Scatter Plot')









    Out[38]:





<matplotlib.text.Text at 0x11cde3128>

As you can see, there is a large concentration of taxi accidents in and around the Manhattan area compared to passenger vehicles. This makes a lot of sense; there are many more taxis driving in Manhattan than anywhere else in NYC. There also seems to be a concentration of accidents in the Financial District. Be careful down there.

Another interesting chart for single-vehicle taxi incidents.



In [39]:

    
taxis['CONTRIBUTING FACTOR VEHICLE 1'].value_counts().head(10)[1:].plot(kind = 'bar', title = 'Why Do Taxis Crash?')









    Out[39]:





<matplotlib.axes._subplots.AxesSubplot at 0x11cddb0f0>

Regression Preperation

I want to be able to run a regression of the total casualties on the time of day, season, and involvement of a taxi driver. In order to do this I need to create 3 new columns in my data frame: time of day, season, and taxi involvement.

Instead of using the hour for the "Time of Day" column, I will group certain hours together into Morning, Afternoon, Night, and Late Night. This way we will have a regression of 4 variables instead of 24.

Instead of using the the Month value for the "Season" column I will group the appropriate months into Winter, Spring, Summer, and Fall. Similar concept as above.

For the "Taxi Involvement" column, I will record whether a Taxi was involved as either True or False for every accident.

But before we can make these groups, we need to collect "Hour", "Month", and "True/False" for every accident.

In order to do this, I will define 3 functions that will create a new column with the information we need.



In [40]:

    
# This function will return the "hour" value for any given row
def Hour(row):   
    time = str(row['TIME'])
    return time[0:time.index(":")]

# .apply method applies the above function to every row and puts the result in a new column called "HOUR"
data['HOUR'] = data.apply(Hour, axis = 1)



In [41]:

    
# This function will return "True" for a row if a Taxi is involved and "False" if not.
def taxiInvolved(row):
    if row['VEHICLE TYPE CODE 1'] == 'TAXI' or row['VEHICLE TYPE CODE 2'] == 'TAXI' or row['VEHICLE TYPE CODE 3'] == 'TAXI' or row['VEHICLE TYPE CODE 4'] == 'TAXI' or row['VEHICLE TYPE CODE 5'] == 'TAXI':
        return True
    else:
        return False

#  Applies the above function to every row and puts the result in a new column called "Taxi_Involvement"
data['Taxi_Involvement'] = data.apply(taxiInvolved, axis = 1)



In [42]:

    
# This function will return the "Month" value for any given row - in other words 1 - 12.
def Month(row):
    date = str(row['DATE'])
    return date[0:date.index("/")]

# Applies the above function to every row and puts the result in a new column called "MONTH"
data['MONTH'] = data.apply(Month, axis =1)



In [43]:

    
# I want to include injuries and deaths in my regression
data['TOTAL_CASUALTIES'] = data['NUMBER OF PERSONS INJURED'] + data['NUMBER OF PERSONS KILLED']

The data frame now looks like this:



In [44]:

    
data.head()









    Out[44]:






  
    
      
      DATE
      TIME
      BOROUGH
      ZIP CODE
      LATITUDE
      LONGITUDE
      LOCATION
      ON STREET NAME
      CROSS STREET NAME
      OFF STREET NAME
      ...
      UNIQUE KEY
      VEHICLE TYPE CODE 1
      VEHICLE TYPE CODE 2
      VEHICLE TYPE CODE 3
      VEHICLE TYPE CODE 4
      VEHICLE TYPE CODE 5
      HOUR
      Taxi_Involvement
      MONTH
      TOTAL_CASUALTIES
    
  
  
    
      0
      04/16/2017
      0:00
      BRONX
      10472
      40.831287
      -73.881730
      (40.831287, -73.88173)
      EAST 172 STREET
      BRONX RIVER AVENUE
      NaN
      ...
      3651861
      PASSENGER VEHICLE
      PICK-UP TRUCK
      NaN
      NaN
      NaN
      0
      False
      04
      0
    
    
      1
      04/16/2017
      0:00
      BROOKLYN
      11201
      40.686726
      -73.990470
      (40.686726, -73.99047)
      BERGEN STREET
      SMITH STREET
      NaN
      ...
      3651519
      TAXI
      PASSENGER VEHICLE
      NaN
      NaN
      NaN
      0
      True
      04
      0
    
    
      2
      04/16/2017
      0:00
      BROOKLYN
      11201
      40.702560
      -73.991140
      (40.70256, -73.99114)
      NaN
      NaN
      57        FRONT STREET
      ...
      3651522
      SPORT UTILITY / STATION WAGON
      SPORT UTILITY / STATION WAGON
      NaN
      NaN
      NaN
      0
      False
      04
      0
    
    
      3
      04/16/2017
      0:00
      QUEENS
      11429
      40.712590
      -73.733154
      (40.71259, -73.733154)
      NaN
      NaN
      220-20    HEMPSTEAD AVENUE
      ...
      3651540
      PASSENGER VEHICLE
      SPORT UTILITY / STATION WAGON
      NaN
      NaN
      NaN
      0
      False
      04
      0
    
    
      4
      04/16/2017
      0:00
      NaN
      NaN
      40.701440
      -73.984770
      (40.70144, -73.98477)
      BRIDGE STREET
      NaN
      NaN
      ...
      3651521
      PASSENGER VEHICLE
      SPORT UTILITY / STATION WAGON
      NaN
      NaN
      NaN
      0
      False
      04
      0
    
  

5 rows × 33 columns

4 New columns have been created: "Total Casualties", "With Taxi", "Hour", and "Month". Now I can use the "Hour" and "Month" columns that I made to create a "Season" and "Time of Day" column that will actually be used in the regression.

"Total Casualties" is the dependent variable.

The "With Taxi" column can be used as-is for a independent variable.

The "Season" and "Time of Day" columns will be the other two independant variables.

The aim of the regression is to figure out which factors might contribute positively and negatively to the number of casualties in a given accident.



In [45]:

    
# This function will return the appropriate season for any given row.
def Season(row):
    winter = [12,1,2]
    spring = [3,4,5]
    summer = [6,7,8]
    fall = [9,10,11]
    
    if int(row['MONTH']) in winter: return "winter"
    if int(row['MONTH']) in spring: return "spring"
    if int(row['MONTH']) in summer: return "summer"
    if int(row['MONTH']) in fall: return "fall"

# Applies the above function to every row and puts the result in a new column called "SEASON"
data['SEASON'] = data.apply(Season, axis = 1)



In [46]:

    
# This function will return the appropriate "time of day" for any given row
def TimeOfDay(row):
    morning = [7,8,9,10,11,12]
    afternoon = [13,14,15,16,17,18]
    night = [19,20,21,22,23,0]
    late_night = [1,2,3,4,5,6]
    
    if int(row['HOUR']) in morning: return "morning"
    if int(row['HOUR']) in afternoon: return "afternoon"
    if int(row['HOUR']) in night: return "night"
    if int(row['HOUR']) in late_night: return "late_night"

# Applies the above function to every row and puts the result in a new column called "TIME_OF_DAY"   
data['TIME_OF_DAY'] = data.apply(TimeOfDay, axis = 1)



In [47]:

    
#imports the statistics package that will be used to run the regression

import statsmodels.formula.api as sm
lm = sm.ols('TOTAL_CASUALTIES ~ TIME_OF_DAY + SEASON + Taxi_Involvement', data = data).fit()
lm.summary()

#Documentation found on this Stack Exchange: http://stackoverflow.com/questions/19991445/run-an-ols-regression-with-pandas-data-frame









    Out[47]:





OLS Regression Results

  Dep. Variable:     TOTAL_CASUALTIES    R-squared:              0.004  


  Model:                    OLS          Adj. R-squared:         0.004  


  Method:              Least Squares     F-statistic:            599.9  


  Date:              Fri, 12 May 2017    Prob (F-statistic):      0.00   


  Time:                  15:56:23        Log-Likelihood:     -1.0127e+06


  No. Observations:       1016502        AIC:                 2.026e+06 


  Df Residuals:           1016494        BIC:                 2.026e+06 


  Df Model:                    7                                        


  Covariance Type:       nonrobust                                      




                               coef      std err       t       P>|t|  [95.0% Conf. Int.] 


  Intercept                      0.2522      0.002    167.661   0.000      0.249     0.255


  TIME_OF_DAY[T.late_night]      0.0927      0.002     37.466   0.000      0.088     0.098


  TIME_OF_DAY[T.morning]        -0.0238      0.002    -15.155   0.000     -0.027    -0.021


  TIME_OF_DAY[T.night]           0.0639      0.002     36.701   0.000      0.061     0.067


  SEASON[T.spring]              -0.0111      0.002     -6.053   0.000     -0.015    -0.008


  SEASON[T.summer]               0.0055      0.002      3.019   0.003      0.002     0.009


  SEASON[T.winter]              -0.0272      0.002    -14.966   0.000     -0.031    -0.024


  Taxi_Involvement[T.True]      -0.0403      0.003    -15.034   0.000     -0.046    -0.035




  Omnibus:        1056180.030    Durbin-Watson:           1.763   


  Prob(Omnibus):     0.000       Jarque-Bera (JB):   180567270.924


  Skew:              4.865       Prob(JB):                 0.00   


  Kurtosis:         67.564       Cond. No.                 5.17



In [48]:

    
lm.params[1:].plot(kind = 'bar', color = ['blue', 'red', 'blue', 'red','blue','red', 'red'])









    Out[48]:





<matplotlib.axes._subplots.AxesSubplot at 0x11bba4d30>

Interpretting Regression Results

The regression shows us that certain factors do have statistically significant effects on the target variable - the total casualties in a given accident. The p values displayed in the summary show that every coefficient is statistically significant well beyond p < 0.05.

The coefficinets show us the degree to which the predictor variables influence our target variable, as well as the direction (positive or negative impact). As you can see from the color-coded bar graph above, the strongest factors contributing towards a higher rate of casualties in the data are the fact that it took place at night or during the late night. This seems intuitively logical, as low visibility and higher rates of drunk driving might play a role. It also shows that the regression results are resonable.

Another interesting conclusion of the regression is that the involvement of a taxi in an accident seems to be the strongest factor among the predictor variables in producing a lower number of casualties in a given accident. Interpretting this result is not as straight forward. Are taxi driver inherently better drivers because of their occupation? Or do they just tend to have less people in the car, causing low number of casualties? Further analysis is required to determine whether or not Taxi drivers are better drivers; but for now it seems that if your're going to crash, you should crash into a taxi.

	DATE	TIME	BOROUGH	ZIP CODE	LATITUDE	LONGITUDE	LOCATION	ON STREET NAME	CROSS STREET NAME	OFF STREET NAME	...	CONTRIBUTING FACTOR VEHICLE 2	CONTRIBUTING FACTOR VEHICLE 3	CONTRIBUTING FACTOR VEHICLE 4	CONTRIBUTING FACTOR VEHICLE 5	UNIQUE KEY	VEHICLE TYPE CODE 1	VEHICLE TYPE CODE 2	VEHICLE TYPE CODE 3	VEHICLE TYPE CODE 4	VEHICLE TYPE CODE 5
0	04/16/2017	0:00	BRONX	10472	40.831287	-73.881730	(40.831287, -73.88173)	EAST 172 STREET	BRONX RIVER AVENUE	NaN	...	Unspecified	NaN	NaN	NaN	3651861	PASSENGER VEHICLE	PICK-UP TRUCK	NaN	NaN	NaN
1	04/16/2017	0:00	BROOKLYN	11201	40.686726	-73.990470	(40.686726, -73.99047)	BERGEN STREET	SMITH STREET	NaN	...	Unspecified	NaN	NaN	NaN	3651519	TAXI	PASSENGER VEHICLE	NaN	NaN	NaN
2	04/16/2017	0:00	BROOKLYN	11201	40.702560	-73.991140	(40.70256, -73.99114)	NaN	NaN	57 FRONT STREET	...	Unspecified	NaN	NaN	NaN	3651522	SPORT UTILITY / STATION WAGON	SPORT UTILITY / STATION WAGON	NaN	NaN	NaN
3	04/16/2017	0:00	QUEENS	11429	40.712590	-73.733154	(40.71259, -73.733154)	NaN	NaN	220-20 HEMPSTEAD AVENUE	...	Passing or Lane Usage Improper	NaN	NaN	NaN	3651540	PASSENGER VEHICLE	SPORT UTILITY / STATION WAGON	NaN	NaN	NaN
4	04/16/2017	0:00	NaN	NaN	40.701440	-73.984770	(40.70144, -73.98477)	BRIDGE STREET	NaN	NaN	...	Unspecified	NaN	NaN	NaN	3651521	PASSENGER VEHICLE	SPORT UTILITY / STATION WAGON	NaN	NaN	NaN

Dep. Variable:	TOTAL_CASUALTIES	R-squared:	0.004
Model:	OLS	Adj. R-squared:	0.004
Method:	Least Squares	F-statistic:	599.9
Date:	Fri, 12 May 2017	Prob (F-statistic):	0.00
Time:	15:56:23	Log-Likelihood:	-1.0127e+06
No. Observations:	1016502	AIC:	2.026e+06
Df Residuals:	1016494	BIC:	2.026e+06
Df Model:	7
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
Intercept	0.2522	0.002	167.661	0.000	0.249 0.255
TIME_OF_DAY[T.late_night]	0.0927	0.002	37.466	0.000	0.088 0.098
TIME_OF_DAY[T.morning]	-0.0238	0.002	-15.155	0.000	-0.027 -0.021
TIME_OF_DAY[T.night]	0.0639	0.002	36.701	0.000	0.061 0.067
SEASON[T.spring]	-0.0111	0.002	-6.053	0.000	-0.015 -0.008
SEASON[T.summer]	0.0055	0.002	3.019	0.003	0.002 0.009
SEASON[T.winter]	-0.0272	0.002	-14.966	0.000	-0.031 -0.024
Taxi_Involvement[T.True]	-0.0403	0.003	-15.034	0.000	-0.046 -0.035

Omnibus:	1056180.030	Durbin-Watson:	1.763
Prob(Omnibus):	0.000	Jarque-Bera (JB):	180567270.924
Skew:	4.865	Prob(JB):	0.00
Kurtosis:	67.564	Cond. No.	5.17