Swiss road accidents 2011 - 2015

Analysing every road accident in Switzerland from 2011 to 2015 using Pandas.


In [2]:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use("ggplot")
%matplotlib inline

Importing files on accidents, .csv Files, organised by year


In [3]:
unfaelle2011 = pd.read_csv("Export_1_Unfallblatt_2011.csv", delimiter = ";", encoding = "latin-1")
unfaelle2012 = pd.read_csv("Export_1_Unfallblatt_2012.csv", delimiter = ";", encoding = "latin-1")
unfaelle2013 = pd.read_csv("Export_1_Unfallblatt_2013.csv", delimiter = ";", encoding = "latin-1")
unfaelle2014 = pd.read_csv("Export_1_Unfallblatt_2014.csv", delimiter = ";", encoding = "latin-1")
unfaelle2015 = pd.read_csv("Export_1_Unfallblatt_2015.csv", delimiter = "\t", encoding = "UTF-8")

Importing files on people involved in accidents, .csv Files, organised by year


In [4]:
personen2011 = pd.read_csv("Export_3_Personenblatt_2011.csv", delimiter = ";", encoding = "latin-1")
personen2012 = pd.read_csv("Export_3_Personenblatt_2012.csv", delimiter = ";", encoding = "latin-1")
personen2013 = pd.read_csv("Export_3_Personenblatt_2013.csv", delimiter = ";", encoding = "latin-1")
personen2014 = pd.read_csv("Export_3_Personenblatt_2014.csv", delimiter = ";", encoding = "latin-1")
personen2015 = pd.read_csv("Export_3_Personenblatt_2015.csv", delimiter = "\t", encoding = "UTF-8")

Importing files on vehicles involved in accidents, .csv Files, organised by year


In [5]:
fahrzeuge2011 = pd.read_csv("Export_2_Objektblatt_2011.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2012 = pd.read_csv("Export_2_Objektblatt_2012.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2013 = pd.read_csv("Export_2_Objektblatt_2013.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2014 = pd.read_csv("Export_2_Objektblatt_2014.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2015 = pd.read_csv("Export_2_Objektblatt_2015.csv", delimiter = "\t", encoding = "UTF-8")

Concatenating the files


In [6]:
df_unfaelle = pd.concat([unfaelle2011, unfaelle2012, unfaelle2013, unfaelle2014, unfaelle2015], ignore_index=True)
df_personen = pd.concat([personen2011, personen2012, personen2013, personen2014, personen2015], ignore_index=True)
df_fahrzeuge = pd.concat([fahrzeuge2011, fahrzeuge2012, fahrzeuge2013, fahrzeuge2014, fahrzeuge2015], ignore_index=True)

In [7]:
#df_unfaelle.columns

In [8]:
#df_personen.columns

In [9]:
#df_fahrzeuge.columns
#contains the "Hauptverursacher UAP" (ja/nein) category

Contents

OVERVIEW

  1. Total number of road accidents in Switzerland since 2011? How have they developed over the years? Make a bar graph.
  2. On which day of the week do the most accidents occur? And at what time of day? Make a graph of that as well.
  3. Compare the various vehicle categories involved in accidents. Total and by year.
  4. What was the most expensive crash in the past five years? Where was it? And which crash one was the most deadly?
  5. How many accidents occured on roads without ligthing?
  6. How have drug and alcohol related accidents developed in the past 5 years?
  7. Combine as much information about Breatheliser record as possible.
  8. What ist the average age of crash drivers? Overall? And how has it changed over the years?
  9. What about female vs. male drivers?
  10. List the 10 accident hotspots in Switzerland?

1. Total number of road accidents in Switzerland since 2011? How have they developed over the years? Make a bar graph.


In [10]:
df_unfaelle['Unfall-UID'].count()


Out[10]:
266483

In [11]:
df_unfaelle.groupby('Jahr')['Unfall-UID'].count()


Out[11]:
Jahr
2011    54269
2012    54171
2013    53052
2014    51756
2015    53235
Name: Unfall-UID, dtype: int64

In [12]:
df_unfaelle.groupby('Jahr')['Unfall-UID'].count().plot(kind='bar')
plt.savefig("Unfaelle_pro_Jahr.svg")
plt.savefig("Unfaelle_pro_Jahr.png")


Most recently there has been a slight increase in road accidents in Switzerland.

2. On which day of the week do the most accidents occur? And at what time of day? Make graph of that as well.


In [13]:
df_unfaelle['Wochentag-Nr'].value_counts()


Out[13]:
5    43687
4    39969
3    39209
2    38617
6    38540
1    37634
7    28827
Name: Wochentag-Nr, dtype: int64

In [14]:
df_unfaelle['Wochentag-Nr'].astype(str).str.replace('6', "Saturday").str.replace('4', "Thursday").str.replace('1', "Monday").str.replace('2', "Tuesday").str.replace('3', "Wednesday").str.replace('5', "Friday").str.replace('7', "Sunday").value_counts(ascending=True).plot(kind='barh')
plt.savefig("Unfaelle_pro_Tag.svg")
plt.savefig("Unfaelle_pro_Tag.png")



In [15]:
plt.savefig("Unfaelle_pro_Tag.svg")
plt.savefig("Unfaelle_pro_Tag.png")


<matplotlib.figure.Figure at 0x105ad7cc0>

Transforming the data into times is proving a problem. They need to be transformed, so I can make a histogram of the times. With time as X-axis and count of accidents corresponding to the time on Y-xis. This post on timestamps in Pandas may be of some help.


In [16]:
#df_unfaelle['Unfallzeit'].value_counts(ascending=False).hist(kind='scatter', x=)
#df_unfaelle['Unfallzeit'].value_counts().plot(kind='barh', x=['Unfallzeit'])
#df_unfaelle['Unfallzeit_Double'] = df_unfaelle['Unfallzeit']
df_unfaelle['Unfallzeit'].value_counts().head(4)


Out[16]:
unbekannt    3024
17:00        2420
18:00        2297
17:30        2290
Name: Unfallzeit, dtype: int64

3. Compare the top ten various vehicle categories involved in accidents. Total and by year.


In [17]:
df_fahrzeuge['Fahrzeugart UAP'].value_counts().head(5)
#money_players = nba[nba['2013 $'] != 'n/a']


Out[17]:
|710|    310440
|712|     22099
|733|     21517
|730|     14451
|725|      9724
Name: Fahrzeugart UAP, dtype: int64

The "|" sign ist causing a lot of problems as it doesn't allow me to change the names of the numbers in one go. And I haven't figures out how to do it in two goes.


In [18]:
#df_fahrzeuge['Fahrzeugart UAP'].astype(str).str.replace('|', "").value_counts(ascending=False)
#vehicles_without_pipes.astype(str).str.replace('710', "Cars").value_counts(ascending=False).head(10)
test = df_fahrzeuge['Fahrzeugart UAP'].astype(str).str.replace('|', "").value_counts(ascending=False)
pd.DataFrame(test).head(10)
#test['Fahrzeugart UAP']
#test['Fahrzeugart UAP'].astype(str).str.replace('712', "Vans")
#test.columns()

#.str.replace('712', "Vans").str.replace('733', "n/a").str.replace('730', "Bicycles").str.replace('725', "Motorbikes, above 25kw").str.replace('718', "Trucks, above 7,4t").str.replace('722', "Mopeds").str.replace('724', "Motorbike, to 25kw").str.replace('720', "Semi trailer, above 7,5t").


Out[18]:
Fahrzeugart UAP
710 310440
712 22099
733 21517
730 14451
nan 13253
725 9724
718 7061
723 6171
722 4723
724 4636

4. What was the most expensive crash in the past five years? Where was it? And which crash was the most deadly?


In [19]:
df_unfaelle.sort_values('Total geschätzter Sachschaden (in 1000 CHF)', ascending=False)[['Total geschätzter Sachschaden (in 1000 CHF)', 'Datum', 'x-Koordinate', 'y-Koordinate', 'Kanton Kürzel', 'Aktuelle BFS Gemeinde-Nr']].head(5)


Out[19]:
Total geschätzter Sachschaden (in 1000 CHF) Datum x-Koordinate y-Koordinate Kanton Kürzel Aktuelle BFS Gemeinde-Nr
102850 3097.0 2012/11/26 00:00:00 733188 264855 TG 4501.0
125924 2113.8 2013/05/06 00:00:00 753599 261346 SG 3213.0
14218 2000.0 2011/04/14 00:00:00 577759 183068 FR 2196.0
232322 1801.0 20.05.15 772188 201408 GR NaN
150309 1730.0 2013/10/17 00:00:00 685914 252395 ZH 69.0

The accident causing the greatest damage, 3 Million Swiss Francs, happened on 26 October 2012 in Canton Thurgau, here, or here on Google Maps.


In [20]:
df_unfaelle.sort_values('Getötete', ascending=False)[['Getötete', 'Datum', 'x-Koordinate', 'y-Koordinate', 'Kanton Kürzel', 'Aktuelle BFS Gemeinde-Nr']].head(5)


Out[20]:
Getötete Datum x-Koordinate y-Koordinate Kanton Kürzel Aktuelle BFS Gemeinde-Nr
64358 28 2012/03/13 00:00:00 607075 125901 VS 6248.0
252202 5 26.09.15 627775 267992 AG NaN
90121 4 2012/09/03 00:00:00 570220 186565 FR 2024.0
139561 3 2013/08/04 00:00:00 663813 199798 OW 1401.0
235187 3 07.06.15 747298 159066 GR NaN

The most deadly accident was a bus accident involving mostly school kids from Belgien. And this one more recently.

5. How many accidents occured on roads without ligthing? (Maybe even work out an accident hotspot?


In [21]:
Strassenbeleuchtung UAP 640 keine 641 ausser Betrieb


  File "<ipython-input-21-d5aee852fb4f>", line 1
    Strassenbeleuchtung UAP 640 keine 641 ausser Betrieb
                          ^
SyntaxError: invalid syntax

In [ ]:


In [ ]:
df_unfaelle[(df_unfaelle['Lichtverhältnis UAP'] == 622) & (df_unfaelle['Strassenbeleuchtung UAP'] == 640)]


#df[(df['animal'] == 'cat') & (df['inches'] > 12)]

#df[df['feet'] > 6.5]

In [ ]:
df_unfaelle[(df_unfaelle['Lichtverhältnis UAP'] == 622) & (df_unfaelle['Strassenbeleuchtung UAP'] == 641)]

Including the over 1000 locations were the lights were out of order, over 10% of accidents happened in poorly lit locations.

6. How have drug and alcohol related accidents developed in the past 5 years?

ALCOHOL

In [57]:
df_unfaelle[df_unfaelle['Hauptursache UAP'] == 1101].groupby("Jahr")["Hauptursache UAP"].value_counts().plot(kind='bar')


Out[57]:
<matplotlib.axes._subplots.AxesSubplot at 0x11b5d5898>

DRUGS


In [58]:
df_unfaelle[df_unfaelle['Hauptursache UAP'] == 1102].groupby("Jahr")["Hauptursache UAP"].value_counts().plot(kind='bar')


Out[58]:
<matplotlib.axes._subplots.AxesSubplot at 0x125899cf8>

7. Combine as much information about breatheliser record as possible.

MERGING: Look at this again!!!*


In [78]:
df_fahrzeuge["Atemtest"].describe()


/usr/local/lib/python3.5/site-packages/numpy/lib/function_base.py:3823: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
Out[78]:
count    115710.000000
mean          0.260254
std           0.582384
min           0.000000
25%                NaN
50%                NaN
75%                NaN
max           9.000000
Name: Atemtest, dtype: float64

In [103]:
df_fahrzeuge["Atemtest"].value_counts()

Alcohol_record = df_fahrzeuge[df_fahrzeuge['Atemtest'] == 9.0]

In [104]:
Alcohol_record[['BUM Probe angeordnet UAP', 'Blutalkoholtest', 'Datum', 'Vertrautheit Strecke UAP', 'Kennzeichen Fahrzeug Kanton']]


Out[104]:
BUM Probe angeordnet UAP Blutalkoholtest Datum Vertrautheit Strecke UAP Kennzeichen Fahrzeug Kanton
15936 901 NaN 2011/01/05 00:00:00 830 Waadt

In [107]:
df_merged = df_personen.merge(df_fahrzeuge, how = 'left', left_on = 'Unfall-UID', right_on ='Unfall-UID')

In [108]:
df = df_merged.merge(df_unfaelle, how = 'left', left_on = 'Unfall-UID', right_on ='Unfall-UID')

In [ ]:


In [ ]:


In [ ]:


In [70]:
df_fahrzeuge.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 442007 entries, 0 to 442006
Data columns (total 48 columns):
Abblend-/Tagfahrlicht UAP                   408604 non-null float64
Ablenkung UAP                               442007 non-null int64
Angabe Führerausweis UAP                    428646 non-null float64
Angabe Führerausweis Zusatz UAP             22133 non-null object
Angaben Fahrzeug UAP                        162825 non-null object
Anhänger / Auflieger UAP                    417774 non-null float64
Anprall UAP                                 106171 non-null object
Atemtest                                    115710 non-null float64
BUM Probe angeordnet UAP                    442007 non-null int64
BUM Probe angeordnet Zusatz UAP             194 non-null object
Bauartbedingte Höchstgeschwindigkeit UAP    395153 non-null float64
Blutalkoholtest                             19394 non-null float64
Blutprobe Alkohol angeordnet UAP            442007 non-null int64
Blutprobe Alkohol angeordnet Zusatz UAP     1169 non-null object
Datum                                       442007 non-null object
Eigentumsverhältnis UAP                     428727 non-null float64
Fahrzeugart UAP                             428754 non-null object
Fahrzeugart Zusatz UAP                      1410 non-null object
Fahrzweck UAP                               442007 non-null int64
Führerausweis Kategorie                     373702 non-null object
Führerausweis Land UAP                      423208 non-null object
Führerausweis seit                          372490 non-null object
Getötete                                    442007 non-null int64
Hauptverursacher UAP                        442007 non-null int64
Jahr                                        442007 non-null int64
Kennzeichen Fahrzeug Kanton                 348204 non-null object
Kennzeichen Fahrzeug Land UAP               427616 non-null object
Kennzeichenart UAP                          411540 non-null float64
Leichtverletzte                             442007 non-null int64
Lenker UAP                                  428754 non-null float64
Meldepflicht UAP                            55762 non-null object
Objekt-Kategorie UAP                        442007 non-null int64
Objekt-Nr                                   442007 non-null object
Objekt-UID                                  442007 non-null object
Schwerverletzte                             442007 non-null int64
Total Personen                              442007 non-null int64
Unfall-UID                                  442007 non-null object
Ursache 1 UAP                               286988 non-null float64
Ursache 2 UAP                               80336 non-null float64
Ursache 3 UAP                               14747 non-null float64
Verdacht Alkohol UAP                        423200 non-null float64
Verdacht Betäubungsmittel UAP               438622 non-null float64
Verdacht Betäubungsmittel Zusatz UAP        1531 non-null object
Verdacht Betääubungsmittel Zusatz UAP       404 non-null object
Verdacht auf Arzneimittel UAP               438659 non-null float64
Verdacht auf Arzneimittel Zusatz UAP        1030 non-null object
Vertrautheit Strecke UAP                    442007 non-null int64
unverletzte Personen                        442007 non-null int64
dtypes: float64(15), int64(13), object(20)
memory usage: 161.9+ MB

In [68]:
df_unfaelle.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266483 entries, 0 to 266482
Data columns (total 42 columns):
Aktuelle BFS Gemeinde-Nr                       213248 non-null float64
Bahnübergang UAP                               266483 non-null int64
Bahnübergang Zusatz UAP                        287 non-null object
Datum                                          266483 non-null object
Getötete                                       266483 non-null int64
Hauptursache UAP                               266483 non-null int64
Höchstgeschwindigkeit                          266483 non-null int64
Höchstgeschwindigkeit Zusatz UAP               6364 non-null object
Innerorts / Ausserorts UAP                     266483 non-null int64
Jahr                                           266483 non-null int64
Kanton Kürzel                                  266483 non-null object
Leichtverletzte                                266483 non-null int64
Lichtverhältnis UAP                            266483 non-null int64
Schwerverletzte                                266483 non-null int64
Sicht UAP                                      266483 non-null int64
Strassenanlage UAP                             266483 non-null int64
Strassenanlage Zusatz UAP                      12218 non-null object
Strassenart UAP                                266483 non-null int64
Strassenart Zusatz UAP                         8807 non-null object
Strassenbeleuchtung UAP                        266483 non-null int64
Strassenkategorie                              266483 non-null object
Strassenzustand UAP                            266483 non-null int64
Strassenzustand Zusatz UAP                     1770 non-null object
Total Objekte                                  266483 non-null int64
Total Personen                                 266483 non-null int64
Total geschätzter Sachschaden (in 1000 CHF)    266482 non-null float64
Unfall-UID                                     266483 non-null object
Unfallstelle UAP                               266483 non-null int64
Unfallstelle Zusatz UAP                        56871 non-null object
Unfalltyp UAP                                  266483 non-null int64
Unfallzeit                                     266483 non-null object
Verkehrsbedingungen UAP                        266483 non-null int64
Verkehrsregelung UAP                           266483 non-null int64
Verkehrsregelung Zusatz UAP                    179 non-null object
Vortrittsregelung UAP                          266483 non-null int64
Witterung UAP                                  266483 non-null int64
Witterung Zusatz UAP                           4828 non-null object
Wochentag-Nr                                   266483 non-null int64
Zonensignalisation UAP                         266483 non-null int64
unverletzte Personen                           266483 non-null int64
x-Koordinate                                   266483 non-null int64
y-Koordinate                                   266483 non-null int64
dtypes: float64(2), int64(27), object(13)
memory usage: 85.4+ MB

In [69]:
df_personen.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 550892 entries, 0 to 550891
Data columns (total 15 columns):
Jahr                       550892 non-null int64
Datum                      550892 non-null object
Unfall-UID                 550892 non-null object
Objekt-UID                 550892 non-null object
Objekt-Nr                  550892 non-null object
Person-UID                 550892 non-null object
Person-Nr                  550892 non-null int64
Geschlecht UAP             550892 non-null int64
Alter                      550892 non-null object
Führerausweisalter         550892 non-null object
Personenart UAP            550892 non-null int64
Personenart Zusatz UAP     229669 non-null object
Unfallfolgen UAP           550892 non-null int64
Schutzsystem UAP           550892 non-null int64
Schutzsystem Zusatz UAP    28730 non-null object
dtypes: int64(6), object(9)
memory usage: 63.0+ MB