Swiss road accidents 2011 - 2015

Data source: Federal Roads Office (FEDRO)

Analysing every road accident in Switzerland from 2011 to 2015 using Pandas.



In [2]:

    
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use("ggplot")
%matplotlib inline

Importing files on accidents, .csv Files, organised by year



In [3]:

    
unfaelle2011 = pd.read_csv("Export_1_Unfallblatt_2011.csv", delimiter = ";", encoding = "latin-1")
unfaelle2012 = pd.read_csv("Export_1_Unfallblatt_2012.csv", delimiter = ";", encoding = "latin-1")
unfaelle2013 = pd.read_csv("Export_1_Unfallblatt_2013.csv", delimiter = ";", encoding = "latin-1")
unfaelle2014 = pd.read_csv("Export_1_Unfallblatt_2014.csv", delimiter = ";", encoding = "latin-1")
unfaelle2015 = pd.read_csv("Export_1_Unfallblatt_2015.csv", delimiter = "\t", encoding = "UTF-8")

Importing files on people involved in accidents, .csv Files, organised by year



In [4]:

    
personen2011 = pd.read_csv("Export_3_Personenblatt_2011.csv", delimiter = ";", encoding = "latin-1")
personen2012 = pd.read_csv("Export_3_Personenblatt_2012.csv", delimiter = ";", encoding = "latin-1")
personen2013 = pd.read_csv("Export_3_Personenblatt_2013.csv", delimiter = ";", encoding = "latin-1")
personen2014 = pd.read_csv("Export_3_Personenblatt_2014.csv", delimiter = ";", encoding = "latin-1")
personen2015 = pd.read_csv("Export_3_Personenblatt_2015.csv", delimiter = "\t", encoding = "UTF-8")

Importing files on vehicles involved in accidents, .csv Files, organised by year



In [5]:

    
fahrzeuge2011 = pd.read_csv("Export_2_Objektblatt_2011.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2012 = pd.read_csv("Export_2_Objektblatt_2012.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2013 = pd.read_csv("Export_2_Objektblatt_2013.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2014 = pd.read_csv("Export_2_Objektblatt_2014.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2015 = pd.read_csv("Export_2_Objektblatt_2015.csv", delimiter = "\t", encoding = "UTF-8")

Concatenating the files



In [6]:

    
df_unfaelle = pd.concat([unfaelle2011, unfaelle2012, unfaelle2013, unfaelle2014, unfaelle2015], ignore_index=True)
df_personen = pd.concat([personen2011, personen2012, personen2013, personen2014, personen2015], ignore_index=True)
df_fahrzeuge = pd.concat([fahrzeuge2011, fahrzeuge2012, fahrzeuge2013, fahrzeuge2014, fahrzeuge2015], ignore_index=True)



In [7]:

    
#df_unfaelle.columns



In [8]:

    
#df_personen.columns



In [9]:

    
#df_fahrzeuge.columns
#contains the "Hauptverursacher UAP" (ja/nein) category

OVERVIEW

Total number of road accidents in Switzerland since 2011? How have they developed over the years? Make a bar graph.
On which day of the week do the most accidents occur? And at what time of day? Make a graph of that as well.
Compare the various vehicle categories involved in accidents. Total and by year.
What was the most expensive crash in the past five years? Where was it? And which crash one was the most deadly?
How many accidents occured on roads without ligthing?
How have drug and alcohol related accidents developed in the past 5 years?
Combine as much information about Breatheliser record as possible.
What ist the average age of crash drivers? Overall? And how has it changed over the years?
What about female vs. male drivers?
List the 10 accident hotspots in Switzerland?

1. Total number of road accidents in Switzerland since 2011? How have they developed over the years? Make a bar graph.



In [10]:

    
df_unfaelle['Unfall-UID'].count()









    Out[10]:





266483



In [11]:

    
df_unfaelle.groupby('Jahr')['Unfall-UID'].count()









    Out[11]:





Jahr
2011    54269
2012    54171
2013    53052
2014    51756
2015    53235
Name: Unfall-UID, dtype: int64



In [12]:

    
df_unfaelle.groupby('Jahr')['Unfall-UID'].count().plot(kind='bar')
plt.savefig("Unfaelle_pro_Jahr.svg")
plt.savefig("Unfaelle_pro_Jahr.png")

Most recently there has been a slight increase in road accidents in Switzerland.

2. On which day of the week do the most accidents occur? And at what time of day? Make graph of that as well.



In [13]:

    
df_unfaelle['Wochentag-Nr'].value_counts()









    Out[13]:





5    43687
4    39969
3    39209
2    38617
6    38540
1    37634
7    28827
Name: Wochentag-Nr, dtype: int64



In [14]:

    
df_unfaelle['Wochentag-Nr'].astype(str).str.replace('6', "Saturday").str.replace('4', "Thursday").str.replace('1', "Monday").str.replace('2', "Tuesday").str.replace('3', "Wednesday").str.replace('5', "Friday").str.replace('7', "Sunday").value_counts(ascending=True).plot(kind='barh')
plt.savefig("Unfaelle_pro_Tag.svg")
plt.savefig("Unfaelle_pro_Tag.png")



In [15]:

    
plt.savefig("Unfaelle_pro_Tag.svg")
plt.savefig("Unfaelle_pro_Tag.png")









    





<matplotlib.figure.Figure at 0x105ad7cc0>

Transforming the data into times is proving a problem. They need to be transformed, so I can make a histogram of the times. With time as X-axis and count of accidents corresponding to the time on Y-xis. This post on timestamps in Pandas may be of some help.



In [16]:

    
#df_unfaelle['Unfallzeit'].value_counts(ascending=False).hist(kind='scatter', x=)
#df_unfaelle['Unfallzeit'].value_counts().plot(kind='barh', x=['Unfallzeit'])
#df_unfaelle['Unfallzeit_Double'] = df_unfaelle['Unfallzeit']
df_unfaelle['Unfallzeit'].value_counts().head(4)









    Out[16]:





unbekannt    3024
17:00        2420
18:00        2297
17:30        2290
Name: Unfallzeit, dtype: int64

3. Compare the top ten various vehicle categories involved in accidents. Total and by year.



In [17]:

    
df_fahrzeuge['Fahrzeugart UAP'].value_counts().head(5)
#money_players = nba[nba['2013 $'] != 'n/a']









    Out[17]:





|710|    310440
|712|     22099
|733|     21517
|730|     14451
|725|      9724
Name: Fahrzeugart UAP, dtype: int64

The "|" sign ist causing a lot of problems as it doesn't allow me to change the names of the numbers in one go. And I haven't figures out how to do it in two goes.



In [18]:

    
#df_fahrzeuge['Fahrzeugart UAP'].astype(str).str.replace('|', "").value_counts(ascending=False)
#vehicles_without_pipes.astype(str).str.replace('710', "Cars").value_counts(ascending=False).head(10)
test = df_fahrzeuge['Fahrzeugart UAP'].astype(str).str.replace('|', "").value_counts(ascending=False)
pd.DataFrame(test).head(10)
#test['Fahrzeugart UAP']
#test['Fahrzeugart UAP'].astype(str).str.replace('712', "Vans")
#test.columns()

#.str.replace('712', "Vans").str.replace('733', "n/a").str.replace('730', "Bicycles").str.replace('725', "Motorbikes, above 25kw").str.replace('718', "Trucks, above 7,4t").str.replace('722', "Mopeds").str.replace('724', "Motorbike, to 25kw").str.replace('720', "Semi trailer, above 7,5t").









    Out[18]:






  
    
      
      Fahrzeugart UAP
    
  
  
    
      710
      310440
    
    
      712
      22099
    
    
      733
      21517
    
    
      730
      14451
    
    
      nan
      13253
    
    
      725
      9724
    
    
      718
      7061
    
    
      723
      6171
    
    
      722
      4723
    
    
      724
      4636

4. What was the most expensive crash in the past five years? Where was it? And which crash was the most deadly?



In [19]:

    
df_unfaelle.sort_values('Total geschätzter Sachschaden (in 1000 CHF)', ascending=False)[['Total geschätzter Sachschaden (in 1000 CHF)', 'Datum', 'x-Koordinate', 'y-Koordinate', 'Kanton Kürzel', 'Aktuelle BFS Gemeinde-Nr']].head(5)









    Out[19]:






  
    
      
      Total geschätzter Sachschaden (in 1000 CHF)
      Datum
      x-Koordinate
      y-Koordinate
      Kanton Kürzel
      Aktuelle BFS Gemeinde-Nr
    
  
  
    
      102850
      3097.0
      2012/11/26 00:00:00
      733188
      264855
      TG
      4501.0
    
    
      125924
      2113.8
      2013/05/06 00:00:00
      753599
      261346
      SG
      3213.0
    
    
      14218
      2000.0
      2011/04/14 00:00:00
      577759
      183068
      FR
      2196.0
    
    
      232322
      1801.0
      20.05.15
      772188
      201408
      GR
      NaN
    
    
      150309
      1730.0
      2013/10/17 00:00:00
      685914
      252395
      ZH
      69.0

The accident causing the greatest damage, 3 Million Swiss Francs, happened on 26 October 2012 in Canton Thurgau, here, or here on Google Maps.



In [20]:

    
df_unfaelle.sort_values('Getötete', ascending=False)[['Getötete', 'Datum', 'x-Koordinate', 'y-Koordinate', 'Kanton Kürzel', 'Aktuelle BFS Gemeinde-Nr']].head(5)









    Out[20]:






  
    
      
      Getötete
      Datum
      x-Koordinate
      y-Koordinate
      Kanton Kürzel
      Aktuelle BFS Gemeinde-Nr
    
  
  
    
      64358
      28
      2012/03/13 00:00:00
      607075
      125901
      VS
      6248.0
    
    
      252202
      5
      26.09.15
      627775
      267992
      AG
      NaN
    
    
      90121
      4
      2012/09/03 00:00:00
      570220
      186565
      FR
      2024.0
    
    
      139561
      3
      2013/08/04 00:00:00
      663813
      199798
      OW
      1401.0
    
    
      235187
      3
      07.06.15
      747298
      159066
      GR
      NaN

The most deadly accident was a bus accident involving mostly school kids from Belgien. And this one more recently.

5. How many accidents occured on roads without ligthing? (Maybe even work out an accident hotspot?



In [21]:

    
Strassenbeleuchtung UAP 640 keine 641 ausser Betrieb









    



  File "<ipython-input-21-d5aee852fb4f>", line 1
    Strassenbeleuchtung UAP 640 keine 641 ausser Betrieb
                          ^
SyntaxError: invalid syntax



In [ ]:



In [ ]:

    
df_unfaelle[(df_unfaelle['Lichtverhältnis UAP'] == 622) & (df_unfaelle['Strassenbeleuchtung UAP'] == 640)]


#df[(df['animal'] == 'cat') & (df['inches'] > 12)]

#df[df['feet'] > 6.5]



In [ ]:

    
df_unfaelle[(df_unfaelle['Lichtverhältnis UAP'] == 622) & (df_unfaelle['Strassenbeleuchtung UAP'] == 641)]

Including the over 1000 locations were the lights were out of order, over 10% of accidents happened in poorly lit locations.

6. How have drug and alcohol related accidents developed in the past 5 years?

ALCOHOL



In [57]:

    
df_unfaelle[df_unfaelle['Hauptursache UAP'] == 1101].groupby("Jahr")["Hauptursache UAP"].value_counts().plot(kind='bar')









    Out[57]:





<matplotlib.axes._subplots.AxesSubplot at 0x11b5d5898>

DRUGS



In [58]:

    
df_unfaelle[df_unfaelle['Hauptursache UAP'] == 1102].groupby("Jahr")["Hauptursache UAP"].value_counts().plot(kind='bar')









    Out[58]:





<matplotlib.axes._subplots.AxesSubplot at 0x125899cf8>

7. Combine as much information about breatheliser record as possible.

MERGING: Look at this again!!!*



In [78]:

    
df_fahrzeuge["Atemtest"].describe()









    



/usr/local/lib/python3.5/site-packages/numpy/lib/function_base.py:3823: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)






    Out[78]:





count    115710.000000
mean          0.260254
std           0.582384
min           0.000000
25%                NaN
50%                NaN
75%                NaN
max           9.000000
Name: Atemtest, dtype: float64



In [103]:

    
df_fahrzeuge["Atemtest"].value_counts()

Alcohol_record = df_fahrzeuge[df_fahrzeuge['Atemtest'] == 9.0]



In [104]:

    
Alcohol_record[['BUM Probe angeordnet UAP', 'Blutalkoholtest', 'Datum', 'Vertrautheit Strecke UAP', 'Kennzeichen Fahrzeug Kanton']]









    Out[104]:






  
    
      
      BUM Probe angeordnet UAP
      Blutalkoholtest
      Datum
      Vertrautheit Strecke UAP
      Kennzeichen Fahrzeug Kanton
    
  
  
    
      15936
      901
      NaN
      2011/01/05 00:00:00
      830
      Waadt



In [107]:

    
df_merged = df_personen.merge(df_fahrzeuge, how = 'left', left_on = 'Unfall-UID', right_on ='Unfall-UID')



In [108]:

    
df = df_merged.merge(df_unfaelle, how = 'left', left_on = 'Unfall-UID', right_on ='Unfall-UID')



In [ ]:



In [ ]:



In [ ]:



In [70]:

    
df_fahrzeuge.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 442007 entries, 0 to 442006
Data columns (total 48 columns):
Abblend-/Tagfahrlicht UAP                   408604 non-null float64
Ablenkung UAP                               442007 non-null int64
Angabe Führerausweis UAP                    428646 non-null float64
Angabe Führerausweis Zusatz UAP             22133 non-null object
Angaben Fahrzeug UAP                        162825 non-null object
Anhänger / Auflieger UAP                    417774 non-null float64
Anprall UAP                                 106171 non-null object
Atemtest                                    115710 non-null float64
BUM Probe angeordnet UAP                    442007 non-null int64
BUM Probe angeordnet Zusatz UAP             194 non-null object
Bauartbedingte Höchstgeschwindigkeit UAP    395153 non-null float64
Blutalkoholtest                             19394 non-null float64
Blutprobe Alkohol angeordnet UAP            442007 non-null int64
Blutprobe Alkohol angeordnet Zusatz UAP     1169 non-null object
Datum                                       442007 non-null object
Eigentumsverhältnis UAP                     428727 non-null float64
Fahrzeugart UAP                             428754 non-null object
Fahrzeugart Zusatz UAP                      1410 non-null object
Fahrzweck UAP                               442007 non-null int64
Führerausweis Kategorie                     373702 non-null object
Führerausweis Land UAP                      423208 non-null object
Führerausweis seit                          372490 non-null object
Getötete                                    442007 non-null int64
Hauptverursacher UAP                        442007 non-null int64
Jahr                                        442007 non-null int64
Kennzeichen Fahrzeug Kanton                 348204 non-null object
Kennzeichen Fahrzeug Land UAP               427616 non-null object
Kennzeichenart UAP                          411540 non-null float64
Leichtverletzte                             442007 non-null int64
Lenker UAP                                  428754 non-null float64
Meldepflicht UAP                            55762 non-null object
Objekt-Kategorie UAP                        442007 non-null int64
Objekt-Nr                                   442007 non-null object
Objekt-UID                                  442007 non-null object
Schwerverletzte                             442007 non-null int64
Total Personen                              442007 non-null int64
Unfall-UID                                  442007 non-null object
Ursache 1 UAP                               286988 non-null float64
Ursache 2 UAP                               80336 non-null float64
Ursache 3 UAP                               14747 non-null float64
Verdacht Alkohol UAP                        423200 non-null float64
Verdacht Betäubungsmittel UAP               438622 non-null float64
Verdacht Betäubungsmittel Zusatz UAP        1531 non-null object
Verdacht Betääubungsmittel Zusatz UAP       404 non-null object
Verdacht auf Arzneimittel UAP               438659 non-null float64
Verdacht auf Arzneimittel Zusatz UAP        1030 non-null object
Vertrautheit Strecke UAP                    442007 non-null int64
unverletzte Personen                        442007 non-null int64
dtypes: float64(15), int64(13), object(20)
memory usage: 161.9+ MB



In [68]:

    
df_unfaelle.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266483 entries, 0 to 266482
Data columns (total 42 columns):
Aktuelle BFS Gemeinde-Nr                       213248 non-null float64
Bahnübergang UAP                               266483 non-null int64
Bahnübergang Zusatz UAP                        287 non-null object
Datum                                          266483 non-null object
Getötete                                       266483 non-null int64
Hauptursache UAP                               266483 non-null int64
Höchstgeschwindigkeit                          266483 non-null int64
Höchstgeschwindigkeit Zusatz UAP               6364 non-null object
Innerorts / Ausserorts UAP                     266483 non-null int64
Jahr                                           266483 non-null int64
Kanton Kürzel                                  266483 non-null object
Leichtverletzte                                266483 non-null int64
Lichtverhältnis UAP                            266483 non-null int64
Schwerverletzte                                266483 non-null int64
Sicht UAP                                      266483 non-null int64
Strassenanlage UAP                             266483 non-null int64
Strassenanlage Zusatz UAP                      12218 non-null object
Strassenart UAP                                266483 non-null int64
Strassenart Zusatz UAP                         8807 non-null object
Strassenbeleuchtung UAP                        266483 non-null int64
Strassenkategorie                              266483 non-null object
Strassenzustand UAP                            266483 non-null int64
Strassenzustand Zusatz UAP                     1770 non-null object
Total Objekte                                  266483 non-null int64
Total Personen                                 266483 non-null int64
Total geschätzter Sachschaden (in 1000 CHF)    266482 non-null float64
Unfall-UID                                     266483 non-null object
Unfallstelle UAP                               266483 non-null int64
Unfallstelle Zusatz UAP                        56871 non-null object
Unfalltyp UAP                                  266483 non-null int64
Unfallzeit                                     266483 non-null object
Verkehrsbedingungen UAP                        266483 non-null int64
Verkehrsregelung UAP                           266483 non-null int64
Verkehrsregelung Zusatz UAP                    179 non-null object
Vortrittsregelung UAP                          266483 non-null int64
Witterung UAP                                  266483 non-null int64
Witterung Zusatz UAP                           4828 non-null object
Wochentag-Nr                                   266483 non-null int64
Zonensignalisation UAP                         266483 non-null int64
unverletzte Personen                           266483 non-null int64
x-Koordinate                                   266483 non-null int64
y-Koordinate                                   266483 non-null int64
dtypes: float64(2), int64(27), object(13)
memory usage: 85.4+ MB



In [69]:

    
df_personen.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 550892 entries, 0 to 550891
Data columns (total 15 columns):
Jahr                       550892 non-null int64
Datum                      550892 non-null object
Unfall-UID                 550892 non-null object
Objekt-UID                 550892 non-null object
Objekt-Nr                  550892 non-null object
Person-UID                 550892 non-null object
Person-Nr                  550892 non-null int64
Geschlecht UAP             550892 non-null int64
Alter                      550892 non-null object
Führerausweisalter         550892 non-null object
Personenart UAP            550892 non-null int64
Personenart Zusatz UAP     229669 non-null object
Unfallfolgen UAP           550892 non-null int64
Schutzsystem UAP           550892 non-null int64
Schutzsystem Zusatz UAP    28730 non-null object
dtypes: int64(6), object(9)
memory usage: 63.0+ MB

	Fahrzeugart UAP
710	310440
712	22099
733	21517
730	14451
nan	13253
725	9724
718	7061
723	6171
722	4723
724	4636

	Total geschätzter Sachschaden (in 1000 CHF)	Datum	x-Koordinate	y-Koordinate	Kanton Kürzel	Aktuelle BFS Gemeinde-Nr
102850	3097.0	2012/11/26 00:00:00	733188	264855	TG	4501.0
125924	2113.8	2013/05/06 00:00:00	753599	261346	SG	3213.0
14218	2000.0	2011/04/14 00:00:00	577759	183068	FR	2196.0
232322	1801.0	20.05.15	772188	201408	GR	NaN
150309	1730.0	2013/10/17 00:00:00	685914	252395	ZH	69.0

	Getötete	Datum	x-Koordinate	y-Koordinate	Kanton Kürzel	Aktuelle BFS Gemeinde-Nr
64358	28	2012/03/13 00:00:00	607075	125901	VS	6248.0
252202	5	26.09.15	627775	267992	AG	NaN
90121	4	2012/09/03 00:00:00	570220	186565	FR	2024.0
139561	3	2013/08/04 00:00:00	663813	199798	OW	1401.0
235187	3	07.06.15	747298	159066	GR	NaN

Swiss road accidents 2011 - 2015

Contents

ALCOHOL

DRUGS

MERGING: Look at this again!!!*