Data source: Federal Roads Office (FEDRO)
Analysing every road accident in Switzerland from 2011 to 2015 using Pandas.
In [2]:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use("ggplot")
%matplotlib inline
Importing files on accidents, .csv Files, organised by year
In [3]:
unfaelle2011 = pd.read_csv("Export_1_Unfallblatt_2011.csv", delimiter = ";", encoding = "latin-1")
unfaelle2012 = pd.read_csv("Export_1_Unfallblatt_2012.csv", delimiter = ";", encoding = "latin-1")
unfaelle2013 = pd.read_csv("Export_1_Unfallblatt_2013.csv", delimiter = ";", encoding = "latin-1")
unfaelle2014 = pd.read_csv("Export_1_Unfallblatt_2014.csv", delimiter = ";", encoding = "latin-1")
unfaelle2015 = pd.read_csv("Export_1_Unfallblatt_2015.csv", delimiter = "\t", encoding = "UTF-8")
Importing files on people involved in accidents, .csv Files, organised by year
In [4]:
personen2011 = pd.read_csv("Export_3_Personenblatt_2011.csv", delimiter = ";", encoding = "latin-1")
personen2012 = pd.read_csv("Export_3_Personenblatt_2012.csv", delimiter = ";", encoding = "latin-1")
personen2013 = pd.read_csv("Export_3_Personenblatt_2013.csv", delimiter = ";", encoding = "latin-1")
personen2014 = pd.read_csv("Export_3_Personenblatt_2014.csv", delimiter = ";", encoding = "latin-1")
personen2015 = pd.read_csv("Export_3_Personenblatt_2015.csv", delimiter = "\t", encoding = "UTF-8")
Importing files on vehicles involved in accidents, .csv Files, organised by year
In [5]:
fahrzeuge2011 = pd.read_csv("Export_2_Objektblatt_2011.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2012 = pd.read_csv("Export_2_Objektblatt_2012.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2013 = pd.read_csv("Export_2_Objektblatt_2013.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2014 = pd.read_csv("Export_2_Objektblatt_2014.csv", delimiter = ";", encoding = "latin-1")
fahrzeuge2015 = pd.read_csv("Export_2_Objektblatt_2015.csv", delimiter = "\t", encoding = "UTF-8")
Concatenating the files
In [6]:
df_unfaelle = pd.concat([unfaelle2011, unfaelle2012, unfaelle2013, unfaelle2014, unfaelle2015], ignore_index=True)
df_personen = pd.concat([personen2011, personen2012, personen2013, personen2014, personen2015], ignore_index=True)
df_fahrzeuge = pd.concat([fahrzeuge2011, fahrzeuge2012, fahrzeuge2013, fahrzeuge2014, fahrzeuge2015], ignore_index=True)
In [7]:
#df_unfaelle.columns
In [8]:
#df_personen.columns
In [9]:
#df_fahrzeuge.columns
#contains the "Hauptverursacher UAP" (ja/nein) category
OVERVIEW
1. Total number of road accidents in Switzerland since 2011? How have they developed over the years? Make a bar graph.
In [10]:
df_unfaelle['Unfall-UID'].count()
Out[10]:
In [11]:
df_unfaelle.groupby('Jahr')['Unfall-UID'].count()
Out[11]:
In [12]:
df_unfaelle.groupby('Jahr')['Unfall-UID'].count().plot(kind='bar')
plt.savefig("Unfaelle_pro_Jahr.svg")
plt.savefig("Unfaelle_pro_Jahr.png")
Most recently there has been a slight increase in road accidents in Switzerland.
2. On which day of the week do the most accidents occur? And at what time of day? Make graph of that as well.
In [13]:
df_unfaelle['Wochentag-Nr'].value_counts()
Out[13]:
In [14]:
df_unfaelle['Wochentag-Nr'].astype(str).str.replace('6', "Saturday").str.replace('4', "Thursday").str.replace('1', "Monday").str.replace('2', "Tuesday").str.replace('3', "Wednesday").str.replace('5', "Friday").str.replace('7', "Sunday").value_counts(ascending=True).plot(kind='barh')
plt.savefig("Unfaelle_pro_Tag.svg")
plt.savefig("Unfaelle_pro_Tag.png")
In [15]:
plt.savefig("Unfaelle_pro_Tag.svg")
plt.savefig("Unfaelle_pro_Tag.png")
Transforming the data into times is proving a problem. They need to be transformed, so I can make a histogram of the times. With time as X-axis and count of accidents corresponding to the time on Y-xis. This post on timestamps in Pandas may be of some help.
In [16]:
#df_unfaelle['Unfallzeit'].value_counts(ascending=False).hist(kind='scatter', x=)
#df_unfaelle['Unfallzeit'].value_counts().plot(kind='barh', x=['Unfallzeit'])
#df_unfaelle['Unfallzeit_Double'] = df_unfaelle['Unfallzeit']
df_unfaelle['Unfallzeit'].value_counts().head(4)
Out[16]:
3. Compare the top ten various vehicle categories involved in accidents. Total and by year.
In [17]:
df_fahrzeuge['Fahrzeugart UAP'].value_counts().head(5)
#money_players = nba[nba['2013 $'] != 'n/a']
Out[17]:
The "|" sign ist causing a lot of problems as it doesn't allow me to change the names of the numbers in one go. And I haven't figures out how to do it in two goes.
In [18]:
#df_fahrzeuge['Fahrzeugart UAP'].astype(str).str.replace('|', "").value_counts(ascending=False)
#vehicles_without_pipes.astype(str).str.replace('710', "Cars").value_counts(ascending=False).head(10)
test = df_fahrzeuge['Fahrzeugart UAP'].astype(str).str.replace('|', "").value_counts(ascending=False)
pd.DataFrame(test).head(10)
#test['Fahrzeugart UAP']
#test['Fahrzeugart UAP'].astype(str).str.replace('712', "Vans")
#test.columns()
#.str.replace('712', "Vans").str.replace('733', "n/a").str.replace('730', "Bicycles").str.replace('725', "Motorbikes, above 25kw").str.replace('718', "Trucks, above 7,4t").str.replace('722', "Mopeds").str.replace('724', "Motorbike, to 25kw").str.replace('720', "Semi trailer, above 7,5t").
Out[18]:
4. What was the most expensive crash in the past five years? Where was it? And which crash was the most deadly?
In [19]:
df_unfaelle.sort_values('Total geschätzter Sachschaden (in 1000 CHF)', ascending=False)[['Total geschätzter Sachschaden (in 1000 CHF)', 'Datum', 'x-Koordinate', 'y-Koordinate', 'Kanton Kürzel', 'Aktuelle BFS Gemeinde-Nr']].head(5)
Out[19]:
In [20]:
df_unfaelle.sort_values('Getötete', ascending=False)[['Getötete', 'Datum', 'x-Koordinate', 'y-Koordinate', 'Kanton Kürzel', 'Aktuelle BFS Gemeinde-Nr']].head(5)
Out[20]:
5. How many accidents occured on roads without ligthing? (Maybe even work out an accident hotspot?
In [21]:
Strassenbeleuchtung UAP 640 keine 641 ausser Betrieb
In [ ]:
In [ ]:
df_unfaelle[(df_unfaelle['Lichtverhältnis UAP'] == 622) & (df_unfaelle['Strassenbeleuchtung UAP'] == 640)]
#df[(df['animal'] == 'cat') & (df['inches'] > 12)]
#df[df['feet'] > 6.5]
In [ ]:
df_unfaelle[(df_unfaelle['Lichtverhältnis UAP'] == 622) & (df_unfaelle['Strassenbeleuchtung UAP'] == 641)]
Including the over 1000 locations were the lights were out of order, over 10% of accidents happened in poorly lit locations.
6. How have drug and alcohol related accidents developed in the past 5 years?
In [57]:
df_unfaelle[df_unfaelle['Hauptursache UAP'] == 1101].groupby("Jahr")["Hauptursache UAP"].value_counts().plot(kind='bar')
Out[57]:
In [58]:
df_unfaelle[df_unfaelle['Hauptursache UAP'] == 1102].groupby("Jahr")["Hauptursache UAP"].value_counts().plot(kind='bar')
Out[58]:
7. Combine as much information about breatheliser record as possible.
In [78]:
df_fahrzeuge["Atemtest"].describe()
Out[78]:
In [103]:
df_fahrzeuge["Atemtest"].value_counts()
Alcohol_record = df_fahrzeuge[df_fahrzeuge['Atemtest'] == 9.0]
In [104]:
Alcohol_record[['BUM Probe angeordnet UAP', 'Blutalkoholtest', 'Datum', 'Vertrautheit Strecke UAP', 'Kennzeichen Fahrzeug Kanton']]
Out[104]:
In [107]:
df_merged = df_personen.merge(df_fahrzeuge, how = 'left', left_on = 'Unfall-UID', right_on ='Unfall-UID')
In [108]:
df = df_merged.merge(df_unfaelle, how = 'left', left_on = 'Unfall-UID', right_on ='Unfall-UID')
In [ ]:
In [ ]:
In [ ]:
In [70]:
df_fahrzeuge.info()
In [68]:
df_unfaelle.info()
In [69]:
df_personen.info()