Motivation

What is your dataset?

We have chosen this dataset in order to investigate the patterns of fire incidents, and since London Fire Brigade in one of the largest and most busy fire departments in the globe, we figured this would be a great place to start. The dataset covers the years 2013, 2014 and 2015, as well as the first quarter of 2016.

We shifted to this project from our previous since we found that the data was corrupted and our preliminary analysis could not find anything worth discussing.

Why did you choose this/these particular dataset(s)?

The Motivation for this dataset was the fact that many people die in fires every single year. One team member has even lost a relative in a fire.

What was your goal for the end user's experience?

To present the data in a beautiful and explaining way such that the user leaves the site enlightened.

Basic stats. Let's understand the dataset better

Write about your choices in data cleaning and preprocessing

The data was very clean itself. One of the things that we did not like though, was the Easting/Northing format for the locations. We are used to Latitude/longitude, and if we very to use them in D3.js, we would have to convert. The conversion was done using the UTM package in python.

We also did some initial processing with the data stamps, since they for one reason used the roman numbers format for the months. Such that 1.1.2013 would be 1.I.2013 instead. This was done with a simple conversion dictionary.

We can some light scripts on the data in order to answers some very basic questions like whether there was an obvious pattern in with months/day/time of day the incidents occurred. This can be seen on the site.

Write a short section that discusses the dataset stats (here you can recycle the work you did for Project Assignment A)

We shifted to this project from our previous and thus can't recycle from project A.

The set is made of 26 columns/parameters that we can put to the test, including:

Time
Date
Location
Type of incident
Property type
Easting/northing location pointer

Theory. Which theoretical tools did you use?

Describe which machine learning tools you use and why the tools you've chosen are right for the problem you're solving.

Based on our data we decided to go with k-Nearest Neighbours and k-Means.

Talk about your model selection. How did you split the data in to test/training. Did you use cross validation?

k-Nearest Neighbours

A classification method kNN was used for predicting whether an incident call is real fire or not. The reason for using this model was having many features of an incident such as time or the week day of call. As labels we used obviously values either fire or not fire.

k-Means

A clustering method k-Means was used to find the optimal positions of fire stations when we have a limited number which we can build. We have found k-Means as a perfect method for solving this task because it is unsupervised learning method. So the only information we provided was latitude and longitude.

Splitting data

In both cases the data had been splitted to 80% of training and 20% for tests.

Explain the model performance. How did you measure it? Are your results what you expected?

k-Nearest Neighbours

We used sklearn metrics for measuring the precision and recall.

1 - Fire
- Precision 0.45%
- Recall 0.18
0 - Not fire
- Precision 0.78%
- Recall 0.93

Results are unexpected both for fire and not fire. We were assuming we would get better in predicting a fire. But pretty much quite good at predicting a non fire incident.

k-Means

We did not do any measurements here because the goal was creating just clusters.

Visualizations

Explain the visualizations you've chosen.

We have chosed:

Barcharts for showing data distribution through out the year/month/day
A map with clusters so users can see the optimal places of fire stations

Why are they right for the story you want to tell?

Chosen visualizations are pretty descriptive they dont require that much time to understand.

In barchats one can easily spot trends
Maps are exciting to look at

Discussion. Think critically about your creation

What went well?,

The machine learning part went pretty well and once we had a better dataset, everything became a lot easier. The collaboration also went well since all team members were experienced with git and seamless code collaboration went smooth and without conflicts.

What is still missing? What could be improved?, Why?

We would have loved to have better, stronger, deeper visualisations of the data, but the time was snappy due to the switch of the dataset. It would be also interesting to combine fire data with estates in different neighbourhoods of London and investigate whether there is any corelation between them.

Appendix

A - machine learning code

http://nbviewer.jupyter.org/url/euronails.org/london/notebooks/thang-notebook.ipynb



In [ ]: