We have chosen this dataset in order to investigate the patterns of fire incidents, and since London Fire Brigade in one of the largest and most busy fire departments in the globe, we figured this would be a great place to start. The dataset covers the years 2013, 2014 and 2015, as well as the first quarter of 2016.
We shifted to this project from our previous since we found that the data was corrupted and our preliminary analysis could not find anything worth discussing.
The Motivation for this dataset was the fact that many people die in fires every single year. One team member has even lost a relative in a fire.
To present the data in a beautiful and explaining way such that the user leaves the site enlightened.
The data was very clean itself. One of the things that we did not like though, was the Easting/Northing format for the locations. We are used to Latitude/longitude, and if we very to use them in D3.js, we would have to convert. The conversion was done using the UTM package in python.
We also did some initial processing with the data stamps, since they for one reason used the roman numbers format for the months. Such that 1.1.2013 would be 1.I.2013 instead. This was done with a simple conversion dictionary.
We can some light scripts on the data in order to answers some very basic questions like whether there was an obvious pattern in with months/day/time of day the incidents occurred. This can be seen on the site.
We shifted to this project from our previous and thus can't recycle from project A.
The set is made of 26 columns/parameters that we can put to the test, including:
Based on our data we decided to go with k-Nearest Neighbours and k-Means.
A classification method kNN was used for predicting whether an incident call is real fire or not. The reason for using this model was having many features of an incident such as time or the week day of call. As labels we used obviously values either fire or not fire.
A clustering method k-Means was used to find the optimal positions of fire stations when we have a limited number which we can build. We have found k-Means as a perfect method for solving this task because it is unsupervised learning method. So the only information we provided was latitude and longitude.
In both cases the data had been splitted to 80% of training and 20% for tests.
We used sklearn metrics for measuring the precision and recall.
Results are unexpected both for fire and not fire. We were assuming we would get better in predicting a fire. But pretty much quite good at predicting a non fire incident.
We did not do any measurements here because the goal was creating just clusters.
We have chosed:
Chosen visualizations are pretty descriptive they dont require that much time to understand.
The machine learning part went pretty well and once we had a better dataset, everything became a lot easier. The collaboration also went well since all team members were experienced with git and seamless code collaboration went smooth and without conflicts.
We would have loved to have better, stronger, deeper visualisations of the data, but the time was snappy due to the switch of the dataset. It would be also interesting to combine fire data with estates in different neighbourhoods of London and investigate whether there is any corelation between them.
In [ ]: