Chen Yang yangcnju@gmail.com
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import Image
%matplotlib inline
Build a model to make predictions on blighted buildings based on real data from data.detroitmi.gov as given by coursera.
Building demolition is very important for the city to turn around and revive its economy. However, it's no easy task. Accurate predictions can provide guidance on potential blighted buildings and help avoid complications at early stages.
The buildings were defined as described below:
In [3]:
# The resulted buildings:
Image("./data/buildings_distribution.png")
Out[3]:
Three kinds (311-calls, blight-violations, and crimes) of incident counts and coordinates (normalized) was used in the end. I also tried to generate more features by differentiating each kind of crimes or each kind of violations in this notebook. However, these differentiated features lead to smaller AUC scores.
In [4]:
Image('./data/train_process.png')
Out[4]:
In [5]:
Image('./data/feature_f_scores.png')
Out[5]:
Locations were most important features in this model. Although I tried using more features generated by differentiating different kind of crimes or violations, the AUC scores did not improve.
In [6]:
Image('./data/bst_tree.png')
Out[6]:
In [7]:
Image('./data/ROC_Curve_combined.png')
Out[7]:
Several things worth trying:
In [ ]: