Start by importing whatever you need to import in order to make this lab work:
In [ ]:
# .. your code here ..
Primary Type
column, click on the Menu
button next to the info button, and select Filter This Column
. It might take a second for the filter option to show up, since it has to load the entire list first.GAMBLING
Export
button next to the Filter
button, and select Download As CSV
Now that you have th dataset stored as a CSV, load it up being careful to double check headers, as per usual:
In [ ]:
# .. your code here ..
Get rid of any rows that have nans in them:
In [ ]:
# .. your code here ..
Display the dtypes
of your dset:
In [ ]:
# .. your code here ..
Coerce the Date
feature (which is currently a string object) into real date, and confirm by displaying the dtypes
again. This might be a slow executing process...
In [ ]:
# .. your code here ..
In [ ]:
def doKMeans(df):
# Let's plot your data with a '.' marker, a 0.3 alpha at the Longitude,
# and Latitude locations in your dataset. Longitude = x, Latitude = y
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(df.Longitude, df.Latitude, marker='.', alpha=0.3)
# TODO: Filter `df` using indexing so it only contains Longitude and Latitude,
# since the remaining columns aren't really applicable for this lab:
#
# .. your code here ..
# TODO: Use K-Means to try and find seven cluster centers in this df.
# Be sure to name your kmeans model `model` so that the printing works.
#
# .. your code here ..
# Now we can print and plot the centroids:
centroids = model.cluster_centers_
print(centroids)
ax.scatter(centroids[:,0], centroids[:,1], marker='x', c='red', alpha=0.5, linewidths=3, s=169)
plt.show()
In [ ]:
# Print & Plot your data
doKMeans(df)
Filter out the data so that it only contains samples that have a Date > '2011-01-01'
, using indexing. Then, in a new figure, plot the crime incidents, as well as a new K-Means run's centroids.
In [ ]:
# .. your code here ..
In [ ]:
# Print & Plot your data
doKMeans(df)
In [ ]: