In [22]:
from IPython.display import IFrame
IFrame(
'https://www.sunfrog.com/Geek-Tech/First-solve-the-problem-Then-write-the-code.html',
width=800,
height=350,
)
Out[22]:
This Jupyter notebook is a place to keep my thoughts organized on how to best present Fort Lauderdale Police department data obtained at the 2016 Fort Lauderdale Civic Hackathon. I blogged about it here.
Just prior to my participation in the Fort Lauderdale Civic Hackathon, I experimented with a MapBox GL JS API. You can see my simple demonstration here where I created a bunch of fake points and cluster mapped them.
That experiment is what inspired me to suggest to my hackathon parter David Karim that we heat map the data. See that map here.
I know little about databases though I respect their power to efficiently handle data. I struggle with cognitive overhead of SQL databases with their normalized data and join commands.
When I heard that MongoDB organized its data without a requirement of normalization, I knew I had to investigate because that CSV file of data was not normalized.
MongoDB's behavior fit my mental model of how I imagined it would be to easily handle data. While I have little experience with which to compare, it appears that MongoDB can efficiently handle over 92,000 citation documents with ease.
A more difficult question: Can I write code to make MongoDB do its thing most efficiently?!
The MapBox API works well with geojson data. A quick search on Google reveals that MongoDB has built-in support for geojson!
In [23]:
from IPython.display import IFrame
IFrame(
'https://docs.mongodb.com/manual/reference/geojson/',
width=800,
height=350,
)
Out[23]:
Create a new collection called 'citations_geojson'.
Create some new documents in 'citations_geoson' as a GeoJSON point objects.
What a point looks like in MongoDB.
{ type: "Point", coordinates: [ 40, 5 ] }
PyMongo driver has some info on geo indexing that might be relevant.
Can the MapBox API utilize geojson data served up directly from MongoDB?
While the MongoDB ojbects looks similar to the geojson I was used to seeing when building MapBox maps, they do not appear to be exactly the same.
Theory: a MapBox API may be able to handle the data directly from MongoDB. I have to figure out how to make that connection.
geojson vt blog post from MapBox
❝If you’re using Mapbox GL-based tools (either GL JS or Mapbox Mobile), you’re already using GeoJSON-VT under the hood.❞
So it is possible to feed large amounts of data to a map. It does not answer the question of whether or not I have to munge the data coming from MongoDB first or not to make it MapBox valid geojson.
This definitely looks promising from the npm domain!
❝GeoJSON normalization for mongoDB. Convert an array of documents with geospatial information (2dsphere only) into a GeoJSON feature collection.❞
%%HTML # create-collection
The repository for this data in a Dockerized MongoDB instance is here: dm-wyncode/docker-mongo-flpd-hackathon-data
In [3]:
# a module for importing values so I do not have to expose them in this Jupyter notebook
from meta import dockerized_mongo_path
In [5]:
# the '!' preceding the command allows me to access the shell from the Jupyter notebook
# in which I am writing this blog post
# ./expect-up-daemon calls a /usr/bin/expect script
# in the $dockerized_mongo_path to bring up the dockerized MongoDB
!cd $dockerized_mongo_path && ./expect-up-daemon
Verify that the database is running and responding to the pymongo driver.
In [47]:
from pymongo import MongoClient
client = MongoClient()
db = client.app
collection_names = sorted(db.collection_names())
print(collection_names)
In [51]:
collections = accidents, citations = [db.get_collection(collection)
for collection
in collection_names]
info = [{collection_name: format(collection.count(), ',')}
for collection_name, collection
in zip(collection_names, collections)]
print("document counts")
for item in info:
print(item)
I must have somehow loaded the database twice with data when I wrote this post because in that post the document counts were double the number I a getting now. I modified the Docker Compose file so that it uses a local file system directory to store the MongoDB data: See the Docker Compose file here.
After some further investigation, I have written a continuation…