T-shirt inspiration

❝first solve the problem then write the code❞


In [22]:
from IPython.display import IFrame
IFrame(
    'https://www.sunfrog.com/Geek-Tech/First-solve-the-problem-Then-write-the-code.html', 
    width=800, 
    height=350,
)


Out[22]:

Introduction

This Jupyter notebook is a place to keep my thoughts organized on how to best present Fort Lauderdale Police department data obtained at the 2016 Fort Lauderdale Civic Hackathon. I blogged about it here.

Just prior to my participation in the Fort Lauderdale Civic Hackathon, I experimented with a MapBox GL JS API. You can see my simple demonstration here where I created a bunch of fake points and cluster mapped them.

That experiment is what inspired me to suggest to my hackathon parter David Karim that we heat map the data. See that map here.

MongoDB

I know little about databases though I respect their power to efficiently handle data. I struggle with cognitive overhead of SQL databases with their normalized data and join commands.

When I heard that MongoDB organized its data without a requirement of normalization, I knew I had to investigate because that CSV file of data was not normalized.

MongoDB's behavior fit my mental model of how I imagined it would be to easily handle data. While I have little experience with which to compare, it appears that MongoDB can efficiently handle over 92,000 citation documents with ease.

A more difficult question: Can I write code to make MongoDB do its thing most efficiently?!

Creating geojson data

The MapBox API works well with geojson data. A quick search on Google reveals that MongoDB has built-in support for geojson!


In [23]:
from IPython.display import IFrame
IFrame(
    'https://docs.mongodb.com/manual/reference/geojson/', 
    width=800, 
    height=350,
)


Out[23]:

Tasks

question

  • Can the MapBox API utilize geojson data served up directly from MongoDB?

    While the MongoDB ojbects looks similar to the geojson I was used to seeing when building MapBox maps, they do not appear to be exactly the same.

    Theory: a MapBox API may be able to handle the data directly from MongoDB. I have to figure out how to make that connection.

possible sources of answers

  • geojson vt blog post from MapBox

    ❝If you’re using Mapbox GL-based tools (either GL JS or Mapbox Mobile), you’re already using GeoJSON-VT under the hood.❞

    So it is possible to feed large amounts of data to a map. It does not answer the question of whether or not I have to munge the data coming from MongoDB first or not to make it MapBox valid geojson.

  • This definitely looks promising from the npm domain!

    ❝GeoJSON normalization for mongoDB. Convert an array of documents with geospatial information (2dsphere only) into a GeoJSON feature collection.❞

Some words I recognize from trying out MapBox: ❝GeoJSON feature collection❞

Create a collection in MongoDB using PyMongo.

%%HTML # create-collection

Start the MongoDB Docker container

The repository for this data in a Dockerized MongoDB instance is here: dm-wyncode/docker-mongo-flpd-hackathon-data


In [3]:
# a module for importing values so I do not have to expose them in this Jupyter notebook
from meta import dockerized_mongo_path

In [5]:
# the '!' preceding the command allows me to access the shell from the Jupyter notebook
# in which I am writing this blog post
# ./expect-up-daemon calls a /usr/bin/expect script 
# in the $dockerized_mongo_path to bring up the dockerized MongoDB
!cd $dockerized_mongo_path && ./expect-up-daemon


spawn sudo docker-compose up -d
[sudo] password for dmmmd: 
dockerflaskmongo_mongodb_1 is up-to-date

Verify that the database is running and responding to the pymongo driver.


In [47]:
from pymongo import MongoClient

client = MongoClient()
db = client.app
collection_names = sorted(db.collection_names())
print(collection_names)


['accidents', 'citations']

In [51]:
collections = accidents, citations = [db.get_collection(collection) 
                                      for collection 
                                      in collection_names]
info = [{collection_name: format(collection.count(), ',')}
        for collection_name, collection 
        in zip(collection_names, collections)]
print("document counts")
for item in info:
    print(item)


document counts
{'accidents': '13,472'}
{'citations': '46,456'}

I must have somehow loaded the database twice with data when I wrote this post because in that post the document counts were double the number I a getting now. I modified the Docker Compose file so that it uses a local file system directory to store the MongoDB data: See the Docker Compose file here.

Time for a break. To be continued…

After some further investigation, I have written a continuation