OpenStreetMap is an open project, which means it's free and everyone can use it and edit as they like. OpenStreetMap is direct competitor of Google Maps. How OpenStreetMap can compete with the giant you ask? It's depend completely on crowd sourcing. There's lot of people willingly update the map around the world, most of them fix their map country.
Openstreetmap is so powerful, and rely heavily on the human input. But its strength also the downfall. Everytime there's human input, there's always be human error.It's very error prone.
Take the name of the street for example. People like to abbreviate the type of the street. Street become St. st. In Indonesia, 'Jalan'(Street-Eng), also abbreviated as Jln, jln, jl, Jln. It maybe get us less attention. But for someone as Data Scientist/Web Developer, they expect the street to have generic format.
'Jalan Sudirman' -> Jalan <name> -> name = Sudirman
'Jln Sudirman' -> Jalan <name> -> ERROR!
This project tends to fix that, it fix abbreviate name, so it can use more generalize type. Not only it's benefit for professional, But we can also can see more structured words.
In this project, i want to show you to fix one type of error, that is the address of the street. I choose whole places of Jakarta. Jakarta is the capital of Indonesia.This dataset is huge, over 250,000 examples. It's my hometown, and i somewhat want to help the community. And not only that, i also will show you how to put the data that has been audited into MongoDB instance. We also use MongoDB's Agregation Framework to get overview and analysis of the data.
In [8]:
OSM_FILE = 'jakarta.osm'
In [11]:
%load mapparser.py
To audit the osm file, first we need to know the overview of the data. To get an overview of the data, we count the tag content of the data.
In [14]:
%load users.py
In [2]:
import xml.etree.cElementTree as ET
In [1]:
%load audit.py
This will save the jakarta osm that has been audited into jakarta_audit.osm Not let's prepare the audited file to be input to the MongoDB instance.
In [ ]:
%load submission/data.py
The processed map has ben saved to jakarta_audit.osm.json Now that we have process the audited map file into array of JSON, let's put it into mongodb instance. this will take the map that we have been audited. First we load the script to insert the map
In [3]:
from data import *
import pprint
In [39]:
data = process_map('jakarta_audit.osm')
Okay let's test if the data is something that we expect
In [33]:
pprint.pprint(data[0:6])
The data seems about right. After we verified the data is ready, let's put it into MongoDB
In [4]:
from pymongo import MongoClient
In [5]:
client = MongoClient('mongodb://localhost:27017')
db = client.examples
In [ ]:
[db.jktosm.insert(e) for e in data]
Okay, it seems that we have sucessfully insert all of our data into MongoDB instance. Let's test this
In [6]:
pipeline = [
{'$limit' : 6}
]
pprint.pprint(db.jktosm.aggregate(pipeline)['result'])
In [7]:
pipeline = [
{'$match': {'address.street':{'$exists':1}}},
{'$limit' : 5}
]
result = db.jktosm.aggregate(pipeline)['result']
pprint.pprint(result)
In [45]:
pipeline = [
{'$match': {'created.user':{'$exists':1}}},
{'$group': {'_id':'$created.user',
'count':{'$sum':1}}},
{'$sort': {'count':-1}},
{'$limit' : 5}
]
result = db.jktosm.aggregate(pipeline)['result']
pprint.pprint(result)
In [9]:
pipeline = [
{'$match': {'amenity':'restaurant',
'name':{'$exists':1}}},
{'$project':{'_id':'$name',
'cuisine':'$cuisine',
'contact':'$phone'}}
]
result = db.jktosm.aggregate(pipeline)['result']
pprint.pprint(result)
the changeset is here http://osmhv.openstreetmap.de/changeset.jsp?id=26730562