Data used - MapZen Weekly OpenStreetMaps Metro Extracts
Map Areas: These two maps are selected since ,right now i am living at Hoodi,Bengaluru. And my dream is to do my masters in japan in robotics,so i had selected locality of University of tokyo, Bunkyo.I really wanted to explore differences between the regions.
Working Code :
In [1]:
def isEnglish(string):
try:
string.encode('ascii')
except UnicodeEncodeError:
return False
else:
return True
In [ ]:
#the city below can be hoodi or bunkyo
for st_type, ways in city_types.iteritems():
for name in ways:
better_name = update_name(name, mapping)
if name != better_name:
print name, "=>", better_name
In [ ]:
#few examples
Bunkyo:
Meidai Jr. High Sch. => Meidai Junior High School
St. Mary's Cathedral => Saint Mary's Cathedral
Shinryukei brdg. E. => Shinryukei Bridge East
Iidabashi Sta. E. => Iidabashi Station East
...
Hoodi:
St. Thomas School => Saint Thomas School
Opp. Jagrithi Apartment => Opposite Jagrithi Apartment
...
These two maps are selected since ,right now i am living at Hoodi,Bengaluru. And one day i want do my masters in japan in robotics,so i had selected locality of University of tokyo, Bunkyo.I really wanted to explore differences between the regions.
I need to add a tag named "city" so i can differentiate them from the database.
This section contains basic statistics about the dataset and the MongoDB queries used to gather them.
In [ ]:
bangalore.osm -40MB
bangalore.osm.json-51MB
tokyo1.osm- 82MB
tokyo1.osm.json-102.351MB
In [6]:
print "Bunkyo:",mongo_db.cities.find({'city':'bunkyo'}).count()
print "Hoodi:",mongo_db.cities.find({'city':'hoodi'}).count()
In [ ]:
print "Bunkyo:",mongo_db.cities.find({"type":"node",
'city':'bunkyo'}).count()
print "Hoodi:",mongo_db.cities.find({"type":"node",
'city':'hoodi'}).count()
In [1]:
Bunkyo: 1051170
Hoodi: 548862
In [1]:
print "Bunkyo:",mongo_db.cities.find({'type':'way',
'city':'bunkyo'}).count()
print "Hoodi:",mongo_db.cities.find({'type':'way',
'city':'hoodi'}).count()
In [1]:
Bunkyo: 217122
Hoodi: 118980
In [1]:
print "Constributors:", len(mongo_db.cities.distinct("created.user"))
In [1]:
Contributors: 858
In [ ]:
def pipeline(city):
p= [{"$match":{"created.user":{"$exists":1},
"city":city}},
{"$group": {"_id": {"City":"$city",
"User":"$created.user"},
"contribution": {"$sum": 1}}},
{"$project": {'_id':0,
"City":"$_id.City",
"User_Name":"$_id.User",
"Total_contribution":"$contribution"}},
{"$sort": {"Total_contribution": -1}},
{"$limit" : 5 }]
return p
result1 =mongo_db["cities"].aggregate(pipeline('bunkyo'))
for each in result1:
print(each)
print("\n")
result2 =mongo_db["cities"].aggregate(pipeline('hoodi'))
for each in result2:
print(each)
The top contributors for hoodi are no where near since bunkyo being a more compact region than hoodi ,there are more places to contribute.
In [ ]:
pipeline=[{"$match":{"Additional Information.amenity":{"$exists":1},
"city":city}},
{"$group": {"_id": {"City":"$city",
"Amenity":"$Additional Information.amenity"},
"count": {"$sum": 1}}},
{"$project": {'_id':0,
"City":"$_id.City",
"Amenity":"$_id.Amenity",
"Count":"$count"}},
{"$sort": {"Count": -1}},
{"$limit" : 10 }]
In [ ]:
p = [{"$match":{"Additional Information.amenity":{"$exists":1},
"Additional Information.amenity":"place_of_worship",
"city":city}},
{"$group":{"_id": {"City":"$city",
"Religion":"$Additional Information.religion"},
"count":{"$sum":1}}},
{"$project":{"_id":0,
"City":"$_id.City",
"Religion":"$_id.Religion",
"Count":"$count"}},
{"$sort":{"Count":-1}},
{"$limit":6}]
As expected japan is popular with buddism,
but india being a secular country it will be having most of the reglious places of worship,where hinduism being majority
In [ ]:
p = [{"$match":{"Additional Information.amenity":{"$exists":1},
"Additional Information.amenity":"restaurant",
"city":city}},
{"$group":{"_id":{"City":"$city",
"Food":"$Additional Information.cuisine"},
"count":{"$sum":1}}},
{"$project":{"_id":0,
"City":"$_id.City",
"Food":"$_id.Food",
"Count":"$count"}},
{"$sort":{"Count":-1}},
{"$limit":6}]
{u'Count': 582, u'City': u'bunkyo'} {u'Food': u'japanese', u'City': u'bunkyo', u'Count': 192} {u'Food': u'chinese', u'City': u'bunkyo', u'Count': 126} {u'Food': u'italian', u'City': u'bunkyo', u'Count': 69} {u'Food': u'indian', u'City': u'bunkyo', u'Count': 63} {u'Food': u'sushi', u'City': u'bunkyo', u'Count': 63}
{u'Count': 213, u'City': u'hoodi'} {u'Food': u'regional', u'City': u'hoodi', u'Count': 75} {u'Food': u'indian', u'City': u'hoodi', u'Count': 69} {u'Food': u'chinese', u'City': u'hoodi', u'Count': 36} {u'Food': u'international', u'City': u'hoodi', u'Count': 24} {u'Food': u'Andhra', u'City': u'hoodi', u'Count': 21}
Indian style cusine in Bunkyo seems famous, Which will be better if i go to japan and do my higher studies there.
In [ ]:
p = [{"$match":{"Additional Information.amenity":{"$exists":1},
"Additional Information.amenity":"fast_food",
"city":city}},
{"$group":{"_id":{"City":"$city",
"Food":"$Additional Information.cuisine"},
"count":{"$sum":1}}},
{"$project":{"_id":0,
"City":"$_id.City",
"Food":"$_id.Food",
"Count":"$count"}},
{"$sort":{"Count":-1}},
{"$limit":6}]
Burger seems very popular among japanese in fast foods,i was expecting ramen to be more popular
, but in hoodi pizza is really common,being a metropolitan city.
In [ ]:
p = [{"$match":{"Additional Information.amenity":{"$exists":1},
"Additional Information.amenity":"atm",
"city":city}},
{"$group":{"_id":{"City":"$city",
"Name":"$Additional Information.name:en"},
"count":{"$sum":1}}},
{"$project":{"_id":0,
"City":"$_id.City",
"Name":"$_id.Name",
"Count":"$count"}},
{"$sort":{"Count":-1}},
{"$limit":4}]
There are quite a few ATM in Bunkyo as compared to hoodi
In [ ]:
## Martial arts or Dojo Center near locality
import re
pat = re.compile(r'dojo', re.I)
d=mongo_db.cities.aggregate([{"$match":{ "$or": [ { "Additional Information.name": {'$regex': pat}}
,{"Additional Information.amenity": {'$regex': pat}}]}}
,{"$group":{"_id":{"City":"$city"
, "Sport":"$Additional Information.name"}}}])
for each in d:
print(each)
In [ ]:
bunkyo:
{u'_id': {u'City': u'bunkyo', u'Sport': u'Aikikai Hombu Dojo'}}
{u'_id': {u'City': u'bunkyo', u'Sport': u'Kodokan Dojo'}}
hoodi:
{u'_id': {u'City': u'hoodi', u'Sport': u"M S Gurukkal's Kalari Academy"}}
I wanted to learn martial arts , In japan is known for its akido and other ninjistsu martial arts , where i can find some in bunkyo Where as in hoodi,india Kalaripayattu Martial Arts are one of the ancient arts that ever existed.
In [ ]:
p = [{"$match":{"Additional Information.shop":{"$exists":1},
"city":city}},
{"$group":{"_id":{"City":"$city",
"Shop":"$Additional Information.shop"},
"count":{"$sum":1}}},
{"$project": {'_id':0,
"City":"$_id.City",
"Shop":"$_id.Shop",
"Count":"$count"}},
{"$sort":{"Count":-1}},
{"$limit":10}]
In [ ]:
{u'Shop': u'convenience', u'City': u'bunkyo', u'Count': 1035}
{u'Shop': u'clothes', u'City': u'bunkyo', u'Count': 282}
{u'Shop': u'books', u'City': u'bunkyo', u'Count': 225}
{u'Shop': u'mobile_phone', u'City': u'bunkyo', u'Count': 186}
{u'Shop': u'confectionery', u'City': u'bunkyo', u'Count': 156}
{u'Shop': u'supermarket', u'City': u'bunkyo', u'Count': 150}
{u'Shop': u'computer', u'City': u'bunkyo', u'Count': 126}
{u'Shop': u'hairdresser', u'City': u'bunkyo', u'Count': 90}
{u'Shop': u'electronics', u'City': u'bunkyo', u'Count': 90}
{u'Shop': u'anime', u'City': u'bunkyo', u'Count': 90}
{u'Shop': u'clothes', u'City': u'hoodi', u'Count': 342}
{u'Shop': u'supermarket', u'City': u'hoodi', u'Count': 129}
{u'Shop': u'bakery', u'City': u'hoodi', u'Count': 120}
{u'Shop': u'shoes', u'City': u'hoodi', u'Count': 72}
{u'Shop': u'furniture', u'City': u'hoodi', u'Count': 72}
{u'Shop': u'sports', u'City': u'hoodi', u'Count': 66}
{u'Shop': u'electronics', u'City': u'hoodi', u'Count': 60}
{u'Shop': u'beauty', u'City': u'hoodi', u'Count': 54}
{u'Shop': u'car', u'City': u'hoodi', u'Count': 36}
{u'Shop': u'convenience', u'City': u'hoodi', u'Count': 36}
In [ ]:
The general stores are quite common in both the places
In [ ]:
p = [{"$match":{"Additional Information.shop":{"$exists":1},
"city":city,
"Additional Information.shop":"supermarket"}},
{"$group":{"_id":{"City":"$city",
"Supermarket":"$Additional Information.name"},
"count":{"$sum":1}}},
{"$project": {'_id':0,
"City":"$_id.City",
"Supermarket":"$_id.Supermarket",
"Count":"$count"}},
{"$sort":{"Count":-1}},
{"$limit":5}]
In [ ]:
{u'Count': 120, u'City': u'bunkyo'}
{u'Count': 9, u'City': u'bunkyo', u'Supermarket': u'Maruetsu'}
{u'Count': 3, u'City': u'bunkyo', u'Supermarket': u"Y's Mart"}
{u'Count': 3, u'City': u'bunkyo', u'Supermarket': u'SainE'}
{u'Count': 3, u'City': u'bunkyo', u'Supermarket': u'DAIMARU Peacock'}
{u'Count': 9, u'City': u'hoodi', u'Supermarket': u'Reliance Fresh'}
{u'Count': 9, u'City': u'hoodi'}
{u'Count': 6, u'City': u'hoodi', u'Supermarket': u"Nilgiri's"}
{u'Count': 3, u'City': u'hoodi', u'Supermarket': u'Royal Mart Supermarket'}
{u'Count': 3, u'City': u'hoodi', u'Supermarket': u'Safal'}
These are few common supermarket brands in both the cities And Nilgiris is like 500 meters away from my home.
After such a investigation on this data i think i have become familiar with bunkyo region.
I was expecting a difficulty in merging both the cities data into a single database ,but seem a simple key like city could differentiate them.
There might be even robust cleaning algorithms to a better and clean database,as most of the data is from gps that goes into OpenStreetMap.org. Which needed to be regularly cleaned.
From the comparision of both the cities these are qiute similar and bunkyo region interests me even more to pursue higher studies.