Google maps bicycling directions are "in beta", i.e. they aren't very good, at least compared to driving directions.
I hope to improve one aspect of these directions - timing - by comparing ride timing from bike share data to the Google timing for the same endpoints.
Here are the bike share data sources I have found:
Data from google maps is probably best accessed through the distance matrix api:https://developers.google.com/maps/documentation/distance-matrix/ The big limitation of this is that I can only access data for 2500 trips per day without paying. With this limitation, it will take me a month or so just to get data for each system.
A friend of mine made an iPhone app that shows the connections via movies between actors, directors, and films One enters an actor, for example, and a list of the movies that actor has been in and the directors they have worked with, while entering a director will give the movies the director has directed and the actors they have worked with. The major issue is this: IMDB does not allow commercial use of their data, and this app is on the apple iTunes store. Instead, the data is based on looking at the Wikipedia pages of the actors, movies, and directors, and finding all the links to actors or directors on the page. This means that there are a large number of incorrect data points, most significantly false positives for connections.
Using the text of the Wikipedia page and possibly IMDB data (http://www.imdb.com/interfaces) for some supervision of learning, I hope to come up with an algorithm that gives a probability of a false positive for a connection.
In order to commercialize this product, I must be able to build an algorithm that doesn't directly use IMDB data, i.e. I can't deliver a predictor that directly trained on IMDB data.
Instead, I hope to use IMDB data to learn what is necessary to write a basic algorithm that will be commercially acceptable.
The bad news about data access is that the API's for accessing Wikipedia data have limitations on access, so it will take about two weeks to get the text information.
Using demographic data, such as:
I hope to be able to predict the number of taxi trip drop offs in a given area. Taxi trip data is available from NYC government websites: (http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml or (just includes new data) https://data.cityofnewyork.us/data?agency=Taxi+and+Limousine+Commission+%28TLC%29&cat=&type=new_view&browseSearch=&scope= )
In [ ]: