There's a number of services out there that make Machine Learning accessible to the masses by abstracting away the complexities of creating predictive models from data. Here I want to show you how to use one of them, BigML, through its API, in order to create a real estate pricing model.
The idea is that you're given characteristics of a real estate property (e.g. number of bedrooms, surface, year of construction, etc.) and you input these into a "model" that will predict the property's value. To create this model, we'll just need to use some example real-estate data that I've scraped from realtor.com using Import.io. The data contains 4776 rows (one per example property), it's available to download as a CSV file or to browse on Google Spreadsheets.
In the following, we'll see how to upload the data to BigML, which will automatically create a predictive model, and how to query this model with any given set of real estate property characteristics. Check out this blog post if you want to understand what happens behind the scenes, how Machine Learning works, and when it fails to work (http://louisdorard.com/blog/when-machine-learning-fails).
The following is an IPython notebook to show you how to use the BigML API to...
IPython notebooks act as interactive web-based code tutorials. They are web pages in which there are blocks of code that you can edit and run. The code is run on the same server that serves the page and the output is displayed on the page. You'll be able to edit and run the blocks of code below by positionning your cursor inside them and pressing Shift+Enter.
First of all, you should create a free BigML account at https://bigml.com/accounts/register/ (it takes 2 minutes, literally).
Authentication is performed using your BigML username and API key, which can be found at https://bigml.com/account/apikey
In [ ]:
BIGML_USERNAME = '' # fill in your username between the quotes
BIGML_API_KEY = '' # fill in your API key
BIGML_AUTH = 'username=' + BIGML_USERNAME + ';api_key=' + BIGML_API_KEY # leave as it is
print "Authentication variables set!"
In [ ]:
# Uncomment lines below in case this block doesn't work
#import pip
#pip.main(['install', 'bigml'])
from bigml.api import BigML
# Assuming you installed the BigML Python wrappers (with the 'pip install bigml' command, see above)
# Assuming BIGML_USERNAME and BIGML_API_KEY were defined as shell environment variables
# otherwise: api=BigML('your username here','your API key here',dev_mode=True)
api=BigML(dev_mode=True) # use BigML in development mode for unlimited usage
print "Wrapper ready to use!"
In [ ]:
source = api.create_source('s3://bml-data/realtor-las-vegas.csv', {"name": "Realtor LV"})
API calls are asynchronous, so we use api.ok to make sure that the request has finished before we move on to the rest.
In [ ]:
api.ok(source) # shows "True" when source has been created
The source can be found on the BigML.com web interface at the following URL:
In [ ]:
BIGML_AUTH = %env BIGML_AUTH
print "https://bigml.com/dashboard/"+str(source['resource'])+"?"+BIGML_AUTH
Open the link in a new tab. If it doesn't work, check that you're logged in on the BigML.com web interface and make sure that the toggle on the right is at "development" (and not "production").
We now create a dataset.
In [ ]:
dataset = api.create_dataset(source, {"name": "Realtor LV dataset"})
api.ok(dataset)
print "Dataset ready and available at https://bigml.com/dashboard/"+str(dataset['resource'])+"?"+BIGML_AUTH
If you click on the outputted link above, it will take you to a histogram view of the data on the BigML dashboard.
In [ ]:
model = api.create_model(dataset)
print "'model' object created!"
BigML uses decision tree models. The tree that's been learnt from your data can be seen at:
In [ ]:
api.ok(model) # making sure the model is ready
print "Model ready and available at https://bigml.com/dashboard/"+str(model['resource'])+"?"+BIGML_AUTH
Let's say we want to predict the value (in USD) of a real estate property characterized by the following attributes (go on and edit the values if you want):
In [ ]:
# the strings below correspond to headers of the realtor-las-vegas.csv file we used to create the model
new_input = {"bedrooms": 4, "full_bathrooms": 2, "type": "Single Family Home", "size_sqft": 1500}
print "'new_input' object created!"
Let's make a prediction for this new input against the model we created:
In [ ]:
prediction = api.create_prediction(model, new_input)
print "Prediction: ",prediction['object']['output']
Here's the same thing on one single line:
In [ ]:
print "Value: ",api.create_prediction(model, {"bedrooms": 4, "full_bathrooms": 4, "type": "Single Family Home", "size_sqft": 1500})['object']['output']," USD"
This was just an overview of the basics of Machine Learning and of BigML's core functionalities. Check out Bootstrapping Machine Learning (http://louisdorard.com/machine-learning-book) to learn more about Prediction APIs, how to apply ML to your domain, how to prepare your data CSV file and how to integrate predictions in your app or in your business.