5 minutes to creating your first Machine Learning model

There's a number of services out there that make Machine Learning accessible to the masses by abstracting away the complexities of creating predictive models from data. Here I want to show you how to use one of them, BigML, through its API, in order to create a real estate pricing model.

The idea is that you're given characteristics of a real estate property (e.g. number of bedrooms, surface, year of construction, etc.) and you input these into a "model" that will predict the property's value. To create this model, we'll just need to use some example real-estate data that I've scraped from realtor.com using Import.io. The data contains 4776 rows (one per example property), it's available to download as a CSV file or to browse on Google Spreadsheets.

In the following, we'll see how to upload the data to BigML, which will automatically create a predictive model, and how to query this model with any given set of real estate property characteristics. Check out this blog post if you want to understand what happens behind the scenes, how Machine Learning works, and when it fails to work (http://louisdorard.com/blog/when-machine-learning-fails).

This page is interactive

The following is an IPython notebook to show you how to use the BigML API to...

  1. create a model from data
  2. make predictions with this model.

IPython notebooks act as interactive web-based code tutorials. They are web pages in which there are blocks of code that you can edit and run. The code is run on the same server that serves the page and the output is displayed on the page. You'll be able to edit and run the blocks of code below by positionning your cursor inside them and pressing Shift+Enter.

0. Initialize the BigML API

First of all, you should create a free BigML account at https://bigml.com/accounts/register/ (it takes 2 minutes, literally).

Authentication variables

Authentication is performed using your BigML username and API key, which can be found at https://bigml.com/account/apikey


In [ ]:
BIGML_USERNAME = '' # fill in your username between the quotes
BIGML_API_KEY = '' # fill in your API key
BIGML_AUTH = 'username=' + BIGML_USERNAME + ';api_key=' + BIGML_API_KEY # leave as it is
print "Authentication variables set!"

API wrapper

We create an api object which will be used to communicate with the BigML API.

Note that BigML has two ways of functioning: production mode or development mode. Here, we choose to use the latter since it's free!


In [ ]:
# Uncomment lines below in case this block doesn't work
#import pip
#pip.main(['install', 'bigml'])

from bigml.api import BigML

# Assuming you installed the BigML Python wrappers (with the 'pip install bigml' command, see above)
# Assuming BIGML_USERNAME and BIGML_API_KEY were defined as shell environment variables
# otherwise: api=BigML('your username here','your API key here',dev_mode=True)

api=BigML(dev_mode=True) # use BigML in development mode for unlimited usage
print "Wrapper ready to use!"

1. Create a predictive model

Specify training data to use

BigML makes a distinction between the origin of the data (the "source") and the actual data that's being used for training (the "dataset"). We first create a data source by specifying a csv file to use (hosted on Amazon S3 in this example).


In [ ]:
source = api.create_source('s3://bml-data/realtor-las-vegas.csv', {"name": "Realtor LV"})

API calls are asynchronous, so we use api.ok to make sure that the request has finished before we move on to the rest.


In [ ]:
api.ok(source) # shows "True" when source has been created

The source can be found on the BigML.com web interface at the following URL:


In [ ]:
BIGML_AUTH = %env BIGML_AUTH
print "https://bigml.com/dashboard/"+str(source['resource'])+"?"+BIGML_AUTH

Open the link in a new tab. If it doesn't work, check that you're logged in on the BigML.com web interface and make sure that the toggle on the right is at "development" (and not "production").

We now create a dataset.


In [ ]:
dataset = api.create_dataset(source, {"name": "Realtor LV dataset"})
api.ok(dataset)
print "Dataset ready and available at https://bigml.com/dashboard/"+str(dataset['resource'])+"?"+BIGML_AUTH

If you click on the outputted link above, it will take you to a histogram view of the data on the BigML dashboard.

Learn a model from the data

This is done in just one command — there are no parameters to set whatsoever.


In [ ]:
model = api.create_model(dataset)
print "'model' object created!"

BigML uses decision tree models. The tree that's been learnt from your data can be seen at:


In [ ]:
api.ok(model) # making sure the model is ready
print "Model ready and available at https://bigml.com/dashboard/"+str(model['resource'])+"?"+BIGML_AUTH

2. Make predictions

Let's say we want to predict the value (in USD) of a real estate property characterized by the following attributes (go on and edit the values if you want):


In [ ]:
# the strings below correspond to headers of the realtor-las-vegas.csv file we used to create the model
new_input = {"bedrooms": 4, "full_bathrooms": 2, "type": "Single Family Home", "size_sqft": 1500}
print "'new_input' object created!"

Let's make a prediction for this new input against the model we created:


In [ ]:
prediction = api.create_prediction(model, new_input)
print "Prediction: ",prediction['object']['output']

Here's the same thing on one single line:


In [ ]:
print "Value: ",api.create_prediction(model, {"bedrooms": 4, "full_bathrooms": 4, "type": "Single Family Home", "size_sqft": 1500})['object']['output']," USD"

Learn more

This was just an overview of the basics of Machine Learning and of BigML's core functionalities. Check out Bootstrapping Machine Learning (http://louisdorard.com/machine-learning-book) to learn more about Prediction APIs, how to apply ML to your domain, how to prepare your data CSV file and how to integrate predictions in your app or in your business.