Bringing Deep Learning to the Grocery Store

Food is an important part of everybody's life. Not only does food influence how we look and feel, agriculture also has a profound impact on ecosystems, economies, and politics. Yet when we go to the grocery store, it can be difficult to really know exactly what we're purchasing and where it comes from.

Inspired by this problem, a few of us decided to build an application that provides information on a packaged food product based on an image taken with a smartphone. In a future blog post, we will share what we built and how we built it. In this notebook, however, we delve deeper into the actual implementation.

This notebook is divided into 5 main parts:

Data Acquisition - Downloading the data and deduplicating it with the deduplication toolkit.
Finding Similar Foods - Pre-computation which identifies similar foods within the datset. This is useful in similar item recommendations
Image Feature Extraction - Finding a vector representation of images in the dataset using a Deep Learning model
Building the Nearest Neighbor Model/Querying the catalog - Building a model with which you can match a new photo to one in the dataset.
Building a Predictive Service - Turning all our hard work into a hosted service, which later is queried by our phone app!

Below are two screenshot of what the app looks like. The first image is a photo taken on the phone, and the second is matched results. Pretty good! To learn what it takes to build an app like this, read on!

Data Preparation

First we download the SFrame containing the openfood dataset, which is a listing of packaged food items from around the world. Although the dataset contains about 30000 package foods, we filter for ones from english speaking countries. This ends up giving us about 2000 packaged food items in the SFrame.



In [1]:

    
import graphlab
import base64



In [2]:

    
openfood_sf = graphlab.SFrame('https://static.turi.com/datasets/food/filtered_openfood_sf')









    



[INFO] This commercial license of GraphLab Create is assigned to engr@turi.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-81941 - Server binary: /Users/piotrteterwak/.virtualenvs/gl-1.4/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1433373934.log
[INFO] GraphLab Server Version: 1.4.0






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/dir_archive.ini to /var/tmp/graphlab-piotrteterwak/81941/000000.ini






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/objects.bin to /var/tmp/graphlab-piotrteterwak/81941/000001.bin






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.frame_idx to /var/tmp/graphlab-piotrteterwak/81941/000002.frame_idx






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.sidx to /var/tmp/graphlab-piotrteterwak/81941/000003.sidx

Inspect the products in the dataset



In [3]:

    
openfood_sf['product_name']









    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.0000 to /var/tmp/graphlab-piotrteterwak/81941/000004.0000






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.0001 to /var/tmp/graphlab-piotrteterwak/81941/000005.0001






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.0002 to /var/tmp/graphlab-piotrteterwak/81941/000006.0002






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.0003 to /var/tmp/graphlab-piotrteterwak/81941/000007.0003






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.0004 to /var/tmp/graphlab-piotrteterwak/81941/000008.0004






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.0005 to /var/tmp/graphlab-piotrteterwak/81941/000009.0005






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.0006 to /var/tmp/graphlab-piotrteterwak/81941/000010.0006






    




PROGRESS: Downloading https://static.turi.com/datasets/food/filtered_openfood_sf/m_ec0c66b259e235f5.0007 to /var/tmp/graphlab-piotrteterwak/81941/000011.0007






    Out[3]:





dtype: str
Rows: 2416
['Flute', 'Cauliflower', 'Spring Onions', 'Luxury Christmas Pudding', 'Luxury Christmas Pudding', "Trader Joe's sea salt potato chips", '', 'Large flat mushrooms', '', 'British Beef Braising Steak', '', 'Yellow Mustard', 'British plain flour', '', '', 'Broccoli', 'Color bombe', '', 'British plain flour', '', '', '', '', 'Haggis', '', '', '6 Christmas deep mince pies', 'Intensely Fruity Christmas Pudding', 'Intensely Fruity Christmas Pudding', 'Intensely Fruity Christmas Pudding', 'Swiss Dark Chocolate', 'Brown Rice', '', 'Christmas Pudding', 'Christmas Pudding', 'Orange & Cranberry Pudding', 'Semolina', '', 'vegetarian bean', 'Mexican orange blossom honey', 'Macaroni & Cheese', '', 'mango protein flavored soy protein shake', '9 Mini All Butter Mince Pies', '', 'Pirouette', '', 'Goldfish', '', 'Lemon Cheesecake', '', 'Strawberry & Champagne Conserve', 'Lemongrass Paste', 'Restaurant Style Blue Chips', 'pad thai ribbon noodles', 'coco puffs', 'Multi Grain Cheerios', 'Bugles Original', '', '', 'Soft Baked Oatmeal Squares Cinnamon Brown Sugar', '', 'Protein chewy bar 10 grams of protein -- 190 calories', '', '', '', 'milk chocolate rich tea biscuits', 'Premium Ground Bison', '', 'panna cotta dessert mix', 'Orange & Cranberry Pudding', 'Italian style babyleaf salad', '', 'Semi skimmed British milk', 'jasmine rice', 'cut mixed peel', 'Ground Almonds', 'Ground almonds', 'desiccated coconut', '', 'Fairtrade white caster sugar', '', 'British White Caster Sugar', 'British Icing Sugar', 'Whole British Milk', 'Lifesavers Mints Wint O Green', '', '', 'True Whey', 'Cinnamon sugar', '', '', 'Cheesy Pizza', '', '', 'Blueberries', 'Peanut Butter Chip Chewy Granola Bars', "Fresh Creamy Goat's Cheese", '', 'Folgers Simply Smooth Decaf (medium)', ... ]

The Openfood dataset contains some repeated product names(for instance, 'Intensely Fruity Christmas Pudding'), so we use the deduplication toolkit to remove copies. Here, we only want to deduplicate names that are very similar. Therefore, we set the radius parameter to be fairly small. The radius is the maximum distance from each point to another so that they are still considered duplicates.



In [4]:

    
dedup = graphlab.nearest_neighbor_deduplication.create(openfood_sf, features=['product_name'],radius=0.25)









    



[INFO] Constructing groups of records that match exactly on the 'grouping_features'.
[INFO] Processing 2416 records in match group: 1/1
[INFO] Building the similarity graph....






    




PROGRESS: Starting to query






    




PROGRESS: +--------------+---------+-------------+--------------+






    




PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |






    




PROGRESS: +--------------+---------+-------------+--------------+






    




PROGRESS: | 1            | 2416    | 0           | 1.01s        |






    




PROGRESS: | 5            | 12080   | 0           | 1.01s        |






    




PROGRESS: | Done         |         | 100         | 1.37s        |






    




PROGRESS: +--------------+---------+-------------+--------------+






    



[INFO] Finding duplicate records in the similarity graph....






    




PROGRESS: +-----------------------------+






    




PROGRESS: | Number of components merged |






    




PROGRESS: +-----------------------------+






    




PROGRESS: | 247                         |






    




PROGRESS: | 0                           |






    




PROGRESS: +-----------------------------+

We select one element from each entity group to be our cleaned dataset.



In [5]:

    
dedup_sf = dedup['entities'].groupby(key_columns="__entity", operations = {'row_number' : graphlab.aggregate.SELECT_ONE('row_number')})
openfood_sf = openfood_sf.add_row_number('row_number').filter_by(dedup_sf['row_number'], 'row_number')

Now, there are many fewer duplicates.



In [6]:

    
openfood_sf['product_name']









    Out[6]:





dtype: str
Rows: 2169
['Flute', 'Cauliflower', 'Spring Onions', "Trader Joe's sea salt potato chips", '', 'Large flat mushrooms', '', '', 'Yellow Mustard', '', '', 'Color bombe', '', 'British plain flour', '', '', '', '', 'Haggis', '', '', '6 Christmas deep mince pies', 'Intensely Fruity Christmas Pudding', 'Swiss Dark Chocolate', 'Brown Rice', '', 'Orange & Cranberry Pudding', 'Semolina', '', 'vegetarian bean', 'Mexican orange blossom honey', '', 'mango protein flavored soy protein shake', '9 Mini All Butter Mince Pies', '', 'Pirouette', '', 'Goldfish', '', 'Lemon Cheesecake', '', 'Strawberry & Champagne Conserve', 'Lemongrass Paste', 'Restaurant Style Blue Chips', 'pad thai ribbon noodles', 'coco puffs', 'Multi Grain Cheerios', 'Bugles Original', '', '', 'Soft Baked Oatmeal Squares Cinnamon Brown Sugar', '', 'Protein chewy bar 10 grams of protein -- 190 calories', '', '', '', 'milk chocolate rich tea biscuits', 'Premium Ground Bison', '', 'panna cotta dessert mix', 'Italian style babyleaf salad', '', 'Semi skimmed British milk', 'jasmine rice', 'cut mixed peel', 'Ground Almonds', 'desiccated coconut', '', 'Fairtrade white caster sugar', '', 'British White Caster Sugar', 'British Icing Sugar', 'Lifesavers Mints Wint O Green', '', '', 'True Whey', 'Cinnamon sugar', '', '', 'Cheesy Pizza', '', '', 'Blueberries', 'Peanut Butter Chip Chewy Granola Bars', "Fresh Creamy Goat's Cheese", '', 'Folgers Simply Smooth Decaf (medium)', 'Madagascar Bourbon Vanilla Bean Paste', 'Vanilla Extract', 'Orange Blossom Water', 'giant red hot pickled sausage', '', '', '', 'Nestea sweet iced tea Lemon', '', '', '', '', '', ... ]

Finding Similar Foods

We start by using the autotagger to find similar foods within the dataset, based on textual information like ingredients and category. The autotagger toolkit pre-processes the query and tags to extract character 4-grams, unigrams, and bigrams as features, then employs the nearest neighbor toolkit to do the similarity search with a weighted jaccard distance. In this case both queries and tags are entries in the dataset, and the result is a matching between each entry and other similar entries.

We start by concatenating several features to create the query and tag.



In [7]:

    
openfood_sf['product_features'] = openfood_sf['product_name'] + ', ' + openfood_sf['generic_name'] + ' ' + openfood_sf['categories'] + ' ' + openfood_sf['packaging'] + ' ' + openfood_sf['brands'] + ' ' + openfood_sf['labels'] + ' ' + openfood_sf['ingredients_text'] + ' ' + openfood_sf['allergens'] + ' ' + openfood_sf['additives']

Next, we set the reference set equal to the tags. Now we'll a set of food items that are most similar to each item in the openfood database.



In [8]:

    
openfood_sf['product_tags'] = openfood_sf['product_features']
tagger = graphlab.autotagger.create(openfood_sf, tag_name='product_tags')









    



[INFO] Extracting features...






    



Starting nearest neighbors model construction...
Class                               : NearestNeighborAutoTagger

Settings
--------
Number of examples                  : 1777
Number of feature columns           : 3
Total training time (seconds)       : 0.3893

Now we query the tagger. Here, we choose k=4 so we can retrieve 4 tags, or nearest neighbors. Note that the nearest tag will be the same as the query since the query set and tag set are identical.



In [9]:

    
openfood_similar_products = tagger.tag(openfood_sf, query_name='product_features', k=4)









    




PROGRESS: Starting to query






    




PROGRESS: +--------------+---------+-------------+--------------+






    




PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |






    




PROGRESS: +--------------+---------+-------------+--------------+






    




PROGRESS: | 1            | 2169    | 0           | 47.479ms     |






    




PROGRESS: | 1365         | 2427111 | 62.75       | 1.04s        |






    




PROGRESS: | Done         |         | 100         | 1.73s        |






    




PROGRESS: +--------------+---------+-------------+--------------+

Remove tags that equal the query , and do some operations to make the results human-reader friendly.



In [10]:

    
openfood_similar_products = openfood_similar_products[openfood_similar_products['product_features'] != openfood_similar_products['product_tags']]
openfood_similar_products['product_name'] = openfood_similar_products['product_tags'].apply(lambda x: x.split(',')[0])
openfood_similar_products_groupby = openfood_similar_products.groupby(key_columns='product_features_id', operations={'similar_foods': graphlab.aggregate.CONCAT('product_name')})
openfood_sf = openfood_sf.add_row_number('row_id')
openfood_sf = openfood_sf.join(openfood_similar_products_groupby, how='left', on={'row_id':'product_features_id'})
openfood_sf = openfood_sf.fillna('similar_foods', ['','',''])

Inspect the results! They look good!



In [11]:

    
openfood_sf.select_columns(['product_name','similar_foods']).unpack('similar_foods')









    Out[11]:





    
        product_name
        similar_foods.0
        similar_foods.1
        similar_foods.2
    
    
        Flute
        Granary baguette
        Arborio Risotto
        6 Large Granary Baps
    
    
        Cauliflower
        Cauliflower Florets
        Sweetheart Cabbage
        Courgettes
    
    
        Spring Onions
        Trimmed Spring Onions
        Spring Greens
        Swede
    
    
        Trader Joe's sea salt
potato chips ...
        Palak Paneer
        Joe-Joes
        Dark Chocolate Mints
    
    
        
        
        
        
    
    
        Large flat mushrooms
        Closed cup chestnut
mushrooms ...
        White Mushrooms
        Closed Cup white
Mushrooms ...
    
    
        
        
        
        
    
    
        
        
        
        
    
    
        Yellow Mustard
        Classic Yellow Mustard
        vegetarian bean
        Cornichons with dill
    
    
        
        
        
        
    

[2169 rows x 4 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Extracting Image Features

We will be using deep visual features to match our personal photos of food to the catalog provided by openfood. In order to do that, we need to load in our pre-trained ImageNet neural network model to be used as a feature extractor, and extract features from the images in the dataset. To learn more about feature extraction, read this blog post. Note: This codeblock assumes that you do not have a GPU, and loads pre-extracted features. If you want to extract features yourself, uncomment the lines below.



In [12]:

    
visual_features = graphlab.SFrame()
visual_features['visual_features'] = graphlab.SArray('https://static.turi.com/datasets/food/openfood_extracted_features')
visual_features = visual_features.add_row_number('row_number').filter_by(dedup_sf['row_number'], 'row_number')
openfood_sf['visual_features'] = visual_features['visual_features']
#pretrained_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')
#openfood_sf['visual_features'] = pretrained_model.extract_features(openfood_sf)









    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/dir_archive.ini to /var/tmp/graphlab-piotrteterwak/81941/000016.ini






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/objects.bin to /var/tmp/graphlab-piotrteterwak/81941/000017.bin






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/m_21af18422f57c8c2.sidx to /var/tmp/graphlab-piotrteterwak/81941/000018.sidx






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/m_21af18422f57c8c2.0000 to /var/tmp/graphlab-piotrteterwak/81941/000019.0000






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/m_21af18422f57c8c2.0001 to /var/tmp/graphlab-piotrteterwak/81941/000020.0001






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/m_21af18422f57c8c2.0002 to /var/tmp/graphlab-piotrteterwak/81941/000021.0002






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/m_21af18422f57c8c2.0003 to /var/tmp/graphlab-piotrteterwak/81941/000022.0003






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/m_21af18422f57c8c2.0004 to /var/tmp/graphlab-piotrteterwak/81941/000023.0004






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/m_21af18422f57c8c2.0005 to /var/tmp/graphlab-piotrteterwak/81941/000024.0005






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/m_21af18422f57c8c2.0006 to /var/tmp/graphlab-piotrteterwak/81941/000025.0006






    




PROGRESS: Downloading https://static.turi.com/datasets/food/openfood_extracted_features/m_21af18422f57c8c2.0007 to /var/tmp/graphlab-piotrteterwak/81941/000026.0007

Building the Nearest Neighbor Model

Now we need to build the Nearest Neighbors model which we will search with our own personal photos of food items. Note that we previously were were using the autotagger, while here we are using the Nearest Neighbor Model. This is because previously we were measuring similarity between text features, and the autotagger handles the featurization of text data. However, with images, we've already extracted deep features, and we can directly use the Nearest Neighbor Model.



In [13]:

    
m = graphlab.nearest_neighbors.create(openfood_sf, features=['visual_features'])









    



Starting nearest neighbors model construction...

Querying the Catalog

Now that we have our Nearest Neighbors model, we need to query it with our own photo of food. In this case, I create an SFrame with one image of the Peanut Butter jar in the kitchen area of our office.



In [14]:

    
peanut_butter = graphlab.SFrame({'image':[graphlab.Image('https://static.turi.com/datasets/food/pb.jpg')]})
pretrained_model = graphlab.load_model('http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45')
peanut_butter['image'] = graphlab.image_analysis.resize(peanut_butter['image'], 256, 256, 3)
peanut_butter['visual_features'] = pretrained_model.extract_features(peanut_butter)









    




PROGRESS: Downloading https://static.turi.com/datasets/food/pb.jpg to /var/tmp/graphlab-piotrteterwak/81941/000027.jpg






    




PROGRESS: Downloading http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45/dir_archive.ini to /var/tmp/graphlab-piotrteterwak/81941/000028.ini






    




PROGRESS: Downloading http://s3.amazonaws.com/GraphLab-Datasets/deeplearning/imagenet_model_iter45/objects.bin to /var/tmp/graphlab-piotrteterwak/81941/000029.bin

Let us take a look at the query image. It is clearly Jif peanut butter.



In [16]:

    
peanut_butter.show()









    Out[16]:

And now we query the model. This finds the nearest items in the catalog to our photo.



In [17]:

    
pb_ans = m.query(peanut_butter)









    




PROGRESS: Starting to query






    




PROGRESS: +--------------+---------+-------------+--------------+






    




PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |






    




PROGRESS: +--------------+---------+-------------+--------------+






    




PROGRESS: | 0            | 1       | 0           | 37.246ms     |






    




PROGRESS: | Done         |         | 100         | 383.963ms    |






    




PROGRESS: +--------------+---------+-------------+--------------+

Now, we extract the nearest neighbors out of the original SFrame with a join the row id, and use GraphLab Canvas(which opens in a browser window) to explore similar items foods to the JIF. For easiest viewing, switch to the Table view. Sure enough, JIF peanut butter is there! It appears to have similar items that are other peanut butters!



In [18]:

    
pb_ans.join(openfood_sf,on={"reference_label":"row_id"}).select_columns(['image','similar_foods']).show()









    Out[18]:

Building a Predictive Service

Now that we've prototyped the model, let's deploy it as a predictive service!



In [18]:

    
env = graphlab.deploy.Ec2Config(region='us-west-1',
                                instance_type='m3.large',
                                aws_access_key_id=YOUR_ACCESS_KEY,
                                aws_secret_access_key=YOUR_SECRET_KEY)



In [19]:

    
deployment = graphlab.deploy.predictive_service.create('food-app-notebook',env, 's3://gl-internal-test/food_app/predictive_service_notebook')









    



[INFO] Launching Predictive Service with 1 hosts, as specified by num_hosts parameter
[INFO] Launching Predictive Service, with name: food-app-notebook
[INFO] [Step 0/5]: Initializing S3 locations.
[INFO] [Step 1/5]: Launching EC2 instances.
[INFO]  ========================================
This commercial license of GraphLab Create is assigned to jgu@example2.com.
=================================================


[INFO] Launching an m3.large instance in the us-west-1b availability zone, with id: i-5ddc039f. You will be responsible for the cost of this instance.
[INFO] WARNING: Launching Predictive Service without SSL certificate!
[INFO] [Step 2/5]: Launching Load Balancer.
[INFO] [Step 3/5]: Configuring Load Balancer.
[INFO] [Step 4/5]: Waiting for Load Balancer to put all instances into service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster not fully operational yet, [0/1] instances currently in service.
[INFO] Cluster is fully operational, [1/1] instances currently in service.
[INFO] [Step 5/5]: Finalizing Configuration.
[INFO] Cache enabled successfully!

Select useful columns.



In [20]:

    
openfood_sf_ps = openfood_sf.select_columns(['image_url', 'product_name', 'generic_name', 'proteins_100g', 'fat_100g', 'carbohydrates_100g', 'energy_100g','similar_foods','row_id'])

Function to construct GraphLab Image from bytestring



In [21]:

    
def image_from_bytestring(image_data):
    import cStringIO as StringIO
    from PIL import Image as _PIL_image
    decoded_image_data = base64.b64decode(image_data)
    stream = StringIO.StringIO(decoded_image_data)
    pil_img = _PIL_image.open(stream)
    width = pil_img.size[0]
    height = pil_img.size[1]
    format_to_num = {'JPG': 0, 'PNG': 1, 'RAW': 2}

    if pil_img.mode == 'L':
        channels = 1
    elif pil_img.mode == 'RGB':
        channels = 3
    else:
        channels = 4
    format_enum = format_to_num['RAW']
    if(pil_img.format == 'JPEG' or pil_img.format == 'JPG'):
        format_enum = format_to_num["JPG"]
    if(pil_img.format == "PNG"):
        format_enum = format_to_num["PNG"]
    image_data_size = len(decoded_image_data)

    #setting the appropriate attributes 
    img = graphlab.Image()
    img._image_data = decoded_image_data
    img._height = height
    img._width = width
    img._channels = channels
    img._format_enum = format_enum
    img._image_data_size = image_data_size
    return img

Define function which queries nearest neighbors model to return matching foods in the database and products similar to them.



In [22]:

    
import base64
from graphlab.deploy import required_packages
@required_packages(["Pillow==2.7.0"])
def match_image(image_bytestring):
    image_sf = graphlab.SFrame({'image' : [image_from_bytestring(image_bytestring)]})
    image_sf['image'] = graphlab.image_analysis.resize(image_sf['image'], 256, 256, 3)
    image_sf['visual_features'] = pretrained_model.extract_features(image_sf)
    ans = m.query(image_sf)
    ret_sf = ans.join(openfood_sf_ps,on={"reference_label":"row_id"})
    return ret_sf

Add the match_image function to the predictive service



In [23]:

    
deployment.add('get_similar_food', match_image, description='Get closest neighbors to image in openfood facts dataset')









    



[INFO] New predictive object 'get_similar_food' added, use apply_changes() to deploy all pending changes, or continue other modification.

Issue a local test query



In [24]:

    
deployment.test_query('get_similar_food', image_bytestring = base64.b64encode(str(graphlab.Image('https://static.turi.com/datasets/food/pb.jpg')._image_data)))









    Out[24]:





{u'response': [{'carbohydrates_100g': 3.57,
   'distance': 51.95592540589967,
   'energy_100g': 75,
   'fat_100g': 0.0,
   'generic_name': '',
   'image_url': 'http://en.openfoodfacts.org/images/products/009/300/003/346/front.7.400.jpg',
   'product_name': 'Petite Snack Crunchers Kosher Dills',
   'proteins_100g': 0.0,
   'query_label': 0,
   'rank': 2,
   'reference_label': 383,
   'similar_foods': ['Kosher Dills',
    'Sardines in soybean oil with hot green chilies',
    'Dijon Originale']},
  {'distance': 53.36623550831396,
   'generic_name': '',
   'image_url': 'http://en.openfoodfacts.org/images/products/03158214/front.4.400.jpg',
   'product_name': 'Mixed Fruit Jelly',
   'query_label': 0,
   'rank': 4,
   'reference_label': 687,
   'similar_foods': ['', 'Snickers', 'Fruit on the bottom']},
  {'distance': 52.461127347020835,
   'generic_name': '',
   'image_url': 'http://en.openfoodfacts.org/images/products/051/500/243/213/front.3.400.jpg',
   'product_name': '',
   'query_label': 0,
   'rank': 3,
   'reference_label': 854,
   'similar_foods': ['', '', '']},
  {'carbohydrates_100g': 8.0,
   'distance': 46.51024199096946,
   'energy_100g': 795,
   'fat_100g': 16.0,
   'generic_name': '',
   'image_url': 'http://en.openfoodfacts.org/images/products/051/500/255/162/front.8.400.jpg',
   'product_name': 'Creamy Peanut Butter',
   'proteins_100g': 7.0,
   'query_label': 0,
   'rank': 1,
   'reference_label': 855,
   'similar_foods': ['Skippy Creamy Peanut Butter',
    'Organic no stir Peanut Butter creamy',
    'No stir Creamy peanut butter']},
  {'carbohydrates_100g': 6.7,
   'distance': 53.828907670943074,
   'energy_100g': 930,
   'fat_100g': 17.0,
   'generic_name': '',
   'image_url': 'http://en.openfoodfacts.org/images/products/502/256/909/1027/front.10.400.jpg',
   'product_name': 'Vegetable P\xc3\xa2t\xc3\xa9',
   'proteins_100g': 8.4,
   'query_label': 0,
   'rank': 5,
   'reference_label': 1827,
   'similar_foods': ['Ardennes p\xc3\xa2t\xc3\xa9',
    'Sweetcorn Relish',
    'Lactofree spreadable']}],
 u'uuid': u'81af06cb-a693-4e07-8ee4-85ec232e2eb5',
 u'version': 1}

It worked, so deploy the change and perform a real query!



In [28]:

    
deployment.apply_changes()
deployment.query('get_similar_food', image_bytestring = base64.b64encode(str(graphlab.Image('https://static.turi.com/datasets/food/pb.jpg')._image_data)))









    Out[28]:





{u'response': [{u'carbohydrates_100g': 3.57,
   u'distance': 51.95591630157739,
   u'energy_100g': 75,
   u'fat_100g': 0.0,
   u'generic_name': u'',
   u'image_url': u'http://en.openfoodfacts.org/images/products/009/300/003/346/front.7.400.jpg',
   u'product_name': u'Petite Snack Crunchers Kosher Dills',
   u'proteins_100g': 0.0,
   u'query_label': 0,
   u'rank': 2,
   u'reference_label': 383,
   u'similar_foods': [u'Kosher Dills',
    u'Sardines in soybean oil with hot green chilies',
    u'Dijon Originale']},
  {u'distance': 53.36623194768798,
   u'generic_name': u'',
   u'image_url': u'http://en.openfoodfacts.org/images/products/03158214/front.4.400.jpg',
   u'product_name': u'Mixed Fruit Jelly',
   u'query_label': 0,
   u'rank': 4,
   u'reference_label': 687,
   u'similar_foods': [u'', u'Snickers', u'Fruit on the bottom']},
  {u'distance': 52.46112423809145,
   u'generic_name': u'',
   u'image_url': u'http://en.openfoodfacts.org/images/products/051/500/243/213/front.3.400.jpg',
   u'product_name': u'',
   u'query_label': 0,
   u'rank': 3,
   u'reference_label': 854,
   u'similar_foods': [u'', u'', u'']},
  {u'carbohydrates_100g': 8.0,
   u'distance': 46.51023913108001,
   u'energy_100g': 795,
   u'fat_100g': 16.0,
   u'generic_name': u'',
   u'image_url': u'http://en.openfoodfacts.org/images/products/051/500/255/162/front.8.400.jpg',
   u'product_name': u'Creamy Peanut Butter',
   u'proteins_100g': 7.0,
   u'query_label': 0,
   u'rank': 1,
   u'reference_label': 855,
   u'similar_foods': [u'Skippy Creamy Peanut Butter',
    u'Organic no stir Peanut Butter creamy',
    u'No stir Creamy peanut butter']},
  {u'carbohydrates_100g': 6.7,
   u'distance': 53.82889755052246,
   u'energy_100g': 930,
   u'fat_100g': 17.0,
   u'generic_name': u'',
   u'image_url': u'http://en.openfoodfacts.org/images/products/502/256/909/1027/front.10.400.jpg',
   u'product_name': u'Vegetable P\xe2t\xe9',
   u'proteins_100g': 8.4,
   u'query_label': 0,
   u'rank': 5,
   u'reference_label': 1827,
   u'similar_foods': [u'Ardennes p\xe2t\xe9',
    u'Sweetcorn Relish',
    u'Lactofree spreadable']}],
 u'uuid': u'ba1640a9-1888-4346-9ce9-0a569b06738f',
 u'version': 1}

Don't forget to the terminate the service once you are done!



In [29]:

    
deployment.terminate_service()









    



[INFO] Deleting load balancer: food-app-notebook
[INFO] Terminating EC2 host(s) [u'i-5ddc039f'] in us-west-1
[INFO] Deleting state data on S3.
[INFO] Deleted reference to PredictiveService('food-app-notebook') from current session.

Go ahead and take some food photos at the grocery store, and try this out for yourself!

product_name	similar_foods.0	similar_foods.1	similar_foods.2
Flute	Granary baguette	Arborio Risotto	6 Large Granary Baps
Cauliflower	Cauliflower Florets	Sweetheart Cabbage	Courgettes
Spring Onions	Trimmed Spring Onions	Spring Greens	Swede
Trader Joe's sea salt potato chips ...	Palak Paneer	Joe-Joes	Dark Chocolate Mints

Large flat mushrooms	Closed cup chestnut mushrooms ...	White Mushrooms	Closed Cup white Mushrooms ...


Yellow Mustard	Classic Yellow Mustard	vegetarian bean	Cornichons with dill