Home Depot Product Search Relevance

The challenge is to predict a relevance score for the provided combinations of search terms and products. To create the ground truth labels, Home Depot has crowdsourced the search/product pairs to multiple human raters.

LabGraph Create

This notebook uses the LabGraph create machine learning iPython module. You need a personal licence to run this code.



In [1]:

    
import graphlab as gl

Load data from CSV files



In [2]:

    
train = gl.SFrame.read_csv("../data/train.csv")









    



[INFO] This non-commercial license of GraphLab Create is assigned to thomasv1000@hotmail.fr and will expire on October 12, 2016. For commercial licensing options, visit https://dato.com/buy/.

[INFO] Start server at: ipc:///tmp/graphlab_server-32514 - Server binary: /Users/tjaskula/.graphlab/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1454970983.log
[INFO] GraphLab Server Version: 1.8.1






    



PROGRESS: Finished parsing file /Users/tjaskula/Documents/GitHub/Kaggle.HomeDepot/data/train.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.117518 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[int,int,str,str,float]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file /Users/tjaskula/Documents/GitHub/Kaggle.HomeDepot/data/train.csv
PROGRESS: Parsing completed. Parsed 74067 lines in 0.174436 secs.



In [3]:

    
test = gl.SFrame.read_csv("../data/test.csv")









    



PROGRESS: Finished parsing file /Users/tjaskula/Documents/GitHub/Kaggle.HomeDepot/data/test.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.194729 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[int,int,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Finished parsing file /Users/tjaskula/Documents/GitHub/Kaggle.HomeDepot/data/test.csv
PROGRESS: Parsing completed. Parsed 166693 lines in 0.33546 secs.



In [4]:

    
desc = gl.SFrame.read_csv("../data/product_descriptions.csv")









    



PROGRESS: Finished parsing file /Users/tjaskula/Documents/GitHub/Kaggle.HomeDepot/data/product_descriptions.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.484572 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[int,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Read 61134 lines. Lines per second: 61700.6
PROGRESS: Finished parsing file /Users/tjaskula/Documents/GitHub/Kaggle.HomeDepot/data/product_descriptions.csv
PROGRESS: Parsing completed. Parsed 124428 lines in 1.56952 secs.

Data merging



In [5]:

    
# merge train with description
train = train.join(desc, on = 'product_uid', how = 'left')



In [6]:

    
# merge test with description
test = test.join(desc, on = 'product_uid', how = 'left')

Let's explore some data

Let's examine 3 different queries and products:

first from the training set
somewhere in the moddle in the training set
the last one from the training set



In [7]:

    
first_doc = train[0]
first_doc









    Out[7]:





{'id': 2,
 'product_description': 'Not only do angles make joints stronger, they also provide more consistent, straight corners. Simpson Strong-Tie offers a wide variety of angles in various sizes and thicknesses to handle light-duty jobs or projects where a structural connection is needed. Some can be bent (skewed) to match the project. For outdoor projects or those where moisture is present, use our ZMAX zinc-coated connectors, which provide extra resistance against corrosion (look for a "Z" at the end of the model number).Versatile connector for various 90 connections and home repair projectsStronger than angled nailing or screw fastening aloneHelp ensure joints are consistently straight and strongDimensions: 3 in. x 3 in. x 1-1/2 in.Made from 12-Gauge steelGalvanized for extra corrosion resistanceInstall with 10d common nails or #9 x 1-1/2 in. Strong-Drive SD screws',
 'product_title': 'Simpson Strong-Tie 12-Gauge Angle',
 'product_uid': 100001,
 'relevance': 3.0,
 'search_term': 'angle bracket'}

'angle bracket' search term is not contained in the body. 'angle' would be after stemming however 'bracket' is not.



In [8]:

    
middle_doc = train[37033]
middle_doc









    Out[8]:





{'id': 113228,
 'product_description': 'PureBond Plywood Project Panels are a convenient and cost-effective way to build cabinets, furniture and other woodworking projects. It provides a beautiful wood veneer face bonded to a strong and flat wood core. These PureBond Project Panels are made with no added formaldehyde, eliminating the concern about off-gassing dangerous fumes during fabrication or when installed in your home. Their smaller size makes them easy to handle and allows you to order just the amount of wood you need. PureBond plywood, in Project Panels sizes or in full sheet sizes, are a Home Depot exclusive.California residents: see&nbsp;Proposition 65 informationDecorative mahogany veneer applied to both sides of this panelB-2 plain sliced mahogany - 7-ply constructionLight weight, all-wood veneer constructionPrecision-cut hardwood plywood panels in convenient small sizesCommon: 3/4 in. x 2 ft. x 4 ft.; Actual: 0.703 in. x 24 in. x 48 in.Grade: B-2',
 'product_title': '3/4 in. x 2 ft. x 4 ft. PureBond Mahogany Plywood Project Panel',
 'product_uid': 137334,
 'relevance': 3.0,
 'search_term': 'table top wood'}

only 'wood' is present from search term



In [9]:

    
last_doc = train[-1]
last_doc









    Out[9]:





{'id': 221473,
 'product_description': 'No. 918 Millennial Ryan heathered texture semi-sheer curtain is a casual solid that adds freshness and a finishing touch to any decor setting. Enhances privacy while allowing light to gently filter through. Clean, simple one-pocket pole top design can be used with a standard or decorative curtain rod. Mix and match with other solids and prints for a look that is all your own.Sheer panel, gently filters lightNo header pole top panelMachine washableWide array of colors to choose from100% polyesterContains 1-curtain panel',
 'product_title': 'LICHTENBERG Pool Blue No. 918 Millennial Ryan Heathered Texture Sheer Curtain Panel, 40 in. W x 63 in. L',
 'product_uid': 206650,
 'relevance': 2.33,
 'search_term': 'fine sheer curtain 63 inches'}

'sheer' and 'courtain' are present and that's all

How many search terms are not present in description and title for ranked 3 documents

Ranked 3 documents are the most relevents searches, but how many search queries doesn't include the searched term in the description and the title



In [10]:

    
train['search_term_word_count'] = gl.text_analytics.count_words(train['search_term'])
ranked3doc = train[train['relevance'] == 3]
print ranked3doc.head()
len(ranked3doc)









    



+-----+-------------+-------------------------------+
|  id | product_uid |         product_title         |
+-----+-------------+-------------------------------+
|  2  |    100001   | Simpson Strong-Tie 12-Gaug... |
|  9  |    100002   | BEHR Premium Textured Deck... |
|  18 |    100006   | Whirlpool 1.9 cu. ft. Over... |
|  21 |    100006   | Whirlpool 1.9 cu. ft. Over... |
|  27 |    100009   | House of Fara 3/4 in. x 3 ... |
|  35 |    100011   | Toro Personal Pace Recycle... |
|  37 |    100011   | Toro Personal Pace Recycle... |
|  65 |    100016   | Sunjoy Calais 8 ft. x 5 ft... |
| 123 |    100023   | Quikrete 80 lb. Crack-Resi... |
| 162 |    100029   | DecoArt Americana Decor 16... |
+-----+-------------+-------------------------------+
+--------------------------------+-----------+-------------------------------+
|          search_term           | relevance |      product_description      |
+--------------------------------+-----------+-------------------------------+
|         angle bracket          |    3.0    | Not only do angles make jo... |
|           deck over            |    3.0    | BEHR Premium Textured DECK... |
|         convection otr         |    3.0    | Achieving delicious result... |
|           microwaves           |    3.0    | Achieving delicious result... |
|            mdf 3/4             |    3.0    | Get the House of Fara 3/4 ... |
| briggs and stratton lawn mower |    3.0    | Recycler 22 in. Personal P... |
|            gas mowe            |    3.0    | Recycler 22 in. Personal P... |
|          grill gazebo          |    3.0    | Make grilling great with t... |
| CONCRETE & MASONRY CLEANER...  |    3.0    | Quikrete 80 lb. Crack-Resi... |
|          chalk paint           |    3.0    | Achieving a vintage, time-... |
+--------------------------------+-----------+-------------------------------+
+-------------------------------+
|     search_term_word_count    |
+-------------------------------+
|   {'bracket': 1, 'angle': 1}  |
|     {'over': 1, 'deck': 1}    |
|  {'otr': 1, 'convection': 1}  |
|       {'microwaves': 1}       |
|      {'mdf': 1, '3/4': 1}     |
| {'and': 1, 'stratton': 1, ... |
|     {'gas': 1, 'mowe': 1}     |
|   {'grill': 1, 'gazebo': 1}   |
| {'etcher': 1, 'cleaner': 1... |
|    {'chalk': 1, 'paint': 1}   |
+-------------------------------+
[10 rows x 7 columns]







    Out[10]:





19125



In [11]:

    
words_search = gl.text_analytics.tokenize(ranked3doc['search_term'], to_lower = True)
words_description = gl.text_analytics.tokenize(ranked3doc['product_description'], to_lower = True)
words_title = gl.text_analytics.tokenize(ranked3doc['product_title'], to_lower = True)
wordsdiff_desc = []
wordsdiff_title = []
puid = []
search_term = []
ws_count = []
ws_count_used_desc = []
ws_count_used_title = []
for item in xrange(len(ranked3doc)):
    ws = words_search[item]
    pd = words_description[item]
    pt = words_title[item]
    diff = set(ws) - set(pd)
    if diff is None:
        diff = 0
    wordsdiff_desc.append(diff)
    
    diff2 = set(ws) - set(pt)
    if diff2 is None:
        diff2 = 0
    wordsdiff_title.append(diff2)
    
    puid.append(ranked3doc[item]['product_uid'])
    search_term.append(ranked3doc[item]['search_term'])
    ws_count.append(len(ws))
    ws_count_used_desc.append(len(ws) - len(diff))
    ws_count_used_title.append(len(ws) - len(diff2))
    
differences = gl.SFrame({"puid" : puid,
                         "search term": search_term,
                         "diff desc" : wordsdiff_desc,
                         "diff title" : wordsdiff_title,
                         "ws count" : ws_count, 
                         "ws count used desc" : ws_count_used_desc,
                         "ws count used title" : ws_count_used_title})



In [12]:

    
differences.sort(['ws count used desc', 'ws count used title'])









    Out[12]:





    
        diff desc
        diff title
        puid
        search term
        ws count
        ws count used desc
    
    
        [recycling, bins]
        [recycling, bins]
        145727
        recycling bins
        2
        0
    
    
        [over, deck]
        [over, deck]
        100002
        deck over
        2
        0
    
    
        [hammer, electric, drill]
        [hammer, electric, drill]
        120061
        electric hammer drill
        3
        0
    
    
        [microwaves]
        [microwaves]
        100006
        microwaves
        1
        0
    
    
        [plywoods]
        [plywoods]
        119996
        plywoods
        1
        0
    
    
        [coca, cola]
        [coca, cola]
        120276
        coca cola
        2
        0
    
    
        [greenhouses]
        [greenhouses]
        120318
        greenhouses
        1
        0
    
    
        [pipe, cutters]
        [pipe, cutters]
        119840
        pipe cutters
        2
        0
    
    
        [buit, themostat, in]
        [buit, themostat, in]
        206359
        buit in themostat
        3
        0
    
    
        [mowers, ridding]
        [mowers, ridding]
        120366
        ridding mowers
        2
        0
    


    
        ws count used title
    
    
        0
    
    
        0
    
    
        0
    
    
        0
    
    
        0
    
    
        0
    
    
        0
    
    
        0
    
    
        0
    
    
        0
    

[19125 rows x 7 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [13]:

    
print "No terms used in description : " + str(len(differences[differences['ws count used desc'] == 0]))
print "No terms used in title : " + str(len(differences[differences['ws count used title'] == 0]))
print "No terms used in description and title : " + str(len(differences[(differences['ws count used desc'] == 0) & 
                                                                        (differences['ws count used title'] == 0)]))









    



No terms used in description : 2666
No terms used in title : 2152
No terms used in description and title : 1206



In [14]:

    
import matplotlib.pyplot as plt
%matplotlib inline

TF-IDF with linear regression



In [15]:

    
train_search_tfidf = gl.text_analytics.tf_idf(train['search_term_word_count'])



In [16]:

    
train['search_tfidf'] = train_search_tfidf



In [17]:

    
train['product_desc_word_count'] = gl.text_analytics.count_words(train['product_description'])
train_desc_tfidf = gl.text_analytics.tf_idf(train['product_desc_word_count'])



In [18]:

    
train['desc_tfidf'] = train_desc_tfidf



In [19]:

    
train['product_title_word_count'] = gl.text_analytics.count_words(train['product_title'])
train_title_tfidf = gl.text_analytics.tf_idf(train['product_title_word_count'])
train['title_tfidf'] = train_title_tfidf



In [20]:

    
train['distance'] = train.apply(lambda x: gl.distances.cosine(x['search_tfidf'],x['desc_tfidf']))
train['distance2'] = train.apply(lambda x: gl.distances.cosine(x['search_tfidf'],x['title_tfidf']))



In [21]:

    
model1 = gl.linear_regression.create(train, target = 'relevance', features = ['distance', 'distance2'], validation_set = None)









    



PROGRESS: Linear regression:
PROGRESS: --------------------------------------------------------
PROGRESS: Number of examples          : 74067
PROGRESS: Number of features          : 2
PROGRESS: Number of unpacked features : 2
PROGRESS: Number of coefficients    : 3
PROGRESS: Starting Newton Method
PROGRESS: --------------------------------------------------------
PROGRESS: +-----------+----------+--------------+--------------------+---------------+
PROGRESS: | Iteration | Passes   | Elapsed Time | Training-max_error | Training-rmse |
PROGRESS: +-----------+----------+--------------+--------------------+---------------+
PROGRESS: | 1         | 2        | 1.054999     | 1.917518           | 0.510175      |
PROGRESS: +-----------+----------+--------------+--------------------+---------------+
PROGRESS: SUCCESS: Optimal solution found.
PROGRESS:



In [23]:

    
#let's take a look at the weights before we plot
model1.get("coefficients")









    Out[23]:





    
        name
        index
        value
        stderr
    
    
        (intercept)
        None
        3.32490615945
        0.0148462020716
    
    
        distance
        None
        -0.483754540522
        0.019557894819
    
    
        distance2
        None
        -0.680280391407
        0.0122246518296
    

[3 rows x 4 columns]



In [25]:

    
test['search_term_word_count'] = gl.text_analytics.count_words(test['search_term'])
test_search_tfidf = gl.text_analytics.tf_idf(test['search_term_word_count'])
test['search_tfidf'] = test_search_tfidf
test['product_desc_word_count'] = gl.text_analytics.count_words(test['product_description'])
test_desc_tfidf = gl.text_analytics.tf_idf(test['product_desc_word_count'])
test['desc_tfidf'] = test_desc_tfidf
test['product_title_word_count'] = gl.text_analytics.count_words(test['product_title'])
test_title_tfidf = gl.text_analytics.tf_idf(test['product_title_word_count'])
test['title_tfidf'] = test_title_tfidf
test['distance'] = test.apply(lambda x: gl.distances.cosine(x['search_tfidf'],x['desc_tfidf']))
test['distance2'] = test.apply(lambda x: gl.distances.cosine(x['search_tfidf'],x['title_tfidf']))



In [27]:

    
'''
predictions_test = model1.predict(test)
test_errors = predictions_test - test['relevance']
RSS_test = sum(test_errors * test_errors)
print RSS_test
'''









    Out[27]:





    
        id
        product_uid
        product_title
        search_term
        product_description
    
    
        1
        100001
        Simpson Strong-Tie
12-Gauge Angle ...
        90 degree bracket
        Not only do angles make
joints stronger, they ...
    
    
        4
        100001
        Simpson Strong-Tie
12-Gauge Angle ...
        metal l brackets
        Not only do angles make
joints stronger, they ...
    
    
        5
        100001
        Simpson Strong-Tie
12-Gauge Angle ...
        simpson sku able
        Not only do angles make
joints stronger, they ...
    
    
        6
        100001
        Simpson Strong-Tie
12-Gauge Angle ...
        simpson strong  ties
        Not only do angles make
joints stronger, they ...
    
    
        7
        100001
        Simpson Strong-Tie
12-Gauge Angle ...
        simpson strong tie hcc668
        Not only do angles make
joints stronger, they ...
    
    
        8
        100001
        Simpson Strong-Tie
12-Gauge Angle ...
        wood connectors
        Not only do angles make
joints stronger, they ...
    
    
        10
        100003
        STERLING Ensemble 33-1/4
in. x 60 in. x 75-1/4 ...
        bath and shower kit
        Classic architecture
meets contemporary de ...
    
    
        11
        100003
        STERLING Ensemble 33-1/4
in. x 60 in. x 75-1/4 ...
        bath drain kit
        Classic architecture
meets contemporary de ...
    
    
        12
        100003
        STERLING Ensemble 33-1/4
in. x 60 in. x 75-1/4 ...
        one piece tub shower
        Classic architecture
meets contemporary de ...
    
    
        13
        100004
        Grape Solar 265-Watt
Polycrystalline Solar ...
        solar panel
        The Grape Solar 265-Watt
Polycrystalline PV Solar ...
    


    
        search_term_word_count
        search_tfidf
        product_desc_word_count
        desc_tfidf
    
    
        {'90': 1, 'bracket': 1,
'degree': 1} ...
        {'90':
6.7821620611958915, ...
        {'outdoor': 1, 'zmax': 1,
'repair': 1, ...
        {'outdoor':
2.2670671040608905, ...
    
    
        {'metal': 1, 'l': 1,
'brackets': 1} ...
        {'metal':
4.761280475281293, 'l': ...
        {'outdoor': 1, 'zmax': 1,
'repair': 1, ...
        {'outdoor':
2.2670671040608905, ...
    
    
        {'sku': 1, 'able': 1,
'simpson': 1} ...
        {'sku':
7.438941597584962, ...
        {'outdoor': 1, 'zmax': 1,
'repair': 1, ...
        {'outdoor':
2.2670671040608905, ...
    
    
        {'ties': 1, 'strong': 1,
'simpson': 1} ...
        {'ties':
7.605068468458936, ...
        {'outdoor': 1, 'zmax': 1,
'repair': 1, ...
        {'outdoor':
2.2670671040608905, ...
    
    
        {'tie': 1, 'strong': 1,
'hcc668': 1, 'simpson': ...
        {'tie': 7.13355994803378,
'strong': ...
        {'outdoor': 1, 'zmax': 1,
'repair': 1, ...
        {'outdoor':
2.2670671040608905, ...
    
    
        {'connectors': 1, 'wood':
1} ...
        {'connectors':
6.993471154863098, ...
        {'outdoor': 1, 'zmax': 1,
'repair': 1, ...
        {'outdoor':
2.2670671040608905, ...
    
    
        {'shower': 1, 'and': 1,
'bath': 1, 'kit': 1} ...
        {'shower':
3.890321658594567, 'a ...
        {'and': 2, 'storing': 1,
'series,': 1, 'z124.1 ...
        {'and':
0.0800483779106809, ...
    
    
        {'bath': 1, 'drain': 1,
'kit': 1} ...
        {'bath':
5.149710580802239, ...
        {'and': 2, 'storing': 1,
'series,': 1, 'z124.1 ...
        {'and':
0.0800483779106809, ...
    
    
        {'shower': 1, 'tub': 1,
'piece': 1, 'one': 1} ...
        {'shower':
3.890321658594567, 't ...
        {'and': 2, 'storing': 1,
'series,': 1, 'z124.1 ...
        {'and':
0.0800483779106809, ...
    
    
        {'solar': 1, 'panel': 1}
        {'solar':
5.732339936697214, ...
        {'polycrystalline': 2,
'module': 1, ...
        {'polycrystalline':
18.056353605403086, ...
    


    
        product_title_word_count
        title_tfidf
        distance
        distance2
    
    
        {'strong-tie': 1,
'12-gauge': 1, 'angle': ...
        {'strong-tie':
5.446047718534487, ...
        0.955471973548
        1.0
    
    
        {'strong-tie': 1,
'12-gauge': 1, 'angle': ...
        {'strong-tie':
5.446047718534487, ...
        1.0
        1.0
    
    
        {'strong-tie': 1,
'12-gauge': 1, 'angle': ...
        {'strong-tie':
5.446047718534487, ...
        0.939075457861
        0.748825894796
    
    
        {'strong-tie': 1,
'12-gauge': 1, 'angle': ...
        {'strong-tie':
5.446047718534487, ...
        0.937712516547
        0.743206885559
    
    
        {'strong-tie': 1,
'12-gauge': 1, 'angle': ...
        {'strong-tie':
5.446047718534487, ...
        0.949250747211
        0.790775642926
    
    
        {'strong-tie': 1,
'12-gauge': 1, 'angle': ...
        {'strong-tie':
5.446047718534487, ...
        1.0
        1.0
    
    
        {'sterling': 1, 'and': 1,
'drain': 1, '75-1/4': 1, ...
        {'sterling':
6.237011694888826, 'a ...
        0.957250783503
        0.682196478975
    
    
        {'sterling': 1, 'and': 1,
'drain': 1, '75-1/4': 1, ...
        {'sterling':
6.237011694888826, 'a ...
        0.956490934274
        0.670267588954
    
    
        {'sterling': 1, 'and': 1,
'drain': 1, '75-1/4': 1, ...
        {'sterling':
6.237011694888826, 'a ...
        1.0
        0.938706963098
    
    
        {'polycrystalline': 1,
'grape': 1, '(4-pack)': ...
        {'polycrystalline':
10.637614715135642, ...
        0.635642096321
        0.517535106244
    

[10 rows x 13 columns]



In [ ]:

    
output



In [ ]:

    
submission = gl.SFrame(test['id'])



In [ ]:

    
submission.add_column(output)
submission.rename({'X1': 'id', 'X2':'relevance'})



In [ ]:

    
submission['relevance'] = submission.apply(lambda x: 3.0 if x['relevance'] > 3.0 else x['relevance'])
submission['relevance'] = submission.apply(lambda x: 1.0 if x['relevance'] < 1.0 else x['relevance'])



In [ ]:

    
submission['relevance'] = submission.apply(lambda x: str(x['relevance']))



In [ ]:

    
submission.export_csv('../data/submission.csv', quote_level = 3)



In [ ]:

    
#gl.canvas.set_target('ipynb')

diff desc	diff title	puid	search term	ws count
[recycling, bins]	[recycling, bins]	145727	recycling bins	2
[over, deck]	[over, deck]	100002	deck over	2
[hammer, electric, drill]	[hammer, electric, drill]	120061	electric hammer drill	3
[microwaves]	[microwaves]	100006	microwaves	1
[plywoods]	[plywoods]	119996	plywoods	1
[coca, cola]	[coca, cola]	120276	coca cola	2
[greenhouses]	[greenhouses]	120318	greenhouses	1
[pipe, cutters]	[pipe, cutters]	119840	pipe cutters	2
[buit, themostat, in]	[buit, themostat, in]	206359	buit in themostat	3
[mowers, ridding]	[mowers, ridding]	120366	ridding mowers	2

name	index	value	stderr
(intercept)	None	3.32490615945	0.0148462020716
distance	None	-0.483754540522	0.019557894819
distance2	None	-0.680280391407	0.0122246518296

id	product_uid	product_title	search_term	product_description
1	100001	Simpson Strong-Tie 12-Gauge Angle ...	90 degree bracket	Not only do angles make joints stronger, they ...
4	100001	Simpson Strong-Tie 12-Gauge Angle ...	metal l brackets	Not only do angles make joints stronger, they ...
5	100001	Simpson Strong-Tie 12-Gauge Angle ...	simpson sku able	Not only do angles make joints stronger, they ...
6	100001	Simpson Strong-Tie 12-Gauge Angle ...	simpson strong ties	Not only do angles make joints stronger, they ...
7	100001	Simpson Strong-Tie 12-Gauge Angle ...	simpson strong tie hcc668	Not only do angles make joints stronger, they ...
8	100001	Simpson Strong-Tie 12-Gauge Angle ...	wood connectors	Not only do angles make joints stronger, they ...
10	100003	STERLING Ensemble 33-1/4 in. x 60 in. x 75-1/4 ...	bath and shower kit	Classic architecture meets contemporary de ...
11	100003	STERLING Ensemble 33-1/4 in. x 60 in. x 75-1/4 ...	bath drain kit	Classic architecture meets contemporary de ...
12	100003	STERLING Ensemble 33-1/4 in. x 60 in. x 75-1/4 ...	one piece tub shower	Classic architecture meets contemporary de ...
13	100004	Grape Solar 265-Watt Polycrystalline Solar ...	solar panel	The Grape Solar 265-Watt Polycrystalline PV Solar ...

search_term_word_count	search_tfidf	product_desc_word_count	desc_tfidf
{'90': 1, 'bracket': 1, 'degree': 1} ...	{'90': 6.7821620611958915, ...	{'outdoor': 1, 'zmax': 1, 'repair': 1, ...	{'outdoor': 2.2670671040608905, ...
{'metal': 1, 'l': 1, 'brackets': 1} ...	{'metal': 4.761280475281293, 'l': ...	{'outdoor': 1, 'zmax': 1, 'repair': 1, ...	{'outdoor': 2.2670671040608905, ...
{'sku': 1, 'able': 1, 'simpson': 1} ...	{'sku': 7.438941597584962, ...	{'outdoor': 1, 'zmax': 1, 'repair': 1, ...	{'outdoor': 2.2670671040608905, ...
{'ties': 1, 'strong': 1, 'simpson': 1} ...	{'ties': 7.605068468458936, ...	{'outdoor': 1, 'zmax': 1, 'repair': 1, ...	{'outdoor': 2.2670671040608905, ...
{'tie': 1, 'strong': 1, 'hcc668': 1, 'simpson': ...	{'tie': 7.13355994803378, 'strong': ...	{'outdoor': 1, 'zmax': 1, 'repair': 1, ...	{'outdoor': 2.2670671040608905, ...
{'connectors': 1, 'wood': 1} ...	{'connectors': 6.993471154863098, ...	{'outdoor': 1, 'zmax': 1, 'repair': 1, ...	{'outdoor': 2.2670671040608905, ...
{'shower': 1, 'and': 1, 'bath': 1, 'kit': 1} ...	{'shower': 3.890321658594567, 'a ...	{'and': 2, 'storing': 1, 'series,': 1, 'z124.1 ...	{'and': 0.0800483779106809, ...
{'bath': 1, 'drain': 1, 'kit': 1} ...	{'bath': 5.149710580802239, ...	{'and': 2, 'storing': 1, 'series,': 1, 'z124.1 ...	{'and': 0.0800483779106809, ...
{'shower': 1, 'tub': 1, 'piece': 1, 'one': 1} ...	{'shower': 3.890321658594567, 't ...	{'and': 2, 'storing': 1, 'series,': 1, 'z124.1 ...	{'and': 0.0800483779106809, ...
{'solar': 1, 'panel': 1}	{'solar': 5.732339936697214, ...	{'polycrystalline': 2, 'module': 1, ...	{'polycrystalline': 18.056353605403086, ...

product_title_word_count	title_tfidf	distance	distance2
{'strong-tie': 1, '12-gauge': 1, 'angle': ...	{'strong-tie': 5.446047718534487, ...	0.955471973548	1.0
{'strong-tie': 1, '12-gauge': 1, 'angle': ...	{'strong-tie': 5.446047718534487, ...	1.0	1.0
{'strong-tie': 1, '12-gauge': 1, 'angle': ...	{'strong-tie': 5.446047718534487, ...	0.939075457861	0.748825894796
{'strong-tie': 1, '12-gauge': 1, 'angle': ...	{'strong-tie': 5.446047718534487, ...	0.937712516547	0.743206885559
{'strong-tie': 1, '12-gauge': 1, 'angle': ...	{'strong-tie': 5.446047718534487, ...	0.949250747211	0.790775642926
{'strong-tie': 1, '12-gauge': 1, 'angle': ...	{'strong-tie': 5.446047718534487, ...	1.0	1.0
{'sterling': 1, 'and': 1, 'drain': 1, '75-1/4': 1, ...	{'sterling': 6.237011694888826, 'a ...	0.957250783503	0.682196478975
{'sterling': 1, 'and': 1, 'drain': 1, '75-1/4': 1, ...	{'sterling': 6.237011694888826, 'a ...	0.956490934274	0.670267588954
{'sterling': 1, 'and': 1, 'drain': 1, '75-1/4': 1, ...	{'sterling': 6.237011694888826, 'a ...	1.0	0.938706963098
{'polycrystalline': 1, 'grape': 1, '(4-pack)': ...	{'polycrystalline': 10.637614715135642, ...	0.635642096321	0.517535106244