In [2]:
import graphlab
In [3]:
hotel = graphlab.SFrame.read_csv('hotels.csv',column_type_hints = {'Airport_Code':str})
dest = graphlab.SFrame.read_csv('dest.csv',column_type_hints = {'Airport_Code':str})
PROGRESS: Finished parsing file /home/anil/Downloads/metripping/hotels.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.686312 secs.
PROGRESS: Finished parsing file /home/anil/Downloads/metripping/hotels.csv
PROGRESS: Parsing completed. Parsed 84327 lines in 0.340961 secs.
PROGRESS: Finished parsing file /home/anil/Downloads/metripping/dest.csv
PROGRESS: Parsing completed. Parsed 100 lines in 0.544257 secs.
PROGRESS: Finished parsing file /home/anil/Downloads/metripping/dest.csv
PROGRESS: Parsing completed. Parsed 88069 lines in 0.37135 secs.
In [4]:
hotel.dtype()
Out[4]:
[str, int, str, int, int, float, float]
In [5]:
dest.head(5)
Out[5]:
Airport_Code
Category
Sub_Category
Total_Reviews
Star_Rating
IAD
Sights Landmarks
Historic Sites
1
5.0
DCA
Sights Landmarks
Historic Sites
1
5.0
IAD
Shopping
Gift Specialty Shops
4
4.5
DCA
Shopping
Gift Specialty Shops
4
4.5
IAD
Nightlife
Bars Clubs
2
4.0
[5 rows x 5 columns]
In [6]:
graphlab.canvas.set_target('ipynb')
dest['Airport_Code'].show()
In [7]:
dest['Total_Category'] = dest['Category'] +" "+ dest['Sub_Category']
In [8]:
dest['word_count'] = graphlab.text_analytics.count_words(dest['Total_Category'])
dest.remove_columns(['Category', 'Sub_Category'])
Out[8]:
Airport_Code
Total_Reviews
Star_Rating
Total_Category
word_count
IAD
1
5.0
Sights Landmarks Historic
Sites ...
{'landmarks': 1,
'historic': 1, 'sights': ...
DCA
1
5.0
Sights Landmarks Historic
Sites ...
{'landmarks': 1,
'historic': 1, 'sights': ...
IAD
4
4.5
Shopping Gift Specialty
Shops ...
{'shops': 1, 'shopping':
1, 'specialty': 1, ...
DCA
4
4.5
Shopping Gift Specialty
Shops ...
{'shops': 1, 'shopping':
1, 'specialty': 1, ...
IAD
2
4.0
Nightlife Bars Clubs
{'clubs': 1, 'bars': 1,
'nightlife': 1} ...
DCA
2
4.0
Nightlife Bars Clubs
{'clubs': 1, 'bars': 1,
'nightlife': 1} ...
IAD
4
5.0
Concerts Shows Theaters
{'theaters': 1,
'concerts': 1, 'shows': ...
DCA
4
5.0
Concerts Shows Theaters
{'theaters': 1,
'concerts': 1, 'shows': ...
IAD
388
4.0
Concerts Shows Arenas
Stadiums ...
{'stadiums': 1, 'arenas':
1, 'concerts': 1, ...
DCA
388
4.0
Concerts Shows Arenas
Stadiums ...
{'stadiums': 1, 'arenas':
1, 'concerts': 1, ...
[88069 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
In [9]:
dest.head(5)
Out[9]:
Airport_Code
Total_Reviews
Star_Rating
Total_Category
word_count
IAD
1
5.0
Sights Landmarks Historic
Sites ...
{'landmarks': 1,
'historic': 1, 'sights': ...
DCA
1
5.0
Sights Landmarks Historic
Sites ...
{'landmarks': 1,
'historic': 1, 'sights': ...
IAD
4
4.5
Shopping Gift Specialty
Shops ...
{'shops': 1, 'shopping':
1, 'specialty': 1, ...
DCA
4
4.5
Shopping Gift Specialty
Shops ...
{'shops': 1, 'shopping':
1, 'specialty': 1, ...
IAD
2
4.0
Nightlife Bars Clubs
{'clubs': 1, 'bars': 1,
'nightlife': 1} ...
[5 rows x 5 columns]
In [10]:
m = graphlab.recommender.ranking_factorization_recommender.create(dest,
'Total_Category',
'Airport_Code')
PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS: Data has 88069 observations with 185 users and 353 items.
PROGRESS: Data prepared in: 1.00394s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter | Description | Value |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors | Factor Dimension | 32 |
PROGRESS: | regularization | L2 Regularization on Factors | 1e-09 |
PROGRESS: | solver | Solver used for training | adagrad |
PROGRESS: | linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
PROGRESS: | binary_target | Assume Binary Targets | True |
PROGRESS: | max_iterations | Maximum Number of Iterations | 25 |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: Optimizing model using SGD; tuning step size.
PROGRESS: Using 11008 / 88069 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0 | 10 | Not Viable |
PROGRESS: | 1 | 2.5 | Not Viable |
PROGRESS: | 2 | 0.625 | Not Viable |
PROGRESS: | 3 | 0.15625 | 0.684611 |
PROGRESS: | 4 | 0.078125 | 1.17557 |
PROGRESS: | 5 | 0.0390625 | 1.32726 |
PROGRESS: | 6 | 0.0195312 | No Decrease (1.411 >= 1.38644) |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final | 0.15625 | 0.684611 |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Iter. | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Initial | 222us | 1.38642 | 0.693089 | |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | 1 | 700.801ms | 0.32705 | 0.143126 | 0.15625 |
PROGRESS: | 2 | 1.44s | 0.227982 | 0.106854 | 0.15625 |
PROGRESS: | 3 | 2.20s | 0.202419 | 0.0963944 | 0.15625 |
PROGRESS: | 4 | 3.07s | 0.182567 | 0.0872886 | 0.15625 |
PROGRESS: | 5 | 3.89s | 0.16963 | 0.0819986 | 0.15625 |
PROGRESS: | 6 | 4.53s | 0.158894 | 0.0768928 | 0.15625 |
PROGRESS: | 10 | 7.18s | 0.131314 | 0.0638484 | 0.15625 |
PROGRESS: | 11 | 7.81s | 0.126126 | 0.0614172 | 0.15625 |
PROGRESS: | 15 | 10.42s | 0.113745 | 0.0556437 | 0.15625 |
PROGRESS: | 20 | 13.76s | 0.102346 | 0.0503294 | 0.15625 |
PROGRESS: | 25 | 17.05s | 0.0935713 | 0.0456336 | 0.15625 |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: Optimization Complete: Maximum number of passes through the data reached.
PROGRESS: Computing final objective value and training Predictive Error.
PROGRESS: Final objective value: 0.517848
PROGRESS: Final training Predictive Error: 0.0439802
In [247]:
# most similar destination for 'IAD'
m.get_similar_items(['IAD'])
PROGRESS: Getting similar items completed in 0.000989
Out[247]:
Airport_Code
similar
distance
rank
IAD
AUH
1.45231065154
1
IAD
VLI
1.4506803751
2
IAD
CXR
1.403313905
3
IAD
ADD
1.38968846202
4
IAD
KWI
1.38657107949
5
IAD
CEB
1.37112155557
6
IAD
SKP
1.36449471116
7
IAD
KBV
1.35761326551
8
IAD
PMV
1.34921184182
9
IAD
ADB
1.34175089002
10
[10 rows x 4 columns]
In [248]:
m.get_similar_items(['IAD'])
PROGRESS: Getting similar items completed in 0.000873
Out[248]:
Airport_Code
similar
distance
rank
IAD
AUH
1.45231065154
1
IAD
VLI
1.4506803751
2
IAD
CXR
1.403313905
3
IAD
ADD
1.38968846202
4
IAD
KWI
1.38657107949
5
IAD
CEB
1.37112155557
6
IAD
SKP
1.36449471116
7
IAD
KBV
1.35761326551
8
IAD
PMV
1.34921184182
9
IAD
ADB
1.34175089002
10
[10 rows x 4 columns]
In [249]:
hotel.head(5)
Out[249]:
Airport_Code
Hotel_ID
Property_Type
Star_Ranking
Total_Reviews
Hotel_Score
Average_Price
DXB
275
Apartment Hotel
4
3403
8.0
27318.2575
DXB
276
Resort
5
4321
8.5
130560.0875
DXB
277
Hotel
4
243
6.3
28930.3675
DXB
278
Hotel
4
1010
7.7
21239.75
DXB
279
Hotel
4
1857
7.6
26382.0
[5 rows x 7 columns]
In [250]:
hotel['word_count'] = graphlab.text_analytics.count_words(hotel['Property_Type'])
In [251]:
hotel['Reviews_Trust_Label'] = (hotel['Total_Reviews']/hotel['Hotel_Score'])
In [252]:
hotel.head(2)
Out[252]:
Airport_Code
Hotel_ID
Property_Type
Star_Ranking
Total_Reviews
Hotel_Score
Average_Price
DXB
275
Apartment Hotel
4
3403
8.0
27318.2575
DXB
276
Resort
5
4321
8.5
130560.0875
word_count
Reviews_Trust_Label
{'apartment': 1, 'hotel':
1} ...
425.375
{'resort': 1}
508.352941176
[2 rows x 9 columns]
In [253]:
mm = graphlab.recommender.ranking_factorization_recommender.create(hotel,
'Hotel_ID',
'Airport_Code',
item_data=dest)
PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS: Data has 84327 observations with 84327 users and 354 items.
PROGRESS: Data prepared in: 1.50649s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter | Description | Value |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors | Factor Dimension | 32 |
PROGRESS: | regularization | L2 Regularization on Factors | 1e-09 |
PROGRESS: | solver | Solver used for training | adagrad |
PROGRESS: | linear_regularization | L2 Regularization on Linear Coefficients | 1e-09 |
PROGRESS: | binary_target | Assume Binary Targets | True |
PROGRESS: | side_data_factorization | Assign Factors for Side Data | True |
PROGRESS: | max_iterations | Maximum Number of Iterations | 25 |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: Optimizing model using SGD; tuning step size.
PROGRESS: Using 10540 / 84327 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0 | 3.84615 | Not Viable |
PROGRESS: | 1 | 0.961538 | Not Viable |
PROGRESS: | 2 | 0.240385 | Not Viable |
PROGRESS: | 3 | 0.0600962 | Not Viable |
PROGRESS: | 4 | 0.015024 | Not Viable |
PROGRESS: | 5 | 0.00375601 | Not Viable |
PROGRESS: | 6 | 0.000939002 | Not Viable |
PROGRESS: | 7 | 0.000234751 | Not Viable |
PROGRESS: | 8 | 5.86877e-05 | Not Viable |
PROGRESS: | 9 | 1.46719e-05 | Not Viable |
PROGRESS: | 10 | 3.66798e-06 | Not Viable |
PROGRESS: | 11 | 9.16995e-07 | Not Viable |
PROGRESS: | 12 | 2.29249e-07 | Not Viable |
PROGRESS: | 13 | 5.73122e-08 | Not Viable |
PROGRESS: | 14 | 1.4328e-08 | Not Viable |
PROGRESS: | 15 | 3.58201e-09 | Not Viable |
PROGRESS: | 16 | 8.95502e-10 | Not Viable |
PROGRESS: | 17 | 2.23876e-10 | Not Viable |
PROGRESS: | 18 | 5.59689e-11 | Not Viable |
PROGRESS: | 19 | 1.39922e-11 | Not Viable |
PROGRESS: | 20 | 3.49806e-12 | Not Viable |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final | 0.005 | Unknown |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: WARNING: Having difficulty finding viable stepsize; Model may be at optimum. Continuing with small step size.
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Iter. | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Initial | 207us | 1.79769e+308 | 1.79769e+308 | |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | 1 | 3.681ms | 1.79769e+308 | 1.79769e+308 | 0.005 |
PROGRESS: | 2 | 13.549ms | 1.79769e+308 | 1.79769e+308 | 0.005 |
PROGRESS: | 3 | 23.681ms | 1.79769e+308 | 1.79769e+308 | 0.005 |
PROGRESS: | 4 | 27.055ms | 1.79769e+308 | 1.79769e+308 | 0.005 |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: Optimization Complete: Convergence on objective within bounds.
PROGRESS: Computing final objective value and training Predictive Error.
PROGRESS: Final objective value: 1.79769e+308
PROGRESS: Final training Predictive Error: 1.79769e+308
In [254]:
# most simiral place for "DXB" after adding hotel data
mm.get_similar_items(["DXB"])
PROGRESS: Getting similar items completed in 0.007532
Out[254]:
Airport_Code
similar
distance
rank
DXB
BIO
1.70740884542
1
DXB
CEI
1.61200469732
2
DXB
TPE
1.61046296358
3
DXB
AMS
1.59781980515
4
DXB
LHE
1.51933383942
5
DXB
GRU
1.47608801723
6
DXB
PTY
1.42839789391
7
DXB
NAN
1.38500064611
8
DXB
AAN
1.37855488062
9
DXB
BUF
1.37443852425
10
[10 rows x 4 columns]
In [255]:
# most similar with "SEZ"
mm.get_similar_items(["SEZ"])
PROGRESS: Getting similar items completed in 0.001321
Out[255]:
Airport_Code
similar
distance
rank
SEZ
GDL
1.53258872032
1
SEZ
CUN
1.51108670235
2
SEZ
AKL
1.4861626327
3
SEZ
ACC
1.45462328196
4
SEZ
LAX
1.42601782084
5
SEZ
DKR
1.40009027719
6
SEZ
BTH
1.38972005248
7
SEZ
CDG
1.38469335437
8
SEZ
ABJ
1.38048645854
9
SEZ
OPO
1.37913474441
10
[10 rows x 4 columns]
In [256]:
# most similar with "MXP"
mm.get_similar_items(['MXP'])
PROGRESS: Getting similar items completed in 0.000954
Out[256]:
Airport_Code
similar
distance
rank
MXP
CEB
1.70715749264
1
MXP
MFM
1.58472329378
2
MXP
SAN
1.57058191299
3
MXP
VTE
1.54280465841
4
MXP
JDH
1.46457794309
5
MXP
LIM
1.46417695284
6
MXP
KTM
1.45444214344
7
MXP
CPT
1.44507962465
8
MXP
FAO
1.42304974794
9
MXP
AUA
1.39572271705
10
[10 rows x 4 columns]
Content source: anilcs13m/Projects
Similar notebooks: