It seems the current high scoring script is written in R using H2O. So let us do one in python using XGBoost.

Thanks to this script for feature engineering ideas.

We shall start with importing the necessary modules


In [1]:
import os
import sys
import operator
import numpy as np
import pandas as pd
from scipy import sparse
import xgboost as xgb
from sklearn import model_selection, preprocessing, ensemble
from sklearn.metrics import log_loss
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

Now let us write a custom function to run the xgboost model.


In [2]:
def runXGB(train_X, train_y, test_X, test_y=None, feature_names=None, seed_val=0, num_rounds=1000):
    param = {}
    param['objective'] = 'multi:softprob'
    param['eta'] = 0.1
    param['max_depth'] = 6
    param['silent'] = 1
    param['num_class'] = 3
    param['eval_metric'] = "mlogloss"
    param['min_child_weight'] = 1
    param['subsample'] = 0.7
    param['colsample_bytree'] = 0.7
    param['seed'] = seed_val
    num_rounds = num_rounds

    plst = list(param.items())
    xgtrain = xgb.DMatrix(train_X, label=train_y)

    if test_y is not None:
        xgtest = xgb.DMatrix(test_X, label=test_y)
        watchlist = [ (xgtrain,'train'), (xgtest, 'test') ]
        model = xgb.train(plst, xgtrain, num_rounds, watchlist, early_stopping_rounds=20)
    else:
        xgtest = xgb.DMatrix(test_X)
        model = xgb.train(plst, xgtrain, num_rounds)

    pred_test_y = model.predict(xgtest)
    return pred_test_y, model

Let us read the train and test files and store it.


In [3]:
data_path = "../input/"
train_file = data_path + "train.json"
test_file = data_path + "test.json"
train_df = pd.read_json(train_file)
test_df = pd.read_json(test_file)
print(train_df.shape)
print(test_df.shape)


(49352, 15)
(74659, 14)

We do not need any pre-processing for numerical features and so create a list with those features.


In [4]:
features_to_use  = ["bathrooms", "bedrooms", "latitude", "longitude", "price"]

Now let us create some new features from the given features.


In [5]:
# count of photos #
train_df["num_photos"] = train_df["photos"].apply(len)
test_df["num_photos"] = test_df["photos"].apply(len)

# count of "features" #
train_df["num_features"] = train_df["features"].apply(len)
test_df["num_features"] = test_df["features"].apply(len)

# count of words present in description column #
train_df["num_description_words"] = train_df["description"].apply(lambda x: len(x.split(" ")))
test_df["num_description_words"] = test_df["description"].apply(lambda x: len(x.split(" ")))

# convert the created column to datetime object so as to extract more features 
train_df["created"] = pd.to_datetime(train_df["created"])
test_df["created"] = pd.to_datetime(test_df["created"])

# Let us extract some features like year, month, day, hour from date columns #
train_df["created_year"] = train_df["created"].dt.year
test_df["created_year"] = test_df["created"].dt.year
train_df["created_month"] = train_df["created"].dt.month
test_df["created_month"] = test_df["created"].dt.month
train_df["created_day"] = train_df["created"].dt.day
test_df["created_day"] = test_df["created"].dt.day
train_df["created_hour"] = train_df["created"].dt.hour
test_df["created_hour"] = test_df["created"].dt.hour

# adding all these new features to use list #
features_to_use.extend(["num_photos", "num_features", "num_description_words","created_year", "created_month", "created_day", "listing_id", "created_hour"])

We have 4 categorical features in our data

  • display_address
  • manager_id
  • building_id
  • listing_id

So let us label encode these features.


In [6]:
categorical = ["display_address", "manager_id", "building_id", "street_address"]
for f in categorical:
        if train_df[f].dtype=='object':
            #print(f)
            lbl = preprocessing.LabelEncoder()
            lbl.fit(list(train_df[f].values) + list(test_df[f].values))
            train_df[f] = lbl.transform(list(train_df[f].values))
            test_df[f] = lbl.transform(list(test_df[f].values))
            features_to_use.append(f)

We have features column which is a list of string values. So we can first combine all the strings together to get a single string and then apply count vectorizer on top of it.


In [7]:
train_df['features'] = train_df["features"].apply(lambda x: " ".join(["_".join(i.split(" ")) for i in x]))
test_df['features'] = test_df["features"].apply(lambda x: " ".join(["_".join(i.split(" ")) for i in x]))
print(train_df["features"].head())
tfidf = CountVectorizer(stop_words='english', max_features=200)
tr_sparse = tfidf.fit_transform(train_df["features"])
te_sparse = tfidf.transform(test_df["features"])


10                                                         
10000     Doorman Elevator Fitness_Center Cats_Allowed D...
100004    Laundry_In_Building Dishwasher Hardwood_Floors...
100007                               Hardwood_Floors No_Fee
100013                                              Pre-War
Name: features, dtype: object

Now let us stack both the dense and sparse features into a single dataset and also get the target variable.


In [8]:
train_X = sparse.hstack([train_df[features_to_use], tr_sparse]).tocsr()
test_X = sparse.hstack([test_df[features_to_use], te_sparse]).tocsr()

target_num_map = {'high':0, 'medium':1, 'low':2}
train_y = np.array(train_df['interest_level'].apply(lambda x: target_num_map[x]))
print(train_X.shape, test_X.shape)


(49352, 217) (74659, 217)

Now let us do some cross validation to check the scores.

Please run it in local to get the cv scores. I am commenting it out here for time.


In [9]:
cv_scores = []
kf = model_selection.KFold(n_splits=5, shuffle=True, random_state=2016)
for dev_index, val_index in kf.split(range(train_X.shape[0])):
        dev_X, val_X = train_X[dev_index,:], train_X[val_index,:]
        dev_y, val_y = train_y[dev_index], train_y[val_index]
        preds, model = runXGB(dev_X, dev_y, val_X, val_y)
        cv_scores.append(log_loss(val_y, preds))
        print(cv_scores)
        break


[0]	train-mlogloss:1.04114	test-mlogloss:1.04219
Multiple eval metrics have been passed: 'test-mlogloss' will be used for early stopping.

Will train until test-mlogloss hasn't improved in 20 rounds.
[1]	train-mlogloss:0.988799	test-mlogloss:0.990721
[2]	train-mlogloss:0.944048	test-mlogloss:0.94691
[3]	train-mlogloss:0.90518	test-mlogloss:0.908812
[4]	train-mlogloss:0.8718	test-mlogloss:0.876215
[5]	train-mlogloss:0.841498	test-mlogloss:0.847057
[6]	train-mlogloss:0.815614	test-mlogloss:0.821795
[7]	train-mlogloss:0.79312	test-mlogloss:0.799993
[8]	train-mlogloss:0.773194	test-mlogloss:0.780815
[9]	train-mlogloss:0.754598	test-mlogloss:0.763247
[10]	train-mlogloss:0.738162	test-mlogloss:0.747594
[11]	train-mlogloss:0.724634	test-mlogloss:0.734739
[12]	train-mlogloss:0.711331	test-mlogloss:0.722318
[13]	train-mlogloss:0.699821	test-mlogloss:0.711481
[14]	train-mlogloss:0.689142	test-mlogloss:0.701381
[15]	train-mlogloss:0.678446	test-mlogloss:0.691482
[16]	train-mlogloss:0.669268	test-mlogloss:0.683158
[17]	train-mlogloss:0.66185	test-mlogloss:0.67647
[18]	train-mlogloss:0.654386	test-mlogloss:0.669772
[19]	train-mlogloss:0.648071	test-mlogloss:0.664241
[20]	train-mlogloss:0.642589	test-mlogloss:0.659292
[21]	train-mlogloss:0.637133	test-mlogloss:0.654492
[22]	train-mlogloss:0.632064	test-mlogloss:0.650024
[23]	train-mlogloss:0.627592	test-mlogloss:0.646221
[24]	train-mlogloss:0.622447	test-mlogloss:0.641828
[25]	train-mlogloss:0.618027	test-mlogloss:0.638092
[26]	train-mlogloss:0.614181	test-mlogloss:0.635053
[27]	train-mlogloss:0.61114	test-mlogloss:0.632717
[28]	train-mlogloss:0.607278	test-mlogloss:0.629888
[29]	train-mlogloss:0.603595	test-mlogloss:0.627116
[30]	train-mlogloss:0.600566	test-mlogloss:0.624912
[31]	train-mlogloss:0.597396	test-mlogloss:0.622441
[32]	train-mlogloss:0.594581	test-mlogloss:0.620373
[33]	train-mlogloss:0.591807	test-mlogloss:0.618497
[34]	train-mlogloss:0.589131	test-mlogloss:0.616384
[35]	train-mlogloss:0.586585	test-mlogloss:0.614496
[36]	train-mlogloss:0.583978	test-mlogloss:0.612716
[37]	train-mlogloss:0.582015	test-mlogloss:0.611317
[38]	train-mlogloss:0.579514	test-mlogloss:0.609588
[39]	train-mlogloss:0.576912	test-mlogloss:0.607814
[40]	train-mlogloss:0.574746	test-mlogloss:0.606454
[41]	train-mlogloss:0.572975	test-mlogloss:0.605284
[42]	train-mlogloss:0.570366	test-mlogloss:0.603354
[43]	train-mlogloss:0.568138	test-mlogloss:0.602107
[44]	train-mlogloss:0.565862	test-mlogloss:0.600475
[45]	train-mlogloss:0.564646	test-mlogloss:0.599563
[46]	train-mlogloss:0.562649	test-mlogloss:0.598221
[47]	train-mlogloss:0.560823	test-mlogloss:0.597094
[48]	train-mlogloss:0.559184	test-mlogloss:0.596101
[49]	train-mlogloss:0.557642	test-mlogloss:0.595268
[50]	train-mlogloss:0.555695	test-mlogloss:0.594217
[51]	train-mlogloss:0.553391	test-mlogloss:0.593256
[52]	train-mlogloss:0.551141	test-mlogloss:0.592129
[53]	train-mlogloss:0.549666	test-mlogloss:0.591489
[54]	train-mlogloss:0.547321	test-mlogloss:0.590389
[55]	train-mlogloss:0.546197	test-mlogloss:0.589846
[56]	train-mlogloss:0.544658	test-mlogloss:0.589096
[57]	train-mlogloss:0.543389	test-mlogloss:0.588546
[58]	train-mlogloss:0.541408	test-mlogloss:0.58737
[59]	train-mlogloss:0.540229	test-mlogloss:0.586951
[60]	train-mlogloss:0.538715	test-mlogloss:0.58633
[61]	train-mlogloss:0.537227	test-mlogloss:0.585638
[62]	train-mlogloss:0.535932	test-mlogloss:0.585132
[63]	train-mlogloss:0.534624	test-mlogloss:0.584407
[64]	train-mlogloss:0.533186	test-mlogloss:0.58367
[65]	train-mlogloss:0.531767	test-mlogloss:0.582788
[66]	train-mlogloss:0.530367	test-mlogloss:0.582063
[67]	train-mlogloss:0.529023	test-mlogloss:0.581331
[68]	train-mlogloss:0.527781	test-mlogloss:0.58068
[69]	train-mlogloss:0.526511	test-mlogloss:0.580342
[70]	train-mlogloss:0.525392	test-mlogloss:0.579888
[71]	train-mlogloss:0.52422	test-mlogloss:0.579319
[72]	train-mlogloss:0.523065	test-mlogloss:0.578852
[73]	train-mlogloss:0.522163	test-mlogloss:0.578434
[74]	train-mlogloss:0.520843	test-mlogloss:0.577687
[75]	train-mlogloss:0.520055	test-mlogloss:0.577254
[76]	train-mlogloss:0.519149	test-mlogloss:0.576857
[77]	train-mlogloss:0.517909	test-mlogloss:0.57638
[78]	train-mlogloss:0.516506	test-mlogloss:0.575721
[79]	train-mlogloss:0.515361	test-mlogloss:0.575472
[80]	train-mlogloss:0.514641	test-mlogloss:0.575183
[81]	train-mlogloss:0.513579	test-mlogloss:0.574743
[82]	train-mlogloss:0.512622	test-mlogloss:0.574371
[83]	train-mlogloss:0.511446	test-mlogloss:0.574089
[84]	train-mlogloss:0.510372	test-mlogloss:0.573719
[85]	train-mlogloss:0.509183	test-mlogloss:0.573575
[86]	train-mlogloss:0.508148	test-mlogloss:0.573277
[87]	train-mlogloss:0.50706	test-mlogloss:0.572957
[88]	train-mlogloss:0.50622	test-mlogloss:0.572635
[89]	train-mlogloss:0.505219	test-mlogloss:0.572276
[90]	train-mlogloss:0.504375	test-mlogloss:0.571933
[91]	train-mlogloss:0.503762	test-mlogloss:0.571746
[92]	train-mlogloss:0.502992	test-mlogloss:0.571413
[93]	train-mlogloss:0.502076	test-mlogloss:0.571129
[94]	train-mlogloss:0.500902	test-mlogloss:0.570822
[95]	train-mlogloss:0.500169	test-mlogloss:0.570567
[96]	train-mlogloss:0.499278	test-mlogloss:0.570131
[97]	train-mlogloss:0.498181	test-mlogloss:0.569639
[98]	train-mlogloss:0.497191	test-mlogloss:0.569336
[99]	train-mlogloss:0.496139	test-mlogloss:0.569146
[100]	train-mlogloss:0.495544	test-mlogloss:0.56896
[101]	train-mlogloss:0.494762	test-mlogloss:0.568668
[102]	train-mlogloss:0.493763	test-mlogloss:0.568456
[103]	train-mlogloss:0.492945	test-mlogloss:0.568271
[104]	train-mlogloss:0.491708	test-mlogloss:0.567905
[105]	train-mlogloss:0.490897	test-mlogloss:0.567701
[106]	train-mlogloss:0.490114	test-mlogloss:0.567514
[107]	train-mlogloss:0.48894	test-mlogloss:0.567149
[108]	train-mlogloss:0.488131	test-mlogloss:0.566846
[109]	train-mlogloss:0.487414	test-mlogloss:0.566577
[110]	train-mlogloss:0.486545	test-mlogloss:0.566364
[111]	train-mlogloss:0.485623	test-mlogloss:0.566043
[112]	train-mlogloss:0.484816	test-mlogloss:0.565925
[113]	train-mlogloss:0.484138	test-mlogloss:0.565711
[114]	train-mlogloss:0.483216	test-mlogloss:0.56544
[115]	train-mlogloss:0.482588	test-mlogloss:0.565323
[116]	train-mlogloss:0.481523	test-mlogloss:0.565
[117]	train-mlogloss:0.48092	test-mlogloss:0.564753
[118]	train-mlogloss:0.480238	test-mlogloss:0.564586
[119]	train-mlogloss:0.47942	test-mlogloss:0.564378
[120]	train-mlogloss:0.478738	test-mlogloss:0.564245
[121]	train-mlogloss:0.478011	test-mlogloss:0.56409
[122]	train-mlogloss:0.476949	test-mlogloss:0.56384
[123]	train-mlogloss:0.476118	test-mlogloss:0.563467
[124]	train-mlogloss:0.475843	test-mlogloss:0.563276
[125]	train-mlogloss:0.474954	test-mlogloss:0.562983
[126]	train-mlogloss:0.474088	test-mlogloss:0.562882
[127]	train-mlogloss:0.473533	test-mlogloss:0.562699
[128]	train-mlogloss:0.472967	test-mlogloss:0.562539
[129]	train-mlogloss:0.472171	test-mlogloss:0.562386
[130]	train-mlogloss:0.471264	test-mlogloss:0.562188
[131]	train-mlogloss:0.470706	test-mlogloss:0.562049
[132]	train-mlogloss:0.469903	test-mlogloss:0.561895
[133]	train-mlogloss:0.469176	test-mlogloss:0.561649
[134]	train-mlogloss:0.468483	test-mlogloss:0.561359
[135]	train-mlogloss:0.467675	test-mlogloss:0.561175
[136]	train-mlogloss:0.466944	test-mlogloss:0.560943
[137]	train-mlogloss:0.466573	test-mlogloss:0.560931
[138]	train-mlogloss:0.465994	test-mlogloss:0.560789
[139]	train-mlogloss:0.465236	test-mlogloss:0.560444
[140]	train-mlogloss:0.464364	test-mlogloss:0.560345
[141]	train-mlogloss:0.463396	test-mlogloss:0.560242
[142]	train-mlogloss:0.46274	test-mlogloss:0.560137
[143]	train-mlogloss:0.462101	test-mlogloss:0.55996
[144]	train-mlogloss:0.461377	test-mlogloss:0.559821
[145]	train-mlogloss:0.460638	test-mlogloss:0.559611
[146]	train-mlogloss:0.459958	test-mlogloss:0.559478
[147]	train-mlogloss:0.459362	test-mlogloss:0.559354
[148]	train-mlogloss:0.458515	test-mlogloss:0.559138
[149]	train-mlogloss:0.457808	test-mlogloss:0.559009
[150]	train-mlogloss:0.45738	test-mlogloss:0.558911
[151]	train-mlogloss:0.456855	test-mlogloss:0.55884
[152]	train-mlogloss:0.456063	test-mlogloss:0.558697
[153]	train-mlogloss:0.455421	test-mlogloss:0.558521
[154]	train-mlogloss:0.454662	test-mlogloss:0.558377
[155]	train-mlogloss:0.454117	test-mlogloss:0.558296
[156]	train-mlogloss:0.453326	test-mlogloss:0.558084
[157]	train-mlogloss:0.452753	test-mlogloss:0.557905
[158]	train-mlogloss:0.452359	test-mlogloss:0.557868
[159]	train-mlogloss:0.451707	test-mlogloss:0.557636
[160]	train-mlogloss:0.451068	test-mlogloss:0.557454
[161]	train-mlogloss:0.450408	test-mlogloss:0.557361
[162]	train-mlogloss:0.449685	test-mlogloss:0.557289
[163]	train-mlogloss:0.448961	test-mlogloss:0.557146
[164]	train-mlogloss:0.448501	test-mlogloss:0.557029
[165]	train-mlogloss:0.447691	test-mlogloss:0.556853
[166]	train-mlogloss:0.446992	test-mlogloss:0.556806
[167]	train-mlogloss:0.446296	test-mlogloss:0.556598
[168]	train-mlogloss:0.445686	test-mlogloss:0.556577
[169]	train-mlogloss:0.444956	test-mlogloss:0.556382
[170]	train-mlogloss:0.444435	test-mlogloss:0.556329
[171]	train-mlogloss:0.443592	test-mlogloss:0.556008
[172]	train-mlogloss:0.442805	test-mlogloss:0.555822
[173]	train-mlogloss:0.442412	test-mlogloss:0.555704
[174]	train-mlogloss:0.441773	test-mlogloss:0.555605
[175]	train-mlogloss:0.441135	test-mlogloss:0.555466
[176]	train-mlogloss:0.440742	test-mlogloss:0.555388
[177]	train-mlogloss:0.44027	test-mlogloss:0.555334
[178]	train-mlogloss:0.439462	test-mlogloss:0.555133
[179]	train-mlogloss:0.43881	test-mlogloss:0.554992
[180]	train-mlogloss:0.438174	test-mlogloss:0.554753
[181]	train-mlogloss:0.437383	test-mlogloss:0.554644
[182]	train-mlogloss:0.436838	test-mlogloss:0.554575
[183]	train-mlogloss:0.436125	test-mlogloss:0.554404
[184]	train-mlogloss:0.435588	test-mlogloss:0.554327
[185]	train-mlogloss:0.435114	test-mlogloss:0.55427
[186]	train-mlogloss:0.434355	test-mlogloss:0.554231
[187]	train-mlogloss:0.43382	test-mlogloss:0.554011
[188]	train-mlogloss:0.433208	test-mlogloss:0.553862
[189]	train-mlogloss:0.43253	test-mlogloss:0.553751
[190]	train-mlogloss:0.432027	test-mlogloss:0.553633
[191]	train-mlogloss:0.43148	test-mlogloss:0.553609
[192]	train-mlogloss:0.431025	test-mlogloss:0.553599
[193]	train-mlogloss:0.430441	test-mlogloss:0.553502
[194]	train-mlogloss:0.429787	test-mlogloss:0.553418
[195]	train-mlogloss:0.429262	test-mlogloss:0.553465
[196]	train-mlogloss:0.42865	test-mlogloss:0.553342
[197]	train-mlogloss:0.428045	test-mlogloss:0.553264
[198]	train-mlogloss:0.427341	test-mlogloss:0.553197
[199]	train-mlogloss:0.426563	test-mlogloss:0.552965
[200]	train-mlogloss:0.426066	test-mlogloss:0.552906
[201]	train-mlogloss:0.42541	test-mlogloss:0.552713
[202]	train-mlogloss:0.424861	test-mlogloss:0.552693
[203]	train-mlogloss:0.42421	test-mlogloss:0.552601
[204]	train-mlogloss:0.423567	test-mlogloss:0.552647
[205]	train-mlogloss:0.422962	test-mlogloss:0.552553
[206]	train-mlogloss:0.422326	test-mlogloss:0.552551
[207]	train-mlogloss:0.421518	test-mlogloss:0.55258
[208]	train-mlogloss:0.420897	test-mlogloss:0.552612
[209]	train-mlogloss:0.420392	test-mlogloss:0.552503
[210]	train-mlogloss:0.420065	test-mlogloss:0.552369
[211]	train-mlogloss:0.419603	test-mlogloss:0.55221
[212]	train-mlogloss:0.41903	test-mlogloss:0.552108
[213]	train-mlogloss:0.418522	test-mlogloss:0.551998
[214]	train-mlogloss:0.417667	test-mlogloss:0.551873
[215]	train-mlogloss:0.417187	test-mlogloss:0.551808
[216]	train-mlogloss:0.416637	test-mlogloss:0.551775
[217]	train-mlogloss:0.41618	test-mlogloss:0.55173
[218]	train-mlogloss:0.415826	test-mlogloss:0.55165
[219]	train-mlogloss:0.415501	test-mlogloss:0.551587
[220]	train-mlogloss:0.415265	test-mlogloss:0.551546
[221]	train-mlogloss:0.414692	test-mlogloss:0.551359
[222]	train-mlogloss:0.414234	test-mlogloss:0.551307
[223]	train-mlogloss:0.413624	test-mlogloss:0.551199
[224]	train-mlogloss:0.41308	test-mlogloss:0.551012
[225]	train-mlogloss:0.41247	test-mlogloss:0.550941
[226]	train-mlogloss:0.411947	test-mlogloss:0.550983
[227]	train-mlogloss:0.411371	test-mlogloss:0.550967
[228]	train-mlogloss:0.41081	test-mlogloss:0.550876
[229]	train-mlogloss:0.410216	test-mlogloss:0.550737
[230]	train-mlogloss:0.409747	test-mlogloss:0.550653
[231]	train-mlogloss:0.409131	test-mlogloss:0.550562
[232]	train-mlogloss:0.408654	test-mlogloss:0.55062
[233]	train-mlogloss:0.408119	test-mlogloss:0.550529
[234]	train-mlogloss:0.407361	test-mlogloss:0.550505
[235]	train-mlogloss:0.406824	test-mlogloss:0.550482
[236]	train-mlogloss:0.406348	test-mlogloss:0.55042
[237]	train-mlogloss:0.406023	test-mlogloss:0.550356
[238]	train-mlogloss:0.405309	test-mlogloss:0.550179
[239]	train-mlogloss:0.404664	test-mlogloss:0.55013
[240]	train-mlogloss:0.404285	test-mlogloss:0.550085
[241]	train-mlogloss:0.403685	test-mlogloss:0.55006
[242]	train-mlogloss:0.403308	test-mlogloss:0.549991
[243]	train-mlogloss:0.402697	test-mlogloss:0.549962
[244]	train-mlogloss:0.402272	test-mlogloss:0.549869
[245]	train-mlogloss:0.401685	test-mlogloss:0.549878
[246]	train-mlogloss:0.401243	test-mlogloss:0.549921
[247]	train-mlogloss:0.400637	test-mlogloss:0.549932
[248]	train-mlogloss:0.400319	test-mlogloss:0.549812
[249]	train-mlogloss:0.399861	test-mlogloss:0.549876
[250]	train-mlogloss:0.399276	test-mlogloss:0.549815
[251]	train-mlogloss:0.398666	test-mlogloss:0.549829
[252]	train-mlogloss:0.398211	test-mlogloss:0.549989
[253]	train-mlogloss:0.397705	test-mlogloss:0.549932
[254]	train-mlogloss:0.397121	test-mlogloss:0.550049
[255]	train-mlogloss:0.396528	test-mlogloss:0.550022
[256]	train-mlogloss:0.396249	test-mlogloss:0.550033
[257]	train-mlogloss:0.395951	test-mlogloss:0.549966
[258]	train-mlogloss:0.395331	test-mlogloss:0.549948
[259]	train-mlogloss:0.394668	test-mlogloss:0.549957
[260]	train-mlogloss:0.394171	test-mlogloss:0.549973
[261]	train-mlogloss:0.39384	test-mlogloss:0.549985
[262]	train-mlogloss:0.393273	test-mlogloss:0.550006
[263]	train-mlogloss:0.392843	test-mlogloss:0.5499
[264]	train-mlogloss:0.392273	test-mlogloss:0.549908
[265]	train-mlogloss:0.391828	test-mlogloss:0.549826
[266]	train-mlogloss:0.391468	test-mlogloss:0.549805
[267]	train-mlogloss:0.390976	test-mlogloss:0.549758
[268]	train-mlogloss:0.390481	test-mlogloss:0.549727
[269]	train-mlogloss:0.390038	test-mlogloss:0.549707
[270]	train-mlogloss:0.389536	test-mlogloss:0.549714
[271]	train-mlogloss:0.388936	test-mlogloss:0.549652
[272]	train-mlogloss:0.388576	test-mlogloss:0.549666
[273]	train-mlogloss:0.388062	test-mlogloss:0.549731
[274]	train-mlogloss:0.387869	test-mlogloss:0.549754
[275]	train-mlogloss:0.387572	test-mlogloss:0.549816
[276]	train-mlogloss:0.387073	test-mlogloss:0.549819
[277]	train-mlogloss:0.386474	test-mlogloss:0.54963
[278]	train-mlogloss:0.385841	test-mlogloss:0.549673
[279]	train-mlogloss:0.385482	test-mlogloss:0.549606
[280]	train-mlogloss:0.385114	test-mlogloss:0.549587
[281]	train-mlogloss:0.384674	test-mlogloss:0.54955
[282]	train-mlogloss:0.384137	test-mlogloss:0.549542
[283]	train-mlogloss:0.38372	test-mlogloss:0.549528
[284]	train-mlogloss:0.383234	test-mlogloss:0.549464
[285]	train-mlogloss:0.38272	test-mlogloss:0.549434
[286]	train-mlogloss:0.382295	test-mlogloss:0.549465
[287]	train-mlogloss:0.381834	test-mlogloss:0.549379
[288]	train-mlogloss:0.38132	test-mlogloss:0.54934
[289]	train-mlogloss:0.380894	test-mlogloss:0.549264
[290]	train-mlogloss:0.380498	test-mlogloss:0.549247
[291]	train-mlogloss:0.380062	test-mlogloss:0.549205
[292]	train-mlogloss:0.37965	test-mlogloss:0.549201
[293]	train-mlogloss:0.379019	test-mlogloss:0.549211
[294]	train-mlogloss:0.378508	test-mlogloss:0.549221
[295]	train-mlogloss:0.378046	test-mlogloss:0.549091
[296]	train-mlogloss:0.377815	test-mlogloss:0.549071
[297]	train-mlogloss:0.377491	test-mlogloss:0.549019
[298]	train-mlogloss:0.377001	test-mlogloss:0.549037
[299]	train-mlogloss:0.376494	test-mlogloss:0.549011
[300]	train-mlogloss:0.376066	test-mlogloss:0.548946
[301]	train-mlogloss:0.375527	test-mlogloss:0.548929
[302]	train-mlogloss:0.375013	test-mlogloss:0.54892
[303]	train-mlogloss:0.374521	test-mlogloss:0.549
[304]	train-mlogloss:0.373935	test-mlogloss:0.549171
[305]	train-mlogloss:0.373428	test-mlogloss:0.549223
[306]	train-mlogloss:0.373039	test-mlogloss:0.54916
[307]	train-mlogloss:0.372686	test-mlogloss:0.549035
[308]	train-mlogloss:0.37216	test-mlogloss:0.548995
[309]	train-mlogloss:0.371648	test-mlogloss:0.548941
[310]	train-mlogloss:0.371155	test-mlogloss:0.548814
[311]	train-mlogloss:0.370729	test-mlogloss:0.548765
[312]	train-mlogloss:0.37032	test-mlogloss:0.548888
[313]	train-mlogloss:0.369891	test-mlogloss:0.548985
[314]	train-mlogloss:0.369316	test-mlogloss:0.548926
[315]	train-mlogloss:0.368816	test-mlogloss:0.548971
[316]	train-mlogloss:0.368333	test-mlogloss:0.548876
[317]	train-mlogloss:0.368004	test-mlogloss:0.548885
[318]	train-mlogloss:0.367705	test-mlogloss:0.548927
[319]	train-mlogloss:0.367121	test-mlogloss:0.548788
[320]	train-mlogloss:0.366641	test-mlogloss:0.548706
[321]	train-mlogloss:0.366203	test-mlogloss:0.548571
[322]	train-mlogloss:0.365932	test-mlogloss:0.548489
[323]	train-mlogloss:0.365446	test-mlogloss:0.548531
[324]	train-mlogloss:0.365172	test-mlogloss:0.548617
[325]	train-mlogloss:0.364779	test-mlogloss:0.548644
[326]	train-mlogloss:0.364241	test-mlogloss:0.548594
[327]	train-mlogloss:0.363824	test-mlogloss:0.548602
[328]	train-mlogloss:0.3634	test-mlogloss:0.548548
[329]	train-mlogloss:0.363085	test-mlogloss:0.548491
[330]	train-mlogloss:0.362653	test-mlogloss:0.548437
[331]	train-mlogloss:0.362338	test-mlogloss:0.548367
[332]	train-mlogloss:0.361838	test-mlogloss:0.548419
[333]	train-mlogloss:0.361572	test-mlogloss:0.548516
[334]	train-mlogloss:0.361207	test-mlogloss:0.548434
[335]	train-mlogloss:0.360795	test-mlogloss:0.548389
[336]	train-mlogloss:0.360272	test-mlogloss:0.548249
[337]	train-mlogloss:0.359874	test-mlogloss:0.548235
[338]	train-mlogloss:0.359489	test-mlogloss:0.54823
[339]	train-mlogloss:0.358986	test-mlogloss:0.548271
[340]	train-mlogloss:0.358536	test-mlogloss:0.548283
[341]	train-mlogloss:0.358192	test-mlogloss:0.5482
[342]	train-mlogloss:0.357849	test-mlogloss:0.548229
[343]	train-mlogloss:0.357487	test-mlogloss:0.54821
[344]	train-mlogloss:0.356953	test-mlogloss:0.548181
[345]	train-mlogloss:0.356421	test-mlogloss:0.548106
[346]	train-mlogloss:0.355903	test-mlogloss:0.548063
[347]	train-mlogloss:0.355627	test-mlogloss:0.548068
[348]	train-mlogloss:0.355334	test-mlogloss:0.54803
[349]	train-mlogloss:0.354875	test-mlogloss:0.548005
[350]	train-mlogloss:0.354477	test-mlogloss:0.547958
[351]	train-mlogloss:0.354084	test-mlogloss:0.547862
[352]	train-mlogloss:0.353584	test-mlogloss:0.54775
[353]	train-mlogloss:0.353249	test-mlogloss:0.547744
[354]	train-mlogloss:0.35303	test-mlogloss:0.547778
[355]	train-mlogloss:0.352646	test-mlogloss:0.547696
[356]	train-mlogloss:0.352297	test-mlogloss:0.54783
[357]	train-mlogloss:0.351894	test-mlogloss:0.547775
[358]	train-mlogloss:0.351425	test-mlogloss:0.54786
[359]	train-mlogloss:0.350943	test-mlogloss:0.547774
[360]	train-mlogloss:0.350602	test-mlogloss:0.547771
[361]	train-mlogloss:0.350357	test-mlogloss:0.547768
[362]	train-mlogloss:0.34985	test-mlogloss:0.547881
[363]	train-mlogloss:0.349465	test-mlogloss:0.547835
[364]	train-mlogloss:0.348895	test-mlogloss:0.547832
[365]	train-mlogloss:0.348455	test-mlogloss:0.548
[366]	train-mlogloss:0.348064	test-mlogloss:0.547948
[367]	train-mlogloss:0.347629	test-mlogloss:0.548026
[368]	train-mlogloss:0.347153	test-mlogloss:0.547928
[369]	train-mlogloss:0.346734	test-mlogloss:0.547903
[370]	train-mlogloss:0.346251	test-mlogloss:0.547871
[371]	train-mlogloss:0.345869	test-mlogloss:0.547909
[372]	train-mlogloss:0.345424	test-mlogloss:0.547937
[373]	train-mlogloss:0.34505	test-mlogloss:0.548001
[374]	train-mlogloss:0.344615	test-mlogloss:0.547982
[375]	train-mlogloss:0.344206	test-mlogloss:0.54803
Stopping. Best iteration:
[355]	train-mlogloss:0.352646	test-mlogloss:0.547696

[0.54803037236074925]

Now let us build the final model and get the predictions on the test set.


In [10]:
preds, model = runXGB(train_X, train_y, test_X, num_rounds=400)
out_df = pd.DataFrame(preds)
out_df.columns = ["high", "medium", "low"]
out_df["listing_id"] = test_df.listing_id.values
out_df.to_csv("xgb_starter2.csv", index=False)

Hope this helps the python users as a good starting point.