TalkingData -- Feature Importance subclass of devices without events

In this notebook, we explore the predictive power of the features via xgb. We restrict to devices for which no event information is available.


In [1]:
%matplotlib inline
import matplotlib.patches as mpatches
import matplotlib.pylab as plt

import numpy as np
import operator
import pandas as pd
import pickle
import xgboost as xgb

from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelEncoder
from xgboost.sklearn import XGBClassifier

#path to data, features and models
DATA_PATH = "../../../input/"
FEATURE_PATH = "../../../features/"

#seed for randomness
SEED = 1747

############################################
###XGB PARAMETERS
############################################

#parameters for xgb fitting
FIT_PARAMS = {
'verbose_eval': 100, 
'early_stopping_rounds': 10,
'num_boost_round': 700,
}

#params for xgb
HYPER_PARAMS = {
"objective": "multi:softprob",
"num_class": 12,
'eval_metric': 'mlogloss', 
}

HYPER_PARAMS = dict(HYPER_PARAMS , ** {'gamma': 5.0751583955640074e-08, 'nthread': 2, 'subsample': 0.35000000000000003,
                                       'colsample_bytree': 0.6000000000000001, 
        'max_depth': 6.0, 'seed': 1747, 'max_delta_step': 4.5, 'learning_rate': 0.032, 'reg_lambda': 2.9837914959522025e-10, 
        'objective': 'multi:softmax', 'min_child_weight': 2.144646982692753e-06, "n_estimators": 280,
        'reg_alpha': 1.223491951048497e-10})

We load the group labels and the features generated in previous notebooks.


In [10]:
train = pd.read_csv('{0}gender_age_train.csv'.format(DATA_PATH))
features = pickle.load(open('{0}phone_model_features.p'.format(FEATURE_PATH), 'rb'))
names = pickle.load(open('{0}phone_model_features_names.p'.format(FEATURE_PATH), 'rb'))

The features extracted in the previous steps are stacked together and a validation set is created.


In [11]:
labels = LabelEncoder().fit_transform(train['group'])

X_train, X_val, y_train, y_val = train_test_split(features[:train.shape[0], :], labels, 
                                                  stratify = labels, train_size = 0.8, random_state = SEED)
X_train.shape


Out[11]:
(59715, 39)

This training data is used as input for the xgb classifier.


In [12]:
dtrain = xgb.DMatrix(X_train, y_train)
dvalid = xgb.DMatrix(X_val, y_val)

gbm = xgb.train(HYPER_PARAMS, dtrain, evals = [(dtrain, 'train'), (dvalid, 'eval')], **FIT_PARAMS )


Will train until eval error hasn't decreased in 10 rounds.
[0]	train-mlogloss:2.479140	eval-mlogloss:2.479334
[100]	train-mlogloss:2.365048	eval-mlogloss:2.389616
Stopping. Best iteration:
[111]	train-mlogloss:2.362857	eval-mlogloss:2.389399

Now, we use importance scores to identify powerful predictors. We see that the device model has a substantially higher predictive power than the other features.


In [9]:
df = pd.Series(gbm.get_fscore(), name = 'fscore').sort_values(ascending = False)

#add true column names
feat_num = [int(col_idx.replace('f','')) for col_idx in df.index]
df.index = names[feat_num]

#set color and draw chart
cols = np.where(['device_model' in word for word in df.index], 'r', 'w')
ax = df[::-1].plot(kind = 'barh', figsize = (20, 15), color = cols[::-1])

ax.set_title('Importance Scores')
red_patch = mpatches.Patch(color='r', label='device-model')
blue_patch = mpatches.Patch(color='w', label='phone-brand/device-pref')
ax.legend(handles=[red_patch, blue_patch], bbox_to_anchor=(1, 0.9))


Out[9]:
<matplotlib.legend.Legend at 0x7fd66b5791d0>

In [ ]: