To do for 07282017:

  1. User_filter: Filter user based on certain features, e.g., consistent with theme, certain time of viewing, or certain time interval before each item viewing.
  2. Recommendation core: It will basically be the collaborative filter (CF), but instead of using real items, I'd like to use features extracted from CNN and dimension-reduced by tSNE to maybe 20 D.
  3. Processor: Input are a. log of user history b. item features Output are a. Top N rank of recommendation item for each user
  4. Evaluator: Evaluate whether the user buy the item within the top N rank of recommended items.

After trial run:

  • tSNE for this amount of sample and the dimension we want may not be feasible. Need to try small portion and time it or try PCA instead

In [1]:
import pandas as pd
import numpy as np
import os
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA

In [2]:
os.chdir('/Users/Walkon302/Desktop/deep-learning-models-master/view2buy')

In [3]:
# Read the preprocessed file, containing the user profile and item features from view2buy folder
df = pd.read_pickle('user_fea_for_eval.pkl')

In [4]:
df.head()


Out[4]:
0 user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features buy_features
0 2469583035\t4199682998971011301\t10013436\t334... 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...
1 2469583035\t4199682998971011301\t10013436\t334... 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...
2 2469583035\t4199682998971011301\t10013436\t334... 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...
3 1488725183\t4199682998971011301\t10013436\t334... 1488725183 4199682998971011301 10013436 334 235671027621670949 10003862 334 180564 1 22 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...
4 2469583035\t4199682998971011301\t10013436\t334... 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...

In [5]:
# Drop the first column, which is the original data format.
df.drop('0', axis = 1, inplace = True)

In [6]:
# Check the data
df.head()


Out[6]:
user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features buy_features
0 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...
1 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...
2 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...
3 1488725183 4199682998971011301 10013436 334 235671027621670949 10003862 334 180564 1 22 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...
4 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757...

In [7]:
# Remove the item that contains the seam buy and view spu to remove the bias
#df = df.query('buy_spu != view_spu')

In [8]:
# Slice the data into 10k items
df = df.iloc[0:10000, :]

In [9]:
# Calculate the average view features for all view items per user
avg_view_fea = pd.DataFrame(df.groupby(['user_id', 'buy_spu'])['view_secondes'].mean())

In [10]:
# Reset the index and rename the column
avg_view_fea.reset_index(inplace=True)
avg_view_fea.rename(columns = {'view_secondes':'avg_view_fea'}, inplace=True)

In [11]:
# Check the data
avg_view_fea.head()


Out[11]:
user_id buy_spu avg_view_fea
0 814009 77763563263074335 13.436364
1 1165283 77200616039542809 21.625000
2 9873479 77200616039542809 34.863636
3 63236390 292247525162119174 19.736842
4 76700950 95777984703225857 155.000000

In [12]:
# Merge avg item view into data
df = pd.merge(df, avg_view_fea, on=['user_id', 'buy_spu'])

In [13]:
# Calculate the weights for view item vec
df['weight_of_view'] = df['view_secondes']/df['avg_view_fea']

In [14]:
df.head()


Out[14]:
user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features buy_features avg_view_fea weight_of_view
0 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.299352
1 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.664113
2 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.317619
3 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.202121
4 2469583035 4199682998971011301 10013436 334 296751124749754369 10005367 334 427866 2 12 [0.066, 0.328, 0.043, 0.0, 0.062, 0.016, 0.303... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.346494

In [34]:
# Generate view_item_vec and buy_item_vec
view_buy_item_fea = pd.concat([df['view_features'], df['buy_features']], axis = 0)

In [36]:
view_buy_item_fea.shape


Out[36]:
(20000,)

Try TSNE and time it

  • It turns out that TSNE is too time consuming even for small set of data. It is also because of how I transformed the data. Thus, in the PCA, I used list in the beginning and then transform all data into numpy array at once, which is much faster.

In [ ]:
# Generate TSNE model
model = TSNE(n_components=10, random_state=0)

In [121]:
# Time the tSNE with 250 samples
%%time
a = pd.DataFrame()
for i, j in enumerate(view_item_vec.iloc[0:250]):
    a = pd.concat([a, pd.DataFrame(j).transpose()], axis = 0)
vt = model.fit_transform(a)


CPU times: user 22.3 s, sys: 501 ms, total: 22.8 s
Wall time: 22.8 s

In [114]:
# Time the tSNE with 500 samples
%%time
a = pd.DataFrame()
for i, j in enumerate(view_item_vec.iloc[0:500]):
    a = pd.concat([a, pd.DataFrame(j).transpose()], axis = 0)
vt = model.fit_transform(a)


CPU times: user 1min 23s, sys: 2.57 s, total: 1min 25s
Wall time: 1min 31s

In [113]:
# Time the tSNE with 1000 samples
%%time
a = pd.DataFrame()
for i, j in enumerate(view_item_vec.iloc[0:1000]):
    a = pd.concat([a, pd.DataFrame(j).transpose()], axis = 0)
vt = model.fit_transform(a)


CPU times: user 4min 25s, sys: 6.05 s, total: 4min 31s
Wall time: 4min 33s

Try PCA instead

  • PCA looks resonable. We can process 300k data around 30 secs if it does not blow up my RAM. I will proceed with this setting for first try

In [37]:
# Generate TSNE model
model = PCA(n_components=200, random_state=0)

Append all view_items for PCA processing


In [38]:
%%time
view_item = []
for i in view_buy_item_fea:
    view_item.append(i)
view_item= np.array(view_item)


CPU times: user 1.17 s, sys: 542 ms, total: 1.72 s
Wall time: 1.77 s

In [39]:
%%time
pca_view_vec = model.fit_transform(view_item)


CPU times: user 11 s, sys: 789 ms, total: 11.8 s
Wall time: 6.75 s

In [40]:
# 200 dimensions of PCA can explain 85% of variables. Beyond that, e.g., 300 D, my computer will run out of memory (8g)
sum(model.explained_variance_ratio_)


Out[40]:
0.90980608640901406

Append all buy_items for PCA processing


In [41]:
# Incert pca result to data
df['pca_view'] = pca_view_vec[0:10000].tolist()
df['pca_buy'] = pca_view_vec[10000:20000].tolist()

In [42]:
# Check the data
df.head()


Out[42]:
user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features buy_features avg_view_fea weight_of_view pca_view pca_buy
0 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.299352 [-4.18441352754, -4.98522684557, 7.40010898649... [-2.45874875255, 0.950284632032, 5.98234076728...
1 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.664113 [3.83780037191, 7.88132568231, 0.937903291471,... [-2.45874875255, 0.950284632032, 5.98234076728...
2 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.317619 [7.77165030033, 7.62261939761, -0.895622345806... [-2.45874875255, 0.950284632032, 5.98234076728...
3 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.202121 [-3.22576025396, -4.16373835223, 5.30225410798... [-2.45874875255, 0.950284632032, 5.98234076728...
4 2469583035 4199682998971011301 10013436 334 296751124749754369 10005367 334 427866 2 12 [0.066, 0.328, 0.043, 0.0, 0.062, 0.016, 0.303... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.346494 [1.43794255292, 10.9324726458, -2.09193256963,... [-2.45874875255, 0.950284632032, 5.98234076728...

In [44]:
# Check the data
df.head()


Out[44]:
user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features buy_features avg_view_fea weight_of_view pca_view pca_buy weighted_view_pca
0 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.299352 [-4.18441352754, -4.98522684557, 7.40010898649... [-2.45874875255, 0.950284632032, 5.98234076728... [-5.43702523761, -6.47756346168, 9.61534491173...
1 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.664113 [3.83780037191, 7.88132568231, 0.937903291471,... [-2.45874875255, 0.950284632032, 5.98234076728... [2.5487336589, 5.23409195283, 0.6228739007, -1...
2 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.317619 [7.77165030033, 7.62261939761, -0.895622345806... [-2.45874875255, 0.950284632032, 5.98234076728... [2.4684263476, 2.42109125239, -0.284466967819,...
3 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.202121 [-3.22576025396, -4.16373835223, 5.30225410798... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.651995148561, -0.841580586219, 1.071698974...
4 2469583035 4199682998971011301 10013436 334 296751124749754369 10005367 334 427866 2 12 [0.066, 0.328, 0.043, 0.0, 0.062, 0.016, 0.303... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.346494 [1.43794255292, 10.9324726458, -2.09193256963,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.498238197476, 3.78803412829, -0.72484169177...

Save the file for further processing


In [45]:
#df.to_pickle('top10k_user_pca.pkl')

In [46]:
# Define function
def dot(K, L):
    if len(K) != len(L): return 0
    return sum(i[0]*i[1] for i in zip(K, L))

def similarity(item_1, item_2):
    return dot(item_1, item_2) / np.sqrt(dot(item_1, item_1)*dot(item_2, item_2))

def average(lists):
    return [np.mean(i) for i in zip(*[l for l in lists])]

In [109]:
df = pd.read_pickle('top10k_user_pca.pkl')

In [128]:
# Calculate the weighted pca_view
df['weighted_view'] = df.apply(lambda x: [y*x['weight_of_view'] for y in x['buy_features']], axis=1)

In [99]:
df.head()


Out[99]:
user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features buy_features avg_view_fea weight_of_view pca_view pca_buy weighted_view_pca
0 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.299352 [-4.18441352754, -4.98522684557, 7.40010898649... [-2.45874875255, 0.950284632032, 5.98234076728... [-5.43702523761, -6.47756346168, 9.61534491173...
1 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.664113 [3.83780037191, 7.88132568231, 0.937903291471,... [-2.45874875255, 0.950284632032, 5.98234076728... [2.5487336589, 5.23409195283, 0.6228739007, -1...
2 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.317619 [7.77165030033, 7.62261939761, -0.895622345806... [-2.45874875255, 0.950284632032, 5.98234076728... [2.4684263476, 2.42109125239, -0.284466967819,...
3 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.202121 [-3.22576025396, -4.16373835223, 5.30225410798... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.651995148561, -0.841580586219, 1.071698974...
4 2469583035 4199682998971011301 10013436 334 296751124749754369 10005367 334 427866 2 12 [0.066, 0.328, 0.043, 0.0, 0.062, 0.016, 0.303... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.346494 [1.43794255292, 10.9324726458, -2.09193256963,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.498238197476, 3.78803412829, -0.72484169177...

In [110]:
ori_user_fea = df.groupby(['user_id'])['view_features'].apply(lambda x: average(x))

In [111]:
ori_user_fea = pd.DataFrame(ori_user_fea)

In [112]:
ori_user_fea=ori_user_fea.reset_index()

In [113]:
df = pd.merge(df, ori_user_fea, on='user_id')

In [114]:
df.head()


Out[114]:
user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features_x buy_features avg_view_fea weight_of_view pca_view pca_buy weighted_view_pca view_features_y
0 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.299352 [-4.18441352754, -4.98522684557, 7.40010898649... [-2.45874875255, 0.950284632032, 5.98234076728... [-5.43702523761, -6.47756346168, 9.61534491173... [0.195346938776, 0.549204081633, 0.08559183673...
1 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.664113 [3.83780037191, 7.88132568231, 0.937903291471,... [-2.45874875255, 0.950284632032, 5.98234076728... [2.5487336589, 5.23409195283, 0.6228739007, -1... [0.195346938776, 0.549204081633, 0.08559183673...
2 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.317619 [7.77165030033, 7.62261939761, -0.895622345806... [-2.45874875255, 0.950284632032, 5.98234076728... [2.4684263476, 2.42109125239, -0.284466967819,... [0.195346938776, 0.549204081633, 0.08559183673...
3 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.202121 [-3.22576025396, -4.16373835223, 5.30225410798... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.651995148561, -0.841580586219, 1.071698974... [0.195346938776, 0.549204081633, 0.08559183673...
4 2469583035 4199682998971011301 10013436 334 296751124749754369 10005367 334 427866 2 12 [0.066, 0.328, 0.043, 0.0, 0.062, 0.016, 0.303... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.346494 [1.43794255292, 10.9324726458, -2.09193256963,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.498238197476, 3.78803412829, -0.72484169177... [0.195346938776, 0.549204081633, 0.08559183673...

In [115]:
df.rename(columns = {'view_features_y':'user_features'}, inplace = True)

In [106]:
df.head()


Out[106]:
user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features buy_features avg_view_fea weight_of_view pca_view pca_buy weighted_view_pca_x user_features
0 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.299352 [-4.18441352754, -4.98522684557, 7.40010898649... [-2.45874875255, 0.950284632032, 5.98234076728... [-5.43702523761, -6.47756346168, 9.61534491173... [-0.488648716682, 2.68093043688, 0.09369500442...
1 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.664113 [3.83780037191, 7.88132568231, 0.937903291471,... [-2.45874875255, 0.950284632032, 5.98234076728... [2.5487336589, 5.23409195283, 0.6228739007, -1... [-0.488648716682, 2.68093043688, 0.09369500442...
2 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.317619 [7.77165030033, 7.62261939761, -0.895622345806... [-2.45874875255, 0.950284632032, 5.98234076728... [2.4684263476, 2.42109125239, -0.284466967819,... [-0.488648716682, 2.68093043688, 0.09369500442...
3 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.202121 [-3.22576025396, -4.16373835223, 5.30225410798... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.651995148561, -0.841580586219, 1.071698974... [-0.488648716682, 2.68093043688, 0.09369500442...
4 2469583035 4199682998971011301 10013436 334 296751124749754369 10005367 334 427866 2 12 [0.066, 0.328, 0.043, 0.0, 0.062, 0.016, 0.303... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.346494 [1.43794255292, 10.9324726458, -2.09193256963,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.498238197476, 3.78803412829, -0.72484169177... [-0.488648716682, 2.68093043688, 0.09369500442...

In [118]:
df['sim'] = df.apply(lambda x: similarity(x['buy_features'], x['user_features']), axis=1)

In [119]:
df.head()


Out[119]:
user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features_x buy_features avg_view_fea weight_of_view pca_view pca_buy weighted_view_pca user_features sim
0 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.299352 [-4.18441352754, -4.98522684557, 7.40010898649... [-2.45874875255, 0.950284632032, 5.98234076728... [-5.43702523761, -6.47756346168, 9.61534491173... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427
1 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.664113 [3.83780037191, 7.88132568231, 0.937903291471,... [-2.45874875255, 0.950284632032, 5.98234076728... [2.5487336589, 5.23409195283, 0.6228739007, -1... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427
2 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.317619 [7.77165030033, 7.62261939761, -0.895622345806... [-2.45874875255, 0.950284632032, 5.98234076728... [2.4684263476, 2.42109125239, -0.284466967819,... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427
3 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.202121 [-3.22576025396, -4.16373835223, 5.30225410798... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.651995148561, -0.841580586219, 1.071698974... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427
4 2469583035 4199682998971011301 10013436 334 296751124749754369 10005367 334 427866 2 12 [0.066, 0.328, 0.043, 0.0, 0.062, 0.016, 0.303... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.346494 [1.43794255292, 10.9324726458, -2.09193256963,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.498238197476, 3.78803412829, -0.72484169177... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427

In [120]:
df['rank'] = df.groupby('user_id')['sim'].rank(ascending=False)

In [121]:
df


Out[121]:
user_id buy_spu buy_sn buy_ct3 view_spu view_sn view_ct3 time_interval view_cnt view_secondes view_features_x buy_features avg_view_fea weight_of_view pca_view pca_buy weighted_view_pca user_features sim rank
0 2469583035 4199682998971011301 10013436 334 220189917005230097 10013861 334 37496 7 45 [0.621, 0.542, 0.0, 0.369, 0.062, 0.039, 0.103... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.299352 [-4.18441352754, -4.98522684557, 7.40010898649... [-2.45874875255, 0.950284632032, 5.98234076728... [-5.43702523761, -6.47756346168, 9.61534491173... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
1 2469583035 4199682998971011301 10013436 334 234826617504419925 10003862 334 170826 2 23 [0.15, 0.98, 0.104, 1.295, 0.111, 0.0, 0.0, 0.... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.664113 [3.83780037191, 7.88132568231, 0.937903291471,... [-2.45874875255, 0.950284632032, 5.98234076728... [2.5487336589, 5.23409195283, 0.6228739007, -1... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
2 2469583035 4199682998971011301 10013436 334 235671027621670949 10003862 334 426968 2 11 [0.106, 0.027, 0.0, 1.398, 0.096, 0.021, 0.072... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.317619 [7.77165030033, 7.62261939761, -0.895622345806... [-2.45874875255, 0.950284632032, 5.98234076728... [2.4684263476, 2.42109125239, -0.284466967819,... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
3 2469583035 4199682998971011301 10013436 334 245522675097001998 10026364 334 83993 2 7 [0.019, 1.415, 0.007, 0.088, 0.055, 0.015, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.202121 [-3.22576025396, -4.16373835223, 5.30225410798... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.651995148561, -0.841580586219, 1.071698974... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
4 2469583035 4199682998971011301 10013436 334 296751124749754369 10005367 334 427866 2 12 [0.066, 0.328, 0.043, 0.0, 0.062, 0.016, 0.303... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.346494 [1.43794255292, 10.9324726458, -2.09193256963,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.498238197476, 3.78803412829, -0.72484169177... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
5 2469583035 4199682998971011301 10013436 334 317580251858771991 10013436 334 79637 1 2 [0.001, 1.924, 0.067, 2.464, 0.0, 0.0, 0.157, ... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.057749 [-0.367162703196, -2.91958595455, 0.5847105535... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.0212032674798, -0.168603078106, 0.03376643... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
6 2469583035 4199682998971011301 10013436 334 36105301270949891 10026364 334 84018 3 16 [0.274, 0.376, 0.0, 0.004, 0.052, 0.074, 0.161... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.461992 [-1.46307008459, -3.68472813598, 5.32224960831... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.675926308966, -1.70231400036, 2.4588354112... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
7 2469583035 4199682998971011301 10013436 334 437770064827043954 10013861 334 80933 2 32 [0.038, 0.239, 0.0, 0.253, 0.196, 0.081, 0.278... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.923984 [-1.82415989318, -1.23091214842, 5.06093770573... [-2.45874875255, 0.950284632032, 5.98234076728... [-1.6854936432, -1.13734251545, 4.67622293611,... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
8 2469583035 4199682998971011301 10013436 334 452688234602614802 10021072 334 427802 2 6 [0.006, 0.723, 0.004, 2.523, 0.212, 0.039, 0.3... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.173247 [-6.22685705728, 2.40710692796, 0.261954686567... [-2.45874875255, 0.950284632032, 5.98234076728... [-1.07878372118, 0.417023828415, 0.04538283903... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
9 2469583035 4199682998971011301 10013436 334 453251182129659946 10013861 334 80226 2 6 [0.208, 0.378, 0.114, 1.377, 0.022, 0.108, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.173247 [4.00748782169, -0.241093712069, 4.0440555338,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.694284867163, -0.0417687397455, 0.700620110... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
10 2469583035 4199682998971011301 10013436 334 72415488038302024 10005711 334 102609 4 21 [0.333, 0.451, 0.014, 0.82, 0.022, 0.0, 0.313,... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.606364 [3.50097392161, -2.14662424013, 2.79187793981,... [-2.45874875255, 0.950284632032, 5.98234076728... [2.12286515341, -1.30163603011, 1.69289475549,... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
11 2469583035 4199682998971011301 10013436 334 76919168096198689 10025260 334 171131 4 16 [0.075, 1.102, 0.192, 1.189, 0.054, 0.0, 0.018... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.461992 [1.80311468372, -2.35211510197, 3.84788577135,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.833024108446, -1.08665777251, 1.77769148187... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
12 2469583035 4199682998971011301 10013436 334 8239247893762572 10013861 334 102580 2 19 [0.272, 0.416, 0.0, 0.606, 0.133, 0.068, 0.166... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.548615 [2.40601665774, -3.67632215044, -5.17040716982... [-2.45874875255, 0.950284632032, 5.98234076728... [1.31997731783, -2.01688622396, -2.83656398061... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
13 2469583035 4199682998971011301 10013436 334 950054293748604928 10008016 334 102400 2 8 [0.02, 0.638, 0.0, 0.134, 0.416, 0.0, 0.277, 0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.230996 [-2.76193822699, -3.19007653548, 7.9538861513,... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.637996337643, -0.736894520864, 1.837314891... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
14 2469583035 4199682998971011301 10013436 334 220752886051713030 10020640 334 427976 1 4 [0.0, 0.0, 0.0, 0.005, 0.263, 0.0, 0.22, 0.174... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.115498 [7.85679054587, -0.165140911316, 5.05460776737... [-2.45874875255, 0.950284632032, 5.98234076728... [0.907443103706, -0.01907343466, 0.58379677218... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
15 2469583035 4199682998971011301 10013436 334 28786936017735750 10003862 334 170588 2 13 [0.302, 0.72, 0.012, 2.366, 0.589, 0.036, 0.08... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.375368 [1.28446799655, 8.87881763408, -0.945953957059... [-2.45874875255, 0.950284632032, 5.98234076728... [0.482148564409, 3.33282665463, -0.35508112589... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
16 2469583035 4199682998971011301 10013436 334 4762914438493188103 10005949 334 427982 2 8 [0.044, 0.008, 0.194, 0.694, 0.969, 0.0, 0.071... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.230996 [-4.21465256809, 6.35749004706, -4.96431279687... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.973567358097, 1.46855397669, -1.1467357786... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
17 2469583035 4199682998971011301 10013436 334 438895960837050506 10003862 334 170697 2 24 [0.182, 0.911, 0.052, 1.607, 0.12, 0.015, 0.30... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.692988 [3.79007144015, 8.87173025452, -1.55532755486,... [-2.45874875255, 0.950284632032, 5.98234076728... [2.62647260673, 6.14799928068, -1.07782274868,... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
18 2469583035 4199682998971011301 10013436 334 454095613458173960 10026544 334 427923 2 8 [0.672, 0.48, 0.071, 0.712, 0.86, 0.312, 1.107... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.230996 [-3.03985729056, 13.1633366306, 2.79320556286,... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.702194494931, 3.04067646387, 0.64521896325... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
19 2469583035 4199682998971011301 10013436 334 225819416539443204 10013861 334 37792 13 127 [0.073, 0.066, 0.314, 2.166, 1.301, 0.0, 0.517... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 3.667060 [-8.33775080655, -2.19965376914, -1.5935272987... [-2.45874875255, 0.950284632032, 5.98234076728... [-30.5750284438, -8.06626128777, -5.8435594459... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
20 2469583035 4199682998971011301 10013436 334 34134967298666501 10013861 334 37823 11 123 [0.034, 0.006, 0.033, 0.098, 0.046, 0.047, 0.6... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 3.551562 [-3.37173229725, -5.93734062946, -4.5080672179... [-2.45874875255, 0.950284632032, 5.98234076728... [-11.9749148825, -21.0868308626, -16.010678328... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
21 2469583035 4199682998971011301 10013436 334 454940062104215564 10020205 334 483661 2 56 [0.845, 0.239, 0.0, 2.8, 0.044, 0.521, 0.267, ... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.616971 [3.98194478229, 5.55701743187, -3.36777741249,... [-2.45874875255, 0.950284632032, 5.98234076728... [6.43868973636, 8.98553673131, -5.44559883316,... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
22 2469583035 4199682998971011301 10013436 334 12179884428992908 10000351 334 54882 2 7 [0.089, 1.367, 0.023, 0.654, 0.146, 0.0, 0.019... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.202121 [3.57062987594, -0.44335464274, 3.70895070819,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.721700676162, -0.0896114569593, 0.749658275... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
23 2469583035 4199682998971011301 10013436 334 88178162066800642 10013861 334 80970 3 21 [0.125, 0.185, 0.021, 0.942, 0.224, 0.081, 0.3... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.606364 [-6.31059509845, 7.59024212546, -1.91023405497... [-2.45874875255, 0.950284632032, 5.98234076728... [-3.82651877213, 4.6024508822, -1.1582974912, ... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
24 2469583035 4199682998971011301 10013436 334 462539836183756981 10004555 334 84173 2 6 [0.325, 0.102, 0.076, 0.366, 0.164, 0.312, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.173247 [-1.82994997828, 3.76151673254, 3.13311658097,... [-2.45874875255, 0.950284632032, 5.98234076728... [-0.317033172431, 0.651671136928, 0.5428027547... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
25 2469583035 4199682998971011301 10013436 334 320394998366306352 10008016 334 50597 2 8 [0.307, 0.0, 0.048, 1.174, 0.056, 0.354, 0.031... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.230996 [8.1096725664, -2.23454573936, -1.93492549683,... [-2.45874875255, 0.950284632032, 5.98234076728... [1.87330091104, -0.516170848455, -0.4469598083... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
26 2469583035 4199682998971011301 10013436 334 5706006606757903 10013436 334 84371 2 8 [0.689, 0.268, 0.006, 1.339, 0.039, 0.0, 0.077... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.230996 [2.69615942529, 1.00275749583, 0.486848790327,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.622801705784, 0.231632845237, 0.11246006235... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
27 2469583035 4199682998971011301 10013436 334 3454179992723457 10014872 334 83838 5 59 [0.152, 0.624, 0.044, 0.675, 0.164, 0.124, 0.3... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.703595 [0.206020878981, 6.6807386033, -6.95796511638,... [-2.45874875255, 0.950284632032, 5.98234076728... [0.350976052524, 11.3812700661, -11.8535516508... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
28 2469583035 4199682998971011301 10013436 334 290277198631026692 10013861 334 80378 2 9 [0.045, 0.257, 0.012, 0.426, 0.603, 0.441, 0.0... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 0.259870 [-10.1439861524, 0.772793521962, 11.3013917106... [-2.45874875255, 0.950284632032, 5.98234076728... [-2.63612132775, 0.200826130339, 2.9368967262,... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
29 2469583035 4199682998971011301 10013436 334 311950754375856148 10013861 334 81080 2 50 [0.107, 1.376, 0.065, 0.861, 0.911, 0.089, 0.2... [0.091, 0.805, 0.0, 0.591, 0.981, 0.026, 0.757... 34.632653 1.443724 [-5.66954022489, 0.26036710884, 6.40483324181,... [-2.45874875255, 0.950284632032, 5.98234076728... [-8.18525253446, 0.375898300918, 9.2468128712,... [0.195346938776, 0.549204081633, 0.08559183673... 0.801427 25.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9970 871432929 290558656145858680 10013861 334 83956022528651293 10026544 334 15331 1 3 [0.354, 0.382, 0.256, 1.829, 0.925, 0.172, 0.3... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 17.000000 0.176471 [3.49047539536, 16.5840682219, -1.15512756602,... [-3.7937365243, -5.60668119482, 0.832048406295... [0.615966246239, 2.92660027445, -0.20384604106... [0.167333333333, 0.52675, 0.274833333333, 0.87... 0.815213 6.5
9971 871432929 290558656145858680 10013861 334 440021864176185352 10010280 334 314303 4 11 [0.097, 0.13, 0.008, 0.391, 0.12, 0.0, 0.593, ... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 17.000000 0.647059 [-1.67681618892, -2.60049841111, -0.6523886136... [-3.7937365243, -5.60668119482, 0.832048406295... [-1.08499871048, -1.68267544249, -0.4221338088... [0.167333333333, 0.52675, 0.274833333333, 0.87... 0.815213 6.5
9972 871432929 290558656145858680 10013861 334 290277189414735877 10010280 334 314249 1 3 [0.071, 0.984, 0.985, 0.337, 0.922, 0.031, 0.0... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 17.000000 0.176471 [-1.89712396572, 2.11717990689, -3.69516929565... [-3.7937365243, -5.60668119482, 0.832048406295... [-0.334786582186, 0.373619983568, -0.652088699... [0.167333333333, 0.52675, 0.274833333333, 0.87... 0.815213 6.5
9973 871432929 290558656145858680 10013861 334 290558656145858680 10013861 334 0 10 62 [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 17.000000 3.647059 [-3.7937365243, -5.60668119482, 0.832048406295... [-3.7937365243, -5.60668119482, 0.832048406295... [-13.8359802651, -20.4478961223, 3.03452948178... [0.167333333333, 0.52675, 0.274833333333, 0.87... 0.815213 6.5
9974 2563906467 290558656145858680 10013861 334 292528995948396605 10012320 334 86176 1 13 [0.085, 1.649, 0.138, 1.288, 0.212, 0.425, 0.0... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 0.470395 [-6.23595393298, 10.8776077311, -1.80801795356... [-3.7937365243, -5.60668119482, 0.832048406295... [-2.93335990926, 5.11676942614, -0.85048212946... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9975 2563906467 290558656145858680 10013861 334 303225012826706301 10003609 334 86679 4 54 [0.005, 0.316, 0.0, 0.025, 0.056, 0.059, 0.081... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 1.953947 [-5.62900849928, -3.00239824039, 9.54033875126... [-3.7937365243, -5.60668119482, 0.832048406295... [-10.998786344, -5.86652814076, 18.6413197969,... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9976 2563906467 290558656145858680 10013861 334 452125314765950984 10003609 334 86641 2 21 [0.176, 0.32, 0.0, 0.067, 0.012, 0.109, 0.0, 0... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 0.759868 [0.444018282158, -2.53729495793, 10.5607251398... [-3.7937365243, -5.60668119482, 0.832048406295... [0.337395470982, -1.92801031343, 8.02476153715... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9977 2563906467 290558656145858680 10013861 334 441147765855838216 10011214 334 520122 2 13 [0.135, 0.399, 0.055, 0.148, 0.786, 0.171, 0.1... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 0.470395 [-3.03838901262, 0.4517757802, 1.36351455112, ... [-3.7937365243, -5.60668119482, 0.832048406295... [-1.42924220002, 0.212512949239, 0.64139006845... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9978 2563906467 290558656145858680 10013861 334 88178162066800642 10013861 334 89369 1 5 [0.125, 0.185, 0.021, 0.942, 0.224, 0.081, 0.3... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 0.180921 [-6.31059509845, 7.59024212546, -1.91023405497... [-3.7937365243, -5.60668119482, 0.832048406295... [-1.14171950794, 1.37323459507, -0.34560155599... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9979 2563906467 290558656145858680 10013861 334 2046804444680242 10014085 334 520233 4 17 [0.51, 0.095, 0.003, 1.169, 0.013, 0.012, 0.21... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 0.615132 [-5.70323632693, 3.36945929692, 4.60129316391,... [-3.7937365243, -5.60668119482, 0.832048406295... [-3.50824076689, 2.07266081751, 2.83040072912,... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9980 2563906467 290558656145858680 10013861 334 81704231277211648 10013861 334 89311 1 3 [0.094, 0.806, 0.0, 1.224, 0.249, 0.119, 0.486... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 0.108553 [-3.70562903689, -4.80629183326, 1.59928371259... [-3.7937365243, -5.60668119482, 0.832048406295... [-0.40225578361, -0.521735626637, 0.1736064556... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9981 2563906467 290558656145858680 10013861 334 309980448694890506 10013861 334 89166 1 4 [0.437, 0.241, 0.0, 0.13, 0.041, 0.367, 0.057,... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 0.144737 [-2.13300347547, -0.379708994452, 6.6075365902... [-3.7937365243, -5.60668119482, 0.832048406295... [-0.308724187239, -0.054957880776, 0.956353980... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9982 2563906467 290558656145858680 10013861 334 303506523124518925 10012320 334 86107 1 3 [0.283, 0.602, 0.026, 0.549, 0.069, 0.427, 1.0... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 0.108553 [0.399318855038, 10.8723704107, 3.27472079495,... [-3.7937365243, -5.60668119482, 0.832048406295... [0.0433471125535, 1.18022441958, 0.35547955997... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9983 2563906467 290558656145858680 10013861 334 4718441391384219663 10003609 334 88457 6 72 [0.523, 0.76, 0.01, 0.015, 0.062, 0.203, 0.038... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 2.605263 [-0.983176096713, -3.06501367504, 10.016504079... [-3.7937365243, -5.60668119482, 0.832048406295... [-2.56143246249, -7.98516720602, 26.0956290503... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9984 2563906467 290558656145858680 10013861 334 290558656145858680 10013861 334 0 8 99 [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 27.636364 3.582237 [-3.7937365243, -5.60668119482, 0.832048406295... [-3.7937365243, -5.60668119482, 0.832048406295... [-13.5900627466, -20.084459938, 2.98059445545,... [0.223818181818, 0.567909090909, 0.02490909090... 0.880700 6.0
9985 2137081044 290558656145858680 10013861 334 308291544650145820 10005711 334 3477 2 93 [0.083, 0.616, 0.117, 0.453, 0.262, 0.017, 0.1... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 52.000000 1.788462 [-3.94442103567, -7.0736364084, 2.15259415416,... [-3.7937365243, -5.60668119482, 0.832048406295... [-7.05444531379, -12.6509266535, 3.84983185263... [0.086, 0.745, 0.069, 0.2395, 0.214, 0.0085, 0... 0.958882 1.5
9986 2137081044 290558656145858680 10013861 334 290558656145858680 10013861 334 0 8 11 [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 52.000000 0.211538 [-3.7937365243, -5.60668119482, 0.832048406295... [-3.7937365243, -5.60668119482, 0.832048406295... [-0.802521187833, -1.18602871429, 0.1760102397... [0.086, 0.745, 0.069, 0.2395, 0.214, 0.0085, 0... 0.958882 1.5
9987 2558955643 290558656145858680 10013861 334 81141277469483041 10013861 334 191252 2 44 [0.066, 0.231, 0.0, 0.0, 0.017, 0.055, 0.191, ... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 71.250000 0.617544 [-5.79231649849, -4.72559454869, 8.24826488342... [-3.7937365243, -5.60668119482, 0.832048406295... [-3.57700948679, -2.91826189674, 5.09366533152... [0.12675, 0.532125, 0.034, 0.567625, 0.213125,... 0.897744 4.5
9988 2558955643 290558656145858680 10013861 334 301817659561656323 10014085 334 180904 1 27 [0.266, 1.038, 0.009, 0.366, 0.004, 0.001, 0.1... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 71.250000 0.378947 [-5.13747346736, 2.90166587351, -7.56054486264... [-3.7937365243, -5.60668119482, 0.832048406295... [-1.94683205079, 1.0995786468, -2.86504857953,... [0.12675, 0.532125, 0.034, 0.567625, 0.213125,... 0.897744 4.5
9989 2558955643 290558656145858680 10013861 334 24283335231729669 10016029 334 17848 1 6 [0.096, 0.914, 0.03, 1.673, 0.269, 0.017, 0.19... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 71.250000 0.084211 [-7.06578461154, -1.58672575577, 0.11495284367... [-3.7937365243, -5.60668119482, 0.832048406295... [-0.595013440971, -0.133619011012, 0.009680239... [0.12675, 0.532125, 0.034, 0.567625, 0.213125,... 0.897744 4.5
9990 2558955643 290558656145858680 10013861 334 221034358932148238 10014085 334 189866 1 8 [0.068, 0.0, 0.008, 0.495, 0.121, 0.041, 0.0, ... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 71.250000 0.112281 [-3.38283097744, 7.29873248241, -4.54115096363... [-3.7937365243, -5.60668119482, 0.832048406295... [-0.379826636063, 0.819506805042, -0.509883616... [0.12675, 0.532125, 0.034, 0.567625, 0.213125,... 0.897744 4.5
9991 2558955643 290558656145858680 10013861 334 30475791522332674 10005367 334 18051 1 51 [0.178, 0.475, 0.1, 0.531, 0.249, 0.0, 0.273, ... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 71.250000 0.715789 [-8.13617044063, 8.87591309674, 6.88162797842,... [-3.7937365243, -5.60668119482, 0.832048406295... [-5.82378515751, 6.35328516399, 4.92579686876,... [0.12675, 0.532125, 0.034, 0.567625, 0.213125,... 0.897744 4.5
9992 2558955643 290558656145858680 10013861 334 106192561257627661 10021212 334 189765 1 10 [0.0, 0.274, 0.099, 0.942, 0.429, 0.0, 1.077, ... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 71.250000 0.140351 [-3.12090338812, 7.67401078584, 4.87773895866,... [-3.7937365243, -5.60668119482, 0.832048406295... [-0.438021528158, 1.07705414538, 0.68459494156... [0.12675, 0.532125, 0.034, 0.567625, 0.213125,... 0.897744 4.5
9993 2558955643 290558656145858680 10013861 334 3735669113737219 10013861 334 17893 1 7 [0.251, 0.451, 0.005, 0.508, 0.45, 0.115, 0.0,... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 71.250000 0.098246 [-7.38894768244, -2.20228332419, 4.18104699726... [-3.7937365243, -5.60668119482, 0.832048406295... [-0.725931702135, -0.216364677464, 0.410769529... [0.12675, 0.532125, 0.034, 0.567625, 0.213125,... 0.897744 4.5
9994 2558955643 290558656145858680 10013861 334 290558656145858680 10013861 334 0 17 417 [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 71.250000 5.852632 [-3.7937365243, -5.60668119482, 0.832048406295... [-3.7937365243, -5.60668119482, 0.832048406295... [-22.2033421843, -32.8138394139, 4.8696727779,... [0.12675, 0.532125, 0.034, 0.567625, 0.213125,... 0.897744 4.5
9995 1167581542 290558656145858680 10013861 334 11335471174639631 10005367 334 3581 1 21 [0.163, 0.804, 0.019, 1.469, 0.0, 0.183, 0.41,... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 51.000000 0.411765 [3.59619857085, 13.2789304134, -0.170385541312... [-3.7937365243, -5.60668119482, 0.832048406295... [1.48078764682, 5.46779487609, -0.070158752305... [0.117333333333, 0.723666666667, 0.01833333333... 0.898082 2.0
9996 1167581542 290558656145858680 10013861 334 95777984703225857 10013861 334 2951 2 24 [0.1, 0.493, 0.015, 0.392, 0.275, 0.099, 0.175... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 51.000000 0.470588 [-3.93677408066, -6.67276770352, -8.7396192115... [-3.7937365243, -5.60668119482, 0.832048406295... [-1.85259956737, -3.14012597813, -4.1127619818... [0.117333333333, 0.723666666667, 0.01833333333... 0.898082 2.0
9997 1167581542 290558656145858680 10013861 334 290558656145858680 10013861 334 0 9 108 [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 51.000000 2.117647 [-3.7937365243, -5.60668119482, 0.832048406295... [-3.7937365243, -5.60668119482, 0.832048406295... [-8.03379499264, -11.872971942, 1.76198486039,... [0.117333333333, 0.723666666667, 0.01833333333... 0.898082 2.0
9998 2405799965 290558656145858680 10013861 334 290558656145858680 10013861 334 0 12 193 [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 101.000000 1.910891 [-3.7937365243, -5.60668119482, 0.832048406295... [-3.7937365243, -5.60668119482, 0.832048406295... [-7.24941731872, -10.7137571347, 1.5899538853,... [0.0885, 1.0, 0.024, 0.2125, 0.274, 0.007, 0.0... 0.951220 1.5
9999 2405799965 290558656145858680 10013861 334 238204322164195330 10012320 334 20434 1 9 [0.088, 1.126, 0.027, 0.399, 0.382, 0.014, 0.0... [0.089, 0.874, 0.021, 0.026, 0.166, 0.0, 0.097... 101.000000 0.089109 [-5.98737706378, -3.02645432026, 2.05082193244... [-3.7937365243, -5.60668119482, 0.832048406295... [-0.533528649247, -0.26968404834, 0.1827465088... [0.0885, 1.0, 0.024, 0.2125, 0.274, 0.007, 0.0... 0.951220 1.5

10000 rows × 20 columns


In [122]:
float(len(df.query('buy_spu == view_spu & rank <= 6')))/float(len(df.query('buy_spu == view_spu'))) * 100


Out[122]:
35.76642335766424

It seems that PCA did not alter rank, and the similarity is also lower than directly use the original features. Also, weighted PCA features did not alter rank as well.

Need to proceed to CF and other methods.


In [ ]: