Customer segmentation

Suppose we want to cluster our customers according to their buying behavior, e.g. their purchase frequency. The Kmeans function in scikit is one way to do that.


In [2]:
import pandas as pd
import contiamo
import datetime

import seaborn as sns
import numpy as np

from sklearn.cluster import KMeans
from sklearn import datasets

Query data into Contiamo


In [3]:
transactions = %contiamo query query:sql:48590597:411:g71GXzJjsx4Uvad11ouKjoYbQUNNPy-qRMKkBNZfyx4
customers = %contiamo query query:sql:48590597:441:MG5W2dMjXzYgsHsgdQYzmhv44dxEQX2Lodu5Uh2Hx_s
applications = %contiamo query query:sql:48590597:442:-gz3nbw1fdmtSXkD4zGNA-cVa7s6sQtRn8upCSn6uys

In [4]:
df = pd.DataFrame({
        'customer_id' : transactions['Field customer id'],
        'revenue' : pd.to_numeric(transactions['Field app price']),
        'n_of_purchases' : [1]*len(transactions)
    })
df.head()


Out[4]:
customer_id n_of_purchases revenue
0 2F65fA7Cd10cbFFD 1 4.99
1 2F65fA7Cd10cbFFD 1 0.19
2 2F65fA7Cd10cbFFD 1 0.99
3 44dEeeb9eEF3ff64 1 19.99
4 44dEeeb9eEF3ff64 1 14.99

Clustering


In [5]:
pf = df.groupby('customer_id').agg({
        'revenue' : 'sum',
        'n_of_purchases' : 'sum'
    })
pf['average_transaction_price'] = pf['revenue'] / pf['n_of_purchases']

del pf['revenue']

pf.head()


Out[5]:
n_of_purchases average_transaction_price
customer_id
000088f2dD0AC5B0 3 1.730000
00031218886cf4c3 3 0.260000
0003774A9BaEceB1 3 4.990000
000a27a08c360eD6 6 1.273333
000a7ac571aaC3e9 15 1.818667

In [6]:
%matplotlib inline
sns.set(style="ticks", color_codes=True)

In [7]:
g = sns.pairplot(pf)



In [12]:
estimator = KMeans(n_clusters=4)

estimator.fit(pf)
labels = estimator.labels_
pf["labels"] = labels

In [13]:
g = sns.pairplot(pf, vars=['n_of_purchases', 'average_transaction_price'], hue="labels")


We see that 4 groups have been created:

  • One group of customer having a low purchase frequency and buying expensive apps (label 0);
  • One group having a high purchase frequency and buying cheap apps (label 1);
  • One group of customers having a low purchase frequency and buying cheap apps (label 2);
  • One group of customer having an intermediate purchase frequency (label 4).

A new column has been added in our dataframe containing that information. So we could observe each group’s buying behavior even further by uploading the new dataframe into Contiamo and exploring the data filtered by cluster in the Explore section.