Purchase frequency

In this notebook we create a table grouping transaction information by customers’ purchase frequency. This is done with functions from the pandas librairie such as df.groupby() and df.cut().


In [30]:
import pandas as pd
import contiamo

Query data into Contiamo


In [31]:
transactions = %contiamo query query:sql:48590597:411:g71GXzJjsx4Uvad11ouKjoYbQUNNPy-qRMKkBNZfyx4
customers = %contiamo query query:sql:48590597:441:MG5W2dMjXzYgsHsgdQYzmhv44dxEQX2Lodu5Uh2Hx_s
applications = %contiamo query query:sql:48590597:442:-gz3nbw1fdmtSXkD4zGNA-cVa7s6sQtRn8upCSn6uys

Select data from march 2017


In [32]:
transactions = transactions.loc[(transactions['Field transaction date date'] <= 20170331) & (transactions['Field transaction date date'] >= 20170301)]

In [27]:
df = pd.DataFrame({
        'customer_id' : transactions['Field customer id'],
        'revenue' : pd.to_numeric(transactions['Field app price']),
        'n_of_purchases' : [1]*len(transactions)
    })
df.head()


Out[27]:
customer_id n_of_purchases revenue
0 2F65fA7Cd10cbFFD 1 4.99
1 2F65fA7Cd10cbFFD 1 0.19
2 2F65fA7Cd10cbFFD 1 0.99
5 44dEeeb9eEF3ff64 1 14.99
8 7FeFbCBe6147a9ff 1 0.19

Create an intermediate table


In [28]:
total_users = df['customer_id'].nunique()

#Grouping by customer id
pf = df.groupby('customer_id').agg({
        'revenue' : 'sum',
        'n_of_purchases' : 'sum'
    }).reset_index()

#Selection of data to create 'No. of Purchases', 'No. of UAUs'and 'Total Revenue'columns 
x = pf.groupby('n_of_purchases').agg({
    'customer_id': 'size',
    'revenue' : 'sum',
})

x = x.reset_index()

#Renaming columns
x.rename(columns = {'customer_id': 'n_of_users'}, inplace=True)

#Creation of '% of Total UAUs' and 'Total transations'columns
x['percentage_of_users'] = 100 * x['n_of_users'] / total_users

x['total_transactions'] = x['n_of_purchases'] * x['n_of_users']

#Creation of 'No. of Purchases' buckets 
x['n_of_purchases'] = pd.cut(x['n_of_purchases'], bins=[0,1,2,3,5,8,13,21,34,55,float('inf')]) 


x.head()


Out[28]:
n_of_purchases n_of_users revenue percentage_of_users total_transactions
0 (0, 1] 13656 108827.53 45.504832 13656
1 (1, 2] 4848 67683.44 16.154615 9696
2 (2, 3] 2697 60800.95 8.987004 8091
3 (3, 5] 1734 40058.73 5.778074 6936
4 (3, 5] 1621 47674.62 5.401533 8105

Create final table


In [29]:
#Grouping data by 'No. of Purchases' buckets
y = x.groupby('n_of_purchases').agg({
    'revenue': 'sum',
    'n_of_users': 'sum',
    'percentage_of_users' : 'sum',
    'total_transactions': 'sum',
})

y = y.reset_index()


#Creation of 'Average revenue per user' and 'Average transaction price' colums
y['average_revenue_per_user'] = y['revenue'] / y['n_of_users']
y['average_transaction_price'] = y['revenue'] / y['total_transactions']


#Putting the colums in the right order
cols = ['n_of_purchases','n_of_users', 'percentage_of_users', 'revenue', 'average_revenue_per_user', 'total_transactions', 'average_transaction_price']
y = y[cols]

#Renaming the 'No. of Purchases' buckets
y['n_of_purchases']=y['n_of_purchases'].astype(str)
y = y.replace({'n_of_purchases': {
            '(0, 1]': '1',
            '(1, 2]' : '2',
            '(2, 3]' : '3', 
            '(3, 5]' : '4-5',
            '(5, 8]' : '6-8',
            '(8, 13]' : '9-13',
            '(13, 21]' : '14-21',
            '(21, 34]' : '22-34',
            '(34, 55]' : '35-55',
            '(55, inf]' : 'Above 55',
        }})

y


Out[29]:
n_of_purchases n_of_users percentage_of_users revenue average_revenue_per_user total_transactions average_transaction_price
0 1 13656 45.504832 108827.53 7.969210 13656 7.969210
1 2 4848 16.154615 67683.44 13.961106 9696 6.980553
2 3 2697 8.987004 60800.95 22.543919 8091 7.514640
3 4-5 3355 11.179607 87733.35 26.150030 15041 5.832947
4 6-8 2160 7.197601 67805.61 31.391486 15094 4.492223
5 9-13 1501 5.001666 52535.05 35.000033 16605 3.163809
6 14-21 927 3.088970 43901.67 47.358867 16169 2.715175
7 22-34 575 1.916028 41079.43 71.442487 15946 2.576159
8 35-55 200 0.666445 23325.54 116.627700 9070 2.571724
9 Above 55 91 0.303232 43065.94 473.252088 20571 2.093527

Interpretation: The maximum revenue is generated by people buying apps only once a month (low frequency buyers). We can now identify buying behaviours according to purchase frequency.