Propensity to Buy

Company XYZ is into creating productivity apps on cloud. Their apps are quite popular across the industry spectrum - large enterprises, small and medium companies and startups - all of them use their apps.

A big challenge that their sales team need to know is to know if the product is ready to be bought by a customer. The products can take anywhere from 3 months to a year to be created/updated. Given the current state of the product, the sales team want to know if customers will be ready to buy.

They have anonymized data from various apps - and know if customers have bought the product or not.

Can you help the enterprise sales team in this initiative?

1. Frame

The first step is to convert the business problem into an analytics problem.

The sales team wants to know if a customer will buy the product, given its current development stage. This is a propensity to buy model. This is a classification problem and the preferred output is the propensity of the customer to buy the product

2. Acquire

The IT team has provided the data in a csv format. The file has the following fields

still_in_beta - Is the product still in beta
bugs_solved_3_months - Number of bugs solved in the last 3 months
bugs_solved_6_months - Number of bugs solved in the last 3 months
bugs_solved_9_months - Number of bugs solved in the last 3 months
num_test_accounts_internal - Number of test accounts internal teams have
time_needed_to_ship - Time needed to ship the product
num_test_accounts_external - Number of customers who have test account
min_installations_per_account - Minimum number of installations customer need to purchase
num_prod_installations - Current number of installations that are in production
ready_for_enterprise - Is the product ready for large enterprises
perf_dev_index - The development performance index
perf_qa_index - The QA performance index
sev1_issues_outstanding - Number of severity 1 bugs outstanding
potential_prod_issue - Is there a possibility of production issue
ready_for_startups - Is the product ready for startups
ready_for_smb - Is the product ready for small and medium businesses
sales_Q1 - Sales of product in last quarter
sales_Q2 - Sales of product 2 quarters ago
sales_Q3 - Sales of product 3 quarters ago
sales_Q4 - Sales of product 4 quarters ago
saas_offering_available - Is a SaaS offering available
customer_bought - Did the customer buy the product

Load the required libraries


In [ ]:
#code here

Load the data


In [29]:
#code here
#train = pd.read_csv

3. Refine


In [35]:
# View the first few rows

In [36]:
# What are the columns

In [37]:
# What are the column types?

In [38]:
# How many observations are there?

In [39]:
# View summary of the raw data

In [41]:
# Check for missing values. If they exist, treat them

4. Explore


In [42]:
# Single variate analysis

# histogram of target variable

In [43]:
# Bi-variate analysis

In [ ]:

5. Transform


In [44]:
# encode the categorical variables

6. Model


In [ ]:
# Create train-test dataset

In [45]:
# Build decision tree model - depth 2

In [46]:
# Find accuracy of model

In [47]:
# Visualize decision tree

In [48]:
# Build decision tree model - depth none

In [49]:
# find accuracy of model

In [50]:
# Build random forest model

In [51]:
# Find accuracy model

In [52]:
# Bonus: Do cross-validation

In [ ]: