Company XYZ is into creating productivity apps on cloud. Their apps are quite popular across the industry spectrum - large enterprises, small and medium companies and startups - all of them use their apps.
A big challenge that their sales team need to know is to know if the product is ready to be bought by a customer. The products can take anywhere from 3 months to a year to be created/updated. Given the current state of the product, the sales team want to know if customers will be ready to buy.
They have anonymized data from various apps - and know if customers have bought the product or not.
Can you help the enterprise sales team in this initiative?
The first step is to convert the business problem into an analytics problem.
The sales team wants to know if a customer will buy the product, given its current development stage. This is a propensity to buy model. This is a classification problem and the preferred output is the propensity of the customer to buy the product
The IT team has provided the data in a csv format. The file has the following fields
still_in_beta
- Is the product still in beta
bugs_solved_3_months
- Number of bugs solved in the last 3 months
bugs_solved_6_months
- Number of bugs solved in the last 3 months
bugs_solved_9_months
- Number of bugs solved in the last 3 months
num_test_accounts_internal
- Number of test accounts internal teams have
time_needed_to_ship
- Time needed to ship the product
num_test_accounts_external
- Number of customers who have test account
min_installations_per_account
- Minimum number of installations customer need to purchase
num_prod_installations
- Current number of installations that are in production
ready_for_enterprise
- Is the product ready for large enterprises
perf_dev_index
- The development performance index
perf_qa_index
- The QA performance index
sev1_issues_outstanding
- Number of severity 1 bugs outstanding
potential_prod_issue
- Is there a possibility of production issue
ready_for_startups
- Is the product ready for startups
ready_for_smb
- Is the product ready for small and medium businesses
sales_Q1
- Sales of product in last quarter
sales_Q2
- Sales of product 2 quarters ago
sales_Q3
- Sales of product 3 quarters ago
sales_Q4
- Sales of product 4 quarters ago
saas_offering_available
- Is a SaaS offering available
customer_bought
- Did the customer buy the product
Load the required libraries
In [ ]:
#code here
Load the data
In [29]:
#code here
#train = pd.read_csv
In [35]:
# View the first few rows
In [36]:
# What are the columns
In [37]:
# What are the column types?
In [38]:
# How many observations are there?
In [39]:
# View summary of the raw data
In [41]:
# Check for missing values. If they exist, treat them
In [42]:
# Single variate analysis
# histogram of target variable
In [43]:
# Bi-variate analysis
In [ ]:
In [44]:
# encode the categorical variables
In [ ]:
# Create train-test dataset
In [45]:
# Build decision tree model - depth 2
In [46]:
# Find accuracy of model
In [47]:
# Visualize decision tree
In [48]:
# Build decision tree model - depth none
In [49]:
# find accuracy of model
In [50]:
# Build random forest model
In [51]:
# Find accuracy model
In [52]:
# Bonus: Do cross-validation
In [ ]: