Capstone milestone report questions

  1. An introduction to the problem: What is the problem? Who is the Client? (Feel free to reuse points 1-2 from your proposal document)
  2. A deeper dive into the data set:
    1. What important fields and information does the data set have?
    2. What are its limitations i.e. what are some questions that you cannot answer with this data set?
    3. What kind of cleaning and wrangling did you need to do?
    4. Are there other datasets you can find, use and combine with, to answer the questions that matter?
    5. Any preliminary exploration you’ve performed and your initial findings. Test the hypotheses one at a time. Often, the data story emerges as a result of a sequence of testing hypothesis e.g. You first tested if X was true, and because it wasn't, you tried Y, which turned out to be true.
    6. Based on these findings, what approach are you going to take? How has your approach changed from what you initially proposed, if applicable?

Data Science Intensive: Capstone Milestone Report

Background

I had decided to pursue a hybrid of two projects. Analyzing life insurance applicant data for predicting risk rating. Preparing a data analytics project proposal for an insurance brokerage.

The deliverables would be:

  • Prediction algorithms to determine risk rating for life insurance applicants.
  • Project proposal for insurance brokerage data analytics project.

Life Insurance Deep Dive

There are 59380 rows in the dataset.

  • Product Info
    • Product_Info_1 has 2 categories. Category, Rowcount = [(1,57816) where row count = (2,1565)]
    • Product_Info_2 has
    • Product_Info_2 has
    • Product_Info_2 has

Insurance brokerage Deep Dive

After observing the client's database I came to the following workflow and processes to assist my client in meeting their business goals.

  1. Data cleanup/transformation.
    1. Observed duplicates, missing data, information not properly filled in etc.
    2. Need to investigate platform on how to perform mass changes and what is required to be changed
    3. Need to investigate if SQL database can be directly queried or there is an API to connect
  2. Data exploration
    1. Perform ETL processes on TAM data using Python
    2. Identification of data types (continuous, discrete, categorical etc.)
    3. Identification of data features related to retention and cross-selling goals
  3. Data analytics
    1. Basic descriptive statistics on
      1. products
      2. representatives
      3. sales activities
      4. claims losses
      5. premium revenues
    2. Basic Tables/charts -> top 20%, histograms, pie charts
    3. Retention rates of different premium brackets
      1. New policies/Total policies, Lost policies/Total Policies
    4. Customer segments (building 1st and 2nd order models) 1, Preimum brackets
      1. Combinations of meta data
      2. Income bracket, postal code, city, province, gender, age, personal? commercial? both?
    5. Discrete, continuous, and categorical time series signatures of “customer features”.
    6. Experimentation with machine learning and predictive models simple linear regression, SVM, decision trees
  4. Data visualization
    1. Excel, Qlikview, or Tableau dashboards… TBD after exploration and further needs assessments
  5. Management Consulting
    1. Recommending reporting, decision-making and operating procedures/policies on retention and product cross-selling
    2. Identifying an appropriate reporting and analytics toolchain and workflow for the company

CAPSTONE PROPOSAL - DISREGARD - USED AS reference

Where I would perform machine learning analysis on an existing life insurance applicant data set, and take the learning patterns toThe problem I want to solve is to create a machine learning algorithm that predicts risk response based on a trained classifier set. A goal for their project would be to understand how retention rates differ for different product and customer segments and how best to improve the retention rates for the upcoming months. Determining their high value clients and the ones most likely not to renew would form a significant part of their retention focused strategy.

The insurance client I have is interested in retention and growing their business, and wanted a better understanding of the data they had stored. As a project, I wanted to analyze an insurance data set offered on Kaggle.com by Prudential Life Insurance. The premise of the competition was to predict the risk response of a client based on a normalized dataset of current clients.

Predicting risk rating for life insurance applicants - Capstone project

The normalized dataset contains continuous data based on height, age, BMI. Categorical (nominal) datasets based on risk response rating (1-7), medical histories, etc.

The main dependent variable is the Risk Response (1-8).

Project Goals

The problem I want to solve is to create a machine learning algorithm that predicts risk response based on a trained classifier set.

Predicting retention risk for insurance brokerage - Brokerage project

My client is an insurance broker and has been in business for 21 years. They have collected a significant amount of data on their clients, insurance agencies, products and sales representatives over the past 20 years. Due to confidentiality concerns, I cannot release any data although the strategies imployed will be discussed.

By being able to find risk rating correlations between life insurance applicants, I can use the methods learned in this course and project to facilitate a discussion on how my client’s problem of improving retention can be resolved using similar methods.

I will be exploring their Applied Systems TAMS software (a insurance brokerage management tool). The tool produces various reports they have and determining an action plan for a data analytics project. This data can be exported into CSV format. The deliverables would be a memo outlining an approach to solving my client’s retention problem and a machine learning algorithm for predicting risk response in the Kaggle dataset.

Project Goals

A goal for their project would be to understand how retention rates differ for different product and customer segments and how best to improve the retention rates for the upcoming months. Determining their high value clients and the ones most likely not to renew would form a significant part of their retention focused strategy.

Capstone project outline

Theme

My capstone project will be a hybrid of two different projects. The life insurance project will be to implement machine learning algorithms to predict risk rating. The brokerage project will be to outline a proposal for a data analytics project based on my initial exploration study of their systems. My reasoning is that there could be useful patterns in analyzing the life insurance data that could be useful in the brokerage project. I will attempt to secure a paid work project based on what is learned in this course.

Data collection

Life insurance data set - Kaggle

The Kaggle dataset is already fairly clean. It does require separation and exploration of the data into different categories.

Insurance brokerage data - Applied Systems TAMS

I need to interview the key executives at the company, particularly the controller, to understand what reports they use and how they use them to make decisions. I also need an understanding how their data is collected and if there are any problems in data entry. I need to present a data project proposal that can help meet their retention and growth requirements. Information will also be collected on how their data can be exported, the most used reports, and figuring out a plan to extract, transform and load the datasets into something useful.

Deliverables

  1. Machine learning algorithm for predicting risk rating in life insurance applicants. iPython Notebook
  2. Project proposal for insurance brokerage on data analytics project improving retention rates and growth opportunities at the firm. iPython Notebook
  3. Slide deck

In [ ]: