Data Science Intensive: Capstone Proposal

Background

The two projects I had chosen was based on Upwork and an insurance client I have. For Upwork, I was interested in analyzing my cover letters, skill sets, acquired jobs, acquired interviews and predicting what jobs would be the optimal fit for me to apply to. I wanted to maximize the utility of the 60 connects I had every month. The insurance client I have is interested in retention and growing their business, and wanted a better understanding of the data they had stored. As a project, I wanted to analyze an insurance data set offered on Kaggle.com by Prudential Life Insurance. The premise of the competition was to predict the risk response of a client based on a normalized dataset of current clients.

Predicting risk rating for life insurance applicants - Capstone project

The normalized dataset contains continuous data based on height, age, BMI. Categorical (nominal) datasets based on risk response rating (1-7), medical histories, etc.

The main dependent variable is the Risk Response (1-8).

Project Goals

The problem I want to solve is to create a machine learning algorithm that predicts risk response based on a trained classifier set.

Dataset and competitition info:

https://www.kaggle.com/c/prudential-life-insurance-assessment

Predicting retention risk for insurance brokerage - Brokerage project

My client is an insurance broker and has been in business for 21 years. They have collected a significant amount of data on their clients, insurance agencies, products and sales representatives over the past 20 years. Due to confidentiality concerns, I cannot release any data although the strategies imployed will be discussed.

By being able to find risk rating correlations between life insurance applicants, I can use the methods learned in this course and project to facilitate a discussion on how my client’s problem of improving retention can be resolved using similar methods.

I will be exploring their Applied Systems TAMS software (a insurance brokerage management tool). The tool produces various reports they have and determining an action plan for a data analytics project. This data can be exported into CSV format. The deliverables would be a memo outlining an approach to solving my client’s retention problem and a machine learning algorithm for predicting risk response in the Kaggle dataset.

Project Goals

A goal for their project would be to understand how retention rates differ for different product and customer segments and how best to improve the retention rates for the upcoming months. Determining their high value clients and the ones most likely not to renew would form a significant part of their retention focused strategy.

Capstone project outline

Theme

My capstone project will be a hybrid of two different projects. The life insurance project will be to implement machine learning algorithms to predict risk rating. The brokerage project will be to outline a proposal for a data analytics project based on my initial exploration study of their systems. My reasoning is that there could be useful patterns in analyzing the life insurance data that could be useful in the brokerage project. I will attempt to secure a paid work project based on what is learned in this course.

Data collection

Life insurance data set - Kaggle

The Kaggle dataset is already fairly clean. It does require separation and exploration of the data into different categories.

Insurance brokerage data - Applied Systems TAMS

I need to interview the key executives at the company, particularly the controller, to understand what reports they use and how they use them to make decisions. I also need an understanding how their data is collected and if there are any problems in data entry. I need to present a data project proposal that can help meet their retention and growth requirements. Information will also be collected on how their data can be exported, the most used reports, and figuring out a plan to extract, transform and load the datasets into something useful.

Deliverables

  1. Machine learning algorithm for predicting risk rating in life insurance applicants. iPython Notebook
  2. Project proposal for insurance brokerage on data analytics project improving retention rates and growth opportunities at the firm. iPython Notebook
  3. Slide deck