I had decided to pursue a hybrid of two projects. Analyzing life insurance applicant data for predicting risk rating. Preparing a data analytics project proposal for an insurance brokerage.
The deliverables would be:
There are 59380 rows in the dataset.
After observing the client's database I came to the following workflow and processes to assist my client in meeting their business goals.
Where I would perform machine learning analysis on an existing life insurance applicant data set, and take the learning patterns toThe problem I want to solve is to create a machine learning algorithm that predicts risk response based on a trained classifier set. A goal for their project would be to understand how retention rates differ for different product and customer segments and how best to improve the retention rates for the upcoming months. Determining their high value clients and the ones most likely not to renew would form a significant part of their retention focused strategy.
The insurance client I have is interested in retention and growing their business, and wanted a better understanding of the data they had stored. As a project, I wanted to analyze an insurance data set offered on Kaggle.com by Prudential Life Insurance. The premise of the competition was to predict the risk response of a client based on a normalized dataset of current clients.
The normalized dataset contains continuous data based on height, age, BMI. Categorical (nominal) datasets based on risk response rating (1-7), medical histories, etc.
The main dependent variable is the Risk Response (1-8).
The problem I want to solve is to create a machine learning algorithm that predicts risk response based on a trained classifier set.
My client is an insurance broker and has been in business for 21 years. They have collected a significant amount of data on their clients, insurance agencies, products and sales representatives over the past 20 years. Due to confidentiality concerns, I cannot release any data although the strategies imployed will be discussed.
By being able to find risk rating correlations between life insurance applicants, I can use the methods learned in this course and project to facilitate a discussion on how my client’s problem of improving retention can be resolved using similar methods.
I will be exploring their Applied Systems TAMS software (a insurance brokerage management tool). The tool produces various reports they have and determining an action plan for a data analytics project. This data can be exported into CSV format. The deliverables would be a memo outlining an approach to solving my client’s retention problem and a machine learning algorithm for predicting risk response in the Kaggle dataset.
A goal for their project would be to understand how retention rates differ for different product and customer segments and how best to improve the retention rates for the upcoming months. Determining their high value clients and the ones most likely not to renew would form a significant part of their retention focused strategy.
My capstone project will be a hybrid of two different projects. The life insurance project will be to implement machine learning algorithms to predict risk rating. The brokerage project will be to outline a proposal for a data analytics project based on my initial exploration study of their systems. My reasoning is that there could be useful patterns in analyzing the life insurance data that could be useful in the brokerage project. I will attempt to secure a paid work project based on what is learned in this course.
The Kaggle dataset is already fairly clean. It does require separation and exploration of the data into different categories.
I need to interview the key executives at the company, particularly the controller, to understand what reports they use and how they use them to make decisions. I also need an understanding how their data is collected and if there are any problems in data entry. I need to present a data project proposal that can help meet their retention and growth requirements. Information will also be collected on how their data can be exported, the most used reports, and figuring out a plan to extract, transform and load the datasets into something useful.
In [ ]: