Smoker’s Propensity Model
Digital^Shift prototypes an ML model that effectively predicts if a life insurance applicant is a smoker.
Life insurance applicants are required to disclose their tobacco usage; however, they have a significant incentive and propensity to lie about whether they smoke. For Equitable life, this issue can result in mispricing life insurance policies during the underwriting process, potentially resulting in losses.
Conventional measures for mitigating this risk consist of fluid tests for nicotine metabolites, which require significant time and lab resources, resulting in higher overhead costs for each applicant. Equitable Life needed another solution – a reliable and reasonably accurate model to predict whether applicants were smokers without the need for corroborative data from labs. When the COVID-19 pandemic reduced access to lab resources for non-essential testing, this need became even more urgent.
We immediately recognized the challenge in correctly identifying smokers as a classification problem suited for applying machine learning algorithms. However, designing a machine learning model that met Equitable Life’s requirements was far from straightforward. The model must make cost-sensitive predictions based on a small, imbalanced data set. More significantly, the cost of a false negative (incorrectly identifying an applicant as a smoker) outweighed the cost of a false positive (incorrectly identifying an applicant as a non-smoker) by twenty-five times.
That is where Digital^Shift came in. We performed data exploration and requirements analysis for a smoker propensity model. Next, we identified relevant data available to Equitable Life, evaluated their suitability based on the balance of the quantity of data and its predictive value, and recommended potential avenues to obtain additional data that could improve model accuracy.
We then worked with Equitable Life to develop a prototype smoker propensity model based on our findings. We took measures to account for the challenges involved in making predictions based on the available data: we made careful selections of algorithms suitable for training with smaller data sets, and we experimented with undersampling and oversampling techniques to determine the most effective remedies for balancing the data set. Ultimately, our model achieved prediction rates that demonstrated its practical feasibility and cost-effectiveness for Equitable Life's use case.