Abstract:
Universal health coverage is a crucial step to ensure the good health and wellbeing of members of any society. However, in developing countries like Tanzania, health care systems are highly reliant on out-of-pocket payments, a mechanism that is a barrier to universal health coverage because it contributes to inefficiencies, inequity, and cost. To solve this challenge, people are encouraged to enroll in health insurance schemes to reduce the burden of out-of-pocket payments whenever they suffer from an illness or have pre-existing disease conditions. On the other hand, insurance companies are advised to charge insurance premium rates that are affordable by many people to guarantee universal health care coverage. Thus, there is a strong need for insurance
companies to develop models that accurately predict medical expenses for the insured
population. This study used demographic and behavioral data to formulate a predictive model to determine health insurance charges using Machine learning algorithms techniques. Additionally, the study evaluated the performance of five machine learning models in predictive analysis; Knearest Neighbors (KNN), Least Absolute Shrinkage and Selection Operator (LASSO), Multiple Liner Regression (MLR), eXtreme Gradient Boosting (XGboosting), and Random Forest Regression (RFR).
Multiple linear regression tests found that the following variables were significant; age (p =
0.000), BMI (p = 0.001), smoking (p = 0.000) and region (0.046). Therefore, these attributes can be said to be the determinants of health insurance charges. The model performance evaluation findings XGboosting and RFR were the best models in prediction with the following values R2 = 0. 855, MAE = 2688.2, RMSE = 4748.7 and R2 = 0.853, MAE = 2726.4, RMSE = 4783.8 respectively. Insurance companies that seek to develop a model for prediction premiums are recommended to use XGboosting and RFR for a more accurate model.