dc.description.abstract |
The issuing of loans is one method the bank conveys itself, which has been shown to raise credit default risk in the past. Despite many bank measures to undertake pre-assessment of loan applications, the case remains, providing a warning flag for the bank to produce a rapid, cost-effective, and optimal approach to reduce and perhaps combat credit risk on loan defaulting before banks experience large losses.
The main aim of this study is to develop machine learning models to predict personal loan default and analyze the performance of various models to identify the borrower's features in an early prediction of personal loan default.
In this study, three classical machine learning algorithms K-Nearest Neighbors, Gradient Boosting, and Random Forest were trained on a Historical credit dataset of 5012 observations obtained from a Commercial bank in Tanzania. The dataset was imbalanced and the use of data imbalance techniques namely SMOTE was applied to give us more insight into the classified datasets and reduced erroneous in the conclusion. This dataset was divided into training and test sets respectively with optimized parameters for each of the algorithms. The performance comparison for each model was done by AUC
and plotting the ROC Curve, the performance evaluation of classifiers was done to find out the accuracy, recall, precision, ad F1-Score for each classifier in classifying the different types of loan defaults.
The finding showed the features for early prediction of borrowers’ loan default were monthly income, total loan amount, and age. The three models RF, KNN, and GB were developed and Random Forest performed well with an accuracy of 84 percent, recall of 85 percent, the precision of 82 percent, F1 Score of 84 percent, and AUC ROC of 91 percent in predicting either loan defaulting or not.
Although, the study managed to implement the high-performance model further studies should be conducted particularly with the use of deep learning or other machine learning models, the involvement high dimensional dataset from many banks in Tanzania. |
en_US |