University of Rwanda Digital Repository

Tree-based and Logistic Regression Models for Business Success Prediction in Rwanda

Show simple item record

dc.contributor.author Kipkogei, Francis
dc.date.accessioned 2021-11-26T11:51:50Z
dc.date.available 2021-11-26T11:51:50Z
dc.date.issued 2021
dc.identifier.uri http://hdl.handle.net/123456789/1450
dc.description Master's Dissertation en_US
dc.description.abstract Background: Businesses have been touted to contribute immensely to economic health of most countries. Many enterprises are started every year, among these, some tend to be successful while others are unsuccessful. Studies are instrumental in giving a glimpse of the geographical locations outlook on the factors affecting success of business, data that relates to the whole nation and focusing on more determinants may give more insights on the challenges and better give a prediction of success of an enterprise given the factors. This study used Rwanda Revenue Authority data to identify important variables that contribute to business success in Rwanda. Tree-based models were compared with logistic regression for prediction of business success. The most robust model was used for business success prediction. Methods: Statistical learning models consisting of tree-based models and logistic regression were trained and evaluated using a dataset obtained from Rwanda Revenue Authority over a sample of 18,162 businesses in Rwanda. Metrics such as recall score, F1 score precision score and accuracy were used in evaluating the performance of each model in differentiating between successful and failed business. Further discriminant analysis such ROC AUC was used to compare and evaluate the discrimination power of machine learning models. Results: Tree-based ensemble models such as gradient boosting, XGBoost, and random forest were among the top classifiers which showed high predicted sensitivity and specificity. Gradient Boosting particularly correctly identified over 93% of business success. On the other hand, the lowest performing model was logistic regression with a recall score of 90% and F1 score of 90.6% on average. Sector was found to be most important feature contributing to business success. Conclusion: Evidence from this study suggests that tree-based models can be utilized within the current care model to essentially produce greater prediction accuracy in the prediction of business success. This study further suggested a need to segment sector to identify other classes within the sector of economy that could contribute to success of business. en_US
dc.language.iso en en_US
dc.publisher University of Rwanda en_US
dc.subject Business success, unsuccessful, tree-based, models, logistic regression en_US
dc.title Tree-based and Logistic Regression Models for Business Success Prediction in Rwanda en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Browse

My Account