Abstract:
Diarrhea disease is a worldwide burden since it is accounted as the second leading cause of death in children aged less than five and this is in line with the report of the Ministry of Health in Rwanda that identified childhood diarrhea as the second cause of morbidity in all health facilities in the period of June 2019 to June 2020. This research aimed to develop the best model to predict the occurrence of diarrhea disease among under-five children with machine learning techniques considering the socio-demographic variables and meteorological variables from RDHS 2014-2015.The target variable was dichotomous with class 0 of children with no diarrhea and class 1 representing children with diarrhea. Among all 7474 children considered in the study, only 905 (12%) experienced diarrhea episodes two weeks before the survey. Bivariate analysis has been performed where residence, age group, wealth index, type of toilet facility, main material floor, duration of breastfeeding, rotavirus vaccine and maternal education are associated with the
childhood diarrhea status and the annual precipitation was found to be statistically significant. Six classifications algorithms including random forest, logistic regression, naïve Bayes, support vector machine, neural network, and gradient boosting were trained to find out the efficient model to predict diarrhea disease status among under-five children and Gradient boosting classifier was the best model with 86.3% of accuracy and this model identifies correctly 91.7% of children with diarrhea disease and can discriminate almost perfectly children with diarrhea and children without it. Feature importance test was performed to obtain relevant predictors that influenced the model to predict diarrhea disease status and high precipitation, children aged 12 to 24 months, household
with earth and sand as main material floor, households with unimproved toilets, and children from poor households were identified as the most contributing predictors to predict diarrhea disease among children.
This model was valuable to identify accurately a vulnerable group of children at risk and it can be used at health facilities level and by community health workers to detect earlier the likelihood of diarrhea among children and set preventive measures to hinder diarrhea which could lead to severe diarrhea and dehydration, and this can lessen the morbidity and the number of hospital admissions due to diarrhea.