Abstract:
The first case of HIV was identified back in 1920 in Congo and since then it has claimed
over 32 million lives. Around 62% of new HIV infections occur among key populations
and their sexual partners, including men who have sex with men (MSM), Female Sex
Workers (FSWs), People Who Inject Drugs (PWID) and people in prison, despite them
constituting a very small proportion of the general population (Data, 2019) and mostly
because this population is made of all group of people that practice sexual risk
behaviours which include inconsistent use of condoms, having multiple sexual partners,
and paid sex in addition to early sex initiation. In Rwanda, HIV prevalence accounts
for 3% among general population, 45.8% among female sex workers and 4.4% among
men who have sex with men. This study aimed at building a model that on predicts new
HIV infections among individuals with sexual risk behaviours by using the algorithms
of machine learning. The study used 3 categories of variables (dependent or response
variable, risk factors as independent variables, and demographic factors as independent
variables as well). Data used were from the RPHIA dataset 2018-2019. Among 30,709
respondents, 29,775 (99.97%) were HIV negative and only 934 (0.03%) were HIV
positive. Three machine learning classification algorithms namely logistic regression,
gradient boost, and random tree forest were trained to find out the model that best
predicts new HIV infections among individuals who practice sexual risk behaviours the
random tree forest was found to be the best model with an accuracy of 71.15%,
precision of 61.2%, recall of 84.5%, and F1-score of 70.9 at 0.35 threshold. obtained
and predicted values were 261 true negatives, 163 false positives, 47 false negatives,
and 257 true positives. Using random tree forest, it was observed that it minimizes the
false negatives, increases true positives, recall and F1-score and the area under curve
was 0.75. Feature importance was performed to determine the risk factors that influence
new HIV infections occurrence among individual’s wo practice sexual risk behaviours
and among social demographic variables, being in the age group of 15-24, being
widowed or single, and having primary level of education were found to be factors that
influence the HIV infection. While not having used condoms during last sexual
intercourse, having debuted sex at an early age (under 20), and having multiple sexual
partners (>1) were revealed to be risk behaviours that highly influenced the model that
predicts HIV infections. This model will be essential for health public practitioners
especially those who are most involved in HIV programs to design new programs about
HIV prevention and transmission methods with emphasis on improving safe sex vii negotiation skills and put more effort on educating young and adolescent children using
the nationally approved ASRHR curriculum.