Abstract:
The last decade has marked a rapid and significant growth of digital technology globally to
drive our society. One of the key solutions that are being adopted by institutions to get aligned with this trend is the use of question answering systems and chatbots to automate some of the services that their users might need. Some of the automated services are questions asked by the users. The biggest challenge lies in the classification of questions and answers the way a human being would do. This research aims to identify advanced implementations that can be used to optimize the usage of question answering systems. Three different models have been built in python and trained on Kaggle labeled dataset of classified questions and answers from various perspectives, to better understand the questions and their respective answers. The labeled features are 21 questions related features and 9 answer features each ranked in a range of 0 and 1. The algorithms attempted to use in the models are Ridge Regression, Recurrent Neural Network using Long Short Term Memory, and a Neural Network using Keras library. Ridge regression obtained a maximum validation accuracy of 0.37, Recurrent Neural Network with
Long Short Term Memory had an accurary 0.40, and Keras with Neural Network performed better on our training dataset with a validation accuracy of 0.58. Better models should be applied to text data for text classification such as BERT models and also consider using more features on the training model to classify better. Additionally, focusing on fewer perceptions but meaningful while choosing labeled features to boost the accuracy of the model.