University of Rwanda Digital Repository

Pre-training neural networks on Xeno-canto and Ebird for bioacoustic classification models

Show simple item record

dc.contributor.author Mikwa, Boris Tamanjong
dc.date.accessioned 2023-05-11T08:54:44Z
dc.date.available 2023-05-11T08:54:44Z
dc.date.issued 2022
dc.identifier.uri http://hdl.handle.net/123456789/1862
dc.description.abstract Both traditional machine learning algorithms (linear discriminant analysis, support vector machine, decision tree, to name a few) and deep learning algorithms such as Convolutional Neural Network (CNN), Long ShortTerm Memory (LSTM), and Recurrent Neural Network (RNN) have been used in bioacoustics research in general and bird species identification in partic ular. However, often there is a limitation of data in bioacoustic research, including bird vocalizations. Training a deep neural network with such a small amount of data most often leads to overfitting. Many researchers have used various techniques, for instance, data augmentation and transfer learning to surpass this problem, but no research has yet been conducted on pre-training neural networks on public repositories which contain bird vocalizations, such as Xeno-canto and eBird for bioacoustic classification models. In this dissertation, we pre-trained CNNs for bioacoustic classifi cation models using two public bird vocalization repositories (Xeno-canto and eBird) and fine-tuned them on locally collected bird audio record ings; audio recordings obtained from Intaka Island Nature Reserve, Cape Town, South Africa. First, we used bird audio vocalizations from the pub lic repositories to pre-train three CNN models using different sample sizes. We pre-trained the three CNN models using 9000, 12000, and 15000 spec trograms (obtained by converting the audio using Fourier Transforms). Next, we trained five baseline models using different sample sizes (the en tire training set, 6150, 9000, 12000, 16000, and 21000 spectrograms) from the collected data. Then, we used the same sample sizes as those employed in training the baseline models to fine-tune the pre-trained models. We used the baseline models as reference models to evaluate the performances vii viii Keywords: Data augmentation; Bioacoustics; Deep learning; Pre-training. Of the fine-tuned models. The best baseline model had a test accuracy of 91.70%, and the best-fine-tuned model achieved 91.73%. The AUC for the best baseline was 96.9% against 96.3% for the best-fine-tuned model. Three findings were observed. Firstly, the performance of the model improved when increasing the size of the training data, and secondly, the performance also improved when using the time-shift augmentation technique. Finally, the results revealed that the baseline models outperformed the fine-tuned model. The reason why the baseline models outperformed the fine-tuned model might have been because the data used in pre-training was not large enough, and a combination of CNN and RNN could produce better results. Using much larger data to pre-train the model might also improve the performance of the fine-tuned models. Despite the results, the research is the first attempt at pre-training models on publicly available bird vocalizations data that has not been investigated in the existing literature. en_US
dc.language.iso en en_US
dc.publisher University of Rwanda en_US
dc.subject Data augmentation; Bioacoustics; Deep learning; Pre-training. en_US
dc.title Pre-training neural networks on Xeno-canto and Ebird for bioacoustic classification models en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Repository


Browse

My Account