Pre-training neural networks on Xeno-canto and Ebird for bioacoustic classification models

Mikwa, Boris Tamanjong

Pre-training neural networks on Xeno-canto and Ebird for bioacoustic classification models

Mikwa, Boris Tamanjong

URI: http://hdl.handle.net/123456789/1862

Date: 2022

Abstract:

Both traditional machine learning algorithms (linear discriminant analysis, support vector machine, decision tree, to name a few) and deep learning algorithms such as Convolutional Neural Network (CNN), Long ShortTerm Memory (LSTM), and Recurrent Neural Network (RNN) have been used in bioacoustics research in general and bird species identification in partic ular. However, often there is a limitation of data in bioacoustic research, including bird vocalizations. Training a deep neural network with such a small amount of data most often leads to overfitting. Many researchers have used various techniques, for instance, data augmentation and transfer learning to surpass this problem, but no research has yet been conducted on pre-training neural networks on public repositories which contain bird vocalizations, such as Xeno-canto and eBird for bioacoustic classification models. In this dissertation, we pre-trained CNNs for bioacoustic classifi cation models using two public bird vocalization repositories (Xeno-canto and eBird) and fine-tuned them on locally collected bird audio record ings; audio recordings obtained from Intaka Island Nature Reserve, Cape Town, South Africa. First, we used bird audio vocalizations from the pub lic repositories to pre-train three CNN models using different sample sizes. We pre-trained the three CNN models using 9000, 12000, and 15000 spec trograms (obtained by converting the audio using Fourier Transforms). Next, we trained five baseline models using different sample sizes (the en tire training set, 6150, 9000, 12000, 16000, and 21000 spectrograms) from the collected data. Then, we used the same sample sizes as those employed in training the baseline models to fine-tune the pre-trained models. We used the baseline models as reference models to evaluate the performances vii viii Keywords: Data augmentation; Bioacoustics; Deep learning; Pre-training. Of the fine-tuned models. The best baseline model had a test accuracy of 91.70%, and the best-fine-tuned model achieved 91.73%. The AUC for the best baseline was 96.9% against 96.3% for the best-fine-tuned model. Three findings were observed. Firstly, the performance of the model improved when increasing the size of the training data, and secondly, the performance also improved when using the time-shift augmentation technique. Finally, the results revealed that the baseline models outperformed the fine-tuned model. The reason why the baseline models outperformed the fine-tuned model might have been because the data used in pre-training was not large enough, and a combination of CNN and RNN could produce better results. Using much larger data to pre-train the model might also improve the performance of the fine-tuned models. Despite the results, the research is the first attempt at pre-training models on publicly available bird vocalizations data that has not been investigated in the existing literature.

Show full item record

Files in this item

Name: Mikwa Boris Taman ...

Size: 5.854Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

College of Business and Economics
Book Chapters from the College of Business and Economics

Search Repository

Browse

All of Repository
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Pre-training neural networks on Xeno-canto and Ebird for bioacoustic classification models

Pre-training neural networks on Xeno-canto and Ebird for bioacoustic classification models

Abstract:

Files in this item

This item appears in the following Collection(s)

Search Repository

Browse

All of Repository

This Collection

My Account