dc.description.abstract |
Training complex deep neural networks can result in overfitting when the networks are
trained from random weight initialization on small datasets. Data augmentation helps
to reduce the negative effects of overfitting. Data augmentation is the process by which
the amount of data for a given problem is increased in quantity via some augmentation
technique. The findings in computer vision and audio recognition research reveals that
the performance of machine learning classifiers is significantly improved when the data
is augmented.
In the context of ecology, researchers conduct field surveys whereby microphones are
placed in some location and audio data is recorded over a period of time. There is
however no guarantee that the particular species of interest in the field survey will
vocalize frequently near the microphone. Thus, the amount of data captured for the
species of interest might be limited. Training robust classifier models on such limited
data will most likely lead to overfitting.
The purpose of this research is to investigate several audio augmentation techniques
as a means to increase the amount of audio examples for certain species of interest
with the goal of creating robust audio vocalization classifier models. We investigate
noise injection and time and frequency masking data augmentation techniques. These
techniques are applied to two birds of interest, namely the pin-tailed whydah (Vidua
macroura) and the Cape robin-chat (Cossypha caffra). While these two species are not
endangered, they allow us to compare the various augmentation techniques. The audio
recordings were obtained from the Intaka Island Nature Reserve, South Africa.
To evaluate the performance of the augmentation techniques we conducted a com parison between experiments run with and without augmentation. We chose to use
convolutional neural networks as our classifier given that they are the state-of-the-art
in audio recognition tasks. Furthermore, convolutional neural networks have revealed
good performance in the field of bioacoustics.
We manually annotated 768 audio files (20 minutes each) totaling over 256 hours of audio |
en_US |