Comments

You must log in or register to comment.

Oceanboi t1_j8l880s wrote

Oooooo buddy. You’re in for a ride. Check out some PyTorch documentation. There’s plenty related to audio classification

1

Oceanboi t1_j8l8tst wrote

I’m guessing your company won’t have the resources or data to train a CNN to convergence from scratch, so read up on some common CNNs that people use for audio transfer learning (EfficientNet has worked well for me, as did ResNet50, albeit less so). Once you can implement one pre trained model, you can implement most of them fairly easily to see which one suits your task best. Also read up on Sharan et al 2019 and 2021 as he benchmarks numerous image representations, model architectures, and network fusion techniques. While results may very, empirically it is a great starting point although I was not able to achieve his results given his model architecture. Pay less attention to the actual architecture he talks about because you’ll most likely be doing transfer learning where you’ll be importing a model and it’s weights. For preprocessing look into either MATLAB for their Auditory Modeling toolbox and if you’re using python look into librosa, torchaudio, and brian2hears for more complex filterbank models.

1

Nerveregenerator t1_j99gc4f wrote

You just use mfcc and then it’s just like image detection

1