Submitted by Oceanboi t3_z30bf2 in MachineLearning
I see a lot of image models (ImageNet, ResNet, etc) that are being used for transfer learning in the audio classification domain. I only see one audio specific model that many people use for audio: YAMNet.
I would think taking a network trained on a specific visual domain and repurposing its classifier head to solve an audio problem using cochleagrams or spectrograms would be inappropriate, given the edges and shapes found in say, a flower, mean nothing when comparing patterns cross spectral visual representation of audio.
I would also thinking taking ResNet and training the entire model (all parameters in the convolutional base AND the classifier head) would simply be starting from a nonsensical point in terms of saved weights, and you may be better off starting from scratch.
Am I missing something about transfer learning here? Or am I spot on in thinking its a bit inappropriate given the domain problems are different?
My project is to compare different cochlear models (filters, such as DNLR, Gammachirp, Gammatone, etc) in Brian2Hears (python library) as inputs to a CNN. I need to identify a good model or set of model architectures that I can use as my baseline to compare performances. YAMNet unfortunately takes the raw audio as an input, and converts it to spectrogram as part of the model training loop (I think), so it would not be usable in its final format for my experiment.
asdfzzz2 t1_ixjh3ph wrote
> Am I missing something about transfer learning here?
Theoretical answer: Both images and spectrograms have continuous curved lines as a signal, and therefore some transfer should happen.
Practical answer: If it works, it works.