Oceanboi OP t1_ixpeeaf wrote on November 25, 2022 at 6:36 AM

Reply to comment by NLP_doofus in [D] Transfer Learning of Image Trained Network in Audio Domain by Oceanboi

The problem is that I am testing different data representations of audio, so the pre processing is what I want to experiment with.

NLP_doofus t1_ixpms9z wrote on November 25, 2022 at 8:31 AM

Ah I missed that point, sorry. So you want to start from scratch for all models? Otherwise it seems like there would be confounding variables that you're testing (e.g., pretrained data set size). I've worked with some of the models I mentioned and I think if you're just changing input/output shapes it shouldn't matter when starting from scratch. Unless there is something fundamental about losing the time axis in speech from your data representations, because these models are autoregressive or masked modeling approaches for representation learning.

Oceanboi OP t1_ixtiolt wrote on November 26, 2022 at 5:44 AM

Not from scratch for all, I simply want to take a base model or a set of base model architectures and compare how different audio representations (cochleagram, and other cochlear models) perform in terms of accuracy/model performance. That’s what got me to look into transfer learning and hence the question! I need some constant set of models to use for my comparisons.