Note that spectrograms are NOT images though; the elements values can be negative but for images it can’t
Having said that, I work in the audio domain and have applied a computer vision transformer, the Shifted window Swin one, to the domain of audio, in particular the spectrograms extracted from the raw waveform
Ok_Construction470 t1_ixlephi wrote
Reply to [D] Transfer Learning of Image Trained Network in Audio Domain by Oceanboi
Note that spectrograms are NOT images though; the elements values can be negative but for images it can’t
Having said that, I work in the audio domain and have applied a computer vision transformer, the Shifted window Swin one, to the domain of audio, in particular the spectrograms extracted from the raw waveform
This was the OG paper https://arxiv.org/abs/2202.00874
They used the pretrained model too