Ok_Construction470

Ok_Construction470 t1_ixlephi wrote

Note that spectrograms are NOT images though; the elements values can be negative but for images it can’t

Having said that, I work in the audio domain and have applied a computer vision transformer, the Shifted window Swin one, to the domain of audio, in particular the spectrograms extracted from the raw waveform

This was the OG paper https://arxiv.org/abs/2202.00874

They used the pretrained model too

2