bklawa

bklawa t1_ir0wlyp wrote

Some ideas:

  • Down sample the audio to lower sample rate (if it is 48Khz, perhaps try 8Khz). This really depends on the task (music, speech, other general audio recordings...).

  • You don't need to feed the whole spectrogram of 30 min to the model for classification. A alternative would be to reduce the time axis by applying the mean or max for example, at the end you will end up with a very small vector. Otherwise you can also do it over splits of 1 mins segments to try keeping more information. But this will definitely help reducing the model size.

  • You can clip the portions of the audio track that are "silent" or under a certain energy threshold before applying the steps above.

Hope this helps

12