shadow_fax1024 t1_j1pxmnn wrote on December 26, 2022 at 1:29 PM

I used plain cnn with and without attention ..I had to handle long audio files in training as well as inference

Helveticus99 OP t1_j1qjdxl wrote on December 26, 2022 at 4:34 PM

Thank you u/shadow_fax1024. How did you handle audio files with different length? And how did you handle the long audio files exactly? I think creating a Mel-Spectrograms over long audio files won't work.

shadow_fax1024 t1_j1scqqd wrote on December 27, 2022 at 12:43 AM

You could split the file into chunk of n seconds ..n seconds you need to find ..which ever fits for your dataset..for mine 4 sec chunk was good enough...also you could use a peak detector first and then chunk the file n/2 seconds either side from the peak and have some overlapping window there too..so that you won't loose information ..