Viewing a single comment thread. View all comments

shadow_fax1024 t1_j1pxmnn wrote

I used plain cnn with and without attention ..I had to handle long audio files in training as well as inference

1

Helveticus99 OP t1_j1qjdxl wrote

Thank you u/shadow_fax1024. How did you handle audio files with different length? And how did you handle the long audio files exactly? I think creating a Mel-Spectrograms over long audio files won't work.

1

shadow_fax1024 t1_j1scqqd wrote

You could split the file into chunk of n seconds ..n seconds you need to find ..which ever fits for your dataset..for mine 4 sec chunk was good enough...also you could use a peak detector first and then chunk the file n/2 seconds either side from the peak and have some overlapping window there too..so that you won't loose information ..

1