Submitted by PlayfulMenu1395 t3_10qfbwg in MachineLearning
Can someone suggest a machine learning model that will segment audio spectrogram to multiple classes. I have labeled data of heart beats. S1, S2, systole and diastole. How to train a segmentation model ?
Eresbonitaguey t1_j6qic3n wrote
Possibly not the ideal solution but I would suggest taking sections of the spectrogram as images (perhaps with overlap) and feeding that into a multi-label classifier. If you’re after a bounding box then the upper and lower bounds should be apparent based on the location of your classes within the spectrogram i.e. sound intensity occurs at similar frequency. If transfer learning from a general image model I would advise against using false colour to generate the three channels and instead would generate different types of spectrograms (Reassignment method/Multi-tapered/etc.) Due to the nature of spectrograms you don’t really want scale invariance so segmentation models that use feature pyramids can be problematic. I found decent success using Compact Convolutional Transformers but that may not be what you need for your task.