Can someone suggest a machine learning model that will segment audio spectrogram to multiple classes. I have labeled data of heart beats. S1, S2, systole and diastole. How to train a segmentation model ?

Comments

You must log in or register to comment.

Eresbonitaguey t1_j6qic3n wrote on February 1, 2023 at 4:14 AM

Possibly not the ideal solution but I would suggest taking sections of the spectrogram as images (perhaps with overlap) and feeding that into a multi-label classifier. If you’re after a bounding box then the upper and lower bounds should be apparent based on the location of your classes within the spectrogram i.e. sound intensity occurs at similar frequency. If transfer learning from a general image model I would advise against using false colour to generate the three channels and instead would generate different types of spectrograms (Reassignment method/Multi-tapered/etc.) Due to the nature of spectrograms you don’t really want scale invariance so segmentation models that use feature pyramids can be problematic. I found decent success using Compact Convolutional Transformers but that may not be what you need for your task.

jiamengial t1_j6sj3l2 wrote on February 1, 2023 at 4:19 PM

Using something like a CTC loss might be a good shout - you could basically say you're doing "speech recognition", but instead of recognising (sub)words you're recognising classes

uhules t1_j6sp5wk wrote on February 1, 2023 at 4:57 PM

CTC is better suited for unaligned sequences, if OP has precise timings for the sound events, plain frame-wise classification should work better.

jiamengial t1_j6t854s wrote on February 1, 2023 at 6:52 PM

That's true, was thinking that flat frame-wise predictions could lead to incorrect mid-segment predictions, which might be an annoying model error to get

uhules t1_j6spu3f wrote on February 1, 2023 at 5:01 PM

What kind of model would work in this case is heavily dependent on data availability and the quality of your annotation. Check these datasets from Papers With Code and see whether any one of those is similar enough to your setting, and pick models or code from their leaderboards.

1bir t1_j6s0ito wrote on February 1, 2023 at 2:16 PM

Possible solution:

train minirocket/hydra, which were designed for time series classification, on the labelled dataset (probably as four one-vs-many problems, eg s1 vs the rest, s2 vs the rest etc)
you'll get sets of 1D convolutional kernels; these can be convolved with time series of any length
only one of these should 'fire' strongly for each different heartbeat phase, so you should get univariate signals for each phase
convolve these kernel sets with your unsegmented data
segment the data based on the strongest signal corresponding to the relevant phase of the heartbeat.

You may need to apply some transformations to the signals to get this to work well though (eg softmax &/ smoothing, or some kind of changepoint detection, which I don't know much about).

True-Measurement-358 t1_j6tur8x wrote on February 1, 2023 at 9:11 PM

Depending on the requirements of your use case, you could also consider using a statistical model for change point detection, like this example: https://centre-borelli.github.io/ruptures-docs/examples/music-segmentation/