Hi everyone, I am looking for something like CLIP in speech area, which could measure the distance between text and speech (Mel-spectrum).
I found speech-CLIP before but unfortunately, its input for speech is raw wave rather than Mel-spectrum (same with HuBERT). I would be so appreciate if you can provide some information about that!
wjldw12138 t1_j9ni2gq wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hi everyone, I am looking for something like CLIP in speech area, which could measure the distance between text and speech (Mel-spectrum).
I found speech-CLIP before but unfortunately, its input for speech is raw wave rather than Mel-spectrum (same with HuBERT). I would be so appreciate if you can provide some information about that!