Submitted by CeFurkan t3_10r9biu in MachineLearning

Currently I am using Davinci Resolve free edition to manually cut / remove no speech parts, or the parts where I take a breath

It is extremely time consuming

I am pretty sure this can be done via AI

For example whisper is able to detect where we use filler words such as umh, um, uh etc

That would be awesome to automatically remove these parts from a video

Just direct me where to look thank you

2

Comments

You must log in or register to comment.

Agreeable_Dog6536 t1_j6uuq7r wrote

He's asking the opposite - remove the bits with no speech.

I used to do more or less this same thing manually, years ago, for a corporate vlog in which people drove around all day fixing pipe leaks and occasionally commented on what they'd done - they wanted the clips where they commented, edited together.

I basically just looked at the audio waveform and figured out where I should probably cut, and then listened to it to narrow it down.

If someone hasn't already trained an AI for this, they should.

1

Ok_Dependent1131 t1_j6uv7o5 wrote

The company that makes snagit has software that does it... but not free...

1

txhwind t1_j6v41wn wrote

Try speech recognition model with timeline alignment output, then cut parts not aligned to words or aligned to filler words.

1

doctorjuice t1_j6vdwpb wrote

There are ways to robustly remove all silent spaces and breaths but filler words is less robust. Would you still find that useful?

1