Submitted by Batuhan_Y t3_y2cpjc in MachineLearning

All you have to do is input a YouTube video link and get a video with subtitles (alongside with .txt, .vtt, .srt files).

Whisper can translate 98 different languages to English. If you want to give it a try;

Link of the app: https://huggingface.co/spaces/BatuhanYilmaz/Auto-Subtitled-Video-Generator

​

https://reddit.com/link/y2cpjc/video/oiac58arcft91/player

80

Comments

You must log in or register to comment.

masterspeler t1_is3h020 wrote

This would be really useful as a Reddit bot to translate Reddit videos, like the Ukranian ones. The bot could probably rehost a subtitled video on its own subreddit and leave a link in the comments to the original, along with the transcribed text.

11

Batuhan_Y OP t1_is3hm53 wrote

That's a nice idea. I've seen a twitter bot doing that. I don't know how to create a reddit or twitter bot atm. I have free time so I can work on it.

If anyone wants to collaborate, I'd be more than happy.

8

Sea_Wonder_6414 t1_is4b9gd wrote

Where does using streamlit come into hugging space.. like how do you integrate both?

2

Batuhan_Y OP t1_is4oggi wrote

You need to build the interface and select the streamlit option when creating a huggingface space. It takes care of the rest.

3

Brief-Mongoose-6256 t1_isdr7pw wrote

Hello,
Can I use this on a non-YouTube link (for example, a video uploaded on my own web server)?

2

Batuhan_Y OP t1_ise9id4 wrote

Yes you can but you need to edit the code in order to do that.

1

Brief-Mongoose-6256 t1_isefnm3 wrote

Thanks. Would be great if the app has an option to add other links

1

Batuhan_Y OP t1_islo00q wrote

It varies from site to site. I’m thinking of adding a file uploader.

1

cheecheepong t1_is32h30 wrote

Running into this issue:

https://imgur.com/a/sbhX9Xg

I tried running this video on it: https://www.youtube.com/watch?v=Mwt35SEeR9w

1

Batuhan_Y OP t1_is389wu wrote

The video is age restricted, I assume pytube can't reach that video, so there is no input to the model.

Can you try with a different video?

3

cheecheepong t1_is38h3m wrote

Great news! Different error this time:

https://www.youtube.com/watch?v=PlUvLBRwLbw

RuntimeError: The size of tensor a (316) must match the size of tensor b (3) at non-singleton dimension 3Traceback:File "/home/user/.local/lib/python3.8/site-packages/streamlit/scriptrunner/script_runner.py", line 554, in _run_scriptexec(code, module.__dict__)File "/home/user/app/app.py", line 258, in <module>main()File "/home/user/app/app.py", line 138, in mainresults = inference(link, loaded_model, task)File "/home/user/.local/lib/python3.8/site-packages/streamlit/legacy_caching/caching.py", line 573, in wrapped_funcreturn get_or_create_cached_value()File "/home/user/.local/lib/python3.8/site-packages/streamlit/legacy_caching/caching.py", line 557, in get_or_create_cached_valuereturn_value = func(*args, **kwargs)File "/home/user/app/app.py", line 81, in inferenceresults = loaded_model.transcribe(path, **options)File "/home/user/.local/lib/python3.8/site-packages/whisper/transcribe.py", line 181, in transcriberesult: DecodingResult = decode_with_fallback(segment)File "/home/user/.local/lib/python3.8/site-packages/whisper/transcribe.py", line 117, in decode_with_fallbackdecode_result = model.decode(segment, options)File "/home/user/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_contextreturn func(*args, **kwargs)File "/home/user/.local/lib/python3.8/site-packages/whisper/decoding.py", line 701, in decoderesult = DecodingTask(model, options).run(mel)File "/home/user/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_contextreturn func(*args, **kwargs)File "/home/user/.local/lib/python3.8/site-packages/whisper/decoding.py", line 633, in runtokens, sum_logprobs, no_speech_probs = self._main_loop(audio_features, tokens)File "/home/user/.local/lib/python3.8/site-packages/whisper/decoding.py", line 588, in _main_looplogits = self.inference.logits(tokens, audio_features)File "/home/user/.local/lib/python3.8/site-packages/whisper/decoding.py", line 145, in logitsreturn self.model.decoder(tokens, audio_features, kv_cache=self.kv_cache)File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_implreturn forward_call(*input, **kwargs)File "/home/user/.local/lib/python3.8/site-packages/whisper/model.py", line 189, in forwardx = block(x, xa, mask=self.mask, kv_cache=kv_cache)File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_implreturn forward_call(*input, **kwargs)File "/home/user/.local/lib/python3.8/site-packages/whisper/model.py", line 124, in forwardx = x + self.attn(self.attn_ln(x), mask=mask, kv_cache=kv_cache)File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_implreturn forward_call(*input, **kwargs)File "/home/user/.local/lib/python3.8/site-packages/whisper/model.py", line 85, in forwardwv = self.qkv_attention(q, k, v, mask)File "/home/user/.local/lib/python3.8/site-packages/whisper/model.py", line 97, in qkv_attentionqk = qk + mask[:n_ctx, :n_ctx]

2

Batuhan_Y OP t1_is39n5l wrote

Ran into same error now. Changing the model size worked somehow. I'm not sure what the problem is.

I tried it on my localhost many times. I think it might be related to huggingface spaces.

4

cheecheepong t1_is3cxb4 wrote

>Ran into same error now. Changing the model size worked somehow. I'm not sure what the problem is.
I tried it on my localhost many times. I think it might be related to huggingface spaces.

Interesting. What model size did you end up using? Did it work eventually on the huggingface spaces?

1

Batuhan_Y OP t1_is3d8cc wrote

Yes it worked on spaces. I've tried tiny, base and small models. Tried with 3 to 5 min. long videos.

2

cheecheepong t1_is3dq0j wrote

Definitely needs more development for robustness but otherwise a great start! I just got it working on a 1:30 video.

1

Batuhan_Y OP t1_is3dxsv wrote

You're right. Also needs more processing power. I'm still working on the app, trying to improve.

Thank you for your interest.

1

inglandation t1_is5ym5t wrote

Did you manage to solve this error? I'm getting it too. My video is unlisted and lasts 10:41 minutes.

I tried switching to the large model but it takes forever, it's still running.

1

Batuhan_Y OP t1_is60gfd wrote

HuggingFace gives 16GB RAM and 8 CPU cores for the spaces, when the app exceeds 16GB it crashes. In order to avoid that kind of situation I need deploy this model to one of the cloud providers with multiple GPU's. I can't afford that kind of heavy processing model at the moment. But I rebuild the hosted space so it works until it exceeds 16GB again :D

Small model works fine most of the time you can try using it.

I put it up on HuggingFace spaces for demonstration purposes.

1

inglandation t1_is61d3u wrote

Okay, thanks! I'd run it locally but it looks like it would be a bit much for my computer.

1

Batuhan_Y OP t1_is627j7 wrote

You can try on hf spaces, its up and running now. If you face an error keep clicking on Transcribe/Translate button (it worked for me on hf spaces :D, no errors on localhost).

1

inglandation t1_is68ong wrote

Okay, thanks! Very useful app btw. I'd be nice if I could somehow replace the autogenerated YouTube subtitles by those. They're much better.

1

Batuhan_Y OP t1_is6guu8 wrote

Do you mean it would be nice to edit the transcript and re-generate the video with that? If so, I actually tried to implement it 2 days ago, but couldn't make it work, I'll be working on that.

1

Nisekoi_ t1_iu1jyvm wrote

I already have a transcript, can i only sync to video without creating transcript from scratch?

1

Batuhan_Y OP t1_iu3ysed wrote

No but I can add a page where you can upload a video and a transcript as .srt file and generate a video with subtitles.

1

Nisekoi_ t1_iu8agjx wrote

can Whisper AI do that, take our own transcription and sync it to audio as srt or vtt?

1