Any to English AI Video Subtitle Captioning App with OpenAI Whisper App Full Tutorial

Рет қаралды 14,942

Күн бұрын

This is a multilingual Video Caption Generator or Video Subtitle Generator with Subtitle Embedded on Original Video using ASR and Speech-to-Text. This is a completely free tool and a great Free Open Source Alternative for Descript with Python, OpenAI Whisper and Gradio App.
This video has 3 main parts:
1. Building a Video Caption Generator with OpenAI Whisper
2. Wrapping the Video Caption Generator as Web App with Gradio
3. Deploying the Video Subtitle Generator on Hugging Face Spaces for Free
Code - github.com/amr...
Deployed App - huggingface.co...
It can handle multiple languages including low resources Indian Languages like Tamil, Hindi, Malayalam, Telugu to English.
Automatic Captions with OpenAI Whisper Full Tutorial

Пікірлер: 103

@lynic-0091 Жыл бұрын

Oh man, finally I don' t have to google for subtitles anymore, thanks a lot!

@1littlecoder Жыл бұрын

Glad it's helpful!

@shahidhussain4730 Жыл бұрын

for this library --> from whisper.utils import write_vtt .........the : ImportError: cannot import name 'write_vtt' from 'whisper.utils' (/usr/local/lib/python3.8/dist-packages/whisper/utils.py) please sir help me....

@swarnaislam5916 6 ай бұрын

did you solve the problem? I am having the same issue.

@prkssngowthamdora Жыл бұрын

ImportError: cannot import name 'write_vtt' from 'whisper.utils',hi bro can you help me with this error .

@vaibhavtawhare7029 Жыл бұрын

Value error: rate must be spedified when data is a numpy array or list of audio... When I run from IPython.display import audio Audio(audio_file) How to eliminate that?

@TheHimanshuRastogi Жыл бұрын

Your videos are too got but Every single time i tried running your code I got an Error, this time it's saying "cannot import name 'write_vtt' from 'whisper.utils' ", now what to do?

@1littlecoder Жыл бұрын

I think they removed that file. I need to find a way to fix it. Thanks for highlighting

@vishnuvallabhyathavakilla2693 Жыл бұрын

ImportError: cannot import name 'write_vtt' from 'whisper.utils' facing this error pls help.

@RetropunkAI Жыл бұрын

yes, this is failing on this cell. @1littlecoder

@tomasmolas Жыл бұрын

Excelent work and explication. Thanks.

@1littlecoder Жыл бұрын

Thank you. I'm glad you liked it

@RetropunkAI Жыл бұрын

wow! This is amazing! Thank you so much for sharing.

@1littlecoder Жыл бұрын

Glad you enjoyed it!

@chaneyvfx5883 Жыл бұрын

@1littlecoder Great stuff as usual. Is there any way to get the segments tighter? Like per word subtitle?

@suryakiranhalder9500 Жыл бұрын

Hi, I'm a professional subtitler, I have two primary questions: 1) Can I download the vtt subtitle files directly from this? 2) Can longer videos (feature length films) be uploaded? I'm alarmed by the rise of AI to translate from local languages, and wish to incorporate AI to my workflow so that I don't become obsolete. Waiting for a reply. Thanks in advance.

@1littlecoder Жыл бұрын

Hey Answer to both the questions are yes. 1. You can do it right now 2. It requires more GPU memory

@redaouhjjou Жыл бұрын

thank you so much for sharing this expensive informations

@1littlecoder Жыл бұрын

Glad it was helpful!

@danielisaacguerrerovelazqu4418 Жыл бұрын

Hello friend, I am grateful that you took the time to make this code, I do not know how to change the translation from English to Spanish, I hope you can help me with that

@1littlecoder Жыл бұрын

Thanks Daniel. OpenAi whisper can only translate from any language to English. Not some other language. So if you want for Spanish you'd need to use a different model

@danielisaacguerrerovelazqu4418 Жыл бұрын

Ohh i see thanks anyway

@narutocole 2 жыл бұрын

Hey thanks so much for sharing this! Any chance it'll work on longer videos? Like 10 to 20 minutes?

@1littlecoder 2 жыл бұрын

I've not tested, it should as OpenAI said they're chunking the audio for longer audio, but it'll take a lot of time.

@atharvavaidya9245 Жыл бұрын

It works perfectly with longer audio. I've transcribed the entirety of GTA 3's Chatterbox FM with it, which is 1 hour long. Although if you're planning on transcribing long audio, I recommend you run it from the command line (whisper -options file.ext), that way you get a live transcript for every 30 seconds of audio, so you can check if it's going well. It also creates a .vtt and .txt file when it's done.

@manamejeff2087 Жыл бұрын

/usr/local/lib/python3.8/dist-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead. warnings.warn(value) i am getting the above error when i run it on google collab i tired to chnage type to numbers but its still not working how do i go about it??

@1littlecoder Жыл бұрын

That's strange, can you share the colab link??

@rishumehta745 Жыл бұрын

as i am working on curl...it doesnt give me speaker label even though i am passing speaker diarization = true parameter in url, any idea where m i making mistake or point out me the documentation?

@ujjawalsharma8363 Жыл бұрын

Hi, if i want to create a web app for this, Like user will upload the audio file and it will give him the transcript. How can i run this python code and where will I have to run it ???? Pls help 🥺

@jacquesalomo5024 Жыл бұрын

Hi @1littlecoder, thanks for sharing. I tried to get the Google Colab version to work but I get an error msg at cell 20. The error is several lines long and ends with "ValueError: rate must be specified when data is a numpy array or list of audio samples." do you have any idea what could be the solution?

@1littlecoder Жыл бұрын

Emailed you please check

@scott-larsen Жыл бұрын

For those running into this issue, I was able to get around it (it needed the frame rate) by doing a `! pip install pydub` up top and then changing the IPython.display.Audio cell to the following. Thanks for an amazing tutorial, @1littecoder! from IPython.display import Audio from pydub.utils import mediainfo audio_file_info = mediainfo(audio_file) Audio(audio_file, rate=audio_file_info['sample_rate'])

@jacquesalomo5024 Жыл бұрын

@@1littlecoder Thanks! I Wrote you back.

@surabhikhandare Жыл бұрын

I am also facing same issue, tried every possible code but not working. Can you share with me the details. Thanks in advance.

@alihusham1560 Жыл бұрын

I won't to do something like that but, I want to get a list of words with their time stamps for each word? ho can I do that?

@meworlds8216 Жыл бұрын

Thanks man amazing video! Keep up the good work! Collab link is not in the description

@1littlecoder Жыл бұрын

github.com/amrrs/subtitle-embedded-video-generator Thanks man, I updated the repo with open in Colab button, Please try.

@kchicken1563 Жыл бұрын

hey inside whisper.utils there is no write_vtt present .....It is giving errors.

@fredsakay994 4 ай бұрын

Could you please create web-service with one of these transcribers-translators please?

@vishnuvallabhyathavakilla2693 Жыл бұрын

I am trying to run this as a .py file in visual studio code . will it actually work?

@Asha-td7bm Жыл бұрын

Amazing Thank you

@1littlecoder Жыл бұрын

You're welcome

@prashantagarwal3339 Жыл бұрын

Hey @1littlecoder thank you for sharing this.The application deployed on huggingfaces seems to give some error "Connection errored out".Any idea what could be the problem here?

@1littlecoder Жыл бұрын

On HF spaces, it runs on CPU that could be the reason that their server is busy. Could you try the Google colab version from GitHub?

@jacquesalomo5024 Жыл бұрын

I had the same error. Tried it today in the morning (Central European Time (CET)) and worked great.

@alignstudio Жыл бұрын

hi is it possbile to save caption file in .srt format?

@alexjutt413 Жыл бұрын

Great tutorial! Is there any way to adjust the position of the subtitles (move them from the bottom to the center of the video, for instance)?

@vishnuvallabhyathavakilla2693 Жыл бұрын

Hi I am trying to run this code as .py files. Facing this error text_encoder\model.fp16.safetensors not found

@vishnuvallabhyathavakilla2693 Жыл бұрын

Could anybody help

@1littlecoder Жыл бұрын

Did you download the model and is it in the right place?

@vishnuvallabhyathavakilla2693 Жыл бұрын

@@1littlecoder yes I did

@miguelmflowers 2 жыл бұрын

I'd like the Google Colab version, huggingface tends to give plenty of errors for processing, and I'm trying with a 1m video and it fails after a couple of seconds.

@1littlecoder Жыл бұрын

Here's the colab version github.com/amrrs/subtitle-embedded-video-generator

@miguelmflowers Жыл бұрын

@@1littlecoder Ohhh I was blind, I just realized inside the GitHub it was the Google Colab button! Thank you bro!

@1littlecoder Жыл бұрын

@@miguelmflowers no you weren't I just added it ;)

@miguelmflowers Жыл бұрын

@@1littlecoder I was trying it now, and I'm not sure in the part of uploading the video, in: input_video = 'tamil_shorts.mp4' That's where I have to paste the Path of the video, right? After that, the audio is detected, I can play it, and then the printing is all what Whisper detected. Can't we edit one of the words if the detection wasn't so accurate? Also, between step 34 to 37, there's no way to edit the timing of the subtitles, so once the video is displayed, I saw the subtitles weren't there, but when playing the video, all the subs appeared at once in the middle of the video (at 30 seconds), and they didn't move or appeared as they were supposed to appear, they were all just written there in the whole video. When receiving a Gradio public link, I received one that used Stable Diffusion, but not the one observed in the video, lol, I don't know what happened there, maybe the code was calling SD instead of Whisper, because then I got the one that worked with Whisper inside the Colab (and outside too). But then again, it happened to me the same error, all the subtitle text was just pasted from the second 30, and didn't disappear until the end of the video.

@musicspinner Жыл бұрын

Can it (whisper) detect+tag distinct speakers?

@1littlecoder Жыл бұрын

Out of box, right now I don't think so but I think when we build a spectogram we can do something about it

@kino2406 Жыл бұрын

Excelent Thanks! , Can I translate from English to another language?

@1littlecoder Жыл бұрын

I think right now, it's any to English. if you want to do English to other then you'd need to have a staged pipeline like ASR speech to English, then English language to your language translation

@ParvathyKapoor Жыл бұрын

Do u have local install link?

@1littlecoder Жыл бұрын

You can basically download the code and run it locally, I don't have a 1-click setup for the same!

@ParvathyKapoor Жыл бұрын

@@1littlecoder u mean i can install with anaconda? or via Git?

@1littlecoder Жыл бұрын

Technically you can download the google colab as a ipynb and run it on your local computer in the jupyter notebook after installing the requirements txt

@kareemmongy9333 Жыл бұрын

Good

@1littlecoder Жыл бұрын

Thank you Kareem

@ready4data Жыл бұрын

What would I add to create a text file in Colab of the translated audio?

@1littlecoder Жыл бұрын

the project in this should help with that kzbin.info/www/bejne/pnuWp6GLfZWCoac

@GoutamReddydazz Жыл бұрын

Thanks man

@1littlecoder Жыл бұрын

I put a lot of hours in making this project. Glad you found it useful

@GoutamReddydazz Жыл бұрын

@@1littlecoder do you have any plans to build own platform as services for such things?

@MohamedAshraf-zs6nv Жыл бұрын

thanks man♥

@1littlecoder Жыл бұрын

You're welcome!

@HSBTechYT Жыл бұрын

Trying to use your HF deployment, but I always get this error "Connection errored out. "

@HSBTechYT Жыл бұрын

Vide size is 15mb

@1littlecoder Жыл бұрын

Strangely I just used it for my KZbin shorts 13 seconds video. Took 160 seconds for conversion.

@HSBTechYT Жыл бұрын

@@1littlecoder Yeah weird. Saw your tweet before commenting

@HSBTechYT Жыл бұрын

Running the code in collab now. Let's see

@1littlecoder Жыл бұрын

Did it work for you?

@metanulski Жыл бұрын

I am confused. You start with a google colab, but there is no link to and google colab in the description :-(

@metanulski Жыл бұрын

I find you video extremly confusing. I did found you colab, and tryed to follow you explanation but i dont understand it at all. At one point we are at the line "input_video = 'tamil_shorts.mp4'". Where does this video cone from? how do I replace it with the video I like to translate?

@1littlecoder Жыл бұрын

@@metanulski that's the name of the mp4 file you uploaded to colab

@1littlecoder Жыл бұрын

@@metanulski also if you have difficulties in using the colab. You can use the Gradio app given at the last.

@metanulski Жыл бұрын

@@1littlecoder My point it, I don't have this video, and you also never explain where to upload it. I did rename a video and uploaded it to the main directory, and it worked, but It would be helpful if you explain is in the video. So that guys without experience can follow.

@1littlecoder Жыл бұрын

@@metanulski it is exactly explained at 13:00