Any to English AI Video Subtitle Captioning App with OpenAI Whisper App Full Tutorial

  Рет қаралды 14,942

1littlecoder

1littlecoder

Күн бұрын

This is a multilingual Video Caption Generator or Video Subtitle Generator with Subtitle Embedded on Original Video using ASR and Speech-to-Text. This is a completely free tool and a great Free Open Source Alternative for Descript with Python, OpenAI Whisper and Gradio App.
This video has 3 main parts:
1. Building a Video Caption Generator with OpenAI Whisper
2. Wrapping the Video Caption Generator as Web App with Gradio
3. Deploying the Video Subtitle Generator on Hugging Face Spaces for Free
Code - github.com/amr...
Deployed App - huggingface.co...
It can handle multiple languages including low resources Indian Languages like Tamil, Hindi, Malayalam, Telugu to English.
Automatic Captions with OpenAI Whisper Full Tutorial

Пікірлер: 103
@lynic-0091
@lynic-0091 Жыл бұрын
Oh man, finally I don' t have to google for subtitles anymore, thanks a lot!
@1littlecoder
@1littlecoder Жыл бұрын
Glad it's helpful!
@shahidhussain4730
@shahidhussain4730 Жыл бұрын
for this library --> from whisper.utils import write_vtt .........the : ImportError: cannot import name 'write_vtt' from 'whisper.utils' (/usr/local/lib/python3.8/dist-packages/whisper/utils.py) please sir help me....
@swarnaislam5916
@swarnaislam5916 6 ай бұрын
did you solve the problem? I am having the same issue.
@prkssngowthamdora
@prkssngowthamdora Жыл бұрын
ImportError: cannot import name 'write_vtt' from 'whisper.utils',hi bro can you help me with this error .
@vaibhavtawhare7029
@vaibhavtawhare7029 Жыл бұрын
Value error: rate must be spedified when data is a numpy array or list of audio... When I run from IPython.display import audio Audio(audio_file) How to eliminate that?
@TheHimanshuRastogi
@TheHimanshuRastogi Жыл бұрын
Your videos are too got but Every single time i tried running your code I got an Error, this time it's saying "cannot import name 'write_vtt' from 'whisper.utils' ", now what to do?
@1littlecoder
@1littlecoder Жыл бұрын
I think they removed that file. I need to find a way to fix it. Thanks for highlighting
@vishnuvallabhyathavakilla2693
@vishnuvallabhyathavakilla2693 Жыл бұрын
ImportError: cannot import name 'write_vtt' from 'whisper.utils' facing this error pls help.
@RetropunkAI
@RetropunkAI Жыл бұрын
yes, this is failing on this cell. @1littlecoder
@tomasmolas
@tomasmolas Жыл бұрын
Excelent work and explication. Thanks.
@1littlecoder
@1littlecoder Жыл бұрын
Thank you. I'm glad you liked it
@RetropunkAI
@RetropunkAI Жыл бұрын
wow! This is amazing! Thank you so much for sharing.
@1littlecoder
@1littlecoder Жыл бұрын
Glad you enjoyed it!
@chaneyvfx5883
@chaneyvfx5883 Жыл бұрын
@1littlecoder Great stuff as usual. Is there any way to get the segments tighter? Like per word subtitle?
@suryakiranhalder9500
@suryakiranhalder9500 Жыл бұрын
Hi, I'm a professional subtitler, I have two primary questions: 1) Can I download the vtt subtitle files directly from this? 2) Can longer videos (feature length films) be uploaded? I'm alarmed by the rise of AI to translate from local languages, and wish to incorporate AI to my workflow so that I don't become obsolete. Waiting for a reply. Thanks in advance.
@1littlecoder
@1littlecoder Жыл бұрын
Hey Answer to both the questions are yes. 1. You can do it right now 2. It requires more GPU memory
@redaouhjjou
@redaouhjjou Жыл бұрын
thank you so much for sharing this expensive informations
@1littlecoder
@1littlecoder Жыл бұрын
Glad it was helpful!
@danielisaacguerrerovelazqu4418
@danielisaacguerrerovelazqu4418 Жыл бұрын
Hello friend, I am grateful that you took the time to make this code, I do not know how to change the translation from English to Spanish, I hope you can help me with that
@1littlecoder
@1littlecoder Жыл бұрын
Thanks Daniel. OpenAi whisper can only translate from any language to English. Not some other language. So if you want for Spanish you'd need to use a different model
@danielisaacguerrerovelazqu4418
@danielisaacguerrerovelazqu4418 Жыл бұрын
Ohh i see thanks anyway
@narutocole
@narutocole 2 жыл бұрын
Hey thanks so much for sharing this! Any chance it'll work on longer videos? Like 10 to 20 minutes?
@1littlecoder
@1littlecoder 2 жыл бұрын
I've not tested, it should as OpenAI said they're chunking the audio for longer audio, but it'll take a lot of time.
@atharvavaidya9245
@atharvavaidya9245 Жыл бұрын
It works perfectly with longer audio. I've transcribed the entirety of GTA 3's Chatterbox FM with it, which is 1 hour long. Although if you're planning on transcribing long audio, I recommend you run it from the command line (whisper -options file.ext), that way you get a live transcript for every 30 seconds of audio, so you can check if it's going well. It also creates a .vtt and .txt file when it's done.
@manamejeff2087
@manamejeff2087 Жыл бұрын
/usr/local/lib/python3.8/dist-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead. warnings.warn(value) i am getting the above error when i run it on google collab i tired to chnage type to numbers but its still not working how do i go about it??
@1littlecoder
@1littlecoder Жыл бұрын
That's strange, can you share the colab link??
@rishumehta745
@rishumehta745 Жыл бұрын
as i am working on curl...it doesnt give me speaker label even though i am passing speaker diarization = true parameter in url, any idea where m i making mistake or point out me the documentation?
@ujjawalsharma8363
@ujjawalsharma8363 Жыл бұрын
Hi, if i want to create a web app for this, Like user will upload the audio file and it will give him the transcript. How can i run this python code and where will I have to run it ???? Pls help 🥺
@jacquesalomo5024
@jacquesalomo5024 Жыл бұрын
Hi @1littlecoder, thanks for sharing. I tried to get the Google Colab version to work but I get an error msg at cell 20. The error is several lines long and ends with "ValueError: rate must be specified when data is a numpy array or list of audio samples." do you have any idea what could be the solution?
@1littlecoder
@1littlecoder Жыл бұрын
Emailed you please check
@scott-larsen
@scott-larsen Жыл бұрын
For those running into this issue, I was able to get around it (it needed the frame rate) by doing a `! pip install pydub` up top and then changing the IPython.display.Audio cell to the following. Thanks for an amazing tutorial, @1littecoder! from IPython.display import Audio from pydub.utils import mediainfo audio_file_info = mediainfo(audio_file) Audio(audio_file, rate=audio_file_info['sample_rate'])
@jacquesalomo5024
@jacquesalomo5024 Жыл бұрын
@@1littlecoder Thanks! I Wrote you back.
@surabhikhandare
@surabhikhandare Жыл бұрын
I am also facing same issue, tried every possible code but not working. Can you share with me the details. Thanks in advance.
@alihusham1560
@alihusham1560 Жыл бұрын
I won't to do something like that but, I want to get a list of words with their time stamps for each word? ho can I do that?
@meworlds8216
@meworlds8216 Жыл бұрын
Thanks man amazing video! Keep up the good work! Collab link is not in the description
@1littlecoder
@1littlecoder Жыл бұрын
github.com/amrrs/subtitle-embedded-video-generator Thanks man, I updated the repo with open in Colab button, Please try.
@kchicken1563
@kchicken1563 Жыл бұрын
hey inside whisper.utils there is no write_vtt present .....It is giving errors.
@fredsakay994
@fredsakay994 4 ай бұрын
Could you please create web-service with one of these transcribers-translators please?
@vishnuvallabhyathavakilla2693
@vishnuvallabhyathavakilla2693 Жыл бұрын
I am trying to run this as a .py file in visual studio code . will it actually work?
@Asha-td7bm
@Asha-td7bm Жыл бұрын
Amazing Thank you
@1littlecoder
@1littlecoder Жыл бұрын
You're welcome
@prashantagarwal3339
@prashantagarwal3339 Жыл бұрын
Hey @1littlecoder thank you for sharing this.The application deployed on huggingfaces seems to give some error "Connection errored out".Any idea what could be the problem here?
@1littlecoder
@1littlecoder Жыл бұрын
On HF spaces, it runs on CPU that could be the reason that their server is busy. Could you try the Google colab version from GitHub?
@jacquesalomo5024
@jacquesalomo5024 Жыл бұрын
I had the same error. Tried it today in the morning (Central European Time (CET)) and worked great.
@alignstudio
@alignstudio Жыл бұрын
hi is it possbile to save caption file in .srt format?
@alexjutt413
@alexjutt413 Жыл бұрын
Great tutorial! Is there any way to adjust the position of the subtitles (move them from the bottom to the center of the video, for instance)?
@vishnuvallabhyathavakilla2693
@vishnuvallabhyathavakilla2693 Жыл бұрын
Hi I am trying to run this code as .py files. Facing this error text_encoder\model.fp16.safetensors not found
@vishnuvallabhyathavakilla2693
@vishnuvallabhyathavakilla2693 Жыл бұрын
Could anybody help
@1littlecoder
@1littlecoder Жыл бұрын
Did you download the model and is it in the right place?
@vishnuvallabhyathavakilla2693
@vishnuvallabhyathavakilla2693 Жыл бұрын
@@1littlecoder yes I did
@miguelmflowers
@miguelmflowers 2 жыл бұрын
I'd like the Google Colab version, huggingface tends to give plenty of errors for processing, and I'm trying with a 1m video and it fails after a couple of seconds.
@1littlecoder
@1littlecoder Жыл бұрын
Here's the colab version github.com/amrrs/subtitle-embedded-video-generator
@miguelmflowers
@miguelmflowers Жыл бұрын
@@1littlecoder Ohhh I was blind, I just realized inside the GitHub it was the Google Colab button! Thank you bro!
@1littlecoder
@1littlecoder Жыл бұрын
@@miguelmflowers no you weren't I just added it ;)
@miguelmflowers
@miguelmflowers Жыл бұрын
​@@1littlecoder I was trying it now, and I'm not sure in the part of uploading the video, in: input_video = 'tamil_shorts.mp4' That's where I have to paste the Path of the video, right? After that, the audio is detected, I can play it, and then the printing is all what Whisper detected. Can't we edit one of the words if the detection wasn't so accurate? Also, between step 34 to 37, there's no way to edit the timing of the subtitles, so once the video is displayed, I saw the subtitles weren't there, but when playing the video, all the subs appeared at once in the middle of the video (at 30 seconds), and they didn't move or appeared as they were supposed to appear, they were all just written there in the whole video. When receiving a Gradio public link, I received one that used Stable Diffusion, but not the one observed in the video, lol, I don't know what happened there, maybe the code was calling SD instead of Whisper, because then I got the one that worked with Whisper inside the Colab (and outside too). But then again, it happened to me the same error, all the subtitle text was just pasted from the second 30, and didn't disappear until the end of the video.
@musicspinner
@musicspinner Жыл бұрын
Can it (whisper) detect+tag distinct speakers?
@1littlecoder
@1littlecoder Жыл бұрын
Out of box, right now I don't think so but I think when we build a spectogram we can do something about it
@kino2406
@kino2406 Жыл бұрын
Excelent Thanks! , Can I translate from English to another language?
@1littlecoder
@1littlecoder Жыл бұрын
I think right now, it's any to English. if you want to do English to other then you'd need to have a staged pipeline like ASR speech to English, then English language to your language translation
@ParvathyKapoor
@ParvathyKapoor Жыл бұрын
Do u have local install link?
@1littlecoder
@1littlecoder Жыл бұрын
You can basically download the code and run it locally, I don't have a 1-click setup for the same!
@ParvathyKapoor
@ParvathyKapoor Жыл бұрын
@@1littlecoder u mean i can install with anaconda? or via Git?
@1littlecoder
@1littlecoder Жыл бұрын
Technically you can download the google colab as a ipynb and run it on your local computer in the jupyter notebook after installing the requirements txt
@kareemmongy9333
@kareemmongy9333 Жыл бұрын
Good
@1littlecoder
@1littlecoder Жыл бұрын
Thank you Kareem
@ready4data
@ready4data Жыл бұрын
What would I add to create a text file in Colab of the translated audio?
@1littlecoder
@1littlecoder Жыл бұрын
the project in this should help with that kzbin.info/www/bejne/pnuWp6GLfZWCoac
@GoutamReddydazz
@GoutamReddydazz Жыл бұрын
Thanks man
@1littlecoder
@1littlecoder Жыл бұрын
I put a lot of hours in making this project. Glad you found it useful
@GoutamReddydazz
@GoutamReddydazz Жыл бұрын
@@1littlecoder do you have any plans to build own platform as services for such things?
@MohamedAshraf-zs6nv
@MohamedAshraf-zs6nv Жыл бұрын
thanks man♥
@1littlecoder
@1littlecoder Жыл бұрын
You're welcome!
@HSBTechYT
@HSBTechYT Жыл бұрын
Trying to use your HF deployment, but I always get this error "Connection errored out. "
@HSBTechYT
@HSBTechYT Жыл бұрын
Vide size is 15mb
@1littlecoder
@1littlecoder Жыл бұрын
Strangely I just used it for my KZbin shorts 13 seconds video. Took 160 seconds for conversion.
@HSBTechYT
@HSBTechYT Жыл бұрын
@@1littlecoder Yeah weird. Saw your tweet before commenting
@HSBTechYT
@HSBTechYT Жыл бұрын
Running the code in collab now. Let's see
@1littlecoder
@1littlecoder Жыл бұрын
Did it work for you?
@metanulski
@metanulski Жыл бұрын
I am confused. You start with a google colab, but there is no link to and google colab in the description :-(
@metanulski
@metanulski Жыл бұрын
I find you video extremly confusing. I did found you colab, and tryed to follow you explanation but i dont understand it at all. At one point we are at the line "input_video = 'tamil_shorts.mp4'". Where does this video cone from? how do I replace it with the video I like to translate?
@1littlecoder
@1littlecoder Жыл бұрын
@@metanulski that's the name of the mp4 file you uploaded to colab
@1littlecoder
@1littlecoder Жыл бұрын
@@metanulski also if you have difficulties in using the colab. You can use the Gradio app given at the last.
@metanulski
@metanulski Жыл бұрын
@@1littlecoder My point it, I don't have this video, and you also never explain where to upload it. I did rename a video and uploaded it to the main directory, and it worked, but It would be helpful if you explain is in the video. So that guys without experience can follow.
@1littlecoder
@1littlecoder Жыл бұрын
@@metanulski it is exactly explained at 13:00
@crangesmcbasketball367
@crangesmcbasketball367 Жыл бұрын
Hey im a fan of Japanese Tv, can I upload a Japanese show I like and get it translated to english then?
@1littlecoder
@1littlecoder Жыл бұрын
Ideally it should work, Did you try? The video length might be an issue
@RustuYucel
@RustuYucel Жыл бұрын
@@1littlecoder what should be video/audio format or length ideally? Wonderful tool by the way. Thnx
@1littlecoder
@1littlecoder Жыл бұрын
@@RustuYucel ideally less than 30 seconds is good. That's why I built this with KZbin shorts in mind. Longer would work but there'll be some catch
@nitinrai97
@nitinrai97 10 ай бұрын
cannot import name 'write_vtt' from 'whisper.utils' (/usr/local/lib/python3.10/dist-packages/whisper/utils.py) this error showing please help
@EM-nr9hj
@EM-nr9hj 6 ай бұрын
Same issue.
@EM-nr9hj
@EM-nr9hj 6 ай бұрын
Do you get it?
No Time to LISTEN? AI Podcast Transcript!!!
17:49
1littlecoder
Рет қаралды 6 М.
小丑在游泳池做什么#short #angel #clown
00:13
Super Beauty team
Рет қаралды 32 МЛН
GTA 5 vs GTA San Andreas Doctors🥼🚑
00:57
Xzit Thamer
Рет қаралды 27 МЛН
At the end of the video, deadpool did this #harleyquinn #deadpool3 #wolverin #shorts
00:15
Anastasyia Prichinina. Actress. Cosplayer.
Рет қаралды 17 МЛН
Best FREE Speech to Text AI - Whisper AI
8:22
Kevin Stratvert
Рет қаралды 968 М.
Make an Offline GPT Voice Assistant in Python
24:29
JakeEh
Рет қаралды 13 М.
How to Install & Use Whisper AI Voice to Text
12:44
Kevin Stratvert
Рет қаралды 472 М.
Think Fast, Talk Smart: Communication Techniques
58:20
Stanford Graduate School of Business
Рет қаралды 40 МЛН
Creating Jarvis powered by OpenAI and Python | ChatGPT
18:40
CS Coach
Рет қаралды 900 М.
Building a Voice to Text App USING AI! [OpenAI Whisper]
18:12
Boris Meinardus
Рет қаралды 8 М.
小丑在游泳池做什么#short #angel #clown
00:13
Super Beauty team
Рет қаралды 32 МЛН