OpenAI Whisper: Robust Speech Recognition via Large-Scale Weak Supervision | Paper and Code

  Рет қаралды 37,936

Aleksa Gordić - The AI Epiphany

Aleksa Gordić - The AI Epiphany

Күн бұрын

Пікірлер: 59
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Let me know whether the code part helped! :) Is it adding any value for you guys? Or am I just rambling and it's too hard to follow unless you play with the code yourself? Would really appreciate some feedback!
@xl0xl0xl0
@xl0xl0xl0 2 жыл бұрын
It definitely did! Is the debugger your first choice when it comes to figuring out how some new codebase works, or did you fire it up for the occasion as a demonstration tool?
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
@@xl0xl0xl0 thanks!! I have a whole series where I do just that. And as for your question - it depends. If I am playing with something on my own then yes, always! By far the best way to understand every single detail of your code
@Erosis
@Erosis 2 жыл бұрын
@@TheAIEpiphany I missed that series! I actually struggle with debugging ML code with vscode, so I'll check it out!
@leobeeson1
@leobeeson1 2 жыл бұрын
This code walkthrough has made this paper walkthrough one of the best I've seen. Thanks for that, and please keep doing it!
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
@@leobeeson1 wow nice, thanks for telling me that! If I get more feedback I might keep doing this in every paper walk through!
@mariatrofimova5512
@mariatrofimova5512 Жыл бұрын
Thanks for walking through Whisper code together, enjoyed the journey!
@pratikkhedikar6759
@pratikkhedikar6759 2 жыл бұрын
mmmaaannnnn !! What a good video. I was like searching for something like this. Where in even a noob like me can understand the entire paper because you took through it step by step! I knew this was going to be a great video when you stopped to explain log-mel spectrum as well! Thanks Aleksa
@devhau5
@devhau5 2 жыл бұрын
I just found this channel and I’m SO THANKFUL for a great walkthrough and explanation. It’s super fun. This is gold!!! Thanks Aleksa!
@alexgil55ka
@alexgil55ka 2 жыл бұрын
This is super cool man! Thanks for diving deep into it
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Thanks Alex!
@DevasheeshMishra
@DevasheeshMishra Жыл бұрын
Loved it.. need more such videos
@DevasheeshMishra
@DevasheeshMishra 6 ай бұрын
Rewatching the stream
@huonglarne
@huonglarne Жыл бұрын
Thank you so much for doing these videos. You helped me so so so so much.
@FreeSubtitlesAI
@FreeSubtitlesAI Жыл бұрын
Very informative and authoritative, thank you!
@Spockleblupit
@Spockleblupit 2 жыл бұрын
Thanks Aleksa! Really appreciate the effort you put into this videos. Quality content, keep it up.
@convolutionalnn2582
@convolutionalnn2582 2 жыл бұрын
Sir,I have read your roadmap to Reinforcement Learning...I wanna do research in RL...1)Should i still follow your roadmap ? 2) Do i need to know the whole maths derivation behind Supervised Unsupervised and Deep Learning Algorithm 3) How can i start doing research in RL in undergraduate in an non research institute?
@pocco8388
@pocco8388 2 жыл бұрын
Thanks for making this great video!
@CHENXIN-pn7oh
@CHENXIN-pn7oh Жыл бұрын
So well explained! Thx!
@nnpy
@nnpy 2 жыл бұрын
Great video!!
@petercowling6769
@petercowling6769 2 жыл бұрын
Welsh an outlier. Never would have guessed. Anyway, gotta go, heading out to Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch this afternoon.
@ludvigjoborn7937
@ludvigjoborn7937 2 жыл бұрын
One can find in the paper that this is because a lot of Welsh was misclassified as English during the data collection process. Imagine them finding out.
@curryeater259
@curryeater259 2 жыл бұрын
You are amazing sir!
@kerenstarobinski8564
@kerenstarobinski8564 2 жыл бұрын
is it possible to find the timestamps of each transcribed word? Great work!
@MyHowHowHow
@MyHowHowHow 2 жыл бұрын
Not in OpenAI's version but a fork of it has this feature. It is called WhisperCpp
@JF-vt4ve
@JF-vt4ve 2 жыл бұрын
impressive work!
@amilia4174
@amilia4174 Жыл бұрын
I have watched your video and it was great! But I'm not sure whether the translation and transcription tasks share the same decode parameters.
@ChuanChihChou
@ChuanChihChou Жыл бұрын
I wonder if we can use the attention map (of how much each audio token contributes to the prediction of each transcript token) to back out timestamps instead?
@MattUebel
@MattUebel 2 жыл бұрын
This is great, ty!
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
🚀
@vinayakbaddi29
@vinayakbaddi29 Жыл бұрын
@Aleksa Gordic, Thanks for sharing this valuable information. Apart from AI would look to see how you are using VS code so effectively to move between the code and debug it. Would really appreciate it if you could provide more information on the same on video.
@goryeodynasti3025
@goryeodynasti3025 Жыл бұрын
@TheAIEpiphany , how do you see the effect of "best_of" parameter in the quality of the transcription? Any insight would be helpful. Thanks
@asceznyk
@asceznyk Жыл бұрын
Hi Aleksa! Great video! I just wanted to know what would the loss function be for the models? Would it be something like cross-entropy? Because the model predicts tokens..
@sibadattasasmal4866
@sibadattasasmal4866 Жыл бұрын
can you put little emphasis on how the time stamps are generated for transcription
@iradaaristova8698
@iradaaristova8698 11 ай бұрын
can you make a video how decoder works?
@chandanabandaru7386
@chandanabandaru7386 2 ай бұрын
hey i am using whisper model for speech to text convert but i am getting tensor error for large audio files (30 min to 1 hr). can anybody help me ? i am testing with postman
@MoiezIzmail
@MoiezIzmail 5 ай бұрын
it's not chinese, it's korean(script is called han-gul). Thanks for the tutorial!
@xXMaDGaMeR
@xXMaDGaMeR Жыл бұрын
can the model be ram locally? how much computing to run this model for inference
@kshitizkhandelwal9348
@kshitizkhandelwal9348 2 жыл бұрын
Can someone explain how are embeddings learnt?
@lavkushdas5529
@lavkushdas5529 Жыл бұрын
hey can you provide your source code that u have written in vscode?
@FinnBrownc
@FinnBrownc 2 жыл бұрын
Would be helpful if you could put these models in history a bit. I’m not as familiar with how things were done in the past vs. today SOTA.
@tahercoolguy123
@tahercoolguy123 2 жыл бұрын
Hey really nice video. Can we fine tune whisper model for our dataset. If yes can you show us how
@phongtranhung5635
@phongtranhung5635 2 жыл бұрын
Hi. I've found you channel and the videos are totally all mind blowing. I have a question regarding Whisper. Currently I want to return a list of all transcribed words probability. I think that I have something to do with the def update inside Decoding.py. Can you make some help on how to do it? I would be very appreciated!
@huonglarne
@huonglarne Жыл бұрын
You can modify the update function to return the logprobs of all words. The max of that logprobs is the selected token's probability.
@marearts.
@marearts. Жыл бұрын
1:30 yes it's korean
@dimorischinyui1875
@dimorischinyui1875 2 жыл бұрын
Hey guys please can anyone help me with this issue. I am trying to run whisper on my machine and I am getting this error in cmd. UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead"). I use a windows 10 with gpu RTX2060. Also it seems it runs on my cpu instead of NVIDIA GPU. I created a python virtual environment and pip installed whisper in that virtual environment just for more details.
@MyHowHowHow
@MyHowHowHow 2 жыл бұрын
Try the parameter --device cuda
@HarishPentapalli
@HarishPentapalli 2 жыл бұрын
Any guesses on the name of company B?
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Hah, hard to infer without repeating the research
@BRICKSINSILK
@BRICKSINSILK Ай бұрын
yes all well and good. but make it not hold onto a timestamp for ten years and then halucinate
@SuperMan-rw6iz
@SuperMan-rw6iz Жыл бұрын
That's not even Mandarin... it's Korean BTW 😅
@FinnBrownc
@FinnBrownc 2 жыл бұрын
Would be helpful if you could put these models in history a bit. I’m not as familiar with how things were done in the past vs. today SOTA.
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Thanks for the feedback, you mostly care about transformers here. "Attention is all you need" paper
Fine-tuning Whisper to learn my Chinese dialect (Teochew)
28:10
Efficient NLP
Рет қаралды 8 М.
Ultimate Guide To Scaling ML Models - Megatron-LM | ZeRO | DeepSpeed | Mixed Precision
1:22:58
Aleksa Gordić - The AI Epiphany
Рет қаралды 26 М.
Why no RONALDO?! 🤔⚽️
00:28
Celine Dept
Рет қаралды 91 МЛН
Молодой боец приземлил легенду!
01:02
МИНУС БАЛЛ
Рет қаралды 2,2 МЛН
Long Nails 💅🏻 #shorts
00:50
Mr DegrEE
Рет қаралды 17 МЛН
Из какого города смотришь? 😃
00:34
МЯТНАЯ ФАНТА
Рет қаралды 2,7 МЛН
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
OpenAI Whisper - Fine tune to Lithuanian | step-by-step with Python
16:43
Data Science Garage
Рет қаралды 10 М.
How does Groq LPU work? (w/ Head of Silicon Igor Arsovski!)
1:11:46
Aleksa Gordić - The AI Epiphany
Рет қаралды 20 М.
Fine tuning Whisper for Speech Transcription
49:26
Trelis Research
Рет қаралды 26 М.
Lucas Beyer (Google DeepMind) - Convergence of Vision & Language
55:08
Aleksa Gordić - The AI Epiphany
Рет қаралды 5 М.
End-to-End Adversarial Text-to-Speech (Paper Explained)
40:49
Yannic Kilcher
Рет қаралды 14 М.
OpenAI's Swarm - a GAME CHANGER for AI Agents
20:48
Cole Medin
Рет қаралды 47 М.
BigScience BLOOM | 3D Parallelism Explained | Large Language Models | ML Coding Series
1:12:00
Aleksa Gordić - The AI Epiphany
Рет қаралды 6 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 3,8 МЛН
Why no RONALDO?! 🤔⚽️
00:28
Celine Dept
Рет қаралды 91 МЛН