SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper / Python

  Рет қаралды 95,476

All About AI

All About AI

Күн бұрын

Пікірлер: 126
@bim-techs
@bim-techs 8 ай бұрын
Tips: You can transform your device's audio output into a "microphone" on Windows, so you don't need to place your headphones over your microphone. 1. Press Windows key + R -> type "mmsys.cpl" 2. In the Recording tab, enable the Stereo Mix option. Now, "Stereo Mix" is an available microphone option! You can select it as the audio input.
@weekendmakeit7760
@weekendmakeit7760 8 ай бұрын
this really helped me! Thank you!
@aoeu256
@aoeu256 7 ай бұрын
this a grewt idea, i was using voice meeter as a virtual audio thingy and its complicated to use
@OliNorwell
@OliNorwell 7 ай бұрын
Epic! - These videos are some of the best stuff on KZbin - love the idea with the image generation at the end
@theraybae
@theraybae 8 ай бұрын
This is amazing and inspiring. I love the ending of the video and can’t wait for Wednesday. As a dyslexic person I think you unlocked a new use case for learning.
@MultiBigkush
@MultiBigkush 5 ай бұрын
Code: import os import time import wave import pyaudio from faster_whisper import WhisperModel # Определяем константы NEON_GREEN = '\033[32m' RESET_COLOR = '\033[0m' os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE" # Функция для записи аудио-фрагмента def record_chunk(p, stream, file_path, chunk_length=1): """ Записывает аудиофрагмент в файл. Args: p (pyaudio.PyAudio): Объект PyAudio. stream (pyaudio.Stream): Поток PyAudio. file_path (str): Путь к файлу, куда будет записан аудиофрагмент. chunk_length (int): Длина аудиофрагмента в секундах. Returns: None """ frames = [] for _ in range(0, int(16000 / 1024 * chunk_length)): data = stream.read(1024) frames.append(data) wf = wave.open(file_path, 'wb') wf.setnchannels(1) wf.setsampwidth(p.get_sample_size(pyaudio.paInt16)) wf.setframerate(16000) wf.writeframes(b''.join(frames)) wf.close() def transcribe_chunk(model, file_path): segments, info = model.transcribe(file_path, beam_size=7) transcription = ''.join(segment.text for segment in segments) return transcription def main2(): """ Основная функция программы. """ # Выбираем модель Whisper model = WhisperModel("medium", device="cuda", compute_type="float16") # Инициализируем PyAudio p = pyaudio.PyAudio() # Открываем поток записи stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024) # Инициализируем пустую строку для накопления транскрипций accumulated_transcription = "" try: while True: # Записываем аудиофрагмент chunk_file = "temp_chunk.wav" record_chunk(p, stream, chunk_file) # Транскрибируем аудиофрагмент transcription = transcribe_chunk(model, chunk_file) print(NEON_GREEN + transcription + RESET_COLOR) # Удаляем временный файл os.remove(chunk_file) # Добавляем новую транскрипцию к накопленной транскрипции accumulated_transcription += transcription + " " except KeyboardInterrupt: print("Stopping...") # Записываем накопленную транскрипцию в лог-файл with open("log.txt", "w") as log_file: log_file.write(accumulated_transcription) finally: print("LOG" + accumulated_transcription) # Закрываем поток записи stream.stop_stream() stream.close() # Останавливаем PyAudio p.terminate() if __name__ == "__main__": main2()
@josequinonez2941
@josequinonez2941 3 ай бұрын
I LOVE U
@shravanhegde2237
@shravanhegde2237 3 ай бұрын
@@josequinonez2941 can u share me the code in engish?
@AkashDesai-ef9mk
@AkashDesai-ef9mk 2 ай бұрын
thankss
@saarza9991
@saarza9991 Ай бұрын
Ok but when I run this it just runs and ends. What to do? I'm new to this plz help
@keeganreeve
@keeganreeve 21 күн бұрын
Спсб очень! )) у тебя гитхаб? я б хотел найти твою страницу
@filipphenderson6342
@filipphenderson6342 7 ай бұрын
Pulling in people with a flashy thumbnail of a Python code that works and then trying to monetize your code based on a library that is already supposed to be open source is in my opinion bs. it is not fair for beginners that might not know Python or whisper very well. for that I give you a thumbs down!
@christianmccauley7340
@christianmccauley7340 Ай бұрын
Wow, an AI channel scamming people? Who would’ve ever heard of such a thing! Tired of the fucking grifternet man, how did this happen?
@jbtesla3581
@jbtesla3581 29 күн бұрын
for real this is a fking scam, the code is in gifthuf free
@ReadyMedia-no
@ReadyMedia-no 8 ай бұрын
There is a product for Live video Transcription there. Live text services are expensive and does not work on many current languages.. Set up a server/service that will ingest a RTMP video source, delay the video and overlay text on video in perfect sync. then offer RTMP output with burned in Live text. :) There is need for this service.
@svenborgers6908
@svenborgers6908 7 ай бұрын
I have tried to get this to run on M1 MacBook. No joy. The CPU maxes out even with the tiny model. But then I tried with the Whisper.cpp implementation which is compiled for apple silicon. I found a whisper-cpp-python wrapper for that library. That actually runs and is far less CPU bound. It has a bit of a stutter, it is not as clean, it misses words between the chunk processing but you can see that with just a little bit more power it could work.
@MrThaitrinh
@MrThaitrinh 6 ай бұрын
Hi Seven, could you please share your code with me? Thank you very much!
@ryanjames3907
@ryanjames3907 8 ай бұрын
wow !! great video !!! Thank you for being so generous and teaching this to us, this is epic stuff! I can already start see all kinds of use cases, I cant wait to get it running, I'm really looking forward to Wednesday's video . Thanks again from Canada
@cristobalmunoz84
@cristobalmunoz84 2 ай бұрын
Nice video!! thanks for your help in this topics!!
@ArmandoMenicacci
@ArmandoMenicacci 8 ай бұрын
Fantastic !!! A bit fast in explaining and showing, but I can always pause!
@benscottbongiben
@benscottbongiben 8 ай бұрын
Good to see transcription and generate responses as audio in real-time for phone call
@reddyparthu5978
@reddyparthu5978 6 ай бұрын
how to get the code for this?
@enesgul2970
@enesgul2970 8 ай бұрын
Gerçekten çok iyisiniz.
@bigswede88
@bigswede88 2 ай бұрын
Heja Sverige ! Bra jobbat
@ferluisch
@ferluisch 4 ай бұрын
Hey man this is really cool! I'd like to know if you: 1) used the whisper v3 model? or the v2? 2) If you have seen the demos from gpt4, they also showed that gpt ASR is better than whisper v3, wonder if it will be open like whisper.
@HammerOnTheNet
@HammerOnTheNet 8 ай бұрын
Amazing and inspiring work! Kris what about something less powerful but better accessible in terms of hardware?
@unrealminigolf4015
@unrealminigolf4015 8 ай бұрын
Awesome bro! ❤
@calvinapollos
@calvinapollos 5 ай бұрын
Great video! Thanks for going through this in such an easy-to-understand way! Can you share the python scripts?
@maizizhamdo
@maizizhamdo 5 ай бұрын
i love your videos man , please video about fastwhisper on docker api please
@aoeu256
@aoeu256 7 ай бұрын
This will be a good tool for language immersion chinese / japanese / indonesian along with the deepl clipboard tool, edge browsers tts engine.
@AdrianC2006Uk
@AdrianC2006Uk 4 ай бұрын
That image gen project was pukka!
@ShariqueAM
@ShariqueAM Күн бұрын
I want to do speech to text Audio from the browser speaker and not from the mic , how can we do that in real time ?
@prakashsahu-xn6qy
@prakashsahu-xn6qy Ай бұрын
how can i get this code which you used in this videos same code i need.
@kimsteinhaug
@kimsteinhaug 8 ай бұрын
Interesting stuff on the image creation at the end while talking, not sure if you are taking into consideration puctuation in you sentences? Im pretty sure this would have to do with something cool, maby keeping an overview of all the text that has been moving out of the "buffer" for style ? Looks like something I could have a lot of fun with, do not have the GPU though :/ Colab however.
@hjoseph777
@hjoseph777 2 ай бұрын
I have been looking where to start, fantastic work, where can I have the code for testing
@110gotrek
@110gotrek 8 ай бұрын
Now make it translate and do phone-cals
@rne1223
@rne1223 8 ай бұрын
Noooo…pls nooo. We got plenty auto callers already.
@ibrahimelshenhapy9179
@ibrahimelshenhapy9179 7 ай бұрын
​@@rne1223 Where?
@luluw9699
@luluw9699 Күн бұрын
Hello ur computer has a virus
@huhaifan
@huhaifan 2 күн бұрын
cannot find the code in github
@martinvizar6430
@martinvizar6430 7 ай бұрын
Impresario thank you
@magnoliasphinkter8622
@magnoliasphinkter8622 18 күн бұрын
thanks this is great! Where can I find the actual code you have on your screen? Struggling to find it on the github
@thedoctor5478
@thedoctor5478 8 ай бұрын
I think there's an even faster whisper module but I forget what it's called
@AustinKang-wk8cl
@AustinKang-wk8cl 8 күн бұрын
did you find out?
@mattaylor-qg4yw
@mattaylor-qg4yw 5 ай бұрын
just joined. would be good to get my grubby paws on the files for this.
@t-dsai
@t-dsai 8 ай бұрын
Thanks for sharing your knowledge/experience. I'm bit perplexed. The description here mentions 45+ prompts in the PDF book, the newsletter website says 40+, and the PDF doc says 35+. Which number is correct?
@aseel6910
@aseel6910 5 ай бұрын
If there any way to translate this text to another languages it will be awesome
@mujahidali2369
@mujahidali2369 Ай бұрын
welldone
@عبدالرحيمعبدالرحيم-غ5غ
@عبدالرحيمعبدالرحيم-غ5غ 8 ай бұрын
could you do another demo to see how it can translate in real time?
@gregh7457
@gregh7457 8 ай бұрын
yes! there are no really good or fast translation apps available. KZbin auto translate is horrible!
@eliasbosc
@eliasbosc 12 күн бұрын
Can you pls share you code?
@ItsNsour
@ItsNsour 5 ай бұрын
can it translate?
@himanshujaviya6021
@himanshujaviya6021 4 ай бұрын
Can we get the code used in this video that would be really helpful
@claudiobalderrama1599
@claudiobalderrama1599 6 ай бұрын
Do you think this could be used to transcribe, for example, phone calls made through the browser? I would greatly appreciate your response :)
@kebman
@kebman 8 ай бұрын
The sentiment analysis really scares me. I mean, there's absolutely no chance that'll be abused by big tech in terms of political marketing. I mean, like, there's no way in hell right?
@George-kx8fl
@George-kx8fl 7 ай бұрын
Would it be possible to do speaker recognition then pipe it into translation
@jotixh
@jotixh 5 ай бұрын
Is there a way to connect a live streaming url?
@leucome
@leucome 8 ай бұрын
Faster whisper and Insanely Fast Whisper don't seem to have AMD gpu support yet. So I had to go with an alternative for the 7900xt. I used wishper.cpp with cuda/HIP + distilled whisper model. Seriously this combination is kinda real-time too, even when using the distil large v2. Though there is a downside to that, the TTS and Whisper on the GPU gobble up like 8GB or vram. This put some limit to the LLM model I can use at same time.
@maxstauss9579
@maxstauss9579 4 ай бұрын
i cant find the script of the realtime translation pls help me finding it :((
@gmazuel
@gmazuel Ай бұрын
Where can find the code .
@lutusp
@lutusp 8 ай бұрын
Hey, it's in your video description, therefore easily fixed: the word is "transcription". Why not avoid the irony of a video that extols modern AI voice to text ... transcription ... in which the AI engine will surely avoid this mistake, and at the speed of light.
@agardner-to7vi
@agardner-to7vi 2 ай бұрын
that is awesome. Sooo i am trying to do something like this. My sister is deaf and i want something that can also just label the who is speaking. So for a small group it will say user 1 user 2 user 3. and who ever is speaking it will let person know. Do you think that is possible.. How could i do that. I got everything but that last part.
@maverick1901
@maverick1901 7 ай бұрын
running fully local is one thing ... doing this via webaudio api towards a backend is a different topic - is there any implementation for that as well foreseen?
@kebman
@kebman 8 ай бұрын
I might be jaded but... I mean really, how about an AI that calculates the probability of drone attacks or artillery attacks? How about an AI that calculates the probability of soldiers hiding in terrain? I mean, there are already good search algorithms out there, that one may-or-may-not use to carry out artillery strikes. I'm just thinking aloud here. Probably nothing.
@saqqara6361
@saqqara6361 3 ай бұрын
how to access your sourcecode as a paid channel member?
@crazyforhyunwoo119
@crazyforhyunwoo119 6 ай бұрын
Can I did this with javascript?
@AlexPopov-hv3kp
@AlexPopov-hv3kp 4 ай бұрын
what is a transcribe_chunk function in the code? Seems that it's not from faster_whisper?
@danielgh4814
@danielgh4814 7 ай бұрын
Hi, I'm a subscriber but I do not have access to your github ,can you helpme please?
@RicardoMaciasYepez6913
@RicardoMaciasYepez6913 4 ай бұрын
Can this run on raspberry pi?
@thnmanucian7993
@thnmanucian7993 5 ай бұрын
Hello. I’m beginner in this major. How can I get your code to refer? Thank you
@kate-pt2ny
@kate-pt2ny 8 ай бұрын
Kris, you are a genius. Real-time speech transcription can do a lot of things. The last example is great. I can’t wait to watch the video released on Wednesday. My computer is a Mac M chip computer. I found the code in your github and changed it to run on the CPU. Later, some problems occurred, such as incomplete transcribed content and OSError. Can you release a version suitable for Mac computers? grateful
@Siri-tz7dz
@Siri-tz7dz 5 ай бұрын
where do i get the setup/python code
@vallu-Tech
@vallu-Tech 6 ай бұрын
Bro can you put th video about live streaming voice to text
@MiguelCayazaya
@MiguelCayazaya 2 ай бұрын
pip install patience and kindness
@isaacmasinde1994
@isaacmasinde1994 Ай бұрын
Which gpu are you using ?
@Edward_ZS
@Edward_ZS 8 ай бұрын
Has anyone updated the code from the previous video to use this recording method instead?
@henrijohnson7779
@henrijohnson7779 7 ай бұрын
@Kris : I already joined as an Adept member on Jan 18th 2024 and requested access to the Github Repo via email and also via Discord but have not had any response from you yet ?
@ytemre
@ytemre 6 ай бұрын
I became a member how do I get access to the code and the github for this
@AllAboutAI
@AllAboutAI 6 ай бұрын
hello :D send me a e-mail at kris@allabtai.com
@haloBean
@haloBean 5 ай бұрын
Hi, Can get the github repo of the above code ? Thanks
@digitalsoultech
@digitalsoultech 8 ай бұрын
The accuracy sucks. Many words are incorrect which you can see in the image itself. This isn't usable in the real world.
@ahmedelkamash9323
@ahmedelkamash9323 5 ай бұрын
how can we download this script?
@kylebolt5861
@kylebolt5861 8 ай бұрын
How do we join your community?
@AllAboutAI
@AllAboutAI 8 ай бұрын
Link in desc :) youtube member
@najafzawar8168
@najafzawar8168 8 ай бұрын
@@AllAboutAI just subscribed to your channel but not getting GitHub code..
@joaopaulonadal8484
@joaopaulonadal8484 8 ай бұрын
How can i get acess to this code?
@erenkaraboga8570
@erenkaraboga8570 7 ай бұрын
Can we take source code ?
@Onlyindianpj
@Onlyindianpj 2 ай бұрын
This is Presentation not tutorial
@TonyHoangPodcast
@TonyHoangPodcast 5 ай бұрын
does it support speaker diairzation?
@ShariqueAM
@ShariqueAM Күн бұрын
I want to do speech to text Audio from the browser speaker and not from the mic , how can we do that in real time ?
@avgplayer
@avgplayer 8 ай бұрын
Waiting for the in deep video :) Btw your discord invite link is expired.
@slimshady91bat
@slimshady91bat Ай бұрын
ma è gratuito?
@nusretalikok823
@nusretalikok823 8 ай бұрын
where can we find the code that you used?
@crazyforhyunwoo119
@crazyforhyunwoo119 6 ай бұрын
github linked in the description
@maxstauss4821
@maxstauss4821 4 ай бұрын
iam a member but i cant acces the github pls HELP
@maxstauss4821
@maxstauss4821 4 ай бұрын
this i my github maxaxaxaxxaxaxaax
@curtisnewton895
@curtisnewton895 7 ай бұрын
transcriPtion
@fufu9352
@fufu9352 6 ай бұрын
Zero latency? I have been check your video timeline. terminal output and audio is not correspond. you must be living a world 1-2 second ahead our timeline. 😅
@AlphaScraperOne
@AlphaScraperOne 6 ай бұрын
🧡
@ramadanhasan1574
@ramadanhasan1574 8 ай бұрын
Where is the link to this source code ? Thanks amazing
@nafila5084
@nafila5084 7 ай бұрын
did you get the code
@ramadanhasan1574
@ramadanhasan1574 6 ай бұрын
no @@nafila5084
@KaMingLeung-kk6ey
@KaMingLeung-kk6ey 5 ай бұрын
@@nafila5084 Can share the code to me as well?
@Velnio_Išpera
@Velnio_Išpera 6 ай бұрын
Can you use different languages?
@tharosen-g4q
@tharosen-g4q 8 ай бұрын
🎈
@vaibhavmishra1100
@vaibhavmishra1100 7 ай бұрын
can you tell me the solution of this error : Could not load library cudnn_ops_infer64_8.dll. Error code 126 Please make sure cudnn_ops_infer64_8.dll is in your library path!
@劉育安
@劉育安 6 ай бұрын
try "pip install nvidia-cudnn-cu12"
@vaibhavmishra1100
@vaibhavmishra1100 6 ай бұрын
its didnt work@@劉育安
@HungBui-r7z
@HungBui-r7z 5 ай бұрын
I have registered as a member, please check your email
@nouriensha2873
@nouriensha2873 13 күн бұрын
Can i convert this code to cpp and implement using Arduino without api
@harshitsingh3061
@harshitsingh3061 8 ай бұрын
where can we get the code
@crazyforhyunwoo119
@crazyforhyunwoo119 6 ай бұрын
github linked in the description.
@abdurrahmankeskin3716
@abdurrahmankeskin3716 3 ай бұрын
how to get the code for this?
@ScaryLasers
@ScaryLasers 2 ай бұрын
how do i get access to the github?? TAKE MY MONEY! lol no but seriously how
@thebigbigdaddy
@thebigbigdaddy 8 ай бұрын
how can we identify different speakers?
@ickorling7328
@ickorling7328 6 ай бұрын
Microsoft co-pilot in a teams call recording transcription. Cant simply call, needs to he a meeting call... subtle difference. Try 'meet now' in teams calender view, or make calendar event.
@royzac7829
@royzac7829 7 ай бұрын
How does the transcription performance compare to assemblyAI?
@fredericpaillot2570
@fredericpaillot2570 8 ай бұрын
Hi Kris! I love what you do, I would like to become a member of your channel, but I can't access the page to subscribe, do you have a direct link? the one in description doesn't work for me.. have a good day!
@MarxOrx
@MarxOrx 8 ай бұрын
BROOOO 🎉 FIRST
@rahar6009
@rahar6009 6 ай бұрын
It is bs to make an open source code monetized! So sorry for you and your kinds... unsubs.
@radudamianov
@radudamianov 8 ай бұрын
Excellent! Thank you so much for sharing!
Officer Rabbit is so bad. He made Luffy deaf. #funny #supersiblings #comedy
00:18
Funny superhero siblings
Рет қаралды 3,2 МЛН
Running With Bigger And Bigger Lunchlys
00:18
MrBeast
Рет қаралды 98 МЛН
The day of the sea 😂 #shorts by Leisi Crazy
00:22
Leisi Crazy
Рет қаралды 1,6 МЛН
Офицер, я всё объясню
01:00
История одного вокалиста
Рет қаралды 2,3 МЛН
3 AI Coding Workflow Experiments feat Claude 3.5 and Cursor AI
18:41
World’s Fastest Talking AI: Deepgram + Groq
11:45
Greg Kamradt (Data Indy)
Рет қаралды 49 М.
Can Whisper be used for real-time streaming ASR?
8:41
Efficient NLP
Рет қаралды 8 М.
GPT-o1: The Best Model I've Ever Tested 🍓 I Need New Tests!
10:58
Matthew Berman
Рет қаралды 237 М.
AI Realism Breakthrough & More AI Use Cases
25:52
The AI Advantage
Рет қаралды 136 М.
I Hacked a Discord Bot, the Owner said this...
9:09
No Text To Speech
Рет қаралды 1,3 МЛН
Officer Rabbit is so bad. He made Luffy deaf. #funny #supersiblings #comedy
00:18
Funny superhero siblings
Рет қаралды 3,2 МЛН