You asked for it - and I delivered | Live speech transcription with OpenAI Whisper STT

Рет қаралды 7,046

3 ай бұрын

Run live speech transcription on Raspberry Pi 5 with faster-whisper and WhisperLive, see the transcription results as they are processed and send the final output to an LLM or TTS. Less finicky than SDL2, WhisperLive instead uses PyAudio for audio capture. Tested with two microphones: ReSpeaker 2-Mics Pi HAT and ReSpeaker USB Mic Array.
How to make whisper.cpp transcribe faster? (audio_ctx explanation)
/ how-to-make-cpp-102337630
Microphones:
www.seeedstudio.com/ReSpeaker...
www.seeedstudio.com/ReSpeaker...
Barebone single-threaded implementation of FasterWhisper transcription from the microphone
gist.github.com/AIWintermuteA...
My fork of whisper.cpp Python bindings
github.com/AIWintermuteAI/whi...
My fork of Whisper live - git clone this to follow along the video
github.com/AIWintermuteAI/Whi...
faster-whisper repository
github.com/SYSTRAN/faster-whi...
Piper TTS
github.com/rhasspy/piper

Пікірлер: 43

@dad2979 2 ай бұрын

I have been following you for a long time and you have done such a fantastic job of crafting your style and keeping your content relevant. Great job Dmitry!

@Hardwareai 2 ай бұрын

Thank you for leaving this comment! I'm still refining my style to tell the truth. One of the things I was successful recently (I think) is keeping my videos more to the point, with good flow of information. Now it looks to me I was blabbering way too much in my older videos at times. I cut a lot of stuff now on post-processing if I feel the video is overloaded. I plan to make some more storytelling-oriented robotics content next half-year, stay tuned and see how it goes.

@exploring-electronic 3 ай бұрын

Thank you for making this follow up!

@Hardwareai 3 ай бұрын

Appreciate your support

@georgeknerr 3 ай бұрын

Excellent work, keep it up!!! Shared on Twitter too.

@Hardwareai 3 ай бұрын

Thanks for sharing!!

@jackwarner5445 Ай бұрын

I'm trying to make an AI voice assistant and would be completely lost without your videos. Thanks so much!

@Hardwareai Ай бұрын

Glad I could help!

@justquicker5044 3 ай бұрын

Thank you so much! You’ve really helped me speed up my project. I normally don’t like and subscribe but I made an exception 🙃. Keep it up!!

@Hardwareai 3 ай бұрын

Thank you for your support!

@inout3394 3 ай бұрын

Thx

@Hardwareai 3 ай бұрын

My pleasure! Thanks for commenting!

@domesticatedviking 3 ай бұрын

Hey, just wanted to say I really appreciated your last two videos. Will you please be my sensei? Thank you!!

@Hardwareai 3 ай бұрын

I appreciate your appreciation! xD I'd say that I'm already a sensei of sorts... You always can support me on Patreon for some extras, but otherwise simply stay tuned for more videos!

@ameetkarn 23 күн бұрын

This is too good....I think this should fit in directly with one of my project. Do you have any recommendation for real time TTS ?

@Hardwareai 22 күн бұрын

Hopefully! I used espeak before for other projects... it is pretty horrible by modern standards, but does its job. For this example I used piper TTS - much better quality, but not as fast as espeak.

@MrTubertub 17 күн бұрын

Hi there, could you please advise what is the best and easiest way to transcribe mp3 files speech recordings to text with no coding experience at all. Thank you

@Hardwareai 13 күн бұрын

That's probably besides the topic, but I'd say something like MacWhisper for Mac? And try searching "whisper.cpp gui windows" for windows.

@shakhizatnurgaliyev9355 3 ай бұрын

Like!!!Dima, awesome content, what do u think about VOSK API and compare it to Whisper? Great example of PiperTTS. Thank you!

@Hardwareai 3 ай бұрын

Thanks, appreciate it! I'll try it out and compare it - I don't think I'll make a video about it, but maybe a blog article :)

@bystander85 Ай бұрын

I've been trying to find a way to make end of speech flag to be more intelligent than just detecting a pause. I find it common that I may have a mental blank, or misspeak, and the delay in my speech incorrectly flags end of speech. It would be interesting if STT systems can continue listening after a pause if it detects an incomplete sentence. Any ideas?

@Hardwareai Ай бұрын

That's a hard one. I don't think this one is solved even in commercial STT engines - e.g. google assistant or siri. That would require understanding on sentence context. We might be getting somewhere with multi-modal models, such as GPT4o, but I don't think there is anything available to be run on Raspberry Pi format computer. Also, as a shortcut, perhaps it would be possible to either run a classifier or modify whisper model to output probability of sentence being finished... It's just an idea though, finding out how well will it work is another thing entirely.

@MarkD-p2h 16 күн бұрын

Thank you for sharing your knowledge. I'm trying to do "float16" STT transcription with diarization using WhisperX on an 8GB Pi5, but "the ctranslate2 package does not compile with CUDA support." Per the whisperx readme, I tried to install pytorch v11.8 from the PyTorch pip command, and then I tried the current version, before trying to install whisperx with no joy. Apologies if this is a silly question, but is there a CUDA version that works on a Pi5 GPU (Broadcom VideoCore VII), or must I only use CPU CUDA? What do you recommend? Thanks!

@Hardwareai 13 күн бұрын

CUDA is Nvidia hardware specific things, so it will not work without Nvidia GPU :) float16 will not give you any performance benefit for CPU, so use either float32 or int8

@MarkD-p2h 13 күн бұрын

@@Hardwareai Thank you so much for your kind reply! I'm learning much and I'm excited to make this project, which will help me greatly in my work in Geneva. ❤

@glikoz 28 күн бұрын

Please advise the hardware setup for offline RAG, TTS, STT

@Hardwareai 25 күн бұрын

Hard to estimate without knowing the details?

@ameetkarn 17 күн бұрын

hi, I am getting following error while running the fork..any ideas ? A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to downgrade to 'numpy

@Hardwareai 13 күн бұрын

Uh-oh. Can you create a Github issue for that?

@simplelife4441 7 күн бұрын

hi there i try run the example_client.py and gives me an error: client = TranscriptionClient( TypeError: TranscriptionClient.__init__() got an unexpected keyword argument 'callback' How can fix it? thanks

@Hardwareai 2 күн бұрын

It sounds like you're not actually using my code, but upstream code? Can you create an issue here github.com/AIWintermuteAI/WhisperLive

@sathishkumarB-h9m 13 күн бұрын

How many languages does whisper will support

@Hardwareai 13 күн бұрын

You can see them here github.com/openai/whisper#available-models-and-languages

@Andriu66 3 ай бұрын

do you use the 8gb raspberry pi?

@Hardwareai 3 ай бұрын

Yes, Raspberry Pi 5 8 Gb - but RAM is hardly relevant here, for tiny.en model.

@isaacfranklin2712 2 ай бұрын

@@Hardwareai thinking of getting the Pi 4 with 1GB RAM. shouldn't be an issue to replicate hopefully.

@Hazar-bt6nf 19 күн бұрын

Can raspberry pi5 run whisper using Python?

@Hardwareai 18 күн бұрын

Yes. absolutely!

@Onlyindianpj 15 күн бұрын

Real implementation is using websocket Idea is App is transmitting PCM 16k raw audio WS Server will capture those audio packets Sent that to whisper ai to get transcription and return to app in json

@Hardwareai 13 күн бұрын

This is pretty much how WhisperLive works, no?

@Onlyindianpj 13 күн бұрын

@@Hardwareai you are not using whisperlive