You asked for it - and I delivered | Live speech transcription with OpenAI Whisper STT

Рет қаралды 13,413

Күн бұрын

Пікірлер

@Hardwareai 2 ай бұрын

Support the channel by buying with these links (affiliate): Official Seeed Studio Store s.click.aliexpress.com/e/_DBfAaP9 s.click.aliexpress.com/e/_DkhpJAR Raspberry Pi 4 s.click.aliexpress.com/e/_DFBtkCP Raspberry Pi 5 s.click.aliexpress.com/e/_DmMSKb5

@dad2979 4 ай бұрын

I have been following you for a long time and you have done such a fantastic job of crafting your style and keeping your content relevant. Great job Dmitry!

@Hardwareai 4 ай бұрын

Thank you for leaving this comment! I'm still refining my style to tell the truth. One of the things I was successful recently (I think) is keeping my videos more to the point, with good flow of information. Now it looks to me I was blabbering way too much in my older videos at times. I cut a lot of stuff now on post-processing if I feel the video is overloaded. I plan to make some more storytelling-oriented robotics content next half-year, stay tuned and see how it goes.

@jackwarner5445 4 ай бұрын

I'm trying to make an AI voice assistant and would be completely lost without your videos. Thanks so much!

@Hardwareai 4 ай бұрын

Glad I could help!

@MrTubertub 3 ай бұрын

Hi there, could you please advise what is the best and easiest way to transcribe mp3 files speech recordings to text with no coding experience at all. Thank you

@Hardwareai 3 ай бұрын

That's probably besides the topic, but I'd say something like MacWhisper for Mac? And try searching "whisper.cpp gui windows" for windows.

@MarkD-p2h 3 ай бұрын

Thank you for sharing your knowledge. I'm trying to do "float16" STT transcription with diarization using WhisperX on an 8GB Pi5, but "the ctranslate2 package does not compile with CUDA support." Per the whisperx readme, I tried to install pytorch v11.8 from the PyTorch pip command, and then I tried the current version, before trying to install whisperx with no joy. Apologies if this is a silly question, but is there a CUDA version that works on a Pi5 GPU (Broadcom VideoCore VII), or must I only use CPU CUDA? What do you recommend? Thanks!

@Hardwareai 3 ай бұрын

CUDA is Nvidia hardware specific things, so it will not work without Nvidia GPU :) float16 will not give you any performance benefit for CPU, so use either float32 or int8

@MarkD-p2h 3 ай бұрын

@@Hardwareai Thank you so much for your kind reply! I'm learning much and I'm excited to make this project, which will help me greatly in my work in Geneva. ❤

@tomlynn1000 Ай бұрын

Is there a written process to follow. I've followed this step by step, but have run into python telling me Illegal Instruction when I run the scripts. Using latest raspbian OS

@Hardwareai Ай бұрын

Hello! I enabled issues on my fork here github.com/AIWintermuteAI/WhisperLive/issues feel free to create an issue with detailed problem description there and steps followed!

@wartem 20 күн бұрын

How did you manage to get Raspberry Pi 5 to work with ReSpeaker 2-Mics Pi HAT? I'm encountering significant issues with deprecated or changed kernel APIs, channel counts, and header problems. I've tested for example seeed-voicecard's GitHub repository, which is seemly not compatible with newer kernels and APIs. A guide on making this work would be incredibly helpful, and greatly appreciated.

@Hardwareai 14 күн бұрын

/sigh/ yes, you're right, almost every major update of Raspberry Pi OS breaks reSpeaker. At the very least you want to start with HinTak's fork, which is more recent github.com/HinTak/seeed-voicecard/issues/28 here is what I found out - try it out and respond on GH if this worked?

@wartem 14 күн бұрын

@@Hardwareai Thank you! I'm still testing but everything except the LEDs seems to work now after too many hours of troubleshooting. I will make a fork of HinTak later and share my findings. I can't wait to try your tutorials here on KZbin when I'm done with this. I found that the shell script they refer to via your link has been moved within the same repository. I've had no luck with this script so far, it runs the installation fine but after reboot I get no sign of success when testing different things ("no soundcards found..." etc). I can't comment on GH (HinTak) since I lack the permission needed.

@simplelife4441 3 ай бұрын

hi there i try run the example_client.py and gives me an error: client = TranscriptionClient( TypeError: TranscriptionClient.__init__() got an unexpected keyword argument 'callback' How can fix it? thanks

@Hardwareai 2 ай бұрын

It sounds like you're not actually using my code, but upstream code? Can you create an issue here github.com/AIWintermuteAI/WhisperLive

@АльбертИванов-ц4х 2 ай бұрын

thanks for video. is it possible to change language ? in fork or in whisper ?

@Hardwareai 2 ай бұрын

You can try with "tiny" model, without ".en" postfix. Granted, multilanguage models are not as precise, so perhaps you will need to use larger models, e.g. base. That would stretch Raspberry Pi capabilities, but should be possible with Raspberry Pi 5?

@ameetkarn 3 ай бұрын

This is too good....I think this should fit in directly with one of my project. Do you have any recommendation for real time TTS ?

@Hardwareai 3 ай бұрын

Hopefully! I used espeak before for other projects... it is pretty horrible by modern standards, but does its job. For this example I used piper TTS - much better quality, but not as fast as espeak.

@exploring-electronic 6 ай бұрын

Thank you for making this follow up!

@Hardwareai 6 ай бұрын

Appreciate your support

@justquicker5044 5 ай бұрын

Thank you so much! You’ve really helped me speed up my project. I normally don’t like and subscribe but I made an exception 🙃. Keep it up!!

@Hardwareai 5 ай бұрын

Thank you for your support!

@sarankumarb1911 10 күн бұрын

Hi, I am getting the below error in server code running terminal INFO:websockets.server:connection open INFO:root:New client connected ERROR:root:Error during new connection initialization: [WinError 2] The system cannot find the file specified And getting like this in example client code running terminal page: [INFO]: * recording [INFO]: Waiting for server ready ... [INFO]: Opened connection [INFO]: Websocket connection closed: 1000: Can you please help me to fix this.

@Hardwareai 10 күн бұрын

Hi there! Can you create an issue in my fork of the WhisperLive?

@bystander85 4 ай бұрын

I've been trying to find a way to make end of speech flag to be more intelligent than just detecting a pause. I find it common that I may have a mental blank, or misspeak, and the delay in my speech incorrectly flags end of speech. It would be interesting if STT systems can continue listening after a pause if it detects an incomplete sentence. Any ideas?

@Hardwareai 4 ай бұрын

That's a hard one. I don't think this one is solved even in commercial STT engines - e.g. google assistant or siri. That would require understanding on sentence context. We might be getting somewhere with multi-modal models, such as GPT4o, but I don't think there is anything available to be run on Raspberry Pi format computer. Also, as a shortcut, perhaps it would be possible to either run a classifier or modify whisper model to output probability of sentence being finished... It's just an idea though, finding out how well will it work is another thing entirely.

@shakhizatnurgaliyev9355 6 ай бұрын

Like!!!Dima, awesome content, what do u think about VOSK API and compare it to Whisper? Great example of PiperTTS. Thank you!

@Hardwareai 5 ай бұрын

Thanks, appreciate it! I'll try it out and compare it - I don't think I'll make a video about it, but maybe a blog article :)

@georgeknerr 6 ай бұрын

Excellent work, keep it up!!! Shared on Twitter too.

@Hardwareai 6 ай бұрын

Thanks for sharing!!

@glikoz 3 ай бұрын

Please advise the hardware setup for offline RAG, TTS, STT

@Hardwareai 3 ай бұрын

Hard to estimate without knowing the details?

@ameetkarn 3 ай бұрын

hi, I am getting following error while running the fork..any ideas ? A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to downgrade to 'numpy

@Hardwareai 3 ай бұрын

Uh-oh. Can you create a Github issue for that?

@emiliosanchez_ 8 күн бұрын

Do you have this problem yet? I managed to solve it with "pip install "numpy

@ameetkarn 7 күн бұрын

@@emiliosanchez_ yes it was fixed..thanks for the help though

@Garfunnckel 5 күн бұрын

how did you fixed it @ameetkarn

@Garfunnckel 5 күн бұрын

@@ameetkarnhow did you fixed it

@Garfunnckel 24 күн бұрын

Would this work on raspberry pi 4B?

@seaniversongeronimo2691 23 күн бұрын

secret dude

@Hardwareai 23 күн бұрын

Absolutely. Just a bit slower, but in my experiments in an earlier video it was still possible to get to real time transcription with tiny.en model

@Garfunnckel 4 күн бұрын

Good day I created a fork at your github can you please help me?

@hjoseph777 2 ай бұрын

Do you have a discord account? I need a consultation for a project I am working on

@Hardwareai 2 ай бұрын

Hi! I don't use Discord a lot. For consultations I do LinkedIn or Patreon - links are in my channel description!

@sathishkumarB-h9m 3 ай бұрын

How many languages does whisper will support

@Hardwareai 3 ай бұрын

You can see them here github.com/openai/whisper#available-models-and-languages

@Hazar-bt6nf 3 ай бұрын

Can raspberry pi5 run whisper using Python?

@Hardwareai 3 ай бұрын

Yes. absolutely!

@66Tomini 6 ай бұрын

do you use the 8gb raspberry pi?

@Hardwareai 6 ай бұрын

Yes, Raspberry Pi 5 8 Gb - but RAM is hardly relevant here, for tiny.en model.

@isaacfranklin2712 4 ай бұрын

@@Hardwareai thinking of getting the Pi 4 with 1GB RAM. shouldn't be an issue to replicate hopefully.

@domesticatedviking 5 ай бұрын

Hey, just wanted to say I really appreciated your last two videos. Will you please be my sensei? Thank you!!

@Hardwareai 5 ай бұрын

I appreciate your appreciation! xD I'd say that I'm already a sensei of sorts... You always can support me on Patreon for some extras, but otherwise simply stay tuned for more videos!

@muhammadanan9190 29 күн бұрын

how can I solve this issue. (I have changed onnxruntime==1.16.0 to onnxruntime==1.17.0) (my python version is 3.12.4) if this matters in any way !! The error is given below ERROR: Cannot install piper-tts==1.1.0 and piper-tts==1.2.0 because these package versions have conflicting dependencies. The conflict is caused by: piper-tts 1.2.0 depends on piper-phonemize~=1.1.0 piper-tts 1.1.0 depends on piper-phonemize~=1.0.0

@Hardwareai 29 күн бұрын

Hello! Can you create an issue in my GH fork? With exact command you were trying to run and problem description.

@muhammadanan9190 29 күн бұрын

@@Hardwareai tysm for the reply ❤️ can you do a video on WhisperFusion by collabora maybe with a real-time speech to text

@Onlyindianpj 3 ай бұрын

Real implementation is using websocket Idea is App is transmitting PCM 16k raw audio WS Server will capture those audio packets Sent that to whisper ai to get transcription and return to app in json

@Hardwareai 3 ай бұрын

This is pretty much how WhisperLive works, no?

@Onlyindianpj 3 ай бұрын

@@Hardwareai you are not using whisperlive

@levbereggelezo 5 ай бұрын

Thx

@Hardwareai 5 ай бұрын

Appreciate it!

@emiliosanchez_ 8 күн бұрын

Thanks for your work! Once I have server side running, when I launch client side, I get this error: "INFO:websockets.server:connection open INFO:root:New client connected ERROR:root:Error during new connection initialization: [ONNXRuntimeError] : 1 : FAIL : Load model from /home/pi/.cache/whisper-live/silero_vad.onnx failed:/onnxruntime_src/onnxruntime/core/graph/model.cc:134 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) ModelProto does not have a graph." Any thoughts?

@Garfunnckel 4 күн бұрын

> I ran into an issue when executing the example_client.py, it would throw this error: > > > > ` > > $ python examples/example_client.py > > > > Traceback (most recent call last): > > File "/home/rpi/WhisperLive/examples/example_client.py", line 26, in > > client = TranscriptionClient( > > ^^^^^^^^^^^^^^^^^^^^ > > TypeError: TranscriptionClient.__init__() got an unexpected keyword argument 'callback' > > ` > > Using the following versions: > > - Raspberry Pi OS 64-bit Debian 12 (bookworm) > > - Python 3.11.2 > >