can you share the weight epoch_2nd_0036.pth ? thank you. I hope you can share with me. i don't have many resources to train again
@jayr77412 күн бұрын
Does it supports multiple languages?
@Linguflex2 күн бұрын
Nope, the dataset I used to train the classification model only has English sentences.
@jijainth4 күн бұрын
hey can you make this a public repo , I would like to test it out , thanks.
@Linguflex4 күн бұрын
Repo is here: github.com/KoljaB/WhoSpeaks
@42ndMoose4 күн бұрын
0:38 regarding the mention of DistilBERT and how it omitted a chunk due to it, will there potentially be a way where we could input new words for 0-shot finetuning? 2:25 i noticed that the model decided to omit the word "like". this is very good for voice agents, but there are some use cases where it is necessary to include certain misspoken words (needed for an accurate transcript or additional nuance). so i suggest a threshold slider, perhaps? will it ever be able to focus on 1 speaker, if another speaker talks over the 1st one? or can it take account of the 2nd speaker at the same time? or will it only assume 1 speaker at all times?
@Linguflex4 күн бұрын
Good observations. At 0:38 it seems like the sentence got detected too quickly. I needs some deeper analysis for those edge cases to figure out why the algorithm cuts into the sentence. You can fine-tune Whisper to handle new words and then use the fine-tuned models directly with RealtimeSTT: huggingface.co/blog/fine-tune-whisper For 2:25, this looks like a Whisper quirk. Whisper tends to remove filler words like "ah" or repetitions. It might improve if I tweak the initial_prompt behavior to use it only for the real-time model but removing it from the final transcription. Whisper tends to lose some accuracy when prompted, but in this case, it’s necessary for real-time processing. Unfortunately, there’s no simple parameter to make Whisper more accurate regarding filler words. If it’s critical it's needed switch to a different ASR model, there are some that deliver precise transcriptions even on filler words, though that brings its own set of pros and cons. Regarding speaker handling, Whisper doesn’t support speaker diarization natively. It always assumes a single speaker. Real-time speaker diarization is a whole other rabbit hole. Not impossible, but very complex to pull off.
@kanavwastaken4 күн бұрын
Uh, hasn't Edge-TTS been here for... years?
@Linguflex4 күн бұрын
You're right, the title's a bit misleading. Edge-TTS itself isn't new, but what's new is the real-time aspect. You can now use LLM outputs and get instant TTS results, or load massive texts and hear the output in just a second.
@daviruela20554 күн бұрын
FYI I am a huge fan of your work. Thank you!
@herofahimshahriargaming82888 күн бұрын
can you make a tutorial video on how to use this RealtimeTTS library?pleas?
@Linguflex8 күн бұрын
I know better docs and tutorials are needed. The thing is, I love building stuff, but I hate documenting. Plus, I’m a bit of a perfectionist, and making a proper tutorial that covers everything RealtimeTTS can do would take me weeks. :)
@wzct11 күн бұрын
amazing
@ivancain12 күн бұрын
please dont stop your work. I am now following you very closely! RIght behind you chief! your work is amazing and overlays significantly with my own ai project. following you on github now.. please dont stop!
@daveacorn78212 күн бұрын
How is the progress on this? Very interested to try this.
@Linguflex12 күн бұрын
Latency way better. Linguflex core is growing nicely. Home assistant module hasn’t seen much progress. Honestly, I could really use help (PRs welcome). Got way too many projects, so some stuff is a bit stuck in "proof of concept" mode for now.
@kritikusi-66612 күн бұрын
so erotic lol
@extensy13 күн бұрын
Thank you so much for adding styletts!
@vohiepthanh969213 күн бұрын
it's great work, dude 🤟
@anarhi1714 күн бұрын
got this error at the end (while speaking): "WebSocket error: [WinError 10054] An existing connection was forcibly closed by the remote hostWebSocket error: [WinError 10054] An existing connection was forcibly closed by the remote host" any ideas? should I open the python file in another terminal while this one is running?
@Linguflex13 күн бұрын
This code is already outdated. Please use the main installation (pip install realtimestt) from the master branch. Then start the client with the `stt` cli command. If you still run into problems I suggest we discuss that on my discord in the RealtimeSTT section discord.com/invite/f556hqRjpv or per mail ([email protected]), happy to assist
@ashsingh217514 күн бұрын
Bro ur voice sysnthesis is too fast? how did u speed up? For me its takes more than few sec to start
@Linguflex13 күн бұрын
Good question. It likely depends on the Edge service availability and your internet speed. My system is pretty fast, but some imports required for other engines might affect performance on slower CPUs.
@HehehehHeh-g1o16 күн бұрын
Could you show installation process as a video? I am very lost with the github haha. Thanks!
@Linguflex15 күн бұрын
Hey! I get why a video would help, but it's tricky for me since my system is already fully set up for AI. On my setup it’s just `pip install RealtimeSTT` because everything (Python, CUDA, etc.) is already installed. Also installation is very different for Linux and Windows. For example on Linux, Python is pre-installed and you can handle the installation process just via CLI. On Windows, you'd need to install Python, CUDA, and the toolkit software first. I suggest join my Discord at discord.com/invite/f556hqRjpv for help (we can voicechat there, also there's a RealtimeSTT section) or mail me at [email protected]!
@nmstoker17 күн бұрын
Great video, well explained, thank you!
@nikhilkaswan541219 күн бұрын
This video got you your new subscriber 😊
@nikhilkaswan541219 күн бұрын
Voice's are 👍 nice
@VodoXdzАй бұрын
can you use this v assistant without a gpu like nvdia (i have a ryzen gpu ) i think you can't bcs of the high processed data that needed to be done fast btw this project get a solid 9/10
@Linguflex14 күн бұрын
I don't think it can run without an Nvidia GPU. The assistant relies on several models, like Whisper, Silero, and WebRTC VAD for ASR, as well as XTTS for TTS and some RVC models for TTS post-processing. These typically require CUDA for optimal performance, and I'm not sure if they’re fully compatible with ROCm for AMD GPUs.
@krosx278Ай бұрын
wow amazing, how do you set audio index?
@LinguflexАй бұрын
Use input_device_index property of AudioToTextRecorder class (hope that's what you mean)
@krosx278Ай бұрын
@@Linguflex it works!! thank you...btw I want to make a question answering ai base on your RealtimeSTT transcribed. I am open for suggestion...
@LinguflexАй бұрын
@@krosx278 Please look at advanced_talk.py or openai_voice_interface.py in RealtimeSTT test folder or at my LocalAIVoiceChat project
@krosx27829 күн бұрын
@@Linguflex I want to use llama and fine-tune it... Is that possible?
@Linguflex29 күн бұрын
@@krosx278 Yes, of course
@SuperWorld007Ай бұрын
Can you please provide Code related to capture system audio and then transcribe it? You provided code for mic but for speaker, I can not find from the test folder.
@LinguflexАй бұрын
github.com/KoljaB/RealtimeSTT/blob/master/tests/realtimestt_test_stereomix.py This script records from system audio, you might need to change LOOPBACK_DEVICE_NAME to your stereomix device name
@skepziev2565Ай бұрын
that is so fire. I'm striving to make something like this
@irem2719Ай бұрын
STT server start command issued. Please wait a moment for it to initialize. Timeout while connecting to the server. Failed to connect to the server. 👄 ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave Cannot connect to server socket err = No such file or directory Cannot connect to server request channel jack server is not running or cannot be started JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock could you help with this issue?Im trying this on ubuntu
@LinguflexАй бұрын
Will try tomorrow on ubuntu
@LinguflexАй бұрын
Had similar issues on Ubuntu. I upgraded RealtimeSTT to v0.3.5 so with "pip install RealtimeSTT==0.3.5" these problems should be solved now.
@kritikusi-666Ай бұрын
I just tried this. Holly hell. This is some fast rendering. Thank you for sharing your repo. I tested with stereo mix with a YT video. It did not miss a beat.
@krosx278Ай бұрын
how did you implement that? can you share your code?
@kritikusi-666Ай бұрын
@@krosx278 implement what exactly? to have the STT detect browser audio? For that, you need an audio mixer or virtual cable.
@krosx278Ай бұрын
@@kritikusi-666 i already saw the latest code from the author its quiet i need... check for the update.
@theteacher3163Ай бұрын
okay, how do I go through no nut november having this ai on my pc ?
@krozark2 ай бұрын
I’ll defenitly give it a try
@nmstoker2 ай бұрын
Really impressive results and as you say, this is how people often speak, so handling this well is really important!
@DeepChauhan932 ай бұрын
hi i am new to coding, want to know how to launch the interface where I can paste text
@Linguflex2 ай бұрын
Code is here: github.com/KoljaB/RealtimeTTS/blob/master/tests/pyqt6_speed_test_chinese.py
@user-le2zv6go3v2 ай бұрын
wow.. really cool
@Lucia-sy7le2 ай бұрын
He's so creepy.
@CodingPuff2 ай бұрын
Interesting! What kind of models are you using
@Linguflex2 ай бұрын
faster_whisper for speech to text, Silero VAD + webrtcvad for voice activity detection, llama 3.1 8b for sentence end verification
@parthpatwari31742 ай бұрын
@@Linguflex llama api?
@Linguflex2 ай бұрын
@@parthpatwari3174 LMStudio
@Pauliomat2 ай бұрын
Thanks for working on these projects
@tiredbusinessdad2 ай бұрын
Awesome 👏💪
@TejaskDeshmukh2 ай бұрын
how we can do it. is your code availble and open source . please guide
@Linguflex2 ай бұрын
work in progress: github.com/KoljaB/RealtimeSTT/blob/dev/tests/realtimestt_speechendpoint.py
@SaddamBinSyed2 ай бұрын
Thanks for the amazing work . Will it support automatic multi language detection as well? Pls advise
@lonligrinrdo49892 ай бұрын
@@SaddamBinSyedWouldn't be hard to implement. faster_whisper is currently buggy when no language is set
@Lennert_hd2 ай бұрын
Wow, looks like a great tool and you've chosen a great text-to-speech voice! Where is it from?
@Linguflex2 ай бұрын
Voice is mashed together in Elevenlabs, a bit complicated process involving training a voice from different voice sources
@global.pradachan2 ай бұрын
why she sounds like sneaky 😁😁🤣🤣
@nexuslux2 ай бұрын
Nice. I like the self-correction.
@pepediedrich56092 ай бұрын
nice
@sebastiangonzales462 ай бұрын
I'll try to use this and customize it for our Undergrad Thesis, is it okay?
@Linguflex2 ай бұрын
Yes sure. It's MIT license so you can use it for whatever you like.
@TorianTammas3 ай бұрын
very nice!
@Miyauti3 ай бұрын
This looks really promising, i will try to test it on my programs! Thanks for the work my dude!
@karimjedda3 ай бұрын
Absolutely amazing, great work!!
@sujaldarode16494 ай бұрын
u got github ?
@Linguflex4 ай бұрын
Yes, github.com/KoljaB. No code for this one up yet tho, bcs it's too early work state
@Plashley5Ай бұрын
What about now? Is the github for this available? @@Linguflex
@AustinKang-wk8clАй бұрын
@@Linguflex Hi, is there still github yet for this?
@LinguflexАй бұрын
@@AustinKang-wk8cl I checked in the code for memory in Linguflex github. It's not perfect, but it's a start...
@JosuéHenriqueBeckerSchwartzhau4 ай бұрын
Can I run it in a raspberry pi 3 B?
@lokeshart33404 ай бұрын
Does it need gpu cause I have a i3 only
@Tigas4ever5 ай бұрын
Can you help me? :( TomlDecodeError: Reserved escape sequence used (line 100 column 1 char 3696) Traceback: File "C:\Users\tiago\miniconda3\envs\MoneyPrinterTurbo\lib\site-packages\streamlit untime\scriptrunner\script_runner.py", line 584, in _run_script exec(code, module.__dict__) File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\webui\Main.py", line 34, in <module> from app.services import task as tm, llm, voice File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\services\task.py", line 8, in <module> from app.config import config File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\config\__init__.py", line 6, in <module> from app.config import config File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\config\config.py", line 42, in <module> _cfg = load_config() File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\config\config.py", line 30, in load_config _config_ = toml.loads(_cfg_content) File "C:\Users\tiago\miniconda3\envs\MoneyPrinterTurbo\lib\site-packages\toml\decoder.py", line 514, in loads raise TomlDecodeError(str(err), original, pos)
@Linguflex5 ай бұрын
Sorry, can't help. I have nothing to do with MoneyPrinterTurbo. This is a translation example as showcase for my TurnVoice GitHub project.
@AbishekAjaiSatnur-v4x4 ай бұрын
You gotta put the pexels API key in quotes like this - """ pexels_api_keys = [ "nE113EOVlRVbpWvRE0yZFuy6KmM9WAqvelyadayadayada",] """.
@nairdrive48255 ай бұрын
Action latency is yet to be improved , amazing project❤
@allfather_ogre5 ай бұрын
Great work..do you have any ideas to reduce latency in text to speech..im working on it..