Linguflex with custom wakewords
0:56
Voice changed with TurnVoice
0:41
Exchange YT video voices.
0:34
Жыл бұрын
Пікірлер
@phamviettien6102
@phamviettien6102 4 сағат бұрын
can you share the weight epoch_2nd_0036.pth ? thank you. I hope you can share with me. i don't have many resources to train again
@jayr7741
@jayr7741 2 күн бұрын
Does it supports multiple languages?
@Linguflex
@Linguflex 2 күн бұрын
Nope, the dataset I used to train the classification model only has English sentences.
@jijainth
@jijainth 4 күн бұрын
hey can you make this a public repo , I would like to test it out , thanks.
@Linguflex
@Linguflex 4 күн бұрын
Repo is here: github.com/KoljaB/WhoSpeaks
@42ndMoose
@42ndMoose 4 күн бұрын
0:38 regarding the mention of DistilBERT and how it omitted a chunk due to it, will there potentially be a way where we could input new words for 0-shot finetuning? 2:25 i noticed that the model decided to omit the word "like". this is very good for voice agents, but there are some use cases where it is necessary to include certain misspoken words (needed for an accurate transcript or additional nuance). so i suggest a threshold slider, perhaps? will it ever be able to focus on 1 speaker, if another speaker talks over the 1st one? or can it take account of the 2nd speaker at the same time? or will it only assume 1 speaker at all times?
@Linguflex
@Linguflex 4 күн бұрын
Good observations. At 0:38 it seems like the sentence got detected too quickly. I needs some deeper analysis for those edge cases to figure out why the algorithm cuts into the sentence. You can fine-tune Whisper to handle new words and then use the fine-tuned models directly with RealtimeSTT: huggingface.co/blog/fine-tune-whisper For 2:25, this looks like a Whisper quirk. Whisper tends to remove filler words like "ah" or repetitions. It might improve if I tweak the initial_prompt behavior to use it only for the real-time model but removing it from the final transcription. Whisper tends to lose some accuracy when prompted, but in this case, it’s necessary for real-time processing. Unfortunately, there’s no simple parameter to make Whisper more accurate regarding filler words. If it’s critical it's needed switch to a different ASR model, there are some that deliver precise transcriptions even on filler words, though that brings its own set of pros and cons. Regarding speaker handling, Whisper doesn’t support speaker diarization natively. It always assumes a single speaker. Real-time speaker diarization is a whole other rabbit hole. Not impossible, but very complex to pull off.
@kanavwastaken
@kanavwastaken 4 күн бұрын
Uh, hasn't Edge-TTS been here for... years?
@Linguflex
@Linguflex 4 күн бұрын
You're right, the title's a bit misleading. Edge-TTS itself isn't new, but what's new is the real-time aspect. You can now use LLM outputs and get instant TTS results, or load massive texts and hear the output in just a second.
@daviruela2055
@daviruela2055 4 күн бұрын
FYI I am a huge fan of your work. Thank you!
@herofahimshahriargaming8288
@herofahimshahriargaming8288 8 күн бұрын
can you make a tutorial video on how to use this RealtimeTTS library?pleas?
@Linguflex
@Linguflex 8 күн бұрын
I know better docs and tutorials are needed. The thing is, I love building stuff, but I hate documenting. Plus, I’m a bit of a perfectionist, and making a proper tutorial that covers everything RealtimeTTS can do would take me weeks. :)
@wzct
@wzct 11 күн бұрын
amazing
@ivancain
@ivancain 12 күн бұрын
please dont stop your work. I am now following you very closely! RIght behind you chief! your work is amazing and overlays significantly with my own ai project. following you on github now.. please dont stop!
@daveacorn782
@daveacorn782 12 күн бұрын
How is the progress on this? Very interested to try this.
@Linguflex
@Linguflex 12 күн бұрын
Latency way better. Linguflex core is growing nicely. Home assistant module hasn’t seen much progress. Honestly, I could really use help (PRs welcome). Got way too many projects, so some stuff is a bit stuck in "proof of concept" mode for now.
@kritikusi-666
@kritikusi-666 12 күн бұрын
so erotic lol
@extensy
@extensy 13 күн бұрын
Thank you so much for adding styletts!
@vohiepthanh9692
@vohiepthanh9692 13 күн бұрын
it's great work, dude 🤟
@anarhi17
@anarhi17 14 күн бұрын
got this error at the end (while speaking): "WebSocket error: [WinError 10054] An existing connection was forcibly closed by the remote hostWebSocket error: [WinError 10054] An existing connection was forcibly closed by the remote host" any ideas? should I open the python file in another terminal while this one is running?
@Linguflex
@Linguflex 13 күн бұрын
This code is already outdated. Please use the main installation (pip install realtimestt) from the master branch. Then start the client with the `stt` cli command. If you still run into problems I suggest we discuss that on my discord in the RealtimeSTT section discord.com/invite/f556hqRjpv or per mail ([email protected]), happy to assist
@ashsingh2175
@ashsingh2175 14 күн бұрын
Bro ur voice sysnthesis is too fast? how did u speed up? For me its takes more than few sec to start
@Linguflex
@Linguflex 13 күн бұрын
Good question. It likely depends on the Edge service availability and your internet speed. My system is pretty fast, but some imports required for other engines might affect performance on slower CPUs.
@HehehehHeh-g1o
@HehehehHeh-g1o 16 күн бұрын
Could you show installation process as a video? I am very lost with the github haha. Thanks!
@Linguflex
@Linguflex 15 күн бұрын
Hey! I get why a video would help, but it's tricky for me since my system is already fully set up for AI. On my setup it’s just `pip install RealtimeSTT` because everything (Python, CUDA, etc.) is already installed. Also installation is very different for Linux and Windows. For example on Linux, Python is pre-installed and you can handle the installation process just via CLI. On Windows, you'd need to install Python, CUDA, and the toolkit software first. I suggest join my Discord at discord.com/invite/f556hqRjpv for help (we can voicechat there, also there's a RealtimeSTT section) or mail me at [email protected]!
@nmstoker
@nmstoker 17 күн бұрын
Great video, well explained, thank you!
@nikhilkaswan5412
@nikhilkaswan5412 19 күн бұрын
This video got you your new subscriber 😊
@nikhilkaswan5412
@nikhilkaswan5412 19 күн бұрын
Voice's are 👍 nice
@VodoXdz
@VodoXdz Ай бұрын
can you use this v assistant without a gpu like nvdia (i have a ryzen gpu ) i think you can't bcs of the high processed data that needed to be done fast btw this project get a solid 9/10
@Linguflex
@Linguflex 14 күн бұрын
I don't think it can run without an Nvidia GPU. The assistant relies on several models, like Whisper, Silero, and WebRTC VAD for ASR, as well as XTTS for TTS and some RVC models for TTS post-processing. These typically require CUDA for optimal performance, and I'm not sure if they’re fully compatible with ROCm for AMD GPUs.
@krosx278
@krosx278 Ай бұрын
wow amazing, how do you set audio index?
@Linguflex
@Linguflex Ай бұрын
Use input_device_index property of AudioToTextRecorder class (hope that's what you mean)
@krosx278
@krosx278 Ай бұрын
@@Linguflex it works!! thank you...btw I want to make a question answering ai base on your RealtimeSTT transcribed. I am open for suggestion...
@Linguflex
@Linguflex Ай бұрын
@@krosx278 Please look at advanced_talk.py or openai_voice_interface.py in RealtimeSTT test folder or at my LocalAIVoiceChat project
@krosx278
@krosx278 29 күн бұрын
@@Linguflex I want to use llama and fine-tune it... Is that possible?
@Linguflex
@Linguflex 29 күн бұрын
@@krosx278 Yes, of course
@SuperWorld007
@SuperWorld007 Ай бұрын
Can you please provide Code related to capture system audio and then transcribe it? You provided code for mic but for speaker, I can not find from the test folder.
@Linguflex
@Linguflex Ай бұрын
github.com/KoljaB/RealtimeSTT/blob/master/tests/realtimestt_test_stereomix.py This script records from system audio, you might need to change LOOPBACK_DEVICE_NAME to your stereomix device name
@skepziev2565
@skepziev2565 Ай бұрын
that is so fire. I'm striving to make something like this
@irem2719
@irem2719 Ай бұрын
STT server start command issued. Please wait a moment for it to initialize. Timeout while connecting to the server. Failed to connect to the server. 👄 ALSA lib pcm_dsnoop.c:567:(snd_pcm_dsnoop_open) unable to open slave ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2721:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm_dmix.c:1000:(snd_pcm_dmix_open) unable to open slave Cannot connect to server socket err = No such file or directory Cannot connect to server request channel jack server is not running or cannot be started JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock could you help with this issue?Im trying this on ubuntu
@Linguflex
@Linguflex Ай бұрын
Will try tomorrow on ubuntu
@Linguflex
@Linguflex Ай бұрын
Had similar issues on Ubuntu. I upgraded RealtimeSTT to v0.3.5 so with "pip install RealtimeSTT==0.3.5" these problems should be solved now.
@kritikusi-666
@kritikusi-666 Ай бұрын
I just tried this. Holly hell. This is some fast rendering. Thank you for sharing your repo. I tested with stereo mix with a YT video. It did not miss a beat.
@krosx278
@krosx278 Ай бұрын
how did you implement that? can you share your code?
@kritikusi-666
@kritikusi-666 Ай бұрын
@@krosx278 implement what exactly? to have the STT detect browser audio? For that, you need an audio mixer or virtual cable.
@krosx278
@krosx278 Ай бұрын
@@kritikusi-666 i already saw the latest code from the author its quiet i need... check for the update.
@theteacher3163
@theteacher3163 Ай бұрын
okay, how do I go through no nut november having this ai on my pc ?
@krozark
@krozark 2 ай бұрын
I’ll defenitly give it a try
@nmstoker
@nmstoker 2 ай бұрын
Really impressive results and as you say, this is how people often speak, so handling this well is really important!
@DeepChauhan93
@DeepChauhan93 2 ай бұрын
hi i am new to coding, want to know how to launch the interface where I can paste text
@Linguflex
@Linguflex 2 ай бұрын
Code is here: github.com/KoljaB/RealtimeTTS/blob/master/tests/pyqt6_speed_test_chinese.py
@user-le2zv6go3v
@user-le2zv6go3v 2 ай бұрын
wow.. really cool
@Lucia-sy7le
@Lucia-sy7le 2 ай бұрын
He's so creepy.
@CodingPuff
@CodingPuff 2 ай бұрын
Interesting! What kind of models are you using
@Linguflex
@Linguflex 2 ай бұрын
faster_whisper for speech to text, Silero VAD + webrtcvad for voice activity detection, llama 3.1 8b for sentence end verification
@parthpatwari3174
@parthpatwari3174 2 ай бұрын
@@Linguflex llama api?
@Linguflex
@Linguflex 2 ай бұрын
@@parthpatwari3174 LMStudio
@Pauliomat
@Pauliomat 2 ай бұрын
Thanks for working on these projects
@tiredbusinessdad
@tiredbusinessdad 2 ай бұрын
Awesome 👏💪
@TejaskDeshmukh
@TejaskDeshmukh 2 ай бұрын
how we can do it. is your code availble and open source . please guide
@Linguflex
@Linguflex 2 ай бұрын
work in progress: github.com/KoljaB/RealtimeSTT/blob/dev/tests/realtimestt_speechendpoint.py
@SaddamBinSyed
@SaddamBinSyed 2 ай бұрын
Thanks for the amazing work . Will it support automatic multi language detection as well? Pls advise
@lonligrinrdo4989
@lonligrinrdo4989 2 ай бұрын
​@@SaddamBinSyedWouldn't be hard to implement. faster_whisper is currently buggy when no language is set
@Lennert_hd
@Lennert_hd 2 ай бұрын
Wow, looks like a great tool and you've chosen a great text-to-speech voice! Where is it from?
@Linguflex
@Linguflex 2 ай бұрын
Voice is mashed together in Elevenlabs, a bit complicated process involving training a voice from different voice sources
@global.pradachan
@global.pradachan 2 ай бұрын
why she sounds like sneaky 😁😁🤣🤣
@nexuslux
@nexuslux 2 ай бұрын
Nice. I like the self-correction.
@pepediedrich5609
@pepediedrich5609 2 ай бұрын
nice
@sebastiangonzales46
@sebastiangonzales46 2 ай бұрын
I'll try to use this and customize it for our Undergrad Thesis, is it okay?
@Linguflex
@Linguflex 2 ай бұрын
Yes sure. It's MIT license so you can use it for whatever you like.
@TorianTammas
@TorianTammas 3 ай бұрын
very nice!
@Miyauti
@Miyauti 3 ай бұрын
This looks really promising, i will try to test it on my programs! Thanks for the work my dude!
@karimjedda
@karimjedda 3 ай бұрын
Absolutely amazing, great work!!
@sujaldarode1649
@sujaldarode1649 4 ай бұрын
u got github ?
@Linguflex
@Linguflex 4 ай бұрын
Yes, github.com/KoljaB. No code for this one up yet tho, bcs it's too early work state
@Plashley5
@Plashley5 Ай бұрын
What about now? Is the github for this available? ​@@Linguflex
@AustinKang-wk8cl
@AustinKang-wk8cl Ай бұрын
@@Linguflex Hi, is there still github yet for this?
@Linguflex
@Linguflex Ай бұрын
@@AustinKang-wk8cl I checked in the code for memory in Linguflex github. It's not perfect, but it's a start...
@JosuéHenriqueBeckerSchwartzhau
@JosuéHenriqueBeckerSchwartzhau 4 ай бұрын
Can I run it in a raspberry pi 3 B?
@lokeshart3340
@lokeshart3340 4 ай бұрын
Does it need gpu cause I have a i3 only
@Tigas4ever
@Tigas4ever 5 ай бұрын
Can you help me? :( TomlDecodeError: Reserved escape sequence used (line 100 column 1 char 3696) Traceback: File "C:\Users\tiago\miniconda3\envs\MoneyPrinterTurbo\lib\site-packages\streamlit untime\scriptrunner\script_runner.py", line 584, in _run_script exec(code, module.__dict__) File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\webui\Main.py", line 34, in <module> from app.services import task as tm, llm, voice File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\services\task.py", line 8, in <module> from app.config import config File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\config\__init__.py", line 6, in <module> from app.config import config File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\config\config.py", line 42, in <module> _cfg = load_config() File "C:\Users\tiago\Desktop\Try\MoneyPrinterTurbo\app\config\config.py", line 30, in load_config _config_ = toml.loads(_cfg_content) File "C:\Users\tiago\miniconda3\envs\MoneyPrinterTurbo\lib\site-packages\toml\decoder.py", line 514, in loads raise TomlDecodeError(str(err), original, pos)
@Linguflex
@Linguflex 5 ай бұрын
Sorry, can't help. I have nothing to do with MoneyPrinterTurbo. This is a translation example as showcase for my TurnVoice GitHub project.
@AbishekAjaiSatnur-v4x
@AbishekAjaiSatnur-v4x 4 ай бұрын
You gotta put the pexels API key in quotes like this - """ pexels_api_keys = [ "nE113EOVlRVbpWvRE0yZFuy6KmM9WAqvelyadayadayada",] """.
@nairdrive4825
@nairdrive4825 5 ай бұрын
Action latency is yet to be improved , amazing project❤
@allfather_ogre
@allfather_ogre 5 ай бұрын
Great work..do you have any ideas to reduce latency in text to speech..im working on it..
@MrMoralHazard
@MrMoralHazard 5 ай бұрын
Looking promising!