Realtime Local AI Chatbot Demo with GPT-SoVITS and Llama 3

  Рет қаралды 5,387

Jarods Journey

Jarods Journey

Күн бұрын

Пікірлер: 91
@mohammadaliabbas3847
@mohammadaliabbas3847 17 күн бұрын
Can you share code
@OnigoroshiZero
@OnigoroshiZero 16 күн бұрын
This is great! Currently running local models for everything is hard for most because of hardware limitations, and using APIs is not recommended for daily use unless you have a few thousand dollars to spend. But having an AI assistant on your pc will be the best thing along with real time game emulation in 2-3 years from now. Especially with agents that can essentially "live" on the desktop using virtual environments backed by Unreal Engine or similar (I've watched a proof of concept along those lines a few weeks ago, and it was very fun).
@Admlass
@Admlass 17 күн бұрын
Got my own implementation of realtime gpt-sovits with playback too (not the same approach as yours). It takes ~0.3s for 30s of audio on colab's T4 and ~1s to stream to my computer through a reverse proxy, so it can be very efficient once you optimize everything. This video makes me want to test it with the STT + LLM parts. We really live in the future.
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
Very interesting! So you made gpt sovits more efficient and improved the base speed of it? Any pointers to look at? There's still a lot of room for improvement I see, since your able to generate more AR tokens faster, that's ideal because of the context dependency of the vits decoding process so more accurate speech might be enabled if this is the case 🤔.
@Admlass
@Admlass 16 күн бұрын
@@Jarods_Journey Yes, I found the bottleneck was the vits model as it doesn't have a batched_decode method so the processing is sequential by default (there was an unsuccesful attempt commented in the TTS.py). I added my own method, there I padded the chunks as we got sequences of different lenghts and removed the padding in post-processing
@Jarods_Journey
@Jarods_Journey 16 күн бұрын
Hmm, my observation is a little bit different, the bottleneck of the process is the AR decoder, not vits in my testing so far. I'm wondering what our difference here are. I'll have to do some testing around and seeing if the AR part can be made faster Appreciate the input!
@davidc.2525
@davidc.2525 16 күн бұрын
wait those numbers are insane, are you willing to share your code? I am looking for something that can do ~0.3s for 3 s not 30s on a Macbook M1, do you think this is possible with your changes?
@davidc.2525
@davidc.2525 16 күн бұрын
If I understand correctly, your change speeds up the batch speed for longer text but does not affect the base speed for a single text.
@UltraK420
@UltraK420 17 күн бұрын
Some french guy was doing this exact thing on twitch over a year ago with gpt-4. The AI personalities were completely uncensored, and very naughty.
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
Well, the sky's the limit with uncensored models 😅
@4.0.4
@4.0.4 16 күн бұрын
I imagine he got banned? Haha.
@BackTiVi
@BackTiVi 16 күн бұрын
@@Tenkin42 Aitrepreneur I think
@UltraK420
@UltraK420 15 күн бұрын
@4.0.4 Nope.
@MyutiConx
@MyutiConx 2 күн бұрын
Hmm the name?
@cxzyKitsuki
@cxzyKitsuki 11 күн бұрын
So cool, I had been looking for self-host ai assistant so long. This is aswesome!
@VirtusRex48
@VirtusRex48 17 күн бұрын
Super cool, can't wait until you share!
@joshfokis
@joshfokis 17 күн бұрын
I have made a few projects like this, one I am currently working on still but the GPT-SoVITS sounds promising. I made a discord bot for my DnD group to ask questions to and it will speak back in response with the answer and in character. I really want to give this one a go. Would love to review this once you publish it, if you do. Great work on your projects. It is always nice to see new updates.
@jdcmsigma47
@jdcmsigma47 8 күн бұрын
awesome vid man!!
@tr1pod623
@tr1pod623 18 күн бұрын
It would be amazing if you could turn into a full on project. with maybe some UI of some sort? like imagine you could add in RAG and add in your personal files or create some sort of Long term memory (as RAG) which the LLM saves (with tools) or maybe i just need to find out how to use GPTsovitts with Silly Tavern. GPTsovitts is so awesome, i love the expression, laughs, and it feels super voice actor like.
@lokeshart3340
@lokeshart3340 17 күн бұрын
ITS amazing i supprt u
@4.0.4
@4.0.4 16 күн бұрын
I think all of what you're looking for and more exists in SillyTavern. Though, 99%+ of existing character cards on characterhub won't have the necessary live2D/VRM/etc thing for the character. Also voice isn't part of the card spec. (In other words, you'd have to set it up like you want yourself, but it's all there for you to do it)
@lokeshart3340
@lokeshart3340 16 күн бұрын
@4.0.4 is it free?
@KnutNukem
@KnutNukem 12 күн бұрын
Impressive, well done
@thenextension9160
@thenextension9160 17 күн бұрын
very cool tech demo. this is the future of content
@gr8tbigtreehugger
@gr8tbigtreehugger 17 күн бұрын
Very awesome! On my voice bot, am doing transcription in smaller chunks in parallel, so the final transcription is super short and then send the appended text to the LLM.
@cassusgames
@cassusgames 17 күн бұрын
Totally unrelated to the actual AI stuff, but did you choose the name Vivy as in Vivy: Fluorite Eye's Song? I enjoyed that anime quite a bit.
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
That is indeed the lore for why I named my original project Vivy and continue to do so :)!
@cassusgames
@cassusgames 17 күн бұрын
@@Jarods_Journey That is awesome and very fitting
@ShiroAisan
@ShiroAisan 15 күн бұрын
holy crap she can laugh and not some creepy robotic hahaa
@sinayagubi8805
@sinayagubi8805 17 күн бұрын
hey! can you speak some japanese with it? also can we somehow reproduce this? will you publish it on github?
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
Japanese - maybe. Publish? Ye, I'll be publishing it sometime in the ensr future
@moresignal
@moresignal 17 күн бұрын
Congratulations. I've been excited to see the results of your efforts and this looks brilliant. Is there any reason a similar approach would not work with F5TTS for low latency response?
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
Essentially, F5 doesn't iteratively produce tokens and it does it all in one big chunk. Unless that can be changed, it cant be used to stream audio
@moresignal
@moresignal 17 күн бұрын
Thanks. I was unaware of the difference. I noticed the socket server example breaks the request into 10 second blocks and processes them sequentially.
@Random_person_07
@Random_person_07 17 күн бұрын
It would be awesome if you could get emotion text data set or a large data set and fine tune the base GPT SoVITS model to do better with emotions
@cefcephatus
@cefcephatus 14 күн бұрын
I can't wait to see a sentient AI Vtuber from your project. Maybe a girl who could design her own physical body.
@fxy201
@fxy201 17 күн бұрын
That's so cool! Pls make tutorial 🙏
@yereem
@yereem 14 күн бұрын
Amazing, had one working fine with gpt 3.5 was using vtube studio for the model and all. tried moving to a local llama 3.1 and add some memory and everything started going downhill. Had to take a break because of classes, I will finally be able to get back at it this winter break hopefully I fix that and move it to a Raspberry Pi , finally got my end on one of those new AI hat I am so exited to test it, will put it on some screen with a camera somewhere in my place for no reason.
@yereem
@yereem 6 күн бұрын
-Update the AI hat is not really useful for llm, the llm can run locally on my pi but it’s kinda slow which makes sense considering the specs of a pi, i am in the process of making it an ai server that i will use with home assistant later.
@kurotesuta
@kurotesuta 11 күн бұрын
Built mine, but decided to try to make it run on a M2 Mac, the output audio doesnt seem to be completed proccesed by GPT-SoVITS, voice sounds like a demon
@Sajeas
@Sajeas 18 күн бұрын
That's cool. Seeing integration with SillyTavern with Mistral large API (it's free) and either GPT-SoVITS or F5 would be nice. I tried to train GPT-SoVITS, but got only .pth file and not ckpt file., so if you'll know or make GPT-SoVITS Trainer would be great.
@lichtundliebe999
@lichtundliebe999 17 күн бұрын
When I tested GPT-SoVITS in English, I still heared a Chinese accent. I used the creator's version and tried only with webui. Maybe there is still some fine tuning necessary.
@liberdelta
@liberdelta 16 күн бұрын
Do you think whisper is better for speech to text Japanese compared to enterprises solutions like amivoice?
@4.0.4
@4.0.4 16 күн бұрын
Whisper large-v3 does handle Japanese somewhat ok, but on a long video it hallucinates a lot. (Things like, "thanks for watching!")
@FrostDagger
@FrostDagger 12 күн бұрын
long term memory? how much vram? specs?
@Tumhishka
@Tumhishka 17 күн бұрын
Could you please provide the code? I recently started working on my thesis project (an AI assistant in glasses similar to Xreal (Android)). This code would help me a lot to study STT and to create an animated character.
@zikwin
@zikwin 18 күн бұрын
Wait, how do you make that animated character speak too?
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
Vtuber studio with doing digital cables
@Random_person_07
@Random_person_07 17 күн бұрын
This is really cool hopefully you release a web UI version or something like that
@MyutiConx
@MyutiConx 2 күн бұрын
Is rtx 3070 8gb possible jarods?
@FromTheWombTotheGrave
@FromTheWombTotheGrave 17 күн бұрын
Will u drop a Full course on making this?
@jimmyjam77
@jimmyjam77 17 күн бұрын
Wow this is cool!
@Tamrinschannel
@Tamrinschannel 17 күн бұрын
wow the respons are quick
@sdaassadd4721
@sdaassadd4721 18 күн бұрын
Hello Jarods, i watch some of your vids, and i want to know with one TTs ir better i can easy train on a 12GB VRAM Gpu voices to portuguese? So many options i getting confuse
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
All of them can technically do it, so are easier than others. If I were you, I'd start with F5TTS first as it's the simplest and 12 GB can hobble along and train on 1-2 hours of data to get a single speaker in the language
@sdaassadd4721
@sdaassadd4721 17 күн бұрын
@@Jarods_Journey any of your videos teach how to train? or have a link i can read? ty for recomendation, i will follow!
@Menober
@Menober 16 күн бұрын
Bro any chance TTS running on AMD GPU? ;(
@soraygoularssm8669
@soraygoularssm8669 17 күн бұрын
Please share the code for the realtime GPT-SoVITS
@NickyTuan
@NickyTuan 15 күн бұрын
How to make voice like that?
@piplupsuper0
@piplupsuper0 17 күн бұрын
Oh wow is there a release for this my man?
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
Sometimes in the future I'll be throwing the code up! It's very messy rn
@islam_era2035
@islam_era2035 10 күн бұрын
tolong berikan cara untuk mengintegrasikan AI ke dalam karakter vtuber
@kritikusi-666
@kritikusi-666 17 күн бұрын
how did you do this Vtuber thingy? Can you make a tutorial?
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
I hope to do a breakdown video in the future, but it's a lot of moving parts
@jimmyjam77
@jimmyjam77 17 күн бұрын
How to customize the voice?
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
You'd need to train a gpt-sovits model, a whole nother step of the process (or use different reference audio)
@rtyzxc
@rtyzxc 15 күн бұрын
I feel like you used a bit of Botan. Judging by the laugh.
@TheDailyMemesShow
@TheDailyMemesShow 17 күн бұрын
OMG, she's so cute!
@alpuhagame
@alpuhagame 16 күн бұрын
So is it real Kizuna AI? xD We're very close. Just need to teach her play video games and comment on it.
@lionight.custom5693
@lionight.custom5693 17 күн бұрын
How to make ai vtuber?
@ninbob4633
@ninbob4633 17 күн бұрын
damn we're gonna get clones of neuro-sama everywhere
@stilly5016
@stilly5016 16 күн бұрын
Please add animation like hand movement so make more realistic 😊
@TABandiTA
@TABandiTA 17 күн бұрын
feels like a budget Neuro-sama, but pretty cool regardless.
@mactheo2574
@mactheo2574 17 күн бұрын
Neuro can't laugh and pretty monotone. Evil is through an API (the TTS, I know he runs a small LLM locally while Neuro is also local TTS). Lame. Also it's so stupid how so many people look up to Vedal. He literally never teach anyone anything about Neuro while the majority of Neuro relies on OSS...
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
I think Vedal has done a fantastic job at building a character and personality, so much so, that the work and success is well deserved IMO. I understand his disposition of not wanting to teach the whole world how he does it because he's got a make a living somehow lol. If I wanted to be an AI vtuber, I'd probably be doing the same thing he's doing tbh
@mactheo2574
@mactheo2574 17 күн бұрын
@@Jarods_Journey You're very gracious and understanding. I still think Vedal should at least give some spot light to the people who worked hard to implement new tech and optimization, things that directly benefited Neuro. Right now, every Neuro "upgrade" updates has been "I made Neuro smarter", "Neuro now remembers more", "Neuro latency is improved", even the latest "Neuro can now directly play minecraft". All of these I can fairly accurate points to how these "upgrades" are made, such as longer context llama 3.1 and mistral models, flash attention, quantization techniques, KV cache, faster whisper and whisper turbo, the amazing minecraft project that Emergent Garden made public, etc.. Just a simple mention of those amazing work would suffice. It teaches the swarm that Vedal cannot exist in a vacuum and software dev is fundamentally collaborative. I've been a "fan" of Vedal more than 2 years, his behavior continues to be troubling to me.
@lokeshart3340
@lokeshart3340 17 күн бұрын
MAKE JARVIS with this clone jarvis voice and add tools in the llm.
@lokeshart3340
@lokeshart3340 17 күн бұрын
Can we run this in CPU??
@Jarods_Journey
@Jarods_Journey 17 күн бұрын
No CPU unfortunately
@futurediffusion
@futurediffusion 18 күн бұрын
Oh and the code ? jaja
@Neuro_Kitti
@Neuro_Kitti 18 күн бұрын
cool ill implement this in my goofy ai vtuber lol
@onlyyoucanstopevil9024
@onlyyoucanstopevil9024 17 күн бұрын
VEDAL NEED RIVAL 😊😊😊
@sadshed4585
@sadshed4585 17 күн бұрын
I have a similar concept for human faces on github
@sadshed4585
@sadshed4585 17 күн бұрын
yours looks smoother tho and runs faster
@R1L1.
@R1L1. 15 күн бұрын
neuro sama but worse. still pretty impressive tho
@IPutFishInAWashingMachine
@IPutFishInAWashingMachine 17 күн бұрын
Someone tell Just Rayen
This Video is AI Generated! SORA Review
16:41
Marques Brownlee
Рет қаралды 3,6 МЛН
AI Is Not Designed for You
8:29
No Boilerplate
Рет қаралды 214 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
How Strong Is Tape?
00:24
Stokes Twins
Рет қаралды 96 МЛН
My scorpion was taken away from me 😢
00:55
TyphoonFast 5
Рет қаралды 2,7 МЛН
Realtime Text-to-Speech with GPT-SoVITS
18:43
Jarods Journey
Рет қаралды 5 М.
Automate Your X Posts With ChatGPT
21:30
Paul Dragoo
Рет қаралды 1,7 М.
AI Learns to Play Dodgeball
10:59
AI Warehouse
Рет қаралды 1,1 МЛН
YouTube is now on EASY Mode (Anyone Can Blow Up in 2025)
12:08
Jason Lee
Рет қаралды 454 М.
Run your own AI (but private)
22:13
NetworkChuck
Рет қаралды 1,7 МЛН
10 AI Animation Tools You Won’t Believe are Free
16:02
Futurepedia
Рет қаралды 565 М.
Human-like ChatGPT Voice is SHOCKINGLY Good  & Audiobook Maker Updates
11:21
I Paid $200 for Sora. Is It Worth It?
12:44
Theoretically Media
Рет қаралды 71 М.
Open Source AI Audiobook Maker - Installation and Usage
36:26
Jarods Journey
Рет қаралды 6 М.