Latest in AI TTS and StyleTTS WebUI Updates

Рет қаралды 6,093

Күн бұрын

Пікірлер

@x0vg5hs1 5 ай бұрын

"I don't want to train my customized voice" I want a cool and reliable male voice to read my textbooks or novels. Preferably running on local lan and accesible via TTS server for android.

@davidtindell950 3 ай бұрын

New Subscriber. Thank You for TTS Tutorial Vids!

@swannschilling474 5 ай бұрын

So nice to have you back! 😊

@gab9847 5 ай бұрын

Wow, this EmoCTRL is just what I need

@megamayo2500 5 ай бұрын

To be fair, Bark did the whole emotion TTS a long time ago. I still consider Bark the best for emotion TTS. The problem is that no one uses Bark. There's so much potential there. I think the issue is the transformer that it was built around. It gives out random results frequently. This should be considered good progress. As, an unexpected emotion is better than a predictable emotion.

@NoidoDev 5 ай бұрын

Suno is built on Bark. At least I remember having that read.

@mactheo2574 5 ай бұрын

Along with the random results that does not follow text, we can't control Bark's generation either. EmoCtrlTTS is absolutely amazing, it's like having controlnet (for stable diffusion) but for voice generation. Being able to generate laughter and emotion in a controlled manner is crazy.

@NoidoDev 5 ай бұрын

@@mactheo2574 Thanks, I'll look into it. Didn't have time to watch the video yet.

@mactheo2574 5 ай бұрын

@@NoidoDev Unfortunately EmoCtrlTTS is made by microsoft and closed source without any plan of releasing. The vid mentioned that. But I'm sure there will be open source alternatives in a couple of years at most.

@Jarods_Journey 5 ай бұрын

Bark was great, different use case though for sure. It was quite unstable and the quality wasn't great in many cases unfortunately. Something like emo-ctrl TTS I thing will be mostly utilized for redubbing as opposed to TTS, the ability to do this would be great

@agenticmark 5 ай бұрын

Vall-E wont be released, so it cant be verified and we will never be able to use it.

@Jarods_Journey 5 ай бұрын

The original system, true, I do believe the reproductions are faithful to the paper though so we'll have to see

@gregorymccollum9107 5 ай бұрын

Thanks for keeping us updated on TTS software. Nicely done. 😀

@rplgrime8006 5 ай бұрын

Thanks for the update!

@salmon_enjoyer 5 ай бұрын

Could you make an u video on how to use Luna Hook instead of textractor for visual novels?

@donmarshal2070 5 ай бұрын

Can you convert Ai Voice into Sapi5 so you can make it default windows voice?? If you have procedure, let me know 🤗

@Jarods_Journey 5 ай бұрын

Unfortunately, no that I'm aware of

@sownheard 5 ай бұрын

:D yeah i love to learn more

@andreaaaaaaa574 5 ай бұрын

Does anyone know of a FREE voice trainer that still works? MANGIO do not work anymore

@manymen314 5 ай бұрын

Totally unrelated to your video :D but, i've trained a model with finetune XTTS, and i am happy with the results it mimics how the orginal speaks. now i've been trying to use RVC over the audio generated by XTTS to change the voice but it keep chaning the accent and how words are pronounced. am i doing something wrong? i just want to change the voice is RVC the wrong thing to use?

@agenticmark 5 ай бұрын

you need to fine tune your rvc model on the wav files first i also use xtts for realistic results xtts -> rvc -> out

@manymen314 5 ай бұрын

@@agenticmark the issue is my rvc model voice is different than the voice generated by XTTS i am currently using models i've downloaded from voicemodel

@Jarods_Journey 5 ай бұрын

Well, you need to use the same wav files you used to train xtts to train an RVC model. If you just download an RVC model from online and try converting it, your going to lose many aspects of the original xtts voice

@Cocina_animal 5 ай бұрын

What do you say about waveglow and tacotron2?

@Jarods_Journey 5 ай бұрын

Well, those are old papers. A lot of newer stuff uses those as a reference and have higher quality output than those

@schmutz06 5 ай бұрын

I found the woosh effects extremely distracting. Thank you for the video, I recommend removing those transition sound effects!

@Jarods_Journey 5 ай бұрын

Glad you noticed this, I rewatched and I also found it quite distracting, which is something I usually leave out 😅. Feedback accepted :)!

@schmutz06 5 ай бұрын

@@Jarods_Journey yep no hard feelings at all. Your video is very helpful, I've been playing with L3.1 8b locally and I've got my own mic > LLM > TTS setup with mid-sentence interruptions etc working and I have it 'commentating' on a breakout style game based on evaluating in-game performance and producing original feedback to the player - it's so much fun and I've barely started toying with it. I'm not a coder, but the fundamentals make sense to me have been able to make anything I want using Claude 3.5 sonnet and GPT 4o as 'copilots' - I think it's removed huge layers of 1. having to know the right syntax, and 2. knowing 'art of the possible' ; these two barriers stopped me ever wanting to dig into coding prior to LLMs. I'm really having fun with it. I've been using gTTS, a local TTS (which uses the basic microsoft voices) and elevenlabs. I'm looking all over for the most cutting edge and performant local TTS options. Running a 3080 Ti and will absolutely grab whatever Nvidia GPU comes next (5xxx) because the prospect of locally running accelerated and performant 70b and better type models and doing all sorts of stuff is the most exciting thing in a long time. With your video, I think I wanted to cleanly hear many samples with 'silence' in between. Silence so I have some seconds to process and reflect, I guess. The woosh just plugged that gap and made it feel congested! I'm also mindful that complaining about 'free' youtube content, in particular where you've clearly made this to HELP, is a sensitive game... but in the funniest way we are so spoilt for choice and resources these days, comes with that some strange sense of 'entitlement' to flag issues. No hard feelings again! absolutely grateful for this contribution and pulling together the video.