New AI Voice Cloning Project - StyleTTS2 Webui (in progress)

Рет қаралды 6,913

Күн бұрын

Пікірлер: 78

@X2ytCrystal 5 ай бұрын

YES!!! I was looking everywhere for StyleTTS2 Web-ui, that thing seems like the best TTS out there atm, but without webui its pure pain.

@alivephotography3712 5 ай бұрын

Great Jarod, StyleTTS seems more stable than Tortoise and XTTS. Really hope we see more multi-lang models in the near future.

@tylerchambliss8379 5 ай бұрын

Man I'm really excited for this. I've been looking for something to make audio books with that doesn't glitch like Tortoise does.

@Jarods_Journey 5 ай бұрын

It'll 100% be more consistent!

@tylerchambliss8379 5 ай бұрын

@@Jarods_Journey The only thing I'm worried about is emotion, intonation, rhythm. Tortoise and 11 labs do this well. I'm not so worried about accent because I can just fix that with RVC anyway but if the actual speech isn't good RVC won't matter

@afaha2214 5 ай бұрын

@@Jarods_Journey can you do an episode with RVC + StyleTTS2

@rplgrime8006 5 ай бұрын

How long does a book take you to convert? I have a decent workstation GPU and I feel like it would take a few days per book. Is there a good way to find Tortoise glitches other than listen to the whole thing and regenerate lines?

@tylerchambliss8379 5 ай бұрын

@@rplgrime8006 Yeah Jarrod actually has an audio book maker app that uses Tortoise. It can correct sentences but sadly it's not very accessible to the blind when it comes to that part. But assuming it takes you 5 seconds per sentence it could take you between double and triple the book's duration to generate it. I just assume it's going to take triple the duration of the book.

@Artholos 5 ай бұрын

🎉 Jarod is always making top content! This is great! 🎉

@Jarods_Journey 5 ай бұрын

Thank you thank you :D!

@motashaiye 5 ай бұрын

I trained a model using XTTS-Finetune-Webui It's 90% same as original voice, in tone and pacing and everything. There's still 5-10% robotic residue sometimes. I used 200 samples, 11 seconds max.

@Tourbillion9048 5 ай бұрын

Great work Jarod. I'm looking forward to cloning my voice locally.

@thenextension9160 3 ай бұрын

hey man thanks for this, I'm getting it going on my ubuntu box

@burekibeats7296 5 ай бұрын

Are you planning to replace TortoiseTTS in the audiobook maker with StyleTTS2 or XTTSv2?

@mikhailv4686 5 ай бұрын

Is it possible to create a pretrain for training voice models in different languages with a small dataset? Do I need to train Bert additionally to create a high-quality voice model?

@X2ytCrystal 5 ай бұрын

Do you have any plans to look into text generation projects? Regular Text-generation or Text2Text generation, would be interesting to see your take on them.

@spirobel2.0 5 ай бұрын

thanks for the video! The one trained on your own voice sounded great even for one epoch. How long was the snippet that you used for training? do you think half an hour could be enough or do you need more voice samples to train a model?

@Jarods_Journey 5 ай бұрын

Half an hour will work fine. Style doesn't seem to need many epochs for training, sometimes, longer trainings results in worse sounding models due to overfitting, but all of that will be in the training video once I get there ofc

@lanhoyc4435 5 ай бұрын

Great Jarod, I'm so greatful. Also, is there a way to create a full song with singing voice, giving that we have lyrics and a prefer melody to develop the song base on that. I'm looking into Juke box but it's still not really clear how to do it yet. Can you light up the way?

@xenn2996 5 ай бұрын

You may have to use another software

@xenn2996 5 ай бұрын

Try searching singing voice ai you might be able to find something on KZbin

@ZAVIN-ip5mj 5 ай бұрын

Please help me. I have this error: [Errno 2] No such file or directory. I've already seen you short video talk about this, but I have no space in between, I still have this error, can you help me, please

@myte1why 5 ай бұрын

there are 2 questions in my mind right now: 1- can it work on other languages. 2- will it be new back end for audio book maker? it sounds wery clear. jusy wonder if it keeps voice consistant like if you use same model and sample does it change voice or not? by the way good work man.

@Jarods_Journey 5 ай бұрын

1. Yes, but the process is quite convoluted. Will need time to figure it out 2. Definitely available as an alternative, reason being that it's about 10x faster than tortoise give or take. It barely changes, there is a "seed" value that you can change, but the way a model says a certain phrase or word pretty much stays the same so that's the only downside. Many little quirks that can be tweaked around with though...

@myte1why 5 ай бұрын

@@Jarods_Journey wow that is some thing to wait for 😁 Thanks for info

@hikmetemre6837 5 ай бұрын

Thank you, I appreciate it, Jarod! You have great content. I have a question for you: for a singing AI model, could I use version 3, or do I have to go with the first version of the RVC repository?

@Jarods_Journey 5 ай бұрын

There's not a viable text-to-sing model, this repository is text to speech (and v3 of what you asked as well). RVC is something completely different which does speech to speech conversion. You can use RVC for changing the vocals of a song to another voice. I pipeline them together, but you need so it goes TTS then to speech to speech. That's the only way they work together

@NezzConstantine 5 ай бұрын

Are you thinking of making a chatbot to speech thing where we can link our chat models with your speech TTS? If not, do you recommend any good programs to use for windows that I could use to link my chatbot to TTS to be able to have a vocal conversation with? Thank you for all of this. You helped me a lot this year getting into the voice cloning stuff and other things.

@Jarods_Journey 5 ай бұрын

I'd recommend you look up silly tavern. A youtuber like aitrepreneur goes over how you could get this all set-up. It doesn't work with my stuff, but it does support xtts and styletts as far as I know.

@to-mi1949 5 ай бұрын

I have a request! Could you maybe do a tutorial of making a python thingy that let's you have real time voice conversations with AI? Something like this: 1. You have a LLM of your choice, and you can instruct it to behave how you want using prompts. For example "Your name is Squidward and you are always grumpy, you give short responses and complain a lot" or things like that. 2. You have voice to text that listens to when you speak, then transcribe it and feed it to the LLM, that then generates a response. 3. The response is then converted into a voice output using tortoise or some other voice thingy where you can pick the voice you want. Would this be possible?

@Mika_Zuki7 3 ай бұрын

I got a question is it possible to do all these kind of things on android?

@svie6062 5 ай бұрын

styletts pretty good! I'm using it for production.

@Jarods_Journey 5 ай бұрын

Would agree :)!

@mohammadaliabbas3847 5 ай бұрын

But it have some high picth noise problem. And cuda out of memory issue when multiple request.. How are you using?

@Kalikakc-p7w 5 ай бұрын

I try to train StyleTTS but i can't how to make ODD.txt and i also already given colab demo training it also not work. How to train this StyleTTS can you make little video on it.

@hiddenfromyourview 5 ай бұрын

Any chance you could package this in to a docker container?

@andreaaaaaaa574 4 ай бұрын

Does anyone know of a FREE voice trainer that still works? MANGIO do not work anymore

@JT-qi6el 5 ай бұрын

how to train different languages?

@mallu_bot50 5 ай бұрын

Can you please explain and how to do with bark and openvoice 2. Because it's easy to install using pinokio ai browser

@hansmarder7007 5 ай бұрын

Great Video. I have a question about your Presentations (TTS/TTS2). Can i Clone a voice which speaks in French, and use this trained model speak a text in Portuguese?

@Jarods_Journey 5 ай бұрын

Eh, kind of. You could train a portugese model, then use the reference audio of the speaker in French, though, the results won't be convincing if we base it off of the zero-shot capabilities shown in the video.

@i6od 5 ай бұрын

I cant seem to install espeak for phonemizer, any links or direction would be helpful :)

@i6od 5 ай бұрын

nvm i figured it out

@HogwartsStudy 5 ай бұрын

now... is there a local install for Suno Ai?

@Jarods_Journey 5 ай бұрын

Unfortunately, not at the quality of Suno, will definetely make a video on that if one does get released though

@HogwartsStudy 5 ай бұрын

@@Jarods_Journey doubtful now that there is the lawsuit.....

@Tungdang-tv7sd 5 ай бұрын

Can I use Model .pth file independently on linux application? If so, please make a video tutorial about this. Thanks for your sharing

@Jarods_Journey 5 ай бұрын

Yes, styletts2 was designed for linux, so following their repository should be a simpler set-up than trying to get it working on windows.

@ph0enixph0enix65 5 ай бұрын

Will it work for german? I tried with like 50 hours of clear german voice dataset with Tortiose TTS but for the sake of my life I could not get to produce a german voice which does not sound like Shodan.

@Jarods_Journey 5 ай бұрын

StyleTTS is a bit more complicated than tortoise for training other languages, but yes, it looks to be possible. The author provides some guidance on how this might be done, but that is an area of exploration I have not yet gone down

@SyntheticVoices 5 ай бұрын

Yes it does even for fine-tuning.

@maskenhandler1648 4 ай бұрын

i want something so i can read webnovels any ideas guys? should i use a normal text to speech . or text to speach ai? i dont know... i want it to sound as realisticly as possible . with emotions and such

@FenrirRobu 5 ай бұрын

Isn't that why we already have sidharthrajaram for StyleTTS2 to be installable?

@Jarods_Journey 5 ай бұрын

Ah yes, I am aware of this repo. It works afaik, but getting to handle the code without abstracting it away allows me to understand what's happening, therefore, making any adjustments to the code a bit easier. I also am not sure if that repo allows for training, which, my repository will be able to support.

@FenrirRobu 5 ай бұрын

@@Jarods_Journey Well, as long as you are building on existing knowledge, Godspeed. While integrating StyleTTS I did feel like MIT-complatible gruut has some drawbacks and maybe needs benchmarking against the GPL licensed default. And to my surprise these repos change surprisingly little so I will probably stop trying to make my forks 'git-level compatible' with their roots, with one exception - RVC.

@miyrrecs3024 5 ай бұрын

I hope that in the future you can overcome some complexes and errors that still appear in the previous upwork. As for this one, it sounds optimized like the Xbox Series S 📦 to me

@GentlemanCave91Vn 5 ай бұрын

I want to integrate it into my Web side running on linux, how do I do that?

@Jarods_Journey 5 ай бұрын

You'll need to dig into the StyleTTS 2 code in order to implement this. Style is much easier to install on linux as that's where it was designed for and in, so the instructions on their repo should guide you a bit there.

@GentlemanCave91Vn 5 ай бұрын

@@Jarods_Journey Please give more clear instructions

@3k3k3 5 ай бұрын

What are the chances this will work with foreign languages? And as always, thanks!

@Jarods_Journey 5 ай бұрын

It'll work, but the process is different (need to look more into it)

@juanjesusligero391 5 ай бұрын

Amazing job! :D Will it work also for Spanish, Japanese and other languages?

@Jarods_Journey 5 ай бұрын

It does, but the process for doing so is quite a bit more involved than multilingual training in something like tortoise.

@juanjesusligero391 5 ай бұрын

@@Jarods_Journey Thanks for your answer! ^^ So, for now, you recommend that if I want to clone my voice in Spanish, I should use Tortoise? How many minutes (for my dataset) and time/epochs of training would be needed for a reasonably good result?

@dohyunee 5 ай бұрын

Faster Jarod!!! xD

@przemeksluzynski 5 ай бұрын

Great!

@sidarth404 5 ай бұрын

create training tab please

@Jarods_Journey 5 ай бұрын

maybe

@SyntheticVoices 5 ай бұрын

This

@actepukc 5 ай бұрын

Damn i've just found that TTS few days ago and I've made Qt GUI but it doesn't have the ability to use pre-trained voices or train them at all I will post it later today in ther discord if anyone want to take it and use it - feel free to take the code and would be great someone who know how those things work to have to make the training to run on Windows as well. The training is hard to make it run if at all on windows machine without WSL(Windows Subsystem for Linux) The GUI that I've made can use different texts and combine them with different emotions - if it can use at least for now one voice only but as StyleTTS2 voices can vary if you use more than needed embedding it can sound as few different voices :) And would be the a free alternative of eleven labs and those other paid services for TTS :P

@anoopchhina7235 5 ай бұрын

hey jarod unrelated note can the ai voice changer client shown in this vid kzbin.info/www/bejne/f4TChIOHi9ton7ssi=V36ObSR_MgU-vzMO able to work with games like valorant?

@kabylebro 5 ай бұрын

Bro sorry but i cannot understand there is so much ai models. What is the best for: TTS REALTIME VOICE CLONING VOICE CLONING ( Im a beginner, and i have a good pc but my net is mid... And i can't install all the models to try them so i wish u can help me cuz i need it for this 3 topics ) Thanks man i appreciate your work ❤

@supersonicunitedsupersonic8531 5 ай бұрын

and can we teach it to TTS in Russian?

@Jarods_Journey 5 ай бұрын

Yes, but it's not easy

@agenticmark 5 ай бұрын

ive yet to find a model better than tortoise, you have to choose between speed an quality. nothing new there. StyleTTS takes more vram than the other models too.

@Jarods_Journey 5 ай бұрын

Xtts/tortoise is the best the open source community has afaik. Style is actually more lenient on inference than the both of those though, just not training. You can actually inference decently fast on CPU