YES!!! I was looking everywhere for StyleTTS2 Web-ui, that thing seems like the best TTS out there atm, but without webui its pure pain.
@alivephotography37125 ай бұрын
Great Jarod, StyleTTS seems more stable than Tortoise and XTTS. Really hope we see more multi-lang models in the near future.
@tylerchambliss83795 ай бұрын
Man I'm really excited for this. I've been looking for something to make audio books with that doesn't glitch like Tortoise does.
@Jarods_Journey5 ай бұрын
It'll 100% be more consistent!
@tylerchambliss83795 ай бұрын
@@Jarods_Journey The only thing I'm worried about is emotion, intonation, rhythm. Tortoise and 11 labs do this well. I'm not so worried about accent because I can just fix that with RVC anyway but if the actual speech isn't good RVC won't matter
@afaha22145 ай бұрын
@@Jarods_Journey can you do an episode with RVC + StyleTTS2
@rplgrime80065 ай бұрын
How long does a book take you to convert? I have a decent workstation GPU and I feel like it would take a few days per book. Is there a good way to find Tortoise glitches other than listen to the whole thing and regenerate lines?
@tylerchambliss83795 ай бұрын
@@rplgrime8006 Yeah Jarrod actually has an audio book maker app that uses Tortoise. It can correct sentences but sadly it's not very accessible to the blind when it comes to that part. But assuming it takes you 5 seconds per sentence it could take you between double and triple the book's duration to generate it. I just assume it's going to take triple the duration of the book.
@Artholos5 ай бұрын
🎉 Jarod is always making top content! This is great! 🎉
@Jarods_Journey5 ай бұрын
Thank you thank you :D!
@motashaiye5 ай бұрын
I trained a model using XTTS-Finetune-Webui It's 90% same as original voice, in tone and pacing and everything. There's still 5-10% robotic residue sometimes. I used 200 samples, 11 seconds max.
@Tourbillion90485 ай бұрын
Great work Jarod. I'm looking forward to cloning my voice locally.
@thenextension91603 ай бұрын
hey man thanks for this, I'm getting it going on my ubuntu box
@burekibeats72965 ай бұрын
Are you planning to replace TortoiseTTS in the audiobook maker with StyleTTS2 or XTTSv2?
@mikhailv46865 ай бұрын
Is it possible to create a pretrain for training voice models in different languages with a small dataset? Do I need to train Bert additionally to create a high-quality voice model?
@X2ytCrystal5 ай бұрын
Do you have any plans to look into text generation projects? Regular Text-generation or Text2Text generation, would be interesting to see your take on them.
@spirobel2.05 ай бұрын
thanks for the video! The one trained on your own voice sounded great even for one epoch. How long was the snippet that you used for training? do you think half an hour could be enough or do you need more voice samples to train a model?
@Jarods_Journey5 ай бұрын
Half an hour will work fine. Style doesn't seem to need many epochs for training, sometimes, longer trainings results in worse sounding models due to overfitting, but all of that will be in the training video once I get there ofc
@lanhoyc44355 ай бұрын
Great Jarod, I'm so greatful. Also, is there a way to create a full song with singing voice, giving that we have lyrics and a prefer melody to develop the song base on that. I'm looking into Juke box but it's still not really clear how to do it yet. Can you light up the way?
@xenn29965 ай бұрын
You may have to use another software
@xenn29965 ай бұрын
Try searching singing voice ai you might be able to find something on KZbin
@ZAVIN-ip5mj5 ай бұрын
Please help me. I have this error: [Errno 2] No such file or directory. I've already seen you short video talk about this, but I have no space in between, I still have this error, can you help me, please
@myte1why5 ай бұрын
there are 2 questions in my mind right now: 1- can it work on other languages. 2- will it be new back end for audio book maker? it sounds wery clear. jusy wonder if it keeps voice consistant like if you use same model and sample does it change voice or not? by the way good work man.
@Jarods_Journey5 ай бұрын
1. Yes, but the process is quite convoluted. Will need time to figure it out 2. Definitely available as an alternative, reason being that it's about 10x faster than tortoise give or take. It barely changes, there is a "seed" value that you can change, but the way a model says a certain phrase or word pretty much stays the same so that's the only downside. Many little quirks that can be tweaked around with though...
@myte1why5 ай бұрын
@@Jarods_Journey wow that is some thing to wait for 😁 Thanks for info
@hikmetemre68375 ай бұрын
Thank you, I appreciate it, Jarod! You have great content. I have a question for you: for a singing AI model, could I use version 3, or do I have to go with the first version of the RVC repository?
@Jarods_Journey5 ай бұрын
There's not a viable text-to-sing model, this repository is text to speech (and v3 of what you asked as well). RVC is something completely different which does speech to speech conversion. You can use RVC for changing the vocals of a song to another voice. I pipeline them together, but you need so it goes TTS then to speech to speech. That's the only way they work together
@NezzConstantine5 ай бұрын
Are you thinking of making a chatbot to speech thing where we can link our chat models with your speech TTS? If not, do you recommend any good programs to use for windows that I could use to link my chatbot to TTS to be able to have a vocal conversation with? Thank you for all of this. You helped me a lot this year getting into the voice cloning stuff and other things.
@Jarods_Journey5 ай бұрын
I'd recommend you look up silly tavern. A youtuber like aitrepreneur goes over how you could get this all set-up. It doesn't work with my stuff, but it does support xtts and styletts as far as I know.
@to-mi19495 ай бұрын
I have a request! Could you maybe do a tutorial of making a python thingy that let's you have real time voice conversations with AI? Something like this: 1. You have a LLM of your choice, and you can instruct it to behave how you want using prompts. For example "Your name is Squidward and you are always grumpy, you give short responses and complain a lot" or things like that. 2. You have voice to text that listens to when you speak, then transcribe it and feed it to the LLM, that then generates a response. 3. The response is then converted into a voice output using tortoise or some other voice thingy where you can pick the voice you want. Would this be possible?
@Mika_Zuki73 ай бұрын
I got a question is it possible to do all these kind of things on android?
@svie60625 ай бұрын
styletts pretty good! I'm using it for production.
@Jarods_Journey5 ай бұрын
Would agree :)!
@mohammadaliabbas38475 ай бұрын
But it have some high picth noise problem. And cuda out of memory issue when multiple request.. How are you using?
@Kalikakc-p7w5 ай бұрын
I try to train StyleTTS but i can't how to make ODD.txt and i also already given colab demo training it also not work. How to train this StyleTTS can you make little video on it.
@hiddenfromyourview5 ай бұрын
Any chance you could package this in to a docker container?
@andreaaaaaaa5744 ай бұрын
Does anyone know of a FREE voice trainer that still works? MANGIO do not work anymore
@JT-qi6el5 ай бұрын
how to train different languages?
@mallu_bot505 ай бұрын
Can you please explain and how to do with bark and openvoice 2. Because it's easy to install using pinokio ai browser
@hansmarder70075 ай бұрын
Great Video. I have a question about your Presentations (TTS/TTS2). Can i Clone a voice which speaks in French, and use this trained model speak a text in Portuguese?
@Jarods_Journey5 ай бұрын
Eh, kind of. You could train a portugese model, then use the reference audio of the speaker in French, though, the results won't be convincing if we base it off of the zero-shot capabilities shown in the video.
@i6od5 ай бұрын
I cant seem to install espeak for phonemizer, any links or direction would be helpful :)
@i6od5 ай бұрын
nvm i figured it out
@HogwartsStudy5 ай бұрын
now... is there a local install for Suno Ai?
@Jarods_Journey5 ай бұрын
Unfortunately, not at the quality of Suno, will definetely make a video on that if one does get released though
@HogwartsStudy5 ай бұрын
@@Jarods_Journey doubtful now that there is the lawsuit.....
@Tungdang-tv7sd5 ай бұрын
Can I use Model .pth file independently on linux application? If so, please make a video tutorial about this. Thanks for your sharing
@Jarods_Journey5 ай бұрын
Yes, styletts2 was designed for linux, so following their repository should be a simpler set-up than trying to get it working on windows.
@ph0enixph0enix655 ай бұрын
Will it work for german? I tried with like 50 hours of clear german voice dataset with Tortiose TTS but for the sake of my life I could not get to produce a german voice which does not sound like Shodan.
@Jarods_Journey5 ай бұрын
StyleTTS is a bit more complicated than tortoise for training other languages, but yes, it looks to be possible. The author provides some guidance on how this might be done, but that is an area of exploration I have not yet gone down
@SyntheticVoices5 ай бұрын
Yes it does even for fine-tuning.
@maskenhandler16484 ай бұрын
i want something so i can read webnovels any ideas guys? should i use a normal text to speech . or text to speach ai? i dont know... i want it to sound as realisticly as possible . with emotions and such
@FenrirRobu5 ай бұрын
Isn't that why we already have sidharthrajaram for StyleTTS2 to be installable?
@Jarods_Journey5 ай бұрын
Ah yes, I am aware of this repo. It works afaik, but getting to handle the code without abstracting it away allows me to understand what's happening, therefore, making any adjustments to the code a bit easier. I also am not sure if that repo allows for training, which, my repository will be able to support.
@FenrirRobu5 ай бұрын
@@Jarods_Journey Well, as long as you are building on existing knowledge, Godspeed. While integrating StyleTTS I did feel like MIT-complatible gruut has some drawbacks and maybe needs benchmarking against the GPL licensed default. And to my surprise these repos change surprisingly little so I will probably stop trying to make my forks 'git-level compatible' with their roots, with one exception - RVC.
@miyrrecs30245 ай бұрын
I hope that in the future you can overcome some complexes and errors that still appear in the previous upwork. As for this one, it sounds optimized like the Xbox Series S 📦 to me
@GentlemanCave91Vn5 ай бұрын
I want to integrate it into my Web side running on linux, how do I do that?
@Jarods_Journey5 ай бұрын
You'll need to dig into the StyleTTS 2 code in order to implement this. Style is much easier to install on linux as that's where it was designed for and in, so the instructions on their repo should guide you a bit there.
@GentlemanCave91Vn5 ай бұрын
@@Jarods_Journey Please give more clear instructions
@3k3k35 ай бұрын
What are the chances this will work with foreign languages? And as always, thanks!
@Jarods_Journey5 ай бұрын
It'll work, but the process is different (need to look more into it)
@juanjesusligero3915 ай бұрын
Amazing job! :D Will it work also for Spanish, Japanese and other languages?
@Jarods_Journey5 ай бұрын
It does, but the process for doing so is quite a bit more involved than multilingual training in something like tortoise.
@juanjesusligero3915 ай бұрын
@@Jarods_Journey Thanks for your answer! ^^ So, for now, you recommend that if I want to clone my voice in Spanish, I should use Tortoise? How many minutes (for my dataset) and time/epochs of training would be needed for a reasonably good result?
@dohyunee5 ай бұрын
Faster Jarod!!! xD
@przemeksluzynski5 ай бұрын
Great!
@sidarth4045 ай бұрын
create training tab please
@Jarods_Journey5 ай бұрын
maybe
@SyntheticVoices5 ай бұрын
This
@actepukc5 ай бұрын
Damn i've just found that TTS few days ago and I've made Qt GUI but it doesn't have the ability to use pre-trained voices or train them at all I will post it later today in ther discord if anyone want to take it and use it - feel free to take the code and would be great someone who know how those things work to have to make the training to run on Windows as well. The training is hard to make it run if at all on windows machine without WSL(Windows Subsystem for Linux) The GUI that I've made can use different texts and combine them with different emotions - if it can use at least for now one voice only but as StyleTTS2 voices can vary if you use more than needed embedding it can sound as few different voices :) And would be the a free alternative of eleven labs and those other paid services for TTS :P
@anoopchhina72355 ай бұрын
hey jarod unrelated note can the ai voice changer client shown in this vid kzbin.info/www/bejne/f4TChIOHi9ton7ssi=V36ObSR_MgU-vzMO able to work with games like valorant?
@kabylebro5 ай бұрын
Bro sorry but i cannot understand there is so much ai models. What is the best for: TTS REALTIME VOICE CLONING VOICE CLONING ( Im a beginner, and i have a good pc but my net is mid... And i can't install all the models to try them so i wish u can help me cuz i need it for this 3 topics ) Thanks man i appreciate your work ❤
@supersonicunitedsupersonic85315 ай бұрын
and can we teach it to TTS in Russian?
@Jarods_Journey5 ай бұрын
Yes, but it's not easy
@agenticmark5 ай бұрын
ive yet to find a model better than tortoise, you have to choose between speed an quality. nothing new there. StyleTTS takes more vram than the other models too.
@Jarods_Journey5 ай бұрын
Xtts/tortoise is the best the open source community has afaik. Style is actually more lenient on inference than the both of those though, just not training. You can actually inference decently fast on CPU