A Tip on Training Better Voice Models in Tortoise TTS

Рет қаралды 16,331

Күн бұрын

Links referenced in the video:
Tortoise Installation - • Local AI Voice Cloning...
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/430bIhy
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds to my PC:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and PC recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
Come join The Learning Journey!
Discord - / discord
Github - github.com/Jar...
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoff...

Пікірлер: 60

@Mistercapi0 10 ай бұрын

I did exactly the same thing a few days ago and can confirm that re-whispering samples + smart padding at the end fixes the cutoff you are experiencing. Play around with it, I found that 0.2s was great on my data. (All depends of how quickly speaker transitions to next sentences and takes a breath)

@Dj-vt5gr 6 ай бұрын

Any chance you could release your "smart padding" code to github? THANK YOU!

@jimb0z93 10 ай бұрын

saw this update release today on twitter, i was expecting a video from the A.I sound master - thanks for the video, good work!

@Mowgi 10 ай бұрын

Sorry if I'm premature with this comment as I'm only partially finished the video, but there are several programs designed to automatically remove breaths from vocal recordings. Or even a simple noise gate would help.

@Jarods_Journey 10 ай бұрын

If you have any, I would love to take a look. I just don't want it removing breaths from the middle of sentences, only at the ends which is the perhaps the issue here (which is why a noise gate wouldn't work)

@matthewfuller9760 10 ай бұрын

@@Jarods_Journey I am not sure about this particular use case, however, both amd and nvidia have background filtering software that filters voices in real time using the gpu for free.

@randomyoutuber1078 10 ай бұрын

I have been looking into forced alignment & voice activity detection. I think its the key to fixing this problem. I have been trying to use it to process the training data with some success. But im not good at coding and haven't been able to test very many of the different methods that are out there.

@Mowgi 10 ай бұрын

Oh and shout out for the experimentation on accents🙏 thanks for your input on the discord

@Jarods_Journey 10 ай бұрын

Ofc :), thanks for you're input as well!

@M4rt1nX 10 ай бұрын

We love the breathing. (Joke aside) I was watching a movie and got irritated because the actors portraying robots were breathing.

@Jarods_Journey 10 ай бұрын

Maybe they're breathing too 😅

@joshuashepherd7189 10 ай бұрын

Lmfao the guy sounds like he's spitting on us 9:01

@smackdown2479 3 ай бұрын

thank you for what you are doing, but when you do video think about noobs like me, try to make movements in camera slow and explain for begainers more, thanks again for your hard work apreciate it .

@MrAlsBundy 10 ай бұрын

I would to advice to take not the srt, I would take the json from whisperx, the alignments are much better and you can add offset and endsets. And don't forget to take a look on numbers, because the alignment-model can not handle numbers.

@Jarods_Journey 10 ай бұрын

I'll look into it, thanks for the advice!

@corbinangelo3359 5 ай бұрын

Up till now, I still have my old n pathetic 1070 with 8GB vram. 😭 It took me 2 days to fiddle with the settings to get my machine to successfully train a voice model. 🤣 I'm really tempted to just go and get a 4060 super with 16GB but think ill wait till computex in June.

@SiddharthTripathi365 10 ай бұрын

Hi Jarods, i am big fan of yours! Can you please create a demo to create a TTS + RVC pipeline for Hindi?

@MrDanINSANE 10 ай бұрын

Since you are the KING of audio clone AI (I'm a fan for a while) Maybe you can help me find the most up-to-date LOCAL clone audio that supports also Hebrew Language? I've tried So-Vits-Svc and RVC a long time ago, RVC can't run on my machine because I have only GPU with 4GB but So-Vits-Svc works... training is HELL lot of time even on a google drive cloud based. Anyhow, is there a NEW / BETTER way that you can direct me that will support Hebrew language? I hear good news about RVC v2 maybe - But training Locally is the real question... unless there are other new AI for voice clone which are better than RVC v2 of course. Thanks ahead, keep up the good work 💙

@Jarods_Journey 10 ай бұрын

RVC is better in my experience and should actually work on 4gb of vram, though I'm not too sure in this case. So-vits is good, but if you can try and get RVC running, it will probably be better. As for TTS, I'm not sure any that support the hebrew language atm, maybe facebooks seamless m4t

@MrDanINSANE 10 ай бұрын

@@Jarods_Journey Thank you! I will give RVC v2 a chance, people are very happy with it compare to So-Vits :)

@TheBibliographerSociety Ай бұрын

6:08 I've run into the same issue with XTTS-Finetune-WebUI, Whisper cuts the ends too short.

@euphemisticukulele67 7 ай бұрын

jeremy clarkson?

@ero1.097 10 ай бұрын

⁉️How can I keep the training of a Dataset that has ending in 300 epochs? Do I need put new audios in the Dataset folder together with the old audios and after go in the configs and put more epochs? EG. If it has stopped in 300, and i want keep training with this new audios, do i need now put 400 epochs to keep the training with the new dataset? and always that i want expanding my dataset training, i only need do it again angain increased in all new traning the number of the epoch?

@bernardthongvanh5613 10 ай бұрын

to do voice cloning I add 3 voice clip and use them with the --voice argument when using do_tts, but each time it produces 3 slightly different voice, is it not possible to freeze the behavior to always get the same voice? the problem is that If I want to read a text I'll need to make several generations and it's impossible to get the exact same voice for the multiple generations

@Mehdi0montahw 10 ай бұрын

How do I change the language in Tortoise TTS and make it speak foreign languages?How do I add a special field to change the language I trained, Arabic, for example?

@Jarods_Journey 10 ай бұрын

You would need a custom tokenizer for training in Arabic in this case

@satyajitroutray282 10 ай бұрын

Few months back..i trained some models using mrq repo.. the problem i faced was with the generation..during testing when i input a small paragraph and check...these models ignore some sentences in the middle..or sometimes some words in the end of from the start of those sentences.

@Jarods_Journey 10 ай бұрын

I've found tortoise best if you give it smaller sentences or split paragraphs in sentences that are on their own line. Then also, it really depends on how well the model adapted to your training data too.

@king-zu3ih 5 ай бұрын

I am new with Tortoise . can i use model train with Tortoise on RVC or any way to convert to RVC model format

@JackpotFriends 4 ай бұрын

i have like 8 2 hour live streams of myself i wanna use for training, is that overkill? can i just plug them in with whisperx & train off the whole sample? suggestions?

@stevewarby12 2 ай бұрын

Hi. My train tab doesn’t show any training g files. Where do I get them please

@farizseptiananda7756 10 ай бұрын

i'm interest to use tortoise and i have done do basic generate with tortoise tts. but i have question, how pause and resume training ? because in my place, sometime power going down for few hour.

@mr-s23 4 ай бұрын

Can YOU share the Whisperx you are using?

@zonas7915 8 ай бұрын

I see that you have an audio combiner script but it's not in the repo

@DM-dy6vn 5 ай бұрын

1:16 "Ch" sound is missing in "lunch"

@shoaibvanu5194 7 ай бұрын

I am trying to train it for indian English accent can you guide me on this plss.

@fjccommish 4 ай бұрын

Why the bad background music?

@johnyoung4409 8 ай бұрын

I'm facing the exact same issue. I carefully split my input by audacity but the problem still exists. Very confusing...

@johnyoung4409 8 ай бұрын

OK, after some digging, I finally find out sufficient audio length is very important, even for voice cloning. In my failure contempt, I only got 4 minutes of audio, now I've increased it to one hour and I don't have that issue again. 10 min is also sufficient in some of my experiment.

@ASlaveToReason 8 ай бұрын

@@johnyoung4409 when you use a 1 hour audio file, what is the length you break up the 1hour file into?

@Vladimirytt 10 ай бұрын

can u make a tutorial on how to use rvc disconnected for colab?

@kabirchawla2652 9 ай бұрын

Is it better with bark?

@gonzalodijoux5953 9 ай бұрын

thanks for your video. is it possible to train french voice ? i have try to train french voice with your other video but the voice is in english. thanks

@janrappe 9 ай бұрын

You can also train non-english (french, german, spanish, etc) voices. But make sure to set the Text LR Ratio to 1 in the "Generate Configurations" tab as mentioned in the video. Otherwise the model would try to pronounce french sentences in english

@ahmetab06 9 ай бұрын

How much vram is required to run it? And which file do we need to run for the first setup?

@Jarods_Journey 9 ай бұрын

VRAM can be as low as 4gb I've heard from people. You might wanna check out my most recent video for an easy zero-code install: kzbin.info/www/bejne/pmSUcquVdpqJgaM

@ahmetab06 9 ай бұрын

I couldn't run it because I have a 4gb graphics card. Can you compare it with ElevenLabs? I couldn't find a video that explains it more clearly with clear examples.@@Jarods_Journey

@rickygrenadier6303 Ай бұрын

lmao jeremy clarkson

@deathxrost 10 ай бұрын

Im off topic here "Jarods" can do something on xxxtentacion voice in RVC 😅 i know there are many AI which can do easily but i was amazed by the RVC song of "Kurt kobain" Somebody If mostly feel like ❤😮real one want to do something with xxxtentacion too...😊

@Reaper_Plaz 5 ай бұрын

Bruhh voice changing feel weird in Mac

@jeffreysabino6176 10 ай бұрын

Which techno song is your background music ?

@Jarods_Journey 10 ай бұрын

kzbin.info/www/bejne/qZ3XkHWXq52hqbM

@srisir481 10 ай бұрын

Does it works only for english?

@MrWaffleToes 10 ай бұрын

what happened to the japanese learning videos

@jurandfantom 10 ай бұрын

Man I beg you, fix your voice synch ;_;

@benphillips2947 6 ай бұрын

Every time I see the bad sync on these videos I get suspicious that it's just them trying to be cute. "Surprise you were listening to TTS all along!" Yeah, we know.

@kodoqmc 10 ай бұрын

does tortoise support language translation?

@Jarods_Journey 10 ай бұрын

It does not

@DOHANEWSUPDATES 5 ай бұрын

Hi dear friend, your video are very useful. I have been trying to clone voice used in lonewwolf motivation videos. It's deep sound. Can you please guide me how to create my script in to this type of voices...my reference audio is like this kzbin.info0U5PIiACwFI?si=TJmhQQV8r5SDaCFG... Please help me to train this type of voices