Local AI Voice Cloning with Tortoise TTS - 2024 Installation (Check LATEST update in description)

  Рет қаралды 114,460

Jarods Journey

Jarods Journey

Күн бұрын

Пікірлер
@Mowgi
@Mowgi Жыл бұрын
We're all very lucky to have someone dedicated to not only teaching us how to use these awesome technologies, but making it as simple and up to date as possible. Keep up the great work, we don't deserve you 🙌
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Thank you thank you 🙏🙏! Really much appreciate it and you're too kind 🥹
@PlaystationEu
@PlaystationEu Жыл бұрын
​@@Jarods_Journeythanks a lot for your work, it's really awesome 😊
@pc_boy5371
@pc_boy5371 Жыл бұрын
I agree with you a 100% love the channel
@brianlink391
@brianlink391 11 ай бұрын
Speak for yourself - I totally deserve him! 😉
@SirRubyRed
@SirRubyRed 9 ай бұрын
Is it not possible to download pretrained voices?
@haydar_kir
@haydar_kir 11 ай бұрын
The way ai tts companies charging people is ridiculous. I am glad there are people like you. Thank you.
@compositeur8455
@compositeur8455 11 ай бұрын
You need an Nvidia GPU to run this crap, so it's not much better
@1ajayc
@1ajayc 5 ай бұрын
@@compositeur8455 most people have this already - its the most popular GPU
@herculeholmes504
@herculeholmes504 2 ай бұрын
I'd be quite happy to pay for an offline TTS with good quality voices, but the commercial software creators only offer online options that come in two price tiers: Ultra-expensive for commercial use, or free for private use. Which sounds nice, but being old-fashioned I just can't and won't trust anything that is "free" and online; it's my data on someone else's computer.
@vixxcelacea2778
@vixxcelacea2778 Ай бұрын
@@compositeur8455 That's standard to run almost anything AI. AMD has yet to get up in that game, but they are starting too. It's mostly due to cuda usage. I bit the bullet and got one off a local auction site for a hundred bucks cheaper than otherwise, a 3060 with 12 GB ram as that's the entry level one for AI stuff and ram is important. Nvidia is over priced because it was allowed to corner the AI market Doing AI stuff isn't cheap yet, not for the companies or the user. It takes too much computer power and powerful GPU's. In the future that might change, but this stuff is still pretty new.
@shawn4990
@shawn4990 Жыл бұрын
After getting into AI and programs like Stable Diffusion over the last year, I had to learn some code with all that's required to get them to run properly. However, since I'm not a programmer, what ended up happening is I created more issues for myself, which took way too much time to google and fix my mistakes. Yes, I've learned a ton, but I've pulled nearly all of my hair out in the process. So, thank you for making this a code-free install. Saves me time and more hair-pulling. Again, thank you Jarod... your efforts are appreciated.
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Appreciate it! I know there are a lot of folks that are interested in AI but all of the code revolving around it and dependency managing... Is a hell scape. So, glad that my code free install can help others out there and it also makes sure the tutorial stays the same throughout time :)!
@2mShortFormCC
@2mShortFormCC 10 ай бұрын
GPT can code if you know what to ask for
@33rdframe
@33rdframe 7 ай бұрын
i am the 24th person to REALLY feel this message, lol. i never wanted to learn python 😂
@IOSALive
@IOSALive 8 ай бұрын
This made me so happy! I liked and subscribed!
@ShannonWare
@ShannonWare 8 ай бұрын
This is an amazing video. Not only has it gotten me started with voice cloning, it is an excellent summary of quick and dirty model training.
@BlueprintBro
@BlueprintBro Жыл бұрын
Thank you so much for always making up to date and accessible guides for everyone!
@tyc00n
@tyc00n Жыл бұрын
super awesome, I tried doing that recently and gave up. Really good idea including all the dependencies so the process becomes 1. Download 2. Extract 3. Run like everything else people download 😊
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Thanks! The key is using the python embeddable packages, though there are a lot of steps to getting a package up and running correctly😅
@black_dragon274
@black_dragon274 11 ай бұрын
@@Jarods_Journey Why isn't there a GUI interface for this? Does it have to be through a terminal or browser? It's so primitive!
@BeamboomCrash4
@BeamboomCrash4 11 ай бұрын
I'm so thankful for u making this video and for the community who makes these tools. I really want to change my video from silent type of video to more of a entertainment type videos but my main problem is my voice, I was born with bad voice and so I really need something like this for the voice of my video
@zanshibumi
@zanshibumi 2 ай бұрын
It works so perfectly well, and you made it so simple! This is amazing, thank you so much. Going to the support page right now.
@PNN_ParodyNewsNetwork
@PNN_ParodyNewsNetwork 4 ай бұрын
Thanks bro! thumbs up for this video
@DM-dy6vn
@DM-dy6vn 8 ай бұрын
5:12 As far as "Samples" are concerned, I noted that the "sample_batch_size" is implicitly set to 16 in the code. You can see it in the console when generating. Having "Samples" set to 16 means that there is one batch to process. If you set Samples=100, then 6 full batches will be processes + 4 samples in 7th batch. The time needed is nearly proportional to the number of batches. Having said that, it is not "exponential". The iterations behave close to square root. Quadrupling "iterations" would approx. double the processing time. A batch of samples will be placed in VRAM, and depending on the length of a text chunk, it could push your GPU to the limit as far as VRAM is concerned. Setting "Samples" to something lower than 16 will free VRAM, but potentially lower the quality, since less samples will be used. Do not feed it overly long sentences. Use "Line delimiter" to separate your sentences during processing. You should avoid GPU using "Shared GPU memory" (my RTX 3090 can do this), because by opting for the PC RAM the processing will become even slower (slow data swapping).
@nodewizard
@nodewizard Жыл бұрын
We have quantized LLMs and Turbo SDXL and LCM models. I think it's time for a turbo/quantized TTS in 2024. Thank you as always for your tutorials and updates.
@legend_of_ray
@legend_of_ray Жыл бұрын
I managed to find the original repo a little while back. Glad your your keeping it alive...thanks for this!
@Dalin_B
@Dalin_B 3 ай бұрын
Working with this now as I speak. Great job man. Really appreciate it
@MatthewJettHall
@MatthewJettHall 6 ай бұрын
OMG you rock!!! Thank you so much for putting this package together for us. It works amazing!!!! Thank you again!
@Nathanizer
@Nathanizer 11 ай бұрын
Thanks a lot ! I was trying stuff with Conda but all didn't work out as I expected. So followed your video, and with the own custom voices. It all works perfectly. Thanks :)
@Chriscs7
@Chriscs7 9 ай бұрын
11:56 - What model is better in the generate tab ? base, whisperX or something else? You need to explain what gives the most accurate cloning not only what is faster to train
@jonnysmith9328
@jonnysmith9328 9 ай бұрын
You're Awesome ! I love your videos. They make sense and easy to follow.
@MR.RECAPER
@MR.RECAPER Жыл бұрын
👌👌thanks, i have trying to install tortoice tts from your first video about it. but i always get error when installing pakages but this it was so easy and it actually worked.😊😊😊😊😊😊
@Samuel-wl4fw
@Samuel-wl4fw Жыл бұрын
Thanks a lot, have been struggling with dependencies, and have been following a few of your videos :)
@gregorymccollum9107
@gregorymccollum9107 10 ай бұрын
😁Saved me hours. Keep working!
@Jarods_Journey
@Jarods_Journey 10 ай бұрын
Thank you, appreciate it!
@cristianenriquevillarroelg4394
@cristianenriquevillarroelg4394 20 күн бұрын
Thank you very much for your hard work on this!
@lightning_dynamics
@lightning_dynamics 10 ай бұрын
thank you so much for putting this all together, I'm making an audiobook and this helps a lot !!!
@HotDrawingWithSugawara
@HotDrawingWithSugawara 4 ай бұрын
Thank you for making a real video with real data in it. The FOUR videos I tried before this one contained nothing of value.
@Jimbo116
@Jimbo116 3 ай бұрын
This is so cool, and that it is free is a big bonus. Thanks for the teaching..really good 🙂
@tea6310
@tea6310 9 ай бұрын
Hey, when I train my voice it keeps saying "ai-voice-cloning>pause" What do I do?
@maliketh9354
@maliketh9354 8 ай бұрын
Did you fix it yet?
@ObscureStuff420
@ObscureStuff420 2 күн бұрын
this other comment helped fix it. "I had the same error. How much Vram does your card have. Mine only has 8GB, lower all the setttings, like 100 epocs, batch 4, gradient 2 etc. It worked for me."
@UmakantMishra
@UmakantMishra 10 ай бұрын
Great package. I will install and explore it. Thank you for sharing your valuable knowledge and experience. Big Like.
@bwowzah
@bwowzah Жыл бұрын
Fantastic video! I greatly appreciate the hard work and dedication you put into what you do on this channel. You've helped me out immensely.
@dandman2798
@dandman2798 3 ай бұрын
Anyone know how to fix this? "RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous" I got it to work once fine, but now it gives this error when it gets to the end after clicking transcribe and process
@Jarods_Journey
@Jarods_Journey 3 ай бұрын
Delete any file that is less than 150 KB and see if that resolves your issue. Best of luck 👍 I've ran into this before but am unsure of what the fix is now
@actualawry
@actualawry 2 ай бұрын
i switched from openai/whisper to m-bain/whisperx and it solved the issue on my end
@rettbull9100
@rettbull9100 10 ай бұрын
My clone voice came out sounding horrible. I used same audio clips that I've used with RVC, which sounds really good. I used all the same setting and did like you said. Though for some reason my long clip was broken up into 0 to 4 sec clips. I made sure all my sets matched what you used. It original audio clip was 54 minutes long. Took over a day to train. edit: the graph lost-mel, green light, was almost at zero at the end of training. I trained it for 500 epochs.
@odesamusic
@odesamusic 2 ай бұрын
I click start bat but it just says "Press any key to continue..." when I press any key nothing happens
@kapteinkonyn3450
@kapteinkonyn3450 9 ай бұрын
When clicking on generate, I get: Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
@wakeup2.369
@wakeup2.369 2 ай бұрын
ERROR Error No RVC configuration found, check configs folder. If rvc.json does not exist, please change a setting in the RVC area to create one.
@rudydoody
@rudydoody 2 ай бұрын
same problem
@elumYT
@elumYT Жыл бұрын
4:05 For me, it says error. Can you help me ?
@KurtStaInes
@KurtStaInes Жыл бұрын
LMAO this program now became the Stable Diffusion of voice generation, I admit that it won't take that long for this to improve . Thanks for the fork looking forward for the documentation.
@Vlkn7
@Vlkn7 Жыл бұрын
Thank you for making videos on rvc and tortoise tts , i hope that one click pipeline comes soon
@syrcon
@syrcon 11 ай бұрын
Your videos are Awesome Jarod! You do such a good job explaining how to install and setup these repositories (even going the extra mile to fork them yourself to make them easier to work with)! Is it possible to fuse two voices together, or is it viable to train a model by combining two datasets from two different speakers?
@Jarods_Journey
@Jarods_Journey 11 ай бұрын
Appreciate it! For tortoise, I believe if you train on two voices, you get a mix or average between the two as this does occur when you use two different files as reference audio files. I actually haven't yet tried this for training so this may be a useful experiment to try.
@syrcon
@syrcon 11 ай бұрын
@@Jarods_Journey I'll have to try it out as well. I assumed that it would have negatively impacted the training of the model, but if it instead blends the two, then that would be really interesting.
@vidneypopples
@vidneypopples Ай бұрын
I'm a bit confused! Do I need to have tortoise set up and running when I use the audiobook maker? Do I have to do anything to link them if that is the case? Please advise.
@Jarods_Journey
@Jarods_Journey Ай бұрын
So the previous versions of the audiobook maker relied on having Tortoise TTS installed and running in the background in order to run. That is not the case with V3. The Audiobook V3 runs completely independently; it does not rely on having any other piece of software running in the background. However, it does use "engines" or "models" that can be trained or finetuned only in other projects at the moment. For example, if you wanted to use a custom trained Tortoise TTS model in the Audiobook maker, you would need to train up the model in Tortoise TTS, then move the .pth file outputted from tortoise into the audiobook maker in order to use it. There are many layers of complexity here so it can be a bit confusing and overwhelming if this is your first time with these tools, however, I am working also on a more integrated training environment so you don't have to independently install all of these other tools by themselves in order to customize or train models.
@MrTompkins
@MrTompkins Жыл бұрын
I get a file not found error when running the start.bat file, but the file does seem to be there! - FileNotFoundError: Could not find module 'D:\Games\vc untime\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax.
@pb2806
@pb2806 11 ай бұрын
Install CUDA with all its features. That's what I had to do to fix this
@schakuun1995
@schakuun1995 11 ай бұрын
Genuis!, great Tutorial thanks :)
@Jarods_Journey
@Jarods_Journey 11 ай бұрын
Appreciate it :)!
@heilalady9561
@heilalady9561 11 ай бұрын
Hi, thank you for the video. I'm totally new to the ai technology but this is really easy to understand. However, when I try to use a new voice, this error shows up: "CUDA out of memory. Tried to allocate 153.52 GiB. GPU 0 has a total capacty of 8.00 GiB of which 1.18 GiB is free. Of the allocated memory 5.35 GiB is allocated by PyTorch, and 196.16 MiB is reserved by PyTorch but unallocated...", , can someone help
@mahmoudahmed5460
@mahmoudahmed5460 11 ай бұрын
same here
@corbinangelo3359
@corbinangelo3359 8 ай бұрын
I also only have 8GB on my 1070. I had to lower all the setting to get it finally to train. epochs 100, batchsize 4, gradient 2, save freq 5, validation 1 With these settings it took a little less than an hour I think. Results weren't very good.
@joshuadelacruz3907
@joshuadelacruz3907 11 ай бұрын
Thanks, mate! This is such an awesome job!
@HaloRian
@HaloRian Жыл бұрын
Thanks for the work ! And the tutorial ! I have leave a subscripton to your channel ! Hope you are well and Start good into the New year!
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Thanks and you as well!
@Random_person_07
@Random_person_07 Жыл бұрын
Thanks so much for making this! it's awesome keep it up!
@audio.video.disco.
@audio.video.disco. 6 ай бұрын
Please, do a series only on how to install and use each of these TTS models i'm not a programmer and im having a really hard time, i think you would get a lot of views from these video tutorials.
@thisisashan
@thisisashan Жыл бұрын
So I followed step by step, and for some reason Tortoise TTS just pauses when I try to train... Seems to be related to the /usr/cuda files missing. Already a bug posted in the Git repo. Don't want to spam you but currently this tutorial doesn't work. missing /cuda/lib64 or whatnot error.
@samodzielny4504
@samodzielny4504 9 ай бұрын
Same problem
@corbinangelo3359
@corbinangelo3359 8 ай бұрын
I had the same error. How much Vram does your card have. Mine only has 8GB, lower all the setttings, like 100 epocs, batch 4, gradient 2 etc. It worked for me.
@thisisashan
@thisisashan 8 ай бұрын
@@corbinangelo3359 my issue was fixed on his git repo The current releases no longer have this issue, is my understanding. Also, if you don't want to lose quality, there are low memory flags you can use to do higher res pictures
@ObscureStuff420
@ObscureStuff420 2 күн бұрын
@@corbinangelo3359 thanks, seems to be working now
@sin_z1
@sin_z1 Жыл бұрын
Mssive Respect to you my dude. Really needed this
@kaziahmed
@kaziahmed 8 ай бұрын
Follow the steps, got this error: Something went wrong 'tuple' object has no attribute 'squeeze'
@cuccurese
@cuccurese 10 ай бұрын
I did everything you told in the video, after all, my audio speech has an American accent, but my audio is in Italian language. :D i spent so much time and training.
@prizegotti
@prizegotti 10 ай бұрын
It's not trained for Italian. Just American English and Japanese.
@cuccurese
@cuccurese 10 ай бұрын
@@prizegotti Thanks!!!!
@francsharma7276
@francsharma7276 7 ай бұрын
guys, If you are getting error of folder name of "voice". plz put voice sample in wav format only it will be resolved
@datorresramos
@datorresramos 11 ай бұрын
Nice video, super easy to understand how to install this Tortoise TTS, i have a question how can i access the webgui from another computer on the same network ?
@TweetykachuDenzelAbaya
@TweetykachuDenzelAbaya 2 ай бұрын
Lubos akong nagpapasalamat sa paggawa mo ng bidyo na ito at sa komunidad na gumagawa ng mga tool na ito. Gusto ko talagang baguhin ang aking bidyo mula sa tahimik na uri ng bidyo sa higit pang isang uri ng pang-libang na mga bidyo ngunit ang aking pangunahing problema ay ang aking boses, ako ay ipinanganak na may masamang boses at kaya kailangan ko ng ganito para sa boses ng aking bidyo.
@midnitejesus
@midnitejesus 7 ай бұрын
My model came out sounding nothing like it was trained on. I had 2300 super clean chopped samples for a character and realized my 3080 would take forever. I trained on 250 samples over 3 hours. The output was 7 models, from 60_gpt to 402_gpt. I tried them all and the voice is simply pitched too high and sounded nothing like the source files. I followed your instructions to the T. Any suggestions?
@spiffylich3349
@spiffylich3349 11 ай бұрын
Awesome Video! I'm a bit stuck, though- I have about a 45 minute clip of a character talking, and I've gone and processed it with UVR-5 and the audio-splitter project you linked, so I have a ton of smaller voice-line wav files. But when I try and train the model on them for ~200 epochs, the results I get from using the model are awful! its like around 50% of the words spoken by the generated audio are just noise, or the AI struggling very hard to speak a word. any tips for getting clearer audio? like, should I put my 45 minute video into the voice folder instead of the multiple clips?
@KeremYurtsevenOfficial
@KeremYurtsevenOfficial 8 ай бұрын
I already trained a voice model. So I only have a pth and an index file. How can I use those on TTS?
@corbinangelo3359
@corbinangelo3359 8 ай бұрын
I'm very curious about that too, If you figured out a way. please let me know.
@huyked
@huyked Жыл бұрын
I wish all the github stuff (I'm a newbie/non-programmer) was this simple. Lol. Thank you!
@Jarods_Journey
@Jarods_Journey Жыл бұрын
And that's why I wanna try and make it as hands off as possible :)! The learning curve sucks in the beginning, but it does get easier though the more you learn it for GitHub though!
@Minedeployder
@Minedeployder 11 ай бұрын
15:41 why for you it instantly show graphs and progressbar? I have a console and it it super slow. For first time i started train and it worked for like 30 minutes and i dont see any progress. When i press View losses it at least show me timer, but it seems to be infinite and by pressing ctrl+shift+esc i see that program dont do anything. Now i cant even proceed and here is i see in terminal: [Training] [2024-01-14T20:14:40.022488] [2024-01-14 20:14:40,022] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. And this is last message in Gradio 24-01-14 20:14:21.793 - INFO: Total epochs needed: 200 for iters 400 C:\Users\carbu\Desktop\ai-voice-cloning untime\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`. warnings.warn( 24-01-14 20:14:36.129 - INFO: Loading model for [./models/tortoise/autoregressive.pth] Any ideas?
@Chriscs7
@Chriscs7 9 ай бұрын
13:01 - should I click "Slice Segments" before "process and transcribe button' if my data set is 20 minutes long in a single .wav file
@carnacthemagnificent2498
@carnacthemagnificent2498 Жыл бұрын
I was really excited to try this because I have not been able to get deepspeed running on my machine, period. However when I run this I get the error about mismatched latents but it adds "The specified pointer resides on host memory and is not registered with any CUDA device" and recalculating latents doesn't make it go away, every time you generate it's back. I guess I'm stuck with old original slow Tortoise.
@Jarods_Journey
@Jarods_Journey Жыл бұрын
What's your Nvidia GPU? This error occurs on my machine if you don't wait for TTS to finish loading or you didn't re(load) TTS in the settings. It's specific to only when you have deepspeed enabled.
@SirChogyal
@SirChogyal 11 ай бұрын
I love this. But unlike other applications, why is this AI voice cloning messed up with large files?
@yaracorreia8209
@yaracorreia8209 Жыл бұрын
Thank You so much for all your content! Really Awesome
@al3x__0
@al3x__0 Жыл бұрын
reinstall thats what I did and it worked
@yaracorreia8209
@yaracorreia8209 Жыл бұрын
@@al3x__0 were you able to add new voices using the .PTH voice model? for tts?
@Jarods_Journey
@Jarods_Journey Жыл бұрын
You have to use tortoise models, and they would need to be placed in training. It would look something like this: training/name of folder/finetune/models/put the tortoise tts models here.
@jurandfantom
@jurandfantom Жыл бұрын
Just noticed that you synch your voice with video
@Skalekul
@Skalekul 6 ай бұрын
Do you have any idea why custom trained models don't work using hifigan, which produces the error 'tuple' object has no attribute 'device'
@csiguszfoxoup
@csiguszfoxoup Жыл бұрын
Thank you! Amazingly explained!
@BIGBIG.design
@BIGBIG.design 2 ай бұрын
The folders I create in ai-voice-cloning-3.0\voices\ are not visible from the dropdown no matter what I do. I tried adding mp3, WAV, with no success. Am I missing something?
@puntogcb
@puntogcb 10 ай бұрын
Hey Jarod! Just wanted to drop a quick note of appreciation for your content on AI. Your journey into the world of artificial intelligence is both fascinating and informative. Thanks for making complex topics so engaging and easy to understand. Keep rocking those AI insights! 🚀 By the way, any chance trainig Spanish LATAM voices in the future? That would be fantastic! How would it work? Muchas muchas gracias! Abrazo de Argentina!
@RobertSmith-kj6eb
@RobertSmith-kj6eb Жыл бұрын
Bro, I got this working real quick. It is amazing. I copied and pasted voices from a different tortoise-tts and it sounds great! Thanks for sharing!
@Samuel-wl4fw
@Samuel-wl4fw Жыл бұрын
Where do you find some available voices? I tried to look but couldn't find any
@leighenhenkelman8648
@leighenhenkelman8648 Жыл бұрын
I'm looking for voices too!@@Samuel-wl4fw
@pogiman
@pogiman 9 ай бұрын
it worked!! thanks man!!
@scedolin
@scedolin 11 ай бұрын
hello,perhaps I am a unlucky,but I have a error at the start of the instalation OSError: [WinError 126] Le module spécifié est introuvable. Error loading "C:\AI\ai-voice-cloning\venv\Lib\site-packages\torch\lib\torch_python.dll" or one of its dependencies.
@soundgif
@soundgif 9 ай бұрын
Hey, thanks for this awesome video. Question - how is the autoregressive model tuned without the VQ-VAE? Since CLVP and CVVP operate on the VQ codes produced by the autoregressive output, wouldn't this harm selection of the samples generated by the autoregressor? I understand that the downstream diffusion model (and presumably the hifigan) operate on the final latents produced by the autoregressive model (and not the codes), so in theory this could be used to tune the autoregressive model weights, but wouldn't it result in poor sample selection performance -- since the autoregressive mel code head can't be trained without the VQ-VAE? Also, just curious - why choose to train the autoregressive model without training the diffusion model (possibly in tandem)? Has any experimenting been done in this area?
@Jarods_Journey
@Jarods_Journey 9 ай бұрын
We do have the VQVAE, it's the dvae.pth model inside of the models folder. I'll give you the 2 blogs posts about this: 152334h.github.io/blog/tortoise-fine-tuned/ and 152334h.github.io/blog/tortoise-fine-tuning/ which are better explanations than I can give at the moment. As for training the diffusion model, I don't have a strong enough understanding yet on what finetuning would do for it, but as far as my understanding is with the AR model, we are training in new representations for the tokens in its vocabulary so that it can output appropriate mel tokens for whatever dataset you use.
@michaelmezher9635
@michaelmezher9635 9 ай бұрын
Wow! Wish I knew the VQVAE was available before! I'd think tuning the diffusion model may be useful for dramatically different voices from whats found in libritts, since theoretically the space of what can be represented in the diffused Mels is limited to these voice characteristics. This is especially true because the diffusion model is trained (fine tuned after autoregressive model convergence) on the autoregressive latents, not the Mel codes.
@Darkcrimsonfall
@Darkcrimsonfall 10 ай бұрын
13:35 14:39 This part is giving me errors and I followed all the steps. Pressing Refresh Dataset List does not show up the folder by the way I also save the folder name Me. I copy all the steps.
@maxzan1909
@maxzan1909 6 ай бұрын
I have a question, is it possible to use it as an API for automation of answers given by chatGPT and to process and read automatically the output audio ?
@hamsteralliance
@hamsteralliance Жыл бұрын
I haven't been able to find an answer to this, so I'm hoping you can help. What's going on when RVC training spits out a "nan"? More specifically, will it cause problems? My training output will look like: loss_disc=4.060, loss_gen=2.968 Then 15 epochs later I'll get a: loss_disc=nan, loss_gen=nan If I stop and restart training, it'll resume from the last checkpoint and start displaying normal numbers again. Anything you know about this would be appreciated, thanks! :D
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Mmph, nan is some undefined number. I'm not sure what causes it, but I've seen people report this occuring on logs. If you can still train successfully without problems, then you should be fine
@weightlossmotivation4070
@weightlossmotivation4070 Жыл бұрын
If you are trying to finetune the model and using the weights from the previous training instead of the base D and G pth, sometimes the generators die. So maybe stick with the base weights if you have changed them. Also you might have not trained them on enough steps (talking about the finetuned weights).
@DM-dy6vn
@DM-dy6vn 8 ай бұрын
5:12 For the sake of speed (without decrease in quality), you should definitely use "Half precision" (see Experimental settings).
@gu9838
@gu9838 Жыл бұрын
will try it out had issues with the cloning part a wile back so we will see thanks!
@marsenification
@marsenification 3 ай бұрын
@Jarods_Journey Thanks for the tutorial. But I cannot run the start.bat just like you shown us. It says "the system cannot find the path specified". How do I fix this?
@GATECH3D
@GATECH3D 11 ай бұрын
Has anyone found solution for missing VRAM while trying to train?
@pb2806
@pb2806 11 ай бұрын
Tick 'Do not load TTS on Start' in settings. Works for me
@LucidFirAI
@LucidFirAI 9 ай бұрын
I am in love with this install method! Your tutorials a year ago were usable but kinda hard to follow, this method however is f'ing perfect :) Is there a way to control tortoise through command line so I can run it with a batch file? What is the best way to run it for stable outputs at the expense of perfection?
@01infinity
@01infinity 11 ай бұрын
Thank you for sharing this video.. I do however get some strange issue: first unpacking with 7zip gives warnings (arm files are in unsupported format... i guess this is not a big deal) however when i try to train the model i do get the following message: untime\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}... looks like the zip file is not complete? Any suggestions how to resolve?
@FrankGlencairn
@FrankGlencairn 11 ай бұрын
Yeah, I run into the same problem - any ideas?
@thoughtfulreddit
@thoughtfulreddit 11 ай бұрын
my same issue
@pb2806
@pb2806 11 ай бұрын
Update 7zip
@caesq_r
@caesq_r 3 ай бұрын
how can I use this on okada voice changer? I trained a model and I trieed loading its .pth to okada but it says that it's missing a "config" parameter. I guess it's because tortoise is TTS and okada is STS, but how do I fix that? I heard someone talking about 'whisper' for the solution but idk what to do with it. HELP!
@shiviarora4173
@shiviarora4173 3 ай бұрын
this video is so helpful damn, thanks bro
@vixxcelacea2778
@vixxcelacea2778 Ай бұрын
I use Replay (because it's faster than the webui for RVC and keeps all generations) to make a model using STS, but when I used this to create a model, it only makes a .pth, no index and Replay says that the .pth doesn't exist, so even though it made a model, it didn't work as one. Did I do something wrong? I searched and found no index file created. I couldn't even test if it was trained right.
@matt0565
@matt0565 6 ай бұрын
I open start and get: FileNotFoundError: Could not find module '...\ai-voice-cloning-3.0 untime\Lib\site-packages\torchaudio\lib\libtorchaudio.pyd' (or one of its dependencies). Try using the full path with constructor syntax
@JDworldofgames-yc7bs
@JDworldofgames-yc7bs 4 ай бұрын
Hi. I am facing the problem The system cannot find the path specified. upon running the start.bat. i have zero knowledge of windows bat file . this is what inside the bat file @echo off set PYTHONUTF8=1 set PATH=%PATH%;C: untime\Lib\site-packages runtime\python.exe .\src\main.py %* pause thanks
@daryladhityahenry
@daryladhityahenry Жыл бұрын
New Question: I never get good quality. I already use my audio file that I use for recording, and free from music etc. Pure vocal. But, I'm getting robot like sound no matter the quality, diffusion or hifigan. Already try to use "High Quality" too.. I train for 500 epoch, try each result ( every 100 epoch ), no one good. Already follow tutorial on split audio file too for data. Is there any missing steps? Thanks. Also, what is "Voice chuck" when we want to generate voice? Thank you so much... [nevermind this all] After I transcribe & process, all is done. On generate configuration, and click "Validate Training Configuration", it said "Empty dataset". But I already check training folder, and my folder audio srt all exists. Why is that? Thank you. I check the code, and it checks "train.txt" file to be empty. What's should be inside train.txt file? Hi! There's some problem with what you make here, which is: the both model for TTS & Whisper is running TT__TT.... My GPU can only hold one of them. So, I can't transcribe & process while TTS server running ( Not even running the process, just starting up ). What manual code that I need to run to transcribe all of the files? I mean, where's the source code located so I can run it manually without running TTS server? [/nevermind this]
@lichtundliebe999
@lichtundliebe999 24 күн бұрын
With 8GB vram, I should lower the batch size. Maybe to 25? What other parameters should be how changed to keep the model's accuracy?
@SagarVerma-jl3cj
@SagarVerma-jl3cj 9 ай бұрын
wow cool but prompt like angry or sad or happy is not working here. why?
@Miko-orginal-1
@Miko-orginal-1 6 ай бұрын
wait so do you need the training part the voices dont sound bad without training its not much of a difference
@CptTurk81
@CptTurk81 10 ай бұрын
This is amazing. I can see there's an api option, do you have any guides on how to use it programmatically? Say for automation?
@TweetykachuDenzelAbaya
@TweetykachuDenzelAbaya 2 ай бұрын
sobrang galing, sinubukan kong gawin iyon kamakailan at sumuko. Talagang magandang ideya kasama ang lahat ng mga dependency upang ang proseso ay maging 16. I-download 17. I-extract ang 18. Patakbuhin tulad ng lahat ng dina-download ng mga tao ☢☢☢☢
@TweetykachuDenzelAbaya
@TweetykachuDenzelAbaya 2 ай бұрын
Salamat! Ang susi ay ang paggamit ng mga python embeddable packages, bagama't maraming mga hakbang upang maihanda ang isang package at tumakbo nang tama ☢☢☢☢
@TweetykachuDenzelAbaya
@TweetykachuDenzelAbaya 2 ай бұрын
@Jarods_Journey Bakit hindi bawal ang walang GUI interface para dito? Mekus ito ay kailangang sa pamamagitan ng isang terminal o browser? Napaka primitive nito!
@donmarshal2070
@donmarshal2070 5 ай бұрын
Bro, For version 3 its shown file destination error after opening start.bat as administrator (nothing is showing if open normally). Can you please tell me what to do, Someone even shared a screenshot on huggingface community to you regarding the issue.
@Elrevisor2k
@Elrevisor2k 10 ай бұрын
How do you create a voice model? For other languages? Great video
@LunaNK22
@LunaNK22 7 ай бұрын
I got CUDA out of memory error... so can I fix it? I have rtx 3050 4 gb vram
@DYLOGaming
@DYLOGaming 8 ай бұрын
Yo! Any reason why my vocals end up sounding super robotic? I'm using custom vocals, but idk why they sound filtered and very bad. Any assistance would be greatly appreciated!
@parmesanzero7678
@parmesanzero7678 11 ай бұрын
Is there an ideal script for voice training? That is, is there an ideal series of things to have the speaker saying to get the best results for new speech from the voice model?
@setumifilms
@setumifilms 10 ай бұрын
I installed everything the same way as in the video, put my voice in wav format in the voice folder, launched, inserted random text and clicked generate on the default settings on Ultra Fast. 20 minutes passed and the program performed only one step out of four. Unfortunately there was no point in continuing further, as it takes a very long time. Or the program is not optimized for video cards with 8 GB Vram or something else. On my video card MSI Nvidia GeForce RTX 2080 super miracle did not happen.
@vrtech473
@vrtech473 Жыл бұрын
nice one ❤ Thanks!
@jenishpatel3260
@jenishpatel3260 Жыл бұрын
facing error for memory, where can I change the memory allocation ? clicking REcompute does not help! RuntimeError: Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Workspace can't be allocated, no enough memory.
@Jarods_Journey
@Jarods_Journey Жыл бұрын
Probably running out of memory (VRAM). Try increasing voice chunks in generate tab and making sure audio samples are no longer than 10 seconds.
@HyperUpscale
@HyperUpscale 11 ай бұрын
I have a silly question - why the voice training needs to be done this way and so complicated? Jarod, could you please do me a favor and check which years is it now? (Hint It is not 2020...)
@overdriveoutershaxson1837
@overdriveoutershaxson1837 7 ай бұрын
also I click on the start and now nothing pops up with it being a system 32 cmd.exe folder with nothing on it t
NEW Open Source Model for Emotional Text to Speech
5:48
Jarods Journey
Рет қаралды 32 М.
Local voice cloning with 6 seconds audio | Coqui XTTS on Windows
20:22
To Brawl AND BEYOND!
00:51
Brawl Stars
Рет қаралды 17 МЛН
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.
Правильный подход к детям
00:18
Beatrise
Рет қаралды 11 МЛН
host ALL your AI locally
24:20
NetworkChuck
Рет қаралды 1,4 МЛН
How to Make the PERFECT Dataset for RVC AI Voice Training
18:17
Jarods Journey
Рет қаралды 137 М.
How to Clone Most Languages Using Tortoise TTS - AI Voice Cloning
29:40
I Didn’t Believe that AI is the Future of Coding. I Was Right.
6:55
Sabine Hossenfelder
Рет қаралды 611 М.
Voice Cloning For Any Language | Fine-Tuning Tortoise-TTS | Part 1
22:53
Realtime Local AI Chatbot Demo with GPT-SoVITS and Llama 3
4:23
Jarods Journey
Рет қаралды 5 М.
To Brawl AND BEYOND!
00:51
Brawl Stars
Рет қаралды 17 МЛН