My TOP 3 Tips for Training Better AI Voices

No video

My TOP 3 Tips for Training Better AI Voices - RVC Voice Cloning

Рет қаралды 12,728

Jarods Journey

Күн бұрын

Пікірлер: 66

@greenockscatman Жыл бұрын

Solid tips all around! You're right to put the dataset first because "garbage in, garbage out" is probably the first thing you're going to learn through trial and error. Appreciate this vid is mostly geared towards AI voice changing, but if you're doing any AI music where you want to change the vocals, my tip is to not go overboard with UVR in trying to "clean up" the target voice (singer in the song you're wanting to replace the vocals for). Lots of times just a single pass through of Kim Vocal 1 sounds miles better than doing that + de-echo, dereverb etc. It's easy to end up losing some of the little qualities of the song that make it sound good if you clean it up too much.

@Jarods_Journey Жыл бұрын

Thank you! A little bit of trial and error and GIGO will be your new motto 😂. This is a good tip as well for the inferencing side of things, definitely though you don't want reverb much on the training side still imo.

@M4rt1nX Жыл бұрын

Nice set up. Better lightning and new background. A lot of improvements there.

@Jarods_Journey Жыл бұрын

Thanks Luz, I've moved to a different portion of my room xD!

@SparkysTechCorner Жыл бұрын

Good video, good info stuff Iv came across in my own trial and error. Keep up the good work man

@Jarods_Journey Жыл бұрын

Appreciate it!

@moriakiinamine1372 Жыл бұрын

Hello! The new RVC update makes training with CUDA faster. With my RTX4070ti it takes 30 seconds per epoch

@Jarods_Journey Жыл бұрын

This is awesome to hear! I'll have to check what they adjusted

@diegolopez-xz8pg Жыл бұрын

Hi, glad to read that. Where did you get the "RVC update"? Thanks and regards from Argentina

@moriakiinamine1372 Жыл бұрын

@@diegolopez-xz8pgcomo tenia problemas de configuraciòn, fui a ver al github y hay una actualizaciòn de hace un dia atras.

@northwestrepair 4 ай бұрын

I dont understand why i cant get any decent result. No matter what i do it will sound like a robotic noise.

@miinyoo 9 ай бұрын

I've banged my head against this for two solid days. I think the noticeable AI sound is a combination of things. #1 on the list is compressed source audio. #2 is leaving silence pre-processed bits in the dataset. #3 is not enough variety in a dataset. #4 Parameters and turning knobs etc. I have found making convincing RVC is really really fucking hard. You can do it with other noise in the background and no one notices, but once it's "alone in the room" it always seems to fall on its own face.

@Jarods_Journey 9 ай бұрын

Not 100% there, some models I've trained though sound 80-90%, though on scrutiny, it's possible to tell. Data is 100% key here.

@sotiris6116 3 ай бұрын

i have 15+ mins of studio quality vocals, but I always get effed up S and T sounds and foggy vocals. I've tried lower batch size but nothing changes.....what can I do??

@DJDJisMusic 2 ай бұрын

I have the same issue, I think at recording we must to emphasise on S and T and for the foggy areas record a variety of high notes

@sotiris6116 2 ай бұрын

@@DJDJisMusic Setting the batch size all the way up seems to help a little. But still it is not perfect.

@Molandria 4 ай бұрын

Greetings. I have been steuggling with this stuff for weeks. I am at a point now where i can train models with RVC, however... i am having a problem i l'm not really finding ANYTHING about, anywhere. :( I will say one thing, and the model steaight up, will say a different word. It was based off a voice recording from an anime character of which there is not a whole lot of audio to begin with... Is it possible to say, I make a model from scratch using myself, and just talk and talk and talk, then after, graft the voice tone of the character onto that? Would that solve tue linguistic issues, or add new ones? For now though, i think i'll restart the data set from scratch using some tips here. =) Your help is amazing.

@azadi9999 Ай бұрын

Is it possible to start small first and then improve the created sound model with more datasets or more epochs, I mean that we don't have to do the modeling from the beginning again. If it's possible please tell us how we can do that?

@macdoctorsg Жыл бұрын

great tutorial mate! I realized a lot of your videos have your voice (audio) outta sync with your visual, i.e. seems like your video couldn't catch-up with your voice.

@Jarods_Journey Жыл бұрын

:O, my voice is not in sync I'll have to check lol

@SaveTheGregoryHorrorShow 7 ай бұрын

Hey I'm still new to AI (especially RVC) training, how many epochs does it take for each varying duration of datasets? Like a dataset that's either 1 and a half to 5 minutes, 5-10 minutes, 10-15 minutes, 15-20 minutes, 30+ minutes, etc. I have varying datasets that are very short to very long. For example, my shortest model is 1 minute 11 secs, my longest one is 43 minutes 57 secs. I hope you understand how I explained it since I'm on the autism spectrum and I love how AI is progressing. Hope you reply soon (cause I know you're a busy guy lol), thanks for reading!

@jlobstertv Жыл бұрын

The UVR tool is effective at separating music vocals from instrumentals; however, in certain instances, there may be some static noise present in the background of the UVR Vocal output. Therefore, it is not guaranteed to work flawlessly for removing background noise in general audio recordings. To ensure clean recordings, it's advisable to use a microphone with noise cancellation capabilities in conjunction with Krisp, a noise-canceling AI app, during the recording process. Additionally, I wish I had known to "start with small datasets" earlier, as I've already set 1000 total epochs for my voice model and it is still training as of now🤣. 15 more hours is my estimated time of completion, I just hope it will turn out well🙏

@Jarods_Journey Жыл бұрын

🤟 appreciate the tip and hope it turns out as well too 🙏!

@Tom_Neverwinter 8 ай бұрын

Eww krisp

@Joe-hp6jz Жыл бұрын

Is it recommended to train voice samples (talking) and singing voice samples together, or would that compromise the overall quality? Would it be better to train only singing voice samples to make an AI song cover?

@Jarods_Journey Жыл бұрын

I have yet to make an explicit comparison, but you can get really good models still with datasets mixing the two (I've done several this way). It might make for an interesting comparison to split that data setup and see what results in the best model 🤔

@reedmoon3630 7 ай бұрын

Thanks for the tips. I'm swapping singer voices. I have good data of about 20 minutes. 200 epocs. I used Harvest and RMVP_gpu for both training and processing. The results are ok but I still hear too much of the original singer's voice. What can I adjust to make the cloned voice totally replace the original voice?

@EvanTunes 2 ай бұрын

Can We Retrain a Model? or Do we have to Train it from the starting?

@motokorcle Жыл бұрын

can I use this software on fortnite like live?

@Jarods_Journey Жыл бұрын

If you wanted to and had a powerful enough PC, yes.

@denblindedjaligator5300 7 ай бұрын

Hello Jarod's Journey. I would like to know if you would like to train a module for me, where I have set it to false `You can get up to a higher batch size I can only get up to 26 It sounds like there is an autotuner on, when I have trained over 200 epoches. but it could well be, if you train with 35 batches, that it became more precise. How can I send you my dataset set the pitch to false thanks.

@Nishartist Жыл бұрын

When i try to train voice . in preprocess section its shows this error start preprocess ['trainset_preprocess_pipeline_print.py', 'D:\\RVC0813Nvidia\\Dataset\\Myvoice\\Myvoice.wav', '40000', '24', 'D:\\RVC0813Nvidia/logs/Myvoice', 'False'] Fail. Traceback (most recent call last): File "D:\RVC0813Nvidia\trainset_preprocess_pipeline_print.py", line 111, in pipeline_mp_inp_dir for idx, name in enumerate(sorted(list(os.listdir(inp_root)))) NotADirectoryError: [WinError 267] The directory name is invalid: 'D:\\RVC0813Nvidia\\Dataset\\Myvoice\\Myvoice.wav' end preprocess

@jaimeleau Жыл бұрын

Thanks man 💪

@LarsEsDoch 3 ай бұрын

How do you use these models in real time?

@klaurcschwackerberg1880 Жыл бұрын

Would you know if it is already possible to make a training which allows me text to audio from acapella's , but I want to avoid the nightmare training from Tacotron 2 , and use n RVC v2 kind of training nice and easy. So I mean I want to train a model by adding acaopella's to the model, in an easy way like you can do in RVC v2 , without having to transcript every sentence as that is needed for tacotron2 training, , and then when inferencing the model , use the type text to audio ! is that not possible yet ? Wouldn't that be great ? Or did I miss something ?

@Avax84 Жыл бұрын

Does it help if you clean up your device? I’m having the voice changer with cpu (and AMD Radeon graphics card) and what ever I do, on discord it’s extremely slow. Also i can’t use CUDA because when I check if it works in the console it keeps saying “false”

@Jarods_Journey Жыл бұрын

That's mainly a hardware limitation, you can try using the directml version of it but CPU is slow and AMD is unstable sometimes. CUDA is Nvidia proprietary so that is why you aren't able to use it

@Avax84 Жыл бұрын

@@Jarods_Journeyif I try to use Cuda it unfortunately says falls when I try to check if it works (by checking in powershell) so that’s currently my biggest hazard

@Jarods_Journey Жыл бұрын

@@Avax84 You can't use CUDa because you need an Nvidia GPU, so that's why you'd have to check out the directml version to see how that works

@Avax84 Жыл бұрын

Also, idk how to fix that

@Avax84 Жыл бұрын

I hear myself with voice changer but very badly, like 10% quality of what I hear when testing in client

@MaisnerProductions Жыл бұрын

great tips

@hdhdhvdjgdjjdbjdb5541 Жыл бұрын

How to minimize the delay when streaming? Get better vga? Is 4060 laptop has better delay than 3060 12gb?

@Antonsetiady Жыл бұрын

Thanks sir

@heyheybackup Жыл бұрын

could you do a tutorial connecting this to OBS?

@klaurcschwackerberg1880 Жыл бұрын

Does anyone know a good model for UVR5 that can extract acapella's from music but now without the backing vocals ? I Know X-minus can do this but I want to use UVR5. I just don't know what model I need to choose, thanks

@Poney01234 Жыл бұрын

Have you tried MVSEP (online) ?

@amiraskari4055 9 ай бұрын

this is my question too, did you found anything?

@denblindedjaligator5300 Жыл бұрын

where can i find the guitar model? How can i get the mpeg working on the mac side? i can not train my voices ore make my Model Inference. Should me and my frend use Xformers ore not?

@Jarods_Journey Жыл бұрын

RVC's guitar model can be found here: huggingface.co/spaces/lj1995/vocal2guitar/tree/main/weights Unfortunately, I don't know whether or not RVC uses xformers or not and mac I'm not sure since I don't own a Mac.

@denblindedjaligator5300 Жыл бұрын

The index file is missing thanks

@denblindedjaligator5300 Жыл бұрын

I found the index file. I have to retype logs not weights

@synthmaster4959 Жыл бұрын

Hey man is there a rvc ai download with working tensorboard? Assuming its a clean windows install

@Jarods_Journey Жыл бұрын

I believe with the folder that you download, it includes the package. But if not, check out this video here: kzbin.info/www/bejne/hmGwaIN3qKxknM0

@synthmaster4959 Жыл бұрын

@@Jarods_Journey ive tried bro, i cant get it working, in the runtime folder in the rvc download theres phython and the tensor stuff i just cant get it to work. I tried yoyr guide also but it breaks the rvc latest release if i install another phython. Can you take a peep at the latest release after the beta as its like 2 weeks old

@AdvancedGamingYT Жыл бұрын

Any tips for the real time voice changer? I can't get it to sound right :/

@Jarods_Journey Жыл бұрын

Depends on graphics card, but you need a good model and then you need to optimize your settings as well. Biggest thing though is the GPU.

@AdvancedGamingYT Жыл бұрын

@@Jarods_Journey Yeah I don't know if it's my mic but for me it sound kinda robotic and not smooth. I have a 3070ti laptop which should be like 3060(ti) desktop ish level.

@LindaSummer27 Жыл бұрын

How to download RVC?

@nickysingha39 Жыл бұрын

Any voice changer for mobile phone

@Jarods_Journey Жыл бұрын

For RVC voices, I haven't run into any because it requires too much compute power. As well, realtime voice changing takes a lot of power so I don't see it being something on phones yet.

@nickysingha39 Жыл бұрын

@@Jarods_Journey ok thanks maybe in future you find a way for phones btw I love your video keep going I'll always support you...