Zero-code package is here if you're running into difficulties installing: huggingface.co/Jmica/ai-voice-cloning/blob/main/ai-voice-cloning-3.0.7z Make sure you have the latest 7zip and when you unzip it, run start.bat.
@IDOLSKPOP688 ай бұрын
I got an error when using it, I created a folder in the voice section but when I refreshed the voice list it didn't appear in the list. And some other errors
@LagomStorybook7 ай бұрын
Do you get errors? Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.2.2+cu121. Bad things might happen unless you revert torch to 1.x.
@LJames-ez9lr6 ай бұрын
Guys remember to update to the latest version of 7zip. Mr. fancy pants used some modern compression. V3 works great after I updated 7zip and extracted. Thank you sir!
@LJames-ez9lr6 ай бұрын
Does this only work with float16 compute for the training? or can I edit a config file to change the compute type? I have a gtx 1080 float16 doesn't work for me.
@Schawum4 ай бұрын
@@LJames-ez9lr gleiches problem bei mir. ich habe auch die 1080 ti und float16 läuft nicht.
@anikethhebbar64388 ай бұрын
00:01 Install the latest version of AI voice cloning repository and set it up with python and git. 02:21 Install and update CUDA and drivers for smooth operation. 06:29 Recommend disabling Whisper X alignment 08:46 Configuring tokenizer and training settings for AI voice cloning 13:08 Confirming and optimizing model usage 15:25 Organizing audio files for voice cloning 19:32 Use low samples and 30-50 iterations for TTS generation, and recompute voice latent if necessary. 21:42 Tortoise TTS is robust for speech-heavy audio. 25:55 Saving frequency and continuing training in Tortoise TTS 28:01 Explanation of training and modifying the state for longer training
@bomar9208 ай бұрын
Thank you for contributing to the open sourcecommunity . Your channel deserve more subs . Your content are high quality
@احمدصبيح-خ7و8 ай бұрын
I trained the program on several hours of Arabic audio. Finally, he started speaking Japanese instead of Arabic
@tipu-j6e3 ай бұрын
i have heard that it needs atleast 10k hrs of good quality audio
@SAnsAN0911908 ай бұрын
Jared, thank you for your hard work! I haven't tried the current changes yet, but I will definitely try! At the moment, I have independently converted the code for training non-English languages. I've been training for Cyrillic for 400+ hours of source audio. In the settings, I selected 35 epochs and Batch Size = 1024 (in order to maximize the use of video memory) and Gradient Accumulation Size = 16. The rest of the settings are similar to yours. When generating short phrases (up to 11 seconds) I get silence and strange sounds at the end of the audio. As you mentioned at 19:00, I increased the 'Length Penalty' and 'Repetition Penalty'. This relieved me of the silence, but there are still artifacts (strange sounds and repetitions). I think this is also largely the fault of poor audio splitting using Whisper for Cyrillic (many phrases are cut off in mid-word). Maybe you can tell me how I can try to fix this? And if you have alternative communication channels where you can communicate with like-minded people, I would be happy to join them =) (I saw a mention of Discord somewhere)
@Artholos8 ай бұрын
Yeah baby! Jarod you’re the hero once again! 🎉 Thank you so much for your hard work!
@AIUnveil8 ай бұрын
Awesome Bro! was checking your channel almost everyday for this video. You are pretty much the only one doing this stuff. Great work.❤❤
@giovannif25678 ай бұрын
You're so talented man! And you make everything look so easy! Happy to be a supporter, and i will continue to be !🚀
@Jarods_Journey8 ай бұрын
Thank you :), appreciate it!
@blakusp8 ай бұрын
Wonderful tutorial! Exist the possibility to share the Spanish (or all languages) base model you trained so far? For the people (including myself) that cannot have the resources to train from scratch? :( haha, thanks! PS: I completely understand if you don't want to share it.
@hackpop7 ай бұрын
I ran into various issues, but at the end, I finally understood that curl was missing from my system, after curl was properly installed, everything went smoothly, thank you Jarod for your contribution, this project is awesome !!!
@LJames-ez9lr6 ай бұрын
@hackpop hi, what kind of issues were you having and what is curl?
@AlexisGomes-n4r8 ай бұрын
Hello, I have cuda toolkit 12.4, windows 11, git and python installed. When running the set-up cuda bat I am getting an error while extracting rvc.zip. (error opening archive : failed to open 'rvc.zip') (ERROR: Could not open requirements file: Errno 2 No such file or directory. Then pannel shutt down. What should I do ?
@szymonnawrocki8906 ай бұрын
Really great videos and content. Thanks to you I'm getting into voice modeling myself
@trollsome7 ай бұрын
11:19 I can't validate training configuration because it just gives me an error saying "empty data set"
@BorygoTomka8 ай бұрын
Hey I have a little trouble On step in 7:12 when I clicked Transcribe and Process i got a error: ValueError: Requested float16 compute type but the target device or backend do not support efficient float16 computation. What I need to do to make it work?
@BorygoTomka8 ай бұрын
No such file or directory: 'training\\Ja\\processed\ un\\dataset\\wav_splits\\file___2\\file___2.srt'
@BorygoTomka8 ай бұрын
I found the problem but still can't fix it. When running the CUDA setup program, an error pops up conflicts.hydra-core 1.3.2 requires antlr4-python3-runtime==4.9.*, but you have antlr4-python3-runtime 4.8 which is incompatible.hydra-core 1.3.2 requires omegaconf=2.2, but you have omegaconf 2.1.0 which is incompatible What to do?
@heyyanito7 ай бұрын
Hi Jarod thanks so much for the release and for walking through this process. It's wonderful. Do you have API examples which include the RVC pipeline? I'm not sure the ones listed in gradio on the most recent release include the flags for adding the RVC inference to the request, although I could just be misunderstanding as programming is not something I am very good at :)
@LJames-ez9lr6 ай бұрын
i got this error when i did the test generate Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 2 in the list. even after I clicked recompute voice latents still same error
@Vxk_yt6 ай бұрын
did you find a fix? i have the same issue. Tried checking on github... there are a few ppl with the same issue
@manfredice3915 ай бұрын
Try set "Candidates" to 1
@DM-dy6vn8 ай бұрын
14:05 The same tokenizer which was used during training has to be selected as well.
@Rewe4life2 ай бұрын
Hi, i am trying to follow your tutorials, but I am unable to get it to work. I am using a ubuntu-server vm to run it on and it has a tesla p40 gpu. then I am cloning the repo, creating and starting the container. I am on the webui, but as soon as I click anything (generate or even just changing the selected voice), I am getting errors. Do you have any suggestions, what I could try? Or am I missing anything obvious? in example downloading a model manually?
@augustinolarian8 ай бұрын
Hi. is there any way to import models already trained? Is there are way we can download already trained voices? I am unable to clone voices in Romania. I get an undescriptible audio every time.
@dthSinthoras8 ай бұрын
While "Transcribe and Process" I get UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: or set allow_custom_value=True. warnings.warn( Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value ... What does it mean? Its running anyways, but I have put in hundrets of hours of audio, so would be great to know, if I should aboard the run...
@younglymak2 ай бұрын
After refreshing the voice list, I get the error 'Connection errored out.' What should I do?
@alexlazareibanez10476 ай бұрын
Hello, it was to know if tortoise v3 supports Spanish language and accent? It is that when I do the voice training I put it in ES language instead of EN but it sounds with English accent the Spanish. I am working with RVC Retrivial xtts2 but I heard that tortoise is better. Thank you.
@Physengineer6 ай бұрын
Question: I am trying to use the audiobook 3.0 program, which requires a rvc voice, but I can't see how using voice cloning 3.0 software to make an rvc voice and index file required by the audiobook program.
@Physengineer6 ай бұрын
Well, I guess it can't. But I installed applio. Once I figured out how to use it, it worked great for making voices. I am using the first voice together with the voice cloning and audiobook 3.0 app. They work great, I could not be happier.
@cyberbol2 ай бұрын
I fallowed your tutorial step by step and at the end when I try generate voice I have this error: error No RVC configuration found, check configs folder. If rvc.json does not exist, please change a setting in the RVC area to create one.
@jr-2nd5 ай бұрын
When I train a voice, terminal always says "\ai-voice-cloning-3.0>pause" at 98,7% and it doesn't move anymore, it happens in every try. Any solution?
@adamrastrand94098 ай бұрын
but why doesn’t the voice sound like me when I trained it? I only trained for two minutes of data but why is fine-tuned voices good then if you just use the auto regressive modern with a voice sample and when you trained a new language with many hours of data, how do I find tune it them so when I trained the new language does that count as a voice or as a new language I don’t really get it
@adamrastrand94098 ай бұрын
Hello so after I’ve trained my tortoise model on my voice, I made a short data set of five minutes with 200 epocs. I think when I select the autoreggressive model from my voice and select none as voice type, it doesn’t sound like me at all or no, it says that none is not classed as an argument, however, when I select random or my voice from the voice folder with the auto aggressive model, it sounds like the latents are computed for another voice with my timbre and such but how do I fix it so it’s completely my own voice should I delete everything from the folder. Should I delete the computer latents file and just keep the audio files or should I delete everything or should I keep the shortest audio file, and when I prepare the tokennicer for another language, is it necessary to have like 70 hours of audiobooks? Also, I wonder do I need a large amount of audio data to prepare the token Iser or is it for training another language and I also have another question so say that you trained a new language with many audiobooks let’s say 50 hours, how do I train a new voice for example a new Spanish voice do I use the previous data set or how do I use the new audio with the new token nicer or new data set I don’t really know
@Jarods_Journey8 ай бұрын
You've got a lot of good questions on training afterwards, I won't be able to respond to it all in this comment. In general, after training a language, you can run "finetunes" of that language to get specific voices. As for you initial question, I'm not entirely sure on what is happening either. Sometimes, the voice won't sound like you. This is where I have RVC come into play as it helps rematch whatever voice you want to get close to.
@adamrastrand94098 ай бұрын
@@Jarods_Journey but why doesn’t the voice sound like me when I trained it? I only trained for two minutes of data but why is fine-tuned voices good then if you just use the auto regressive modern with a voice sample and when you trained a new language with many hours of data, how do I find tune it them so when I trained the new language does that count as a voice or as a new language I don’t really get it and how does it sound like when I have two little data for a new language say Norwegian Swedish or any other language will it sound like that language and how many training data do I need to prepare the token nicer?
@dani00014 ай бұрын
I downloaded the Huggingface version and started training a Hungarian language voice model with it. However, for some reason, I can't reach the stage of text generation. (RE)Compute Voice Latents runs indefinitely, then I get a CUDA Out Of Memory error. Additionally, it also tries to generate the text forever. I am using everything according to the settings shown in the video. What could be the problem? Is my NVIDIA GeForce GTX 1660 6GB video card and the 32GB RAM in my computer insufficient for this? Thank you very much in advance for your response!
@aheront35412 ай бұрын
Connection errored out.
@ToukoWhite6 ай бұрын
After I click train I get this error " RuntimeError: CUDA error: device-side assert triggered [Training] [2024-06-23T23:29:29.443537] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. [Training] [2024-06-23T23:29:29.443537] For debugging consider passing CUDA_LAUNCH_BLOCKING=1. [Training] [2024-06-23T23:29:29.443537] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions." Anyway to fix it?
@AlexisGomes-n4r8 ай бұрын
When running start.bat i am getting the error : no module named 'psutil'
@oosixcosoo-yt55488 ай бұрын
I think it's because you don't have Python 3.11 installed. If you want it to run normally, just reinstall python at 3.11 version and rewatch the video
@StefanHackbarth-xz7ee6 ай бұрын
Help please... when i'm running Training i get these error message: 'utf-8' codec can't decode byte 0x81 in position 2: invalid start byte
@rubenrodenascebrian38556 ай бұрын
Great video and great repository, thank you very much for your work. I AM HAVING A PROBLEM... I train the model in Spanish, set "ES" for Whisper to recognize the Spanish language, but when I finish the training, it speaks with an English accent, but totally English. Why is this happening? Thank you very much!!!
@chiyanchandru59144 ай бұрын
how can i run i already have an transcription data with audio
@sirjared213 ай бұрын
Have run into issues using this. For one voice, I keep getting a 'CUDA out of memory' error where it tries to set aside something like a terabyte of RAM, lol - it didn't happen before using the same voice and settings, but just randomly happened. For all voices, if you use the en_tokenizer when generating after training, the outputted voice sounds utterly insane/incomprehensible. Switching back to '/modules/tortoise-tts/tortoise/data/tokenizer.json' fixes it. There really needs to be a guide for all the sliders and what they mean. There needs to be some kind of "recommended" settings for training epochs, etc. I've done 80 for a voice and it took like 8 hours. For another voice it took only 2, so I guess it depends on how big of a data set you have. This is my first foray into ai voice cloning - doing it for a mod project, and so far it's been frustrating. After all's said and done, I've yet to create a realistic copy of a voice.
@BZAKether2 ай бұрын
I can't generate anything, when I try I get a "Error with Pydantic and startlette" and "ERROR: Exception in ASGI application" :(
@stellarbuddyАй бұрын
Same
@احمدصبيح-خ7و8 ай бұрын
Thank you for your wonderful explanation, which many people learn from. I want to tell you that I have been using text-to-speech programs for a long time, but they are weak for the Arabic language. Finally, I applied the explanation in this video, and at the end a message appears stating that the CUDA memory is full. Perhaps the reason is due to my lack of sufficient knowledge in applying this explanation. I hope to apply this explanation to an audio sample of the Arabic language so that I can apply it and explain the numbers entered and divide them by two because I did not understand their exact meaning.
@Jarods_Journey8 ай бұрын
CUDA memory being full means you're GPU VRAM is too small. Recommend that you start at batch size = 1 and gradient accumlation = 1. Then if training start with these settings, you can restart (close the browser window) increase batch size by 1, save configuration, and keep doing this until you run out of memory again. With this, you'll know what the smallest batch size you can use is.
@AlexisGomes-n4r8 ай бұрын
I dont have rvc.zip while downloading
@mosambielal67005 ай бұрын
Can you please guide me on how did you added emotions tab? And how can we add other emotions here?
@farsi_vibes_edit8 ай бұрын
please help i get this error G:\tortiois\ai-voice-cloning>call .\venv\Scripts\activate.bat Traceback (most recent call last): File "G:\tortiois\ai-voice-cloning\src\main.py", line 23, in from utils import * File "G:\tortiois\ai-voice-cloning\src\utils.py", line 41, in from tortoise.api import TextToSpeech as TorToise_TTS, MODELS, get_model_path, pad_or_truncate ModuleNotFoundError: No module named 'tortoise'
@farsi_vibes_edit8 ай бұрын
i get this error when i click on the start.bat
@francisgoeltner55696 ай бұрын
Hello Jarod! First of all: Awesome video and a great channel you have there. Really helpful stuff! I experienced a bit of a problem though with the training continuation process as you described it. I had exactly the outlined problem with a crashed console and tried to resume from the .state file of the previous run. Configuration import and setting of the old state as resume state path worked nicely, but when I try to run the training I get this message: PermissionError: [WinError 5] Access is denied: './training\\Voice2Train\\finetune' -> './training\\Voice2Train\\finetune_archived_240624-081331' The file named indeed does not exist. Did I miss something here? Any help would be greatly appreciated!
@nottobemessed46287 ай бұрын
you have set the large-v3 model as default how do we change that model to lower one , like medium or small ?
@pupattolino758 ай бұрын
I followed the installation but I received this error: from rvc_pipe.rvc_infer import rvc_convert ModuleNotFoundError: No module named 'rvc_pipe'
@stevecato8 ай бұрын
Can rvc be cloned from the same dataset? If using it, how much effort should go into training tortoise vs rvc? Thanks.
@Jarods_Journey8 ай бұрын
Yep, it should be generally. But rvc only requires 10-60 minutes of audio. Can't really determine, but rvc is generally easier to get it matching. Tortoise is more important though to get the style of how a character speaks, etc
@Nickfulcrium2 ай бұрын
Once you transcribe from a certain language can it speak any language ?
@emmanueltoussaint24668 ай бұрын
Thank you so much for that one. But I keep getting that error: RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory And I got this warning just before. !!!! WARNING !!!! No GPU available in PyTorch. You may need to reinstall PyTorch. Loading TorToiSe... (AR: None, diffusion: None, vocoder: bigvgan_24khz_100band) No hardware acceleration is available, falling back to CPU... What can I do to solve this please?
@francsharma72765 ай бұрын
I tried hindi 2hr of voice with 300 epoch, and it say 4 days after 6hr it just stop loading, my graphic card is 3070ti
@Bichos28-bg4nm6 ай бұрын
Hi, a French speaking person, I would like to hear a Portuguese text spoken. Is this possible with ai-voice-cloning 3.0?
@swedishcat74488 ай бұрын
Awesome tutorial and fantastic job on making this, for people to use. I do have a weird bug or something I need some assistance with, if you may. I'm training a model with English speech and have set the language to en. But when I generate a prompt, the voice is in Japanese (or something like that). I don't quite understand where that come from. Is there any other settings I can change? Thanks.
@Jarods_Journey8 ай бұрын
I also don't know where that's coming from. Make sure the tokenizer is English
@swedishcat74488 ай бұрын
@@Jarods_Journey Thanks for answering, Yeah, the tokenizer is the "en_tokenizer" as in the video but I still get that. Is there any other fix? Can I provide any logs or something for you to see anything? I really like the turtoise TTS
@bigadz878 ай бұрын
@@Jarods_Journey I am getting the same issue, followed your video exactly
@JanPeter567 ай бұрын
Same here lmao, had it training on 12 minutes of clean af audio of Christopher Lee talking, the first time all i could get was whale noises, and the 2nd time training it sounded like a japanese whale Edit: It must be the tokenizer i think. when i select the default "./modules/tortoise-tts/tortoise/data/tokenizer.json" instead of "./models/tokenizers/en_tokenizer.json" The model suddenly created clear english audio
@bwheldale6 ай бұрын
@@JanPeter56I tried this and mine went from gibberish to English which was hopeful, but the accent was too 'English' instead of Australian. An audio language detection site said it was 65% English though it sounded more like German.
@WonderWhat10008 ай бұрын
Hello Jarod, I have a question about cloning English voices in AI voice cloning. So there is a voice I want to clone with an hour of data , so how many epoch are needed to clone a good voice. can you please elaborate this part. Thank you
@Jarods_Journey8 ай бұрын
If you're doing english, for an hour, I'd train to about 50 epochs first and see how that sounds. You can always train longer if you determine the need to
@himelhs6 ай бұрын
My laptop is intel can i didn't use it ?
@gmfPimp7 ай бұрын
Thanks for your effort. FYI, you are not using MP3s, your file extension is MP4. MP3s have better audio quality than mp4.
@miyrrecs30246 ай бұрын
I did all the steps according to the video for the Spanish language with fluent input, but what I get is a messy voice like 'dysphasia'.
@francsharma72765 ай бұрын
can we train model in parts, if yes please make a video, "for 10 epoch and 2hr of audio" it tool 4 hours
@Vaultcitizen8 ай бұрын
I installed 2 versions in different folders. How should I uninstall the older one? Simply delete the folder, or there is a better way (through cmd) ? Thanks for your work and making it easy to test :)
@CyberPhonkMusic8 ай бұрын
How much GPU do you need to train a new language like Brazilian Portuguese? Do you need 25 hours of audio? Does it always have to be the same person speaking?
@mydreams34378 ай бұрын
how much time duriation of input voice file..can i take mp3 file
@farsi_vibes_edit8 ай бұрын
tnx i really nThank you. I really needed this software and your training. I am installing it. I hope I won't have any problems.
@iQOmni8 ай бұрын
You are amazing thanks for all that you do
@edgarl.mardal82566 ай бұрын
Hi, are you pinoy? I was wondering if I could ask for help to create a AI Cold sale Agent with norwegian LLM and train a TTS to talk fluent Nowergian?
@frh17005 ай бұрын
when i try to train the model i have this error Missing dataset: ./training/test//whisper.json
@zenkidpress22718 ай бұрын
Hello Jerods, it would be nice if you trained the voices in other languages for the community and then shared everything (even charging a sum because you obviously spent time training the other languages), I would gladly pay 🙂
@tempertephra8 ай бұрын
agree may be good to share other lan files in the community. please initiate.
@ph0enixph0enix656 ай бұрын
In case you're willing to do so, I would need a female german voice model. I would also gladly pay for it.
@Djamel__LD8 ай бұрын
can i use it with Intel Iris(R) Xe 16 GB GPU ?
@Vlad-hm7cj8 ай бұрын
Does this work on linux? My windows machine has an AMD instead of an Nvidia... T-T
@craigcarter15726 ай бұрын
setup-cuda says Python 3.11 not installed, however when I run >python --version it sees the version 3.11
@Vxk_yt6 ай бұрын
try uninstall and reinstall, if you have other versions uninstall all of them, also click on add to path when installing
@craigcarter15726 ай бұрын
@@Vxk_yt I found the issue, when installing python, even though I checked the box for path setup, win 11 did not update environmental settings. I added paths manually and it fixed everything. Reminds me of the old MS DOS days. Thanks much for your reply.
@IDOLSKPOP688 ай бұрын
Is there any way to install on linux?
@threepe06 ай бұрын
stuck on "No module named 'vc_infer_pipeline'"
@CINECOMBO5 ай бұрын
same problem
@fdgfdgdfgdfgfdgdf8 ай бұрын
when i push train: ModuleNotFoundError: No module named 'axial_positional_embedding'
@trickydicky9618 ай бұрын
I get the same error.
@stickmanland7 ай бұрын
How much time would it take to train?
@ywueeee7 ай бұрын
can this run on mac?
@SyamsQbattar4 ай бұрын
Is it support Indonesia languange?
@NT3wazLcUqwA7 ай бұрын
how about cantonese languages ?
@RexVergstrong4 ай бұрын
I'm getting this error when I validate the training config. [Errno 2] No such file or directory: './training//train.txt'
@RexVergstrong4 ай бұрын
There's a train.json in that folder but in my model folder there was a train.txt. I copied into the training folder directly and it seemed to work for now.
@kernsanders39738 ай бұрын
Thank you!
@RA-ss5fe7 ай бұрын
1. will it work for Urdu/Hindi language? 2. will it work with any type of nvidia gpu? i mean with low end gpu ? 3. how much space in harddrive does it requires?
@akemixx._08 ай бұрын
Does this already support Brazilian Portuguese?
@farsi_vibes_edit8 ай бұрын
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them. unknown package: Expected sha256 f9ef0a648310435511e76905f9b89612e45ef2c8b023bee294f5e6f7e73a3e7c Got 887e84fc28f6772ed033ff6d269a01179021bf974277d1b1859c9654541781ba
@farsi_vibes_edit8 ай бұрын
i get this error but downloading is still continue is this ok?
@MaorStudio8 ай бұрын
You are awesome.
@StringerBell8 ай бұрын
Followed every single step and ended with jibberish mess in Bulgarian language. Trained on 3 hours of studio quality voiceovers for 500 epoch (save every 20 epoch)
@Jarods_Journey8 ай бұрын
Issue with training another language is the model needs to generalize and 3 hours isn't sufficient enough for that. I'd say you wanna start with at least 25 hours of ANY language data in bulgarian, for this, you could probably scrape from audiobooks. Even 25 hours may produce a rough model, so the more the better. After this, we'll call this a bulgarian base model. Now that you have a base model, you can then "finetune" it for a voice style you want. Though, it's a lot to talk about in a comment so I'll think about making a followup video
@StringerBell8 ай бұрын
@@Jarods_Journey Is there a way to train a language and then change the style to super EXCITED and over the top performance? Also can I use random voices? Male and female for the training?
@StringerBell8 ай бұрын
@@Jarods_Journey How to finetune a base model is a super interesting topic. Adding emotion or style to it. Please do make a video when you can, it will be imennsly helpful followup to this tutorial!
@StringerBell8 ай бұрын
I just trained for 29 hours on my RTX 4090 on 48 hours of studio quality audiobooks in Bulgarian. Let's say the result is underwhelming.
@satyajitroutray2824 ай бұрын
@@StringerBell Did you successfully trained your model in Bulgarian language with that amount of data?
@WorldYuteChronicles8 ай бұрын
big up!
@MadeEasyTube8 ай бұрын
Thank you
@MuratAtasoy3 ай бұрын
I train for Turkish language with 5 min voice record, results are nonesence :) like a new language lol. This is only for english?
@MuratAtasoy3 ай бұрын
a day's work gone trash
@Airbender1310908 ай бұрын
Does it work with Russian?
@Jarods_Journey8 ай бұрын
It should work, after having trained 4 languages so far to a good degree of accuracy, I don't see why not. Just make sure you have enough data and it should be fine to run :)
@Test-ep7gg5 ай бұрын
And will Bulgarian work? There are good results on sites, I prefer to use the resources of such projects, but no matter how many projects I install, there is still no Bulgarian language. This one is currently giving me an error and I don't know if I should bother doing it. ImportError: cannot import name 'RootModel' from 'pydantic'
@yasenkey37794 ай бұрын
@@Test-ep7gg did it work for bulgarian
@tylerchambliss83798 ай бұрын
I don't understand why your models aren't skipping. I still can't make my books bro. What are you doing? How are you making these models not skip and glitch?
@Jarods_Journey8 ай бұрын
My models do have some skipping, but it's not every generation. Unfortunately, the only thing I can say is that my datasets are generally clean, and even with mass my mass transcribed datasets for other languages, those are dirty datasets. I'm not doing anything particularly special in my models.
@aachannel28433 ай бұрын
Can Arab voices be reproduced?
@v3ucn8 ай бұрын
support Chinese?
@stickmanland7 ай бұрын
21:12
@lazar44266 ай бұрын
8:20
@deadwarrior98666 ай бұрын
doesnt work
@Haidnt-c1hАй бұрын
Tiếng Việt nghe chưa ổn lắm ^_^
@taichinh-taman551625 күн бұрын
Bạn biết AI nào train được giọng nói của chính mình bằng tiếng Việt tốt nhất không?
@Haidnt-c1h25 күн бұрын
@@taichinh-taman5516 11Labs bạn
@ללמד_טבעי6 ай бұрын
It's so complicated and the result doesn't sound good either, in short it's a waste of time. We would be happy for a short, simple way with results that sound humane זה כל כך מסובך וגם התוצאה לא נשמעת טוב בקיצור זה בזבוז זמן. נשמח לדרך קצרה פשוטה ועם תוצאות שנשמעות אנושיות
@kushalvirulkar8 ай бұрын
please clone hindi language/
@stepantrekhleb32716 ай бұрын
this shit does not work at all
@peterimade0036 ай бұрын
Do you have a discord channel? It'll be nice to have a community research together on this tool.
@peterimade0036 ай бұрын
How do one get good trained models
@ЗлодейПо8 ай бұрын
I tried to install the necessary ones, but then the new ones were not compatible with something else, maybe I was doing something wrong?, I just cloned git, and then launched setup-cuda In general, I get these errors from the console DEPRECATION: omegaconf 2.1.0 has a non-standard dependency specifier PyYAML>=5.1.*. in pip 24.1, this behavior change will be enforced. A possible replacement is to upgrade to a newer version of omegaconf or contact the author with a proposal to release a version with the appropriate dependency specifiers. ERROR: The pip dependency recognition program currently does not take into account all installed packages. This behavior is the source of the following dependency conflicts. onnxruntime 1.17.1 requires numpy>=1.24.2, but you have numpy 1.23.5, which is incompatible. onnxruntime-gpu 1.17.1 requires numpy>=1.24.2, but you have numpy 1.23.5, which is incompatible. torchcrepe 0.0.20 requires librosa==0.9.1, but you have librosa 0.8.1, which is incompatible. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gradio 4.22.0 requires pydantic>=2.0, but you have pydantic 1.10.15 which is incompatible. ModuleNotFoundError: No module named 'fairseq' UPD: I reinstalled everything here 100 times in general, as I understand it, the main problem is omegaconf, its versions do not match what other programs need, when installing fairseq, omegaconf version 2.0.6 is installed, but then an error appears that (pyannote-audio 3.1.1 requires omegaconf=2.1, but you have omegaconf 2.0.6 which is incompatible.). If you install omegaconf 2.1 then the error (fairseq 0.12.2 requires omegaconf
@Jarods_Journey8 ай бұрын
The dependency conflicts get resolved by reinstalling the requirements.txt file at the end of the installations, though, the biggest concern for me is the Module Not Found one. The fairseq installation is a wheels file that I uploaded to huggingface, that's where the installation resides. It's possible your device is failing to download it from hugging face, hence, why the script isn't installing it.
@ЗлодейПо8 ай бұрын
@@Jarods_Journey It seems that this is exactly the problem, you're right, it's a pity that I can't fix it, because I don't even roughly understand what to do, the latest version of the turtle worked perfectly ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'E:\\Tortoise TTS\\ai-voice-cloning\\fairseq-0.12.4-cp311-cp311-win_amd64.whl' ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'E:\\Tortoise TTS\\ai-voice-cloning\\deepspeed-0.14.0-cp311-cp311-win_amd64.whl' ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'E:\\Tortoise TTS\\ai-voice-cloning\\pyfastmp3decoder-0.0.1-cp311-cp311-win_amd64.whl' I rearranged python because I forgot to add it to the path and opened the console (setup-cuda) with administrator rights, it didn't help, it's a pity, the last option I have left is to look at the problem on hugging face, but I doubt that there will be something worthwhile there, sorry for all these errors, then you are doing really amazing things, thank you for that)
@dthSinthoras8 ай бұрын
How serious should this warning be taken I get while training? UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call. torchaudio.set_audio_backend("soundfile") Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\KI-Stuff\__Sound\TorToiseAnyLanguage\models\torch\whisperx-vad-segmentation.bin` Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.2.2+cu121. Bad things might happen unless you revert torch to 1.x.
@Jarods_Journey8 ай бұрын
Not an issue, should be good to go
@alexisgomes17408 ай бұрын
Hello, I have cuda toolkit 12.4, windows 11, git and python installed. When running the set-up cuda bat I am getting an error while extracting rvc.zip. (error opening archive : failed to open 'rvc.zip') (ERROR: Could not open requirements file: Errno 2 No such file or directory. Then pannel shutt down. What can I do ?