How to Clone Most Languages Using Tortoise TTS

How to Clone Most Languages Using Tortoise TTS - AI Voice Cloning

Рет қаралды 26,989

Күн бұрын

Пікірлер: 155

@Jarods_Journey 8 ай бұрын

Zero-code package is here if you're running into difficulties installing: huggingface.co/Jmica/ai-voice-cloning/blob/main/ai-voice-cloning-3.0.7z Make sure you have the latest 7zip and when you unzip it, run start.bat.

@IDOLSKPOP68 8 ай бұрын

I got an error when using it, I created a folder in the voice section but when I refreshed the voice list it didn't appear in the list. And some other errors

@LagomStorybook 7 ай бұрын

Do you get errors? Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.2.2+cu121. Bad things might happen unless you revert torch to 1.x.

@LJames-ez9lr 6 ай бұрын

Guys remember to update to the latest version of 7zip. Mr. fancy pants used some modern compression. V3 works great after I updated 7zip and extracted. Thank you sir!

@LJames-ez9lr 6 ай бұрын

Does this only work with float16 compute for the training? or can I edit a config file to change the compute type? I have a gtx 1080 float16 doesn't work for me.

@Schawum 4 ай бұрын

@@LJames-ez9lr gleiches problem bei mir. ich habe auch die 1080 ti und float16 läuft nicht.

@anikethhebbar6438 8 ай бұрын

00:01 Install the latest version of AI voice cloning repository and set it up with python and git. 02:21 Install and update CUDA and drivers for smooth operation. 06:29 Recommend disabling Whisper X alignment 08:46 Configuring tokenizer and training settings for AI voice cloning 13:08 Confirming and optimizing model usage 15:25 Organizing audio files for voice cloning 19:32 Use low samples and 30-50 iterations for TTS generation, and recompute voice latent if necessary. 21:42 Tortoise TTS is robust for speech-heavy audio. 25:55 Saving frequency and continuing training in Tortoise TTS 28:01 Explanation of training and modifying the state for longer training

@bomar920 8 ай бұрын

Thank you for contributing to the open sourcecommunity . Your channel deserve more subs . Your content are high quality

@احمدصبيح-خ7و 8 ай бұрын

I trained the program on several hours of Arabic audio. Finally, he started speaking Japanese instead of Arabic

@tipu-j6e 3 ай бұрын

i have heard that it needs atleast 10k hrs of good quality audio

@SAnsAN091190 8 ай бұрын

Jared, thank you for your hard work! I haven't tried the current changes yet, but I will definitely try! At the moment, I have independently converted the code for training non-English languages. I've been training for Cyrillic for 400+ hours of source audio. In the settings, I selected 35 epochs and Batch Size = 1024 (in order to maximize the use of video memory) and Gradient Accumulation Size = 16. The rest of the settings are similar to yours. When generating short phrases (up to 11 seconds) I get silence and strange sounds at the end of the audio. As you mentioned at 19:00, I increased the 'Length Penalty' and 'Repetition Penalty'. This relieved me of the silence, but there are still artifacts (strange sounds and repetitions). I think this is also largely the fault of poor audio splitting using Whisper for Cyrillic (many phrases are cut off in mid-word). Maybe you can tell me how I can try to fix this? And if you have alternative communication channels where you can communicate with like-minded people, I would be happy to join them =) (I saw a mention of Discord somewhere)

@Artholos 8 ай бұрын

Yeah baby! Jarod you’re the hero once again! 🎉 Thank you so much for your hard work!

@AIUnveil 8 ай бұрын

Awesome Bro! was checking your channel almost everyday for this video. You are pretty much the only one doing this stuff. Great work.❤❤

@giovannif2567 8 ай бұрын

You're so talented man! And you make everything look so easy! Happy to be a supporter, and i will continue to be !🚀

@Jarods_Journey 8 ай бұрын

Thank you :), appreciate it!

@blakusp 8 ай бұрын

Wonderful tutorial! Exist the possibility to share the Spanish (or all languages) base model you trained so far? For the people (including myself) that cannot have the resources to train from scratch? :( haha, thanks! PS: I completely understand if you don't want to share it.

@hackpop 7 ай бұрын

I ran into various issues, but at the end, I finally understood that curl was missing from my system, after curl was properly installed, everything went smoothly, thank you Jarod for your contribution, this project is awesome !!!

@LJames-ez9lr 6 ай бұрын

@hackpop hi, what kind of issues were you having and what is curl?

@AlexisGomes-n4r 8 ай бұрын

Hello, I have cuda toolkit 12.4, windows 11, git and python installed. When running the set-up cuda bat I am getting an error while extracting rvc.zip. (error opening archive : failed to open 'rvc.zip') (ERROR: Could not open requirements file: Errno 2 No such file or directory. Then pannel shutt down. What should I do ?

@szymonnawrocki890 6 ай бұрын

Really great videos and content. Thanks to you I'm getting into voice modeling myself

@trollsome 7 ай бұрын

11:19 I can't validate training configuration because it just gives me an error saying "empty data set"

@BorygoTomka 8 ай бұрын

Hey I have a little trouble On step in 7:12 when I clicked Transcribe and Process i got a error: ValueError: Requested float16 compute type but the target device or backend do not support efficient float16 computation. What I need to do to make it work?

@BorygoTomka 8 ай бұрын

No such file or directory: 'training\\Ja\\processed\ un\\dataset\\wav_splits\\file___2\\file___2.srt'

@BorygoTomka 8 ай бұрын

I found the problem but still can't fix it. When running the CUDA setup program, an error pops up conflicts.hydra-core 1.3.2 requires antlr4-python3-runtime==4.9.*, but you have antlr4-python3-runtime 4.8 which is incompatible.hydra-core 1.3.2 requires omegaconf=2.2, but you have omegaconf 2.1.0 which is incompatible What to do?

@heyyanito 7 ай бұрын

Hi Jarod thanks so much for the release and for walking through this process. It's wonderful. Do you have API examples which include the RVC pipeline? I'm not sure the ones listed in gradio on the most recent release include the flags for adding the RVC inference to the request, although I could just be misunderstanding as programming is not something I am very good at :)

@LJames-ez9lr 6 ай бұрын

i got this error when i did the test generate Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 2 in the list. even after I clicked recompute voice latents still same error

@Vxk_yt 6 ай бұрын

did you find a fix? i have the same issue. Tried checking on github... there are a few ppl with the same issue

@manfredice391 5 ай бұрын

Try set "Candidates" to 1

@DM-dy6vn 8 ай бұрын

14:05 The same tokenizer which was used during training has to be selected as well.

@Rewe4life 2 ай бұрын

Hi, i am trying to follow your tutorials, but I am unable to get it to work. I am using a ubuntu-server vm to run it on and it has a tesla p40 gpu. then I am cloning the repo, creating and starting the container. I am on the webui, but as soon as I click anything (generate or even just changing the selected voice), I am getting errors. Do you have any suggestions, what I could try? Or am I missing anything obvious? in example downloading a model manually?

@augustinolarian 8 ай бұрын

Hi. is there any way to import models already trained? Is there are way we can download already trained voices? I am unable to clone voices in Romania. I get an undescriptible audio every time.

@dthSinthoras 8 ай бұрын

While "Transcribe and Process" I get UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: or set allow_custom_value=True. warnings.warn( Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value Error reading comment frame, skipped Incorrect BOM value ... What does it mean? Its running anyways, but I have put in hundrets of hours of audio, so would be great to know, if I should aboard the run...

@younglymak 2 ай бұрын

After refreshing the voice list, I get the error 'Connection errored out.' What should I do?

@alexlazareibanez1047 6 ай бұрын

Hello, it was to know if tortoise v3 supports Spanish language and accent? It is that when I do the voice training I put it in ES language instead of EN but it sounds with English accent the Spanish. I am working with RVC Retrivial xtts2 but I heard that tortoise is better. Thank you.

@Physengineer 6 ай бұрын

Question: I am trying to use the audiobook 3.0 program, which requires a rvc voice, but I can't see how using voice cloning 3.0 software to make an rvc voice and index file required by the audiobook program.

@Physengineer 6 ай бұрын

Well, I guess it can't. But I installed applio. Once I figured out how to use it, it worked great for making voices. I am using the first voice together with the voice cloning and audiobook 3.0 app. They work great, I could not be happier.

@cyberbol 2 ай бұрын

I fallowed your tutorial step by step and at the end when I try generate voice I have this error: error No RVC configuration found, check configs folder. If rvc.json does not exist, please change a setting in the RVC area to create one.

@jr-2nd 5 ай бұрын

When I train a voice, terminal always says "\ai-voice-cloning-3.0>pause" at 98,7% and it doesn't move anymore, it happens in every try. Any solution?

@adamrastrand9409 8 ай бұрын

⁠ but why doesn’t the voice sound like me when I trained it? I only trained for two minutes of data but why is fine-tuned voices good then if you just use the auto regressive modern with a voice sample and when you trained a new language with many hours of data, how do I find tune it them so when I trained the new language does that count as a voice or as a new language I don’t really get it

@adamrastrand9409 8 ай бұрын

Hello so after I’ve trained my tortoise model on my voice, I made a short data set of five minutes with 200 epocs. I think when I select the autoreggressive model from my voice and select none as voice type, it doesn’t sound like me at all or no, it says that none is not classed as an argument, however, when I select random or my voice from the voice folder with the auto aggressive model, it sounds like the latents are computed for another voice with my timbre and such but how do I fix it so it’s completely my own voice should I delete everything from the folder. Should I delete the computer latents file and just keep the audio files or should I delete everything or should I keep the shortest audio file, and when I prepare the tokennicer for another language, is it necessary to have like 70 hours of audiobooks? Also, I wonder do I need a large amount of audio data to prepare the token Iser or is it for training another language and I also have another question so say that you trained a new language with many audiobooks let’s say 50 hours, how do I train a new voice for example a new Spanish voice do I use the previous data set or how do I use the new audio with the new token nicer or new data set I don’t really know

@Jarods_Journey 8 ай бұрын

You've got a lot of good questions on training afterwards, I won't be able to respond to it all in this comment. In general, after training a language, you can run "finetunes" of that language to get specific voices. As for you initial question, I'm not entirely sure on what is happening either. Sometimes, the voice won't sound like you. This is where I have RVC come into play as it helps rematch whatever voice you want to get close to.

@adamrastrand9409 8 ай бұрын

@@Jarods_Journey but why doesn’t the voice sound like me when I trained it? I only trained for two minutes of data but why is fine-tuned voices good then if you just use the auto regressive modern with a voice sample and when you trained a new language with many hours of data, how do I find tune it them so when I trained the new language does that count as a voice or as a new language I don’t really get it and how does it sound like when I have two little data for a new language say Norwegian Swedish or any other language will it sound like that language and how many training data do I need to prepare the token nicer?

@dani0001 4 ай бұрын

I downloaded the Huggingface version and started training a Hungarian language voice model with it. However, for some reason, I can't reach the stage of text generation. (RE)Compute Voice Latents runs indefinitely, then I get a CUDA Out Of Memory error. Additionally, it also tries to generate the text forever. I am using everything according to the settings shown in the video. What could be the problem? Is my NVIDIA GeForce GTX 1660 6GB video card and the 32GB RAM in my computer insufficient for this? Thank you very much in advance for your response!

@aheront3541 2 ай бұрын

Connection errored out.

@ToukoWhite 6 ай бұрын

After I click train I get this error " RuntimeError: CUDA error: device-side assert triggered [Training] [2024-06-23T23:29:29.443537] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. [Training] [2024-06-23T23:29:29.443537] For debugging consider passing CUDA_LAUNCH_BLOCKING=1. [Training] [2024-06-23T23:29:29.443537] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions." Anyway to fix it?

@AlexisGomes-n4r 8 ай бұрын

When running start.bat i am getting the error : no module named 'psutil'

@oosixcosoo-yt5548 8 ай бұрын

I think it's because you don't have Python 3.11 installed. If you want it to run normally, just reinstall python at 3.11 version and rewatch the video

@StefanHackbarth-xz7ee 6 ай бұрын

Help please... when i'm running Training i get these error message: 'utf-8' codec can't decode byte 0x81 in position 2: invalid start byte

@rubenrodenascebrian3855 6 ай бұрын

Great video and great repository, thank you very much for your work. I AM HAVING A PROBLEM... I train the model in Spanish, set "ES" for Whisper to recognize the Spanish language, but when I finish the training, it speaks with an English accent, but totally English. Why is this happening? Thank you very much!!!

@chiyanchandru5914 4 ай бұрын

how can i run i already have an transcription data with audio

@sirjared21 3 ай бұрын

Have run into issues using this. For one voice, I keep getting a 'CUDA out of memory' error where it tries to set aside something like a terabyte of RAM, lol - it didn't happen before using the same voice and settings, but just randomly happened. For all voices, if you use the en_tokenizer when generating after training, the outputted voice sounds utterly insane/incomprehensible. Switching back to '/modules/tortoise-tts/tortoise/data/tokenizer.json' fixes it. There really needs to be a guide for all the sliders and what they mean. There needs to be some kind of "recommended" settings for training epochs, etc. I've done 80 for a voice and it took like 8 hours. For another voice it took only 2, so I guess it depends on how big of a data set you have. This is my first foray into ai voice cloning - doing it for a mod project, and so far it's been frustrating. After all's said and done, I've yet to create a realistic copy of a voice.

@BZAKether 2 ай бұрын

I can't generate anything, when I try I get a "Error with Pydantic and startlette" and "ERROR: Exception in ASGI application" :(

@stellarbuddy Ай бұрын

Same

@احمدصبيح-خ7و 8 ай бұрын

Thank you for your wonderful explanation, which many people learn from. I want to tell you that I have been using text-to-speech programs for a long time, but they are weak for the Arabic language. Finally, I applied the explanation in this video, and at the end a message appears stating that the CUDA memory is full. Perhaps the reason is due to my lack of sufficient knowledge in applying this explanation. I hope to apply this explanation to an audio sample of the Arabic language so that I can apply it and explain the numbers entered and divide them by two because I did not understand their exact meaning.

@Jarods_Journey 8 ай бұрын

CUDA memory being full means you're GPU VRAM is too small. Recommend that you start at batch size = 1 and gradient accumlation = 1. Then if training start with these settings, you can restart (close the browser window) increase batch size by 1, save configuration, and keep doing this until you run out of memory again. With this, you'll know what the smallest batch size you can use is.

@AlexisGomes-n4r 8 ай бұрын

I dont have rvc.zip while downloading

@mosambielal6700 5 ай бұрын

Can you please guide me on how did you added emotions tab? And how can we add other emotions here?

@farsi_vibes_edit 8 ай бұрын

please help i get this error G:\tortiois\ai-voice-cloning>call .\venv\Scripts\activate.bat Traceback (most recent call last): File "G:\tortiois\ai-voice-cloning\src\main.py", line 23, in from utils import * File "G:\tortiois\ai-voice-cloning\src\utils.py", line 41, in from tortoise.api import TextToSpeech as TorToise_TTS, MODELS, get_model_path, pad_or_truncate ModuleNotFoundError: No module named 'tortoise'

@farsi_vibes_edit 8 ай бұрын

i get this error when i click on the start.bat

@francisgoeltner5569 6 ай бұрын

Hello Jarod! First of all: Awesome video and a great channel you have there. Really helpful stuff! I experienced a bit of a problem though with the training continuation process as you described it. I had exactly the outlined problem with a crashed console and tried to resume from the .state file of the previous run. Configuration import and setting of the old state as resume state path worked nicely, but when I try to run the training I get this message: PermissionError: [WinError 5] Access is denied: './training\\Voice2Train\\finetune' -> './training\\Voice2Train\\finetune_archived_240624-081331' The file named indeed does not exist. Did I miss something here? Any help would be greatly appreciated!

@nottobemessed4628 7 ай бұрын

you have set the large-v3 model as default how do we change that model to lower one , like medium or small ?

@pupattolino75 8 ай бұрын

I followed the installation but I received this error: from rvc_pipe.rvc_infer import rvc_convert ModuleNotFoundError: No module named 'rvc_pipe'

@stevecato 8 ай бұрын

Can rvc be cloned from the same dataset? If using it, how much effort should go into training tortoise vs rvc? Thanks.

@Jarods_Journey 8 ай бұрын

Yep, it should be generally. But rvc only requires 10-60 minutes of audio. Can't really determine, but rvc is generally easier to get it matching. Tortoise is more important though to get the style of how a character speaks, etc

@Nickfulcrium 2 ай бұрын

Once you transcribe from a certain language can it speak any language ?

@emmanueltoussaint2466 8 ай бұрын

Thank you so much for that one. But I keep getting that error: RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory And I got this warning just before. !!!! WARNING !!!! No GPU available in PyTorch. You may need to reinstall PyTorch. Loading TorToiSe... (AR: None, diffusion: None, vocoder: bigvgan_24khz_100band) No hardware acceleration is available, falling back to CPU... What can I do to solve this please?

@francsharma7276 5 ай бұрын

I tried hindi 2hr of voice with 300 epoch, and it say 4 days after 6hr it just stop loading, my graphic card is 3070ti

@Bichos28-bg4nm 6 ай бұрын

Hi, a French speaking person, I would like to hear a Portuguese text spoken. Is this possible with ai-voice-cloning 3.0?

@swedishcat7448 8 ай бұрын

Awesome tutorial and fantastic job on making this, for people to use. I do have a weird bug or something I need some assistance with, if you may. I'm training a model with English speech and have set the language to en. But when I generate a prompt, the voice is in Japanese (or something like that). I don't quite understand where that come from. Is there any other settings I can change? Thanks.

@Jarods_Journey 8 ай бұрын

I also don't know where that's coming from. Make sure the tokenizer is English

@swedishcat7448 8 ай бұрын

@@Jarods_Journey Thanks for answering, Yeah, the tokenizer is the "en_tokenizer" as in the video but I still get that. Is there any other fix? Can I provide any logs or something for you to see anything? I really like the turtoise TTS

@bigadz87 8 ай бұрын

@@Jarods_Journey I am getting the same issue, followed your video exactly

@JanPeter56 7 ай бұрын

Same here lmao, had it training on 12 minutes of clean af audio of Christopher Lee talking, the first time all i could get was whale noises, and the 2nd time training it sounded like a japanese whale Edit: It must be the tokenizer i think. when i select the default "./modules/tortoise-tts/tortoise/data/tokenizer.json" instead of "./models/tokenizers/en_tokenizer.json" The model suddenly created clear english audio

@bwheldale 6 ай бұрын

@@JanPeter56I tried this and mine went from gibberish to English which was hopeful, but the accent was too 'English' instead of Australian. An audio language detection site said it was 65% English though it sounded more like German.

@WonderWhat1000 8 ай бұрын

Hello Jarod, I have a question about cloning English voices in AI voice cloning. So there is a voice I want to clone with an hour of data , so how many epoch are needed to clone a good voice. can you please elaborate this part. Thank you

@Jarods_Journey 8 ай бұрын

If you're doing english, for an hour, I'd train to about 50 epochs first and see how that sounds. You can always train longer if you determine the need to

@himelhs 6 ай бұрын

My laptop is intel can i didn't use it ?

@gmfPimp 7 ай бұрын

Thanks for your effort. FYI, you are not using MP3s, your file extension is MP4. MP3s have better audio quality than mp4.

@miyrrecs3024 6 ай бұрын

I did all the steps according to the video for the Spanish language with fluent input, but what I get is a messy voice like 'dysphasia'.

@francsharma7276 5 ай бұрын

can we train model in parts, if yes please make a video, "for 10 epoch and 2hr of audio" it tool 4 hours

@Vaultcitizen 8 ай бұрын

I installed 2 versions in different folders. How should I uninstall the older one? Simply delete the folder, or there is a better way (through cmd) ? Thanks for your work and making it easy to test :)

@CyberPhonkMusic 8 ай бұрын

How much GPU do you need to train a new language like Brazilian Portuguese? Do you need 25 hours of audio? Does it always have to be the same person speaking?

@mydreams3437 8 ай бұрын

how much time duriation of input voice file..can i take mp3 file

@farsi_vibes_edit 8 ай бұрын

tnx i really nThank you. I really needed this software and your training. I am installing it. I hope I won't have any problems.

@iQOmni 8 ай бұрын

You are amazing thanks for all that you do

@edgarl.mardal8256 6 ай бұрын

Hi, are you pinoy? I was wondering if I could ask for help to create a AI Cold sale Agent with norwegian LLM and train a TTS to talk fluent Nowergian?

@frh1700 5 ай бұрын

when i try to train the model i have this error Missing dataset: ./training/test//whisper.json

@zenkidpress2271 8 ай бұрын

Hello Jerods, it would be nice if you trained the voices in other languages for the community and then shared everything (even charging a sum because you obviously spent time training the other languages), I would gladly pay 🙂

@tempertephra 8 ай бұрын

agree may be good to share other lan files in the community. please initiate.

@ph0enixph0enix65 6 ай бұрын

In case you're willing to do so, I would need a female german voice model. I would also gladly pay for it.

@Djamel__LD 8 ай бұрын

can i use it with Intel Iris(R) Xe 16 GB GPU ?

@Vlad-hm7cj 8 ай бұрын

Does this work on linux? My windows machine has an AMD instead of an Nvidia... T-T

@craigcarter1572 6 ай бұрын

setup-cuda says Python 3.11 not installed, however when I run >python --version it sees the version 3.11

@Vxk_yt 6 ай бұрын

try uninstall and reinstall, if you have other versions uninstall all of them, also click on add to path when installing

@craigcarter1572 6 ай бұрын

@@Vxk_yt I found the issue, when installing python, even though I checked the box for path setup, win 11 did not update environmental settings. I added paths manually and it fixed everything. Reminds me of the old MS DOS days. Thanks much for your reply.

@IDOLSKPOP68 8 ай бұрын

Is there any way to install on linux?

@threepe0 6 ай бұрын

stuck on "No module named 'vc_infer_pipeline'"

@CINECOMBO 5 ай бұрын

same problem

@fdgfdgdfgdfgfdgdf 8 ай бұрын

when i push train: ModuleNotFoundError: No module named 'axial_positional_embedding'

@trickydicky961 8 ай бұрын

I get the same error.

@stickmanland 7 ай бұрын

How much time would it take to train?

@ywueeee 7 ай бұрын

can this run on mac?

@SyamsQbattar 4 ай бұрын

Is it support Indonesia languange?

@NT3wazLcUqwA 7 ай бұрын

how about cantonese languages ?

@RexVergstrong 4 ай бұрын

I'm getting this error when I validate the training config. [Errno 2] No such file or directory: './training//train.txt'

@RexVergstrong 4 ай бұрын

There's a train.json in that folder but in my model folder there was a train.txt. I copied into the training folder directly and it seemed to work for now.

@kernsanders3973 8 ай бұрын

Thank you!

@RA-ss5fe 7 ай бұрын

1. will it work for Urdu/Hindi language? 2. will it work with any type of nvidia gpu? i mean with low end gpu ? 3. how much space in harddrive does it requires?

@akemixx._0 8 ай бұрын

Does this already support Brazilian Portuguese?

@farsi_vibes_edit 8 ай бұрын

ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them. unknown package: Expected sha256 f9ef0a648310435511e76905f9b89612e45ef2c8b023bee294f5e6f7e73a3e7c Got 887e84fc28f6772ed033ff6d269a01179021bf974277d1b1859c9654541781ba

@farsi_vibes_edit 8 ай бұрын

i get this error but downloading is still continue is this ok?

@MaorStudio 8 ай бұрын

You are awesome.

@StringerBell 8 ай бұрын

Followed every single step and ended with jibberish mess in Bulgarian language. Trained on 3 hours of studio quality voiceovers for 500 epoch (save every 20 epoch)

@Jarods_Journey 8 ай бұрын

Issue with training another language is the model needs to generalize and 3 hours isn't sufficient enough for that. I'd say you wanna start with at least 25 hours of ANY language data in bulgarian, for this, you could probably scrape from audiobooks. Even 25 hours may produce a rough model, so the more the better. After this, we'll call this a bulgarian base model. Now that you have a base model, you can then "finetune" it for a voice style you want. Though, it's a lot to talk about in a comment so I'll think about making a followup video

@StringerBell 8 ай бұрын

@@Jarods_Journey Is there a way to train a language and then change the style to super EXCITED and over the top performance? Also can I use random voices? Male and female for the training?

@StringerBell 8 ай бұрын

@@Jarods_Journey How to finetune a base model is a super interesting topic. Adding emotion or style to it. Please do make a video when you can, it will be imennsly helpful followup to this tutorial!

@StringerBell 8 ай бұрын

I just trained for 29 hours on my RTX 4090 on 48 hours of studio quality audiobooks in Bulgarian. Let's say the result is underwhelming.

@satyajitroutray282 4 ай бұрын

@@StringerBell Did you successfully trained your model in Bulgarian language with that amount of data?

@WorldYuteChronicles 8 ай бұрын

big up!

@MadeEasyTube 8 ай бұрын

Thank you

@MuratAtasoy 3 ай бұрын

I train for Turkish language with 5 min voice record, results are nonesence :) like a new language lol. This is only for english?

@MuratAtasoy 3 ай бұрын

a day's work gone trash

@Airbender131090 8 ай бұрын

Does it work with Russian?

@Jarods_Journey 8 ай бұрын

It should work, after having trained 4 languages so far to a good degree of accuracy, I don't see why not. Just make sure you have enough data and it should be fine to run :)

@Test-ep7gg 5 ай бұрын

And will Bulgarian work? There are good results on sites, I prefer to use the resources of such projects, but no matter how many projects I install, there is still no Bulgarian language. This one is currently giving me an error and I don't know if I should bother doing it. ImportError: cannot import name 'RootModel' from 'pydantic'

@yasenkey3779 4 ай бұрын

@@Test-ep7gg did it work for bulgarian

@tylerchambliss8379 8 ай бұрын

I don't understand why your models aren't skipping. I still can't make my books bro. What are you doing? How are you making these models not skip and glitch?

@Jarods_Journey 8 ай бұрын

My models do have some skipping, but it's not every generation. Unfortunately, the only thing I can say is that my datasets are generally clean, and even with mass my mass transcribed datasets for other languages, those are dirty datasets. I'm not doing anything particularly special in my models.

@aachannel2843 3 ай бұрын

Can Arab voices be reproduced?

@v3ucn 8 ай бұрын

support Chinese?

@stickmanland 7 ай бұрын

21:12

@lazar4426 6 ай бұрын

8:20

@deadwarrior9866 6 ай бұрын

doesnt work

@Haidnt-c1h Ай бұрын

Tiếng Việt nghe chưa ổn lắm ^_^

@taichinh-taman5516 25 күн бұрын

Bạn biết AI nào train được giọng nói của chính mình bằng tiếng Việt tốt nhất không?

@Haidnt-c1h 25 күн бұрын

@@taichinh-taman5516 11Labs bạn

@ללמד_טבעי 6 ай бұрын

It's so complicated and the result doesn't sound good either, in short it's a waste of time. We would be happy for a short, simple way with results that sound humane זה כל כך מסובך וגם התוצאה לא נשמעת טוב בקיצור זה בזבוז זמן. נשמח לדרך קצרה פשוטה ועם תוצאות שנשמעות אנושיות

@kushalvirulkar 8 ай бұрын

please clone hindi language/

@stepantrekhleb3271 6 ай бұрын

this shit does not work at all

@peterimade003 6 ай бұрын

Do you have a discord channel? It'll be nice to have a community research together on this tool.

@peterimade003 6 ай бұрын

How do one get good trained models

@ЗлодейПо 8 ай бұрын

I tried to install the necessary ones, but then the new ones were not compatible with something else, maybe I was doing something wrong?, I just cloned git, and then launched setup-cuda In general, I get these errors from the console DEPRECATION: omegaconf 2.1.0 has a non-standard dependency specifier PyYAML>=5.1.*. in pip 24.1, this behavior change will be enforced. A possible replacement is to upgrade to a newer version of omegaconf or contact the author with a proposal to release a version with the appropriate dependency specifiers. ERROR: The pip dependency recognition program currently does not take into account all installed packages. This behavior is the source of the following dependency conflicts. onnxruntime 1.17.1 requires numpy>=1.24.2, but you have numpy 1.23.5, which is incompatible. onnxruntime-gpu 1.17.1 requires numpy>=1.24.2, but you have numpy 1.23.5, which is incompatible. torchcrepe 0.0.20 requires librosa==0.9.1, but you have librosa 0.8.1, which is incompatible. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gradio 4.22.0 requires pydantic>=2.0, but you have pydantic 1.10.15 which is incompatible. ModuleNotFoundError: No module named 'fairseq' UPD: I reinstalled everything here 100 times in general, as I understand it, the main problem is omegaconf, its versions do not match what other programs need, when installing fairseq, omegaconf version 2.0.6 is installed, but then an error appears that (pyannote-audio 3.1.1 requires omegaconf=2.1, but you have omegaconf 2.0.6 which is incompatible.). If you install omegaconf 2.1 then the error (fairseq 0.12.2 requires omegaconf

@Jarods_Journey 8 ай бұрын

The dependency conflicts get resolved by reinstalling the requirements.txt file at the end of the installations, though, the biggest concern for me is the Module Not Found one. The fairseq installation is a wheels file that I uploaded to huggingface, that's where the installation resides. It's possible your device is failing to download it from hugging face, hence, why the script isn't installing it.

@ЗлодейПо 8 ай бұрын

@@Jarods_Journey It seems that this is exactly the problem, you're right, it's a pity that I can't fix it, because I don't even roughly understand what to do, the latest version of the turtle worked perfectly ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'E:\\Tortoise TTS\\ai-voice-cloning\\fairseq-0.12.4-cp311-cp311-win_amd64.whl' ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'E:\\Tortoise TTS\\ai-voice-cloning\\deepspeed-0.14.0-cp311-cp311-win_amd64.whl' ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'E:\\Tortoise TTS\\ai-voice-cloning\\pyfastmp3decoder-0.0.1-cp311-cp311-win_amd64.whl' I rearranged python because I forgot to add it to the path and opened the console (setup-cuda) with administrator rights, it didn't help, it's a pity, the last option I have left is to look at the problem on hugging face, but I doubt that there will be something worthwhile there, sorry for all these errors, then you are doing really amazing things, thank you for that)

@dthSinthoras 8 ай бұрын

How serious should this warning be taken I get while training? UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call. torchaudio.set_audio_backend("soundfile") Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint C:\KI-Stuff\__Sound\TorToiseAnyLanguage\models\torch\whisperx-vad-segmentation.bin` Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x. Model was trained with torch 1.10.0+cu102, yours is 2.2.2+cu121. Bad things might happen unless you revert torch to 1.x.

@Jarods_Journey 8 ай бұрын

Not an issue, should be good to go