Local voice cloning with 6 seconds audio

Local voice cloning with 6 seconds audio | Coqui XTTS on Windows

Рет қаралды 45,830

Күн бұрын

Пікірлер: 222

@toykotokyoto Жыл бұрын

another great video, Thorsten 👏 We have a happy update... you can now use unlimited audio for the 0-shot clone :D no longer are you limited to just 6 seconds. The HuggingFace space is still hard coded to max out at 30 seconds though... so we don't overload their servers 😆

@ThorstenMueller Жыл бұрын

You're very welcome and thanks for the update 😊.

@juanjesusligero391 Жыл бұрын

This is great news! :D You probably should make another video comparing the quality differences between the 6 seconds and 30 seconds input audio! (or maybe more, if you can change that max value in the local installation) ^^ @@ThorstenMueller

@ThorstenMueller Жыл бұрын

@@juanjesusligero391 An audio samples comparison video with different audio input length is already in the making 😉.

@tsunderes_were_a_mistake 11 ай бұрын

Does the output sound better with longer audio? I tried the Japanese version on hugging face and output sounded robotic.

@ThorstenMueller 11 ай бұрын

@@tsunderes_were_a_mistake In my german model i didn't encounter a change depending on the text length. But i did not exactly check this specific aspect. If you think this would be helpful i can give it a more specific try (with a german model). But i can't say anything about the Japanese model.

@juanjesusligero391 Жыл бұрын

I was exactly like you, I also had too high expectations for Coqui XTTS, haha ^_^ While the outcome wasn't quite what I was expecting, the results are still quite impressive, especially considering they are based on just a 6-second sample. I was also really happy to read in the comments that the devs are working on improvements, like allowing for voice samples longer than 6 seconds. I loved the video! Thanks a lot for your work, Thorsten! ^^

@ThorstenMueller Жыл бұрын

Thanks a lot for your nice feedback 🥰.

@Reincarnated_Recap 7 ай бұрын

omg, the quality is so good compared to all the other voice-cloning TTS

@schakuun1995 Жыл бұрын

Great video! I'm really getting into TTS and it's so exciting to see what's possible now. It's incredible how something that needed hours of data a year ago can now be done in just 6 seconds. It's fascinating to watch this tech evolve

@ThorstenMueller Жыл бұрын

Thank you for your nice feedback 😊. I'm really curious to see where quality is going in near future.

@MohanPoornachandra 2 ай бұрын

Thanks a lot. I was wanting to train a model from many days and was thrashing with various errors. This solved everything

@secondaccount5512 Жыл бұрын

Great video, expectations after listening to the interview with Josh were high, but XTTS is still kinda new, so I am excited for the future improvements.

@ThorstenMueller 11 ай бұрын

I'm excited too 😊.

@Cmapukan 9 ай бұрын

Thanks for the good explanation and clear example. I wish you prosperity and new opportunities. I apologize for my broken English.

@ThorstenMueller 9 ай бұрын

Thank you for your nice comment. I wish you all the best, too 😊.

@nuborn.studio 10 ай бұрын

Nettes Tool und großen Respekt an den Entwickler! Ich finde die Idee super, allerdings könnte ich persönlich nichts mit der Qualität anfangen. Aber hey, für 6 Sekunden input ist dass doch ein mega Ergebnis finde ich!

@ThorstenMueller 10 ай бұрын

Dem kann ich mich anschließen 😊.

@nerdynav Жыл бұрын

Hi Thorsten, I am a computer engineer and AI KZbinr myself (who isn't nowadays? haha :P). Just wanted to say that you make great tutorials on AI voice. I stumbled on this tutorial while exploring Coqui and it is the best tutorial I found. Thanks for taking the time to do these. Also, a subscriber asked me for a resource on Coqui TTS tutorials on reddit, I have shared your channel! Keep up the great work.

@ThorstenMueller Жыл бұрын

Hi 👋. Thanks for your kind feedback on my content 😊. You're right, we are not alone on AI content 😆.

@ThatGuyNamedBender 10 ай бұрын

Pretty much 95% of youtube and the working class are against AI lmfao but keep daydreaming

@MarcoManzo Жыл бұрын

Great! I was looking forward to this, only got it running on linux. Thank you for the tech support ;-)

@MarcoManzo Жыл бұрын

😂 maybe cuda is exactly my problem on windows🤷‍♂

@ThorstenMueller Жыл бұрын

Thanks and you're welcome 😊. I'm happy if people find my videos helpful.

@__________________________6910 Жыл бұрын

Sir, your explanation is very easy to understand.

@ThorstenMueller Жыл бұрын

Thank you, happy to hear that 😊.

@davidtindell950 3 ай бұрын

Thank You Yet Again! P.S. In addition to "Schei? Encoding" ... I am a fan of: "CAUTION I TEST IN PRODUCTION".

@ThorstenMueller 2 ай бұрын

Nice one 😆

@chrispeters8295 9 ай бұрын

Thank you for the super informative video! You're awesome!

@ThorstenMueller 9 ай бұрын

Wow, thanks a lot for your nice feedback 😊.

@amp3253 Жыл бұрын

Could you help, please? tts : The term 'tts' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again. At line:1 char:1 + tts --list_models + ~~~ + CategoryInfo : ObjectNotFound: (tts:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException

@ThorstenMueller 11 ай бұрын

Did you use a python venv? Is this activated when try to run "tts" command? Does "pip list" show you an installed TTS package?

@CatonSilver 11 ай бұрын

amazing video! I am wondering if it's possible to train a given voice and then just use that voice for future use. In the "clone your voice locally" section, the code requires the reference audio as an input. I'm thinking in terms of efficiency and that if you plan to use the same voice over and over, you shouldn't need to train the model each time.

@ThorstenMueller 11 ай бұрын

Good question. I didn't think about that - up to now.

@AmrAli-ig2mk 8 ай бұрын

Thanks a lot for your efforts. you are doing great work, keep it up.

@ThorstenMueller 8 ай бұрын

Thank you a lot for your kind feedback - this keeps me motivated 😊

@terryjones2213 9 ай бұрын

What is your python version?

@anarmustafayev9145 Жыл бұрын

Genau das haben wir gesucht. Herzlichen Dank 👍

@ThorstenMueller Жыл бұрын

Das freut mich sehr 😊.

@dempa3 2 ай бұрын

This seems very useful, but when I run "pip install tts", I get "Error compiling Cython file", and the operation breaks.

@ThorstenMueller 2 ай бұрын

Strange, which python version are you using?

@КравчукІгор-т2э 2 ай бұрын

Ich danke Ihnen vielmals. Sehr gutes Video. Deutsche Ordnung in allem!

@TomiTom1234 Жыл бұрын

Can you please tell me what program did you use to run the codes on @15:28 ?

@ThorstenMueller Жыл бұрын

Sure, it's a code editor from Microsoft, called "Visual Studio Code".

@madhushantan1887 2 ай бұрын

Hi, now that Coqui is shutting down, we can’t use the model via API? I find trouble using the model like that. for the import code: “from TTS.api import TTS” module not found

@ThorstenMueller 2 ай бұрын

Might be a problem with your local installation, too. Does "pip list" show a TTS package?

@Name-is2bp 7 ай бұрын

did you make a tutorial on how to install and use cuda?

@ThorstenMueller 7 ай бұрын

No, not yet. But interesting idea. I've added it on my TODO list 😊.

@LeSchurke 7 ай бұрын

nices video ;) und ei gude wie? Is it better, when the ref voice is longer than 6 sec? or doesn't matter or more worse? 00:43

@ThorstenMueller 7 ай бұрын

Ei subba, freut mich', dass des Video gefällt :) According to my talk with co-founder of Coqui AI, Josh Meyer, the model is optimized for a 6 second audio input. Before trying longer audio input try using other 6 second clips.

@PlayGameToday 7 ай бұрын

Hello, sir Thorsten! The title of the video doesn't really capture the point. Unfortunately, I didn't find in your video how to start the GUI for Coqui TTS. In the title to the video you have stated - XTTS - and just I was hoping that I could run the gradio-gui that was at the beginning of your video. Too bad you don't have a video tutorial on how to deploy on your local machine the handy GUI for voice generation that was in the demo.

@ThorstenMueller 7 ай бұрын

Do you mean the Huggingface UI from the video?

@PlayGameToday 7 ай бұрын

@@ThorstenMueller Yes

@tobiasd2755 4 ай бұрын

Sehr gut erklärt. Ich hatte von dem video jedoch erhofft, nicht nur einen einzelnen speech zu erstellen, sondern mein eigenes model abzuspeichern, so dass es dann z.B. unter tts --list_models auftaucht oder ich es zumindestens bei --model_name angeben kann. Ist das auch möglich?

@ThorstenMueller 3 ай бұрын

Vielen Dank 😊. Die "--list_models" Option zeigt Informationen aus der .models.json Datei aus dem Repo an. Du könntest versuchen dein Modell in der Datei lokal bei dir einzutragen. Du hast also bereits ein eigenes Modell trainiert?

@hashtag_ Ай бұрын

For anyone coming recently, the tts repo isn't maintained anymore according to an issue post on the github. It results in an error when running 'pip install tts'. This fork worked for me instead: 'pip install coqui-tts'

@ThorstenMueller Ай бұрын

Thanks for that fork hint 👍🏻. Maybe an issue with a (too new) python version.

@Aiolia_Games 8 ай бұрын

Posso usar essa voz para narrar um vídeo no KZbin?

@IvarDaigon 9 ай бұрын

I've been using coqui for months and it's amazing that Coqui simulates breathing at all, but breathing is typically the most distorted part of the generated the audio which can make it sound unnatural.. I'm wondering if you remove the breathing from the source audio whether that will improve the quality of the cloned voice or whether the distorted breathing is just a symptom of the underlying model.

@ThorstenMueller 9 ай бұрын

I've no idea how this could work. Maybe it helps if you use audio tools to cut out your breathing from the recording you provide to XTTS. Or maybe there are audiofilters like sox or ffmpeg that can remove breathing sounds from the generated audio.

@MrScesher 11 ай бұрын

Hi Thorsten, I can't get it to run. I always receive "No module named 'TTS.api'; 'TTS' is not a package" Even though the tts package is installed. Pip lists it in the installed packages. The few threads I found are no help. Maybe you have an idea?

@ThorstenMueller 11 ай бұрын

This is strange. If "pip list" shows the tts package then it seems that everything is installed correctly. Are you running your python script really in the right python venv? Can you run "tts --help" in the command line successful?

@MrScesher 11 ай бұрын

@@ThorstenMueller The tts command in the console works. tts --list_models too. And yes i am running the created venv.

@MrScesher 11 ай бұрын

@@ThorstenMueller I managed to get it running briefly when I use the setup of the git repo. But it is only working in that terminal and after closing it everything is gone with it. Thats not a solution, because the setup is taking too long.

@saadjutt1660 4 ай бұрын

Is there any way we can push this trained model to huggingface? Like once we give the audio sample and next time when pushed to huggingface hub we only need to pass the text to generate the audio with respective voice?

@ThorstenMueller 4 ай бұрын

Do you mean the actual model or a space to use the model out of the box?

@timo1949 10 ай бұрын

Sehr sehr guter Kanal! 👍 Ich habe mich gefragt: Was ist denn der Grund für die doch niedrige Samplingrate von 22.050Hz im ThorstenVoice Dataset? Einfach eine schnellere Vearbeitung der Daten?

@ThorstenMueller 10 ай бұрын

Vielen Dank für deine tolle Rückmeldung 😃. In den Tests war in der Audioausgabe kaum ein Unterschied hörbar, dafür aber war der Rechenaufwand bei bspw. 44kHz merklich höher.

@timo1949 10 ай бұрын

@@ThorstenMueller Danke für die Info. Elevenlabs will ja für ein Professional Voice Cloning auch nur 128kbps mp3 und meint, dass kein Nachteil feststellbar ist. Sehr interessant, wie die AI das verarbeitet.

@marcinziajkowski3870 7 ай бұрын

Can we create ready to use object instead of "speaker_wav" list passed every time we generate "output.wav" ? to speed up process ?

@ThorstenMueller 7 ай бұрын

As i'm not sure, i'd recommend asking on Coqui community on github. But as Coqui AI (the company) has shut down, i'm not sure on how fast you might get a reaction.

@PlayGameToday 7 ай бұрын

What parameters I need to include to make audio output more quality? It's looks like only 96kbps bitrate..

@ThorstenMueller 7 ай бұрын

Normally generated output is the same samplerate as the voice dataset the model has been trained on. Maybe you can use tools like ffmpeg to adjust samplerate afterwards, but i doubt if this will increase the quality.

@PlayGameToday 7 ай бұрын

@@ThorstenMueller I need to train my own model in 48KHz, so the output will be more quality

@ari4340 10 ай бұрын

Hello! I've been using this on hugging face for a few months, but today when I went to the page this error appears: Runtime error Scheduling failure: not enough hardware capacity Container logs: Fetching error logs... Any idea of what's happening? Thank you!

@ThorstenMueller 10 ай бұрын

According to the error message the XTTS container does not have enough compute power on Huggingface platform. This might be a temporary problem or might relate to the shutdown of Coqui AI as a company.

@ari4340 10 ай бұрын

@@ThorstenMueller Thanks for your reply! I hope it's not the later, It's the only free and online option that I knew of 😓

@MYODM. 4 ай бұрын

Can I hire you for a few hours? I need help with a project that’s deeply personal and I would like to go the local hosting route.

@ThorstenMueller 3 ай бұрын

Feel free to contact me here (with some additional info). www.thorsten-voice.de/en/contact/

@ricardorey259 11 ай бұрын

Hello, good video, do you know how to remove the character limit restriction when writing? Warning: The text length exceeds the character limit of 239 for language 'es', this might cause truncated audio.

@ThorstenMueller 11 ай бұрын

Thanks for your nice feedback 😊. Hmm, not really. Earlier we sometimes run into a "max_decoder_steps" which caused truncated audio, but i'm not sure if this applies here too.

@congtaihu1287 8 ай бұрын

thank you for this video! i am running into problems. when i execute the script, it shows "AssertionError: CUDA is not availabe on this machine.". But i have cuda12.3 and compatible torch and my other ai software ran well. i have no idea what is happening. please help!

@ThorstenMueller 8 ай бұрын

Does it work if you use it with "use_cuda false" in general?

@davidtindell950 3 ай бұрын

Using my local PC GPU: Cloned Voice WORKED WELL ... and ... sounded 'somewhat ' like me BUT actually BETTER than me ( bolder and stronger ) !!!!

@elplayeravefenix2280 2 ай бұрын

this work for you actually??????

@davidtindell950 2 ай бұрын

@@elplayeravefenix2280 Yes. Not very well but it ‘worked’. On other projects I have found that more voice samples worked better but takes time. Ok.

@rogerperez9856 11 ай бұрын

Hello, do you know why when converting a text of about 500 words it takes about 25 minutes?

@ThorstenMueller 11 ай бұрын

I didn't try it with such long texts. Is it faster when you split it into smaller pieces and put the chunks together in post generation?

@spiritual_audiobooks 7 ай бұрын

What do you say to Applio TTS? Maybe the best Open Source TTS?

@ThorstenMueller 7 ай бұрын

I didn't heard about Applio TTS. You say it's worth giving it a try?

@Zimba-box 10 ай бұрын

I got this line or error code when I wanted to in the wheel -U: ERROR: Could not build wheels for tts, which is required to install pyproject.toml-based projects how to fix that?

@ThorstenMueller 10 ай бұрын

Did you update pip to latest version first - "pip install pip setuptools wheel -U"?

@64jcl Жыл бұрын

Btw, how do I get the gpu parameter to work. I have a 3000 series GPU but even if I select gpu=True it says CUDA is not available. Also I have noticed that the cloned voice from my own speech shifts to sometimes output british accent and sometimes american (likely because my accent is neither). But it also means it is impossible to get consistent results with this. Is there some way to save a snapshot of whatever it came to was "the voice" and reuse that as input on subsequent generations. If not it is quite useless and just a fun demo really.

@ThorstenMueller Жыл бұрын

Did you install CUDA and is it working? There are Python code sniplets available to check if CUDA is working.

@МихаилЮрков-т1э 9 ай бұрын

Thanks for the informative video and interesting presentation. Please make a guide on how to train a model on a custom dataset.

@ThorstenMueller 9 ай бұрын

Thanks for your nice feedback 😊. This topic is already on my (growing) TODO list.

@alexlavertyau 11 ай бұрын

I have tried a some voice cloning tools and provided my voice as a reference audio, but none of the results sound anything like me... : ( I have an australian accent but the generated voices come out with American accents, not sure what I'm doing wrong.

@ThorstenMueller 11 ай бұрын

I guess you're doing nothing wrong. Maybe the english model has been trained on a voice dataset with hours of native english speaking people and one phrase has not enough "power" to change the accent. Normally i'd recommend asking in Coqui TTS community, but as Coqui is shutting down, it might take some time to get an answer, because of other priorities maybe.

@RossDCurrie 6 ай бұрын

"ERROR: Failed building wheel for tts" - What version of python are you running?

@ThorstenMueller 6 ай бұрын

This error often occurs when you use an older version of pip. Did you run "pip install pip setuptools wheel -U" before installing Coqui?

@RossDCurrie 6 ай бұрын

@@ThorstenMueller This may have been the issue. Played around with it a bit and got it working again, but can't recall exactly which thing I did differently. Thanks for the reply though! If you're looking for content ideas, one thing I am struggling with is how this all fits together now, in June 2024. Specifically - when I start the server and hit the local webserver, I get a very different UI than what I see in other videos on XTTS. And I know there are all different UIs for XTTS - there's a fine tuning one, a web UI, RVC, etc. and some of them have bits that don't work, and it sounds like Coqui has abandoned the project now and... it's hard to catch up on it all when coming into it for the first time, and it changes so rapidly. So I guess what I'm trying to figure out is - if I want to build an AI voice clone of me, today, what's the strategy/stack you recommend?

@gonzaloorellanatech 3 ай бұрын

how we can get a more fast response?... better hardware?, ram? processing? ... thsnks for the video!

@ThorstenMueller 3 ай бұрын

First, you're welcome :). Do you use cpu or gpu? Because gpu (CUDA) provides faster response.

@gonzaloorellanatech 3 ай бұрын

@@ThorstenMueller thnks for your response. Yea!... GPU, but my notebook is only to development... i need better process to audio files from cloning voice tts

@EfficioIgnisVitae 11 ай бұрын

I'm getting this issue where when I try to check for models this happens: LLVM ERROR: Symbol not found: __svml_cosf8_ha Anyone know what's going on here?

@ThorstenMueller 11 ай бұрын

That's strange. Maybe recreate your python venv and reinstall. Maybe there's an error in your installation.

@Gute_Nacht_Kurzgeschichten 8 ай бұрын

Super erklärt 👍Wie kann ich denn meine Stimme Klonen das er mir ganze Texte vorliest? z.b. eine PDF Datei oder ein Word Dokument, oder beschränkt es sich nur auf 6 Sek.

@ThorstenMueller 8 ай бұрын

Vielen Dank für das Lob - das freut mich sehr 😊. Eine fertige Lösung für Text/Word/PDF Input gibt es (glaube ich) nicht, aber generell kannst Du längeren Output erzeugen. Du musst den Eingabetext vielleicht aufteilen, aber sicherlich gehen deutlich mehr als 6 Sekunden.

@bobbyboe Жыл бұрын

Hi Thorsten, sieht so einfach aus bei dir. Ich hab Coqui über Pinokio installiert und gestartet, in der Erwartung dann irgendwie lokal zu dieser GUI zu kommen. Pinokio sagt dann auch "running" aber unter den üblichen local hosts im browser finde ich nichts. Dann gibt es noch einen button "server", den hab ich mal gedrückt und bekomme die Antwort: .........Connected! Macht alles den Eindruck als liefe alles wie es soll... nur für mich endet das Erlebnis dort, weil ich nicht weiß wo sich Coqui mir zeigen könnte... schade eigentlich. Pinokio ist normalerweise ein gute Zugang für Non-Coder.

@ThorstenMueller Жыл бұрын

Meinst Du die GUI von Huggingface?

@bobbyboe Жыл бұрын

@@ThorstenMueller ja, ich meinte generell irgendeine GUI

@jab4li 3 ай бұрын

If i install xtts on my computer, i can use unlimited characters? Because the demo version on huggingface has 200 characters limitation. Thanks.

@ThorstenMueller 3 ай бұрын

This should be the case. The limitation is part of their Huggingface space and should not apply locally. huggingface.co/spaces/coqui/xtts/blob/d3b67acd01a3f63524371ad7d35a044ac0e75f60/app.py#L200

@jab4li 3 ай бұрын

@@ThorstenMueller Nice, i'm gonna try it. Thanks!

@john_blues Жыл бұрын

Is this able to pull text from a text file? I have a Tortoise version that can do it, and it is helpful for long form text.

@ThorstenMueller Жыл бұрын

IMHO this isn't supported by now. But finding a suitable solution for that is on my TODO list.

@john_blues 11 ай бұрын

@@ThorstenMueller For some reason my reply keeps getting deleted. Anyhow, I run a local TTS that can pull from a text file. Maybe it will help you. It is by neonbjb on Github.

@ThatPain1 10 ай бұрын

@john_blues You can totally read in, one or muliple files via python, transform the text as you like, and use xtts to generate a synthetic speech audiofile from it. Im using i currently to create sort of a audobook from a fanfiction. Removing points at end of sentences improved the result quite a lot.

@nomadhgnis9425 11 ай бұрын

have a question for you. IF I wanted to pause for a number of seconds between sentences then how can I do that. Piper is really cool. Thanks.

@ThorstenMueller 11 ай бұрын

Normally this is an aspect of SSML (Speech Synthesis Markup Language), which is by now not supported by Coqui and Piper. Maybe you can try a workaround and add multiple dots (....) to create a pause. But i didn't try it out myself.

@nomadhgnis9425 11 ай бұрын

@@ThorstenMueller thanks. will try that.

@nomadhgnis9425 11 ай бұрын

@@ThorstenMueller just tried it. I put dots where I wanted to pause bit it does not work. It only responds to one dot.

@ThorstenMueller 11 ай бұрын

@@nomadhgnis9425 Okay, then maybe it's a workaround to create multiple tts wave files and merge them together including pauses. That's not an optimal way but it could do the job.

@nomadhgnis9425 11 ай бұрын

@@ThorstenMueller I found a way. I am using debian. I had to create a 3 second silent wav file and split the paragraphs into different wav files and then merge them together with the ilent wav where I need it. I done this with a bash script. So problem solved. Do you know where I can get more voice files other then the ones listed.

@GESTOR-SITES 7 ай бұрын

How to fix "ERROR: Could not build wheels for tts, which is required to install pyproject.toml-based projects" chatgpt cannot help me. it´s necessary downgrade python?

@ThorstenMueller 7 ай бұрын

Did you update the python dependencies in your environment? So running "pip install setuptools wheel pip -U"

@64jcl Жыл бұрын

Quite amazing that they can do this with such a short clip. I had the same results as you with english, it doesn't really sound like me even though I tried to speak my best english. :) - How would you compare it with Piper with regards to TTS performance? Ofc Piper is quite difficult to train for new voices, but its free to use commercially even. I wish there was some simpler way to clone voices with it and that would be golden. I have looked at your video for this but preparing the training set seems like a chore.

@ThorstenMueller Жыл бұрын

Thanks for your comment 😊. I didn't compare the performance between XTTS and Piper TTS. I guess when you want a free and best voice clone i'd go with Piper TTS right now, but the effort is higher - as you said.

@akemixx._0 10 ай бұрын

Is it possible to use AI even with texts in another language? I would really like to know because I want to dub a game with this tool.

@ThorstenMueller 10 ай бұрын

I'm not sure about that. I'd recommend asking on Coqui community, but as Coqui AI (the company) has shut down i'm not sure on how fast you might get an answer.

@tsunderes_were_a_mistake 11 ай бұрын

I tried it on huggingface with Japanese but it sounded robotic. Can you make a tutorial on how to finetune xtts on local?

@ThorstenMueller 11 ай бұрын

Thanks for your topic suggestion. I've added it on my TODO list but it might take some time.

@IngridUterus 10 ай бұрын

Hey, ich habe das über Pinokio installiert, da ich es anders nicht zum laufen gebracht habe. Allerdings weiß ich nicht, wie ich bei coqui-tts auf GPU umstellen kann. Welche Datei muss ich öffnen? Auch die Geisterstimmen möchte ich gerne verhindern. Weißt du wo ich da was einstellen muss? Ich weiß, dass es möglich ist, da ich einen Telegram-Bot verwende, der mit coqui arbeitet und fehlerfrei funktioniert, allerdings mit starker Zeichenbegrenzung. Achja, Zeichenbegrenzung :D wo kann ich die auch ändern? Danke dir im vorraus

@ThorstenMueller 10 ай бұрын

Bei den Coqui TTS Modellen gibt es einen Kommandozeilenparameter "--use_cuda". Damit sollte die GPU genutzt werden. Zur Länge kannst Du mal versuchen die Konfigurationsdatei des Modells zu öffnen und den Wert von "max_decoder_steps" zu erhöhen (habe ich aber bei XTTS selber noch nicht versucht). Viel Erfolg 😊.

@IngridUterus 10 ай бұрын

@@ThorstenMueller danke. Das werde ich heute Abend mal versuchen. Wo genau finde ich die Konfigurationsdatei? Ist das die configs.py im TTS Ordner? Gibt es auch eine Möglichkeit, die Fehler am Ende von Sätzen und in den Stellen zwischen den Sätzen zu vermeiden? Oft entstehen da auch eine Art Geisterstimmen, die echt seltsam klingen xD

@ThorstenMueller 10 ай бұрын

@@IngridUterus Hast Du die config Datei gefunden?

@IngridUterus 10 ай бұрын

@@ThorstenMueller Ja, ich habe eine bessere variante für coqui-tts gefunden, die wesentlich einfacher für Anfänger ist. Kann ich dir nur empfehlen: Alltalk_tts

@adityapatil6723 Ай бұрын

This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error im getting this error please help someone

@ThorstenMueller 28 күн бұрын

Which python version are you using? Did you update "pip" first?

@adityapatil6723 28 күн бұрын

@@ThorstenMueller yes sir i did update it. and python is 3.11.9

@ThorstenMueller 27 күн бұрын

@@adityapatil6723 I originally thought python 3.11 is not supported, but according their github readme 3.11 should work. But as coqui tts isn't under active development, maybe you should try python 3.10, if this is possible for you.

@ignacioalonsol 7 ай бұрын

Has anyone made a comparison between xtts and piper training? I'm curious on what's better quality @thorsten?

@ThorstenMueller 7 ай бұрын

Personally i prefer Piper. But i trained my models in piper with way more input data then the 6 seconds input to xtts.

@chrsl3 Жыл бұрын

Amazing result.

@saadjutt1660 4 ай бұрын

Can I still use this toturial? since Coqui is shut down. Plus can I use it for cloning Urdu language?

@ThorstenMueller 4 ай бұрын

Honestly i'm not sure on the future of XTTS (model, code and huggingface space) cause of their shutdown. But right now code and space is still available so it should still work as described but please let me know if you experience bigger problems.

@Chriscs7 9 ай бұрын

What is better this or Tortoise TTS (Ecker Voice Clone) ?

@ThorstenMueller 9 ай бұрын

Hard to say, as i didn't give Tortoise TTS a closer look, but it's still on my todo list.

@humanperson8418 2 ай бұрын

It has a clear English bias, but overall sounds pretty good.

@jez9999 9 күн бұрын

Coqui appears to have folded now. Confusingly there is a community run fork that is sorted but its docs look very similar to the original.

@ThorstenMueller 2 күн бұрын

Coqui already shut down by beginning of 2024 and imho the code in the original repo is not maintained any more. I heart about a fork too but didn't have time to give it a try.

@mrechbreger 17 сағат бұрын

@@ThorstenMueller the license is garbage and prevents any further interest... why should anyone keep developing it if he cannot use it for further commercial projects...

@ratside9485 Жыл бұрын

Kannst du auch zeigen, wie man es finetune kann? Aber Lokal? Danke

@ThorstenMueller Жыл бұрын

Danke für deinen Themenvorschlag 😊. Ich habe es auf meine TODO Liste gesetzt.

@ratside9485 Жыл бұрын

@@ThorstenMueller gibt inzwischen auch auf GitHub ein WebUI fürs finetunen 🙌 funktioniert ganz gut. Das einzige was noch ein Problem ist das sich die Einstellungen ändern Temperatur und Co hab da Stunden ausprobiert es werden immer Sätze übersprungen.

@asanostudio 10 ай бұрын

Have you made a video tutorial to create a voice model for Indonesian, or how to add a voice model, I want to make an Indonesian voice model

@ThorstenMueller 9 ай бұрын

No. But as Coqui (company) shut down i'm not sure on further development of their code. Maybe it's worth taking a look to Piper TTS for training an Indonesian tts model. kzbin.info/www/bejne/mJDalpKgosZlaJI

@tapikoBlends Ай бұрын

Love thi channel 😊😊😊

@ThorstenMueller Ай бұрын

Thanks a lot 😊

@yousfalaadi5322 2 ай бұрын

can i use a big text dataset?

@ThorstenMueller 2 ай бұрын

I which context? Finetuning?

@callmefred 7 ай бұрын

It's sad that they've discontinued the project.

@ThorstenMueller 7 ай бұрын

Yes, but they did not just discontinue the project, but Coqui AI (the company) behind XTTS shut down.

@DrFukuro Жыл бұрын

Ich mag deine Videos sehr, auch wenn viele leider nur auf Englisch sind. Könntest du dir vorstellen, einmal ein generelleres Übersichtsvideo zur Sprachsynthese machen? Auch nach tagelager Recherche blickt man als Laie nur unvollständig durch, es wäre großartig, wenn mal ein Profi wie Du für den Interessierten etwas tiefergehend folgende Themen erläutert: Was genau ist/machen Coqui, Xtts, Tortoise, Espeak / espaek-ng und wo ist der Unterschied zu Mbrola und dessen Stimmen? (Kann ich tts anstelle von Mbrola in Skripten verwenden? Ja/nein - Wie/Warum?) Beispielhafte Fragen zu xtts: Was ist eine Multilingual Voice im Unterschied zur Thorsten Voice? Was genau ist voice cloning im Gegensatz zu voice transfer? Was machen/sind Coqui speakers? Wo ist der Unterschied darin, des xtts Modell zu feintunen und einfach nur eine speaker_wav Referenz anzugeben?

@ThorstenMueller Жыл бұрын

Vielen Dank für deine tolle Rückmeldung und den Vorschlag 😊. Das Thema gefällt mir sehr gut. Wenn man sich so lange und intensiv mit einem Thema beschäftigt, dann werden diese "Grundlagen" irgendwie so normal, dass man gar nicht mehr drüber nachdenkt. Ich habe das Thema auf meine TODO Liste gesetzt. Besten Dank dafür 😊.

@TNMPlayer 11 ай бұрын

For some reason my terminal doesn't run in the venv.

@ThorstenMueller 11 ай бұрын

Could you successfully create a venv and just can't activate it or can't you create it?

@TNMPlayer 11 ай бұрын

@@ThorstenMueller the venv created just fine but I couldn’t open a terminal within it

@ThorstenMueller 11 ай бұрын

@@TNMPlayer That's strange. Do you use the .bat or powershell (.ps1) file to activate the venv?

@TNMPlayer 11 ай бұрын

@@ThorstenMueller I used the .ps1

@ThorstenMueller 11 ай бұрын

@@TNMPlayer Maybe try out the .bat version, this could have an effect.

@animations.ki.anokhi.duniya 8 ай бұрын

Coqui tts is shotting down?

@ThorstenMueller 8 ай бұрын

Sadly, yes. I've made a short about it. kzbin.infoQMruRTlQu7I?si=JyDY8ziFJC8omAPY

@characters1210 8 ай бұрын

Can i make code clone Arabic voice and read arabic text

@ThorstenMueller 8 ай бұрын

I've no experience using Arabic with XTTS. Did you already try it using their Huggingface space?

@starbuck1002 Жыл бұрын

Ich habe mich ebenfalls ein wenig mit Coqui XTTS ausprobiert. Ich bin zu dem Entschluss gekommen dass es sich nicht lohnt. 1. kann coqui XTTS nicht annährend mit den führenden Mitstreitern bezogen auf Qualität der clones mithalten. 2. Ist coqui XTTS für diese Qualität bei diesem Preis meiner Meinung nach nicht lohnenswert, betrachtet man auch hier die Qualität und Pricings der Mitstreiter! Trotzdem wieder vielen Dank für dein Video Thorsten!

@ratside9485 Жыл бұрын

Welchen Preis? 1$ am Tag für Unternehmen sonst ist es Kostenlos.

@NoxmilesDe Жыл бұрын

Is there a TTS for Android?

@ThorstenMueller Жыл бұрын

IMHO by now there's no support for Coqui und Piper TTS on Android. But this would be really cool 😎. Did you ask already at their communities?

@quamagi 2 ай бұрын

Creo que clono su voz muy bien con esos pocos datos que tuvo la inteligencia artificial

@ThorstenMueller 2 ай бұрын

Estaré feliz si funciona bien en español con pocos datos de entrada.

@jimmyjam77 17 күн бұрын

The voice quality is OK, but not great. Did you ever figure out a way to make it better?

@ThorstenMueller 16 күн бұрын

No in xtts, but (just in case you're looking for an english solution) do you know my f5 tutorial? kzbin.info/www/bejne/d4SpoIeEpdCAbtEsi=gyYl6R8W1xuKoZZM

@Schawum 4 ай бұрын

--- hallo, bitte das tutorial nochmal auf deutsch. weil das würde mich wirklich sehr interessieren. aber englsich verstehe ich kein wort.

@ThorstenMueller 4 ай бұрын

Hallo, helfen dir vielleicht zunächst die automatisch auf Deutsch übersetzen Untertitel?

@Schawum 4 ай бұрын

@@ThorstenMueller die sind immer aus bei mir. weil ich beim lesen dem video nicht volgen kann. daher bringt mir das nicht wirklich was.

@developerzava 7 ай бұрын

TTS is available on python 3.12?

@ThorstenMueller 7 ай бұрын

According their README python 3.11 is the max supported version. As Coqui AI hat shut down i'm not sure if or when this will be adjusted to higher python version.

@JamBassMusic 10 ай бұрын

Thank you!!

@stefanporath8392 10 ай бұрын

Hello Thorsten, great video tutorials but xtts is not for me. No support for windows and never will be. No chance on older macs with nvidia cards because of lacking drivers. No support on linux without cuda. I was really looking forward to this but I simply don't have the time to fidel around for days or weeks. Thank you.

@PhantasyAI0 11 ай бұрын

I love your videos bro but you gotta speak a bit faster XD I have to play the video at 1.5x speed haha still love the videos!

@ThorstenMueller 11 ай бұрын

Hehe, thanks for your suggestion. I'll keep it in mind for next videos. As a non-native english speaker i have to think a little while for the right words 😆.

@insanitytoons 7 ай бұрын

Cloning a voice with a sample of just 6 seconds even though it's not 100% identical, for me that's an AI that really needs to be improved, these AI that need dozens of hours to clone a voice didn't interest me much, I did it several tests using samples longer than 30, 60, 80 seconds in various languages and some were perfect, I also copied dozens of voices available on websites and the results were also very good, I suggest saving each audio generated in a different file because each The generated audio will never be the same as the previous one.

@ThorstenMueller 7 ай бұрын

Josh Meyer (co-founder of Coqui AI) mentioned in my XTTS interview that 6 seconds audio input duration should be perfect for XTTS model. kzbin.info/www/bejne/jqSyfmSNj5WebpY

@רחלישדה-ה4מ 9 ай бұрын

must GPU?

@ThorstenMueller 9 ай бұрын

Generally (not sure for XTTS in special) CPU might work but way slower than using a CUDA enabled GPU.

@רחלישדה-ה4מ 9 ай бұрын

if i want to clone my own voice,i need to train this?how?@@ThorstenMueller

@ThorstenMueller 9 ай бұрын

@@רחלישדה-ה4מ I'd recommend you taking a look to Piper TTS for that. kzbin.info/www/bejne/mJDalpKgosZlaJI

@רחלישדה-ה4מ 8 ай бұрын

thanks!@@ThorstenMueller

@Bonk1971 Жыл бұрын

Not for commercial use. We need a truly open solution.

@juanjesusligero391 Жыл бұрын

Yeah, it's a shame it's not 100% open. Fortunatelly, we'll always have Tortoise tts :)

@chryseus1331 Жыл бұрын

Who cares it's not like they're going to sue you if you do.

@juanjesusligero391 Жыл бұрын

@@chryseus1331They could, though. If you have a company and want to use a software for commercial use, I wouldn't recommend ignoring its license.

@Silberschweifer 3 ай бұрын

oh no desynchorn video

@Silberschweifer 3 ай бұрын

do you clap your hands by recording?

@ThorstenMueller 3 ай бұрын

No, but thanks for the idea to optimize video/audio sync but clapping 👍.

@alexeyshmelev9115 4 ай бұрын

"all you need is 6 second audio" is just nonsense. It is not enough and the result is miles away from anything close to the original.

@ThorstenMueller 4 ай бұрын

I agree, at least on my personal tests with my foreign (german) pronunciation. The result has been far away from being a high class voice clone. Have you seen my interview with Josh (Coqui AI co-founder)? kzbin.info/www/bejne/jqSyfmSNj5WebpY)