F5-TTS how to train a new language best open source text to speech

  Рет қаралды 3,731

RaspiAudio

RaspiAudio

Күн бұрын

Пікірлер: 39
@petrkolacek8958
@petrkolacek8958 20 күн бұрын
Thank you. Your video helped me a lot. Before I tried train language from scratch and I was not successful. So Ill try your guide.
@NineSevenPictures
@NineSevenPictures 18 күн бұрын
Bonjour. Merci pour cette vidéo très instructive, sans oublier cet accent bien de chez nous. ;-)
@RaspiAudio
@RaspiAudio 18 күн бұрын
Link updated. In the last version of F5tts in the web interface select "custom" and enter theses path: MODEL_CKPT: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt VOCAB_FILE: hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt
@NineSevenPictures
@NineSevenPictures 8 күн бұрын
@@RaspiAudio Merci beacoup.
@AndrasEliassen
@AndrasEliassen 19 күн бұрын
Thank you for this video - very informative! I laughed so hard at the mistake: "stupid female voice" 🤣but I think it's probably safe from the "Internet police" 🚔 I will use your tutorial to see if I can train a new language with this tech 👍
@Burka_Tech6330
@Burka_Tech6330 7 күн бұрын
I like your video thank you.
@naveennoelj
@naveennoelj 7 күн бұрын
Good video, Thanks for the contribution. One quick qs: This is used when you want to add a new language but suppose u want to use it for voice cloning, how will it work?
@lullu3467
@lullu3467 Күн бұрын
Bonjour, j'aimerais bien entrainer le modèle sur un dataset très très large (librispeech, qui fait plus de 100GO), comment pourrais-je faire ça sur le cloud ? Je pense que le streaming est compliqué j'ai rien rien compris au code original de l'entrainement...
@RaspiAudio
@RaspiAudio 22 сағат бұрын
Si vos fichiers sons sont déjà retranscris en texte il suffit de les mettre dans le bon format, autrement faire un Whisper Je pensais faire une vidéo pour faire ça dans le cloud, mais pour entraîner sur 100go ça coûtera très cher!
@lullu3467
@lullu3467 6 сағат бұрын
@@RaspiAudio J'aimerais bien financer cela, seriez vous prêts à entrainer un modèle multilingual et multispeaker (avec language token, j'ai remarqué que le modèle avait du mal avec le cross lingual...) Avez vous un contact ?
@RaspiAudio
@RaspiAudio 5 сағат бұрын
@@lullu3467 oui vous pouvez utiliser info@raspiaudio.com
@TheMame82
@TheMame82 23 күн бұрын
Thank you for this work. Seems your result is more close to zero shot voice cloning, than the one Jarod trained in his video tutorial (he used ~10 hours single speaker). Just to get it right, the 80k samples you used where all from the same reader (single speaker)? This would mean: 1) few hours, single speaker --> model speaks new language, but only for reference speaker from training data 2) many hours, single speaker --> model generalizes new language (zero shot capability) 3) many hours, multi speaker, multi language (as for base model) -> proper voice cloning, code switching within single text
@RaspiAudio
@RaspiAudio 23 күн бұрын
@@TheMame82 it's hard to make conclusion at that point as there is not enough data. After training with one speaker for 80k for a consistent learning I'm fine-tuning with 90k samples of multiple speakers hoping that it will help with zero shot flexibility, I will publish results.
@mauricio9581
@mauricio9581 20 күн бұрын
Great video and great explanation! I hope you do more tutorials like these in the future :) Would you say F5 is the best Open Source TTS in the market?
@RaspiAudio
@RaspiAudio 20 күн бұрын
I think so as it is a bit more flexible than xtts to add different tones, btw I'm not associated with F5tts team just a random guy trying to fin a good TTS
@321123580
@321123580 20 күн бұрын
What are computer characteristics required to train model?
@RaspiAudio
@RaspiAudio 20 күн бұрын
I'm using an rtx 4090, but I would like to make a google collab so anyone could train in the cloud on a pay per use base
@321123580
@321123580 20 күн бұрын
@RaspiAudio OK thanks
@Pacifier1222
@Pacifier1222 18 күн бұрын
Salut! Je suis en train de faire un training français avec le corpus Mozilla de 800k fichiers. J'ai 20 epoch sur 40 d'effectué. Je t'en donnerai des nouvelles. Par contre, F5-TTS contient certains bogues. J'ai dû créer des dossiers comme "french" quand j'avais déja french_char de créé.
@Pacifier1222
@Pacifier1222 18 күн бұрын
J'ai aussi un sample de 8k de fichiers en quebecois pour être plus régional!
@RaspiAudio
@RaspiAudio 18 күн бұрын
@@Pacifier1222 ça serait vraiment cool si vous pouvez entraîner sur la base de mon checkpoint de cette manière on pourrait conjuguer les efforts plutôt que repartir de zéro à chaque fois
@Pacifier1222
@Pacifier1222 18 күн бұрын
@@RaspiAudio En fait, j'avais déja 20 epoch de fait au final. J'ai décidé d'en refaire 20 autres. je trouvais qu'il y avait une tonalité sur certains mots incorrectes. J'ai déja 1 semaines de fait dessus, alors c'est sûr que je ne voudrais pas trop recommencer.
@RaspiAudio
@RaspiAudio 18 күн бұрын
What hardware are you using?
@Pacifier1222
@Pacifier1222 18 күн бұрын
@@RaspiAudio Nvidia 3090, AMD 5950x et 64GB de ram
@jonathanoostenbrink6783
@jonathanoostenbrink6783 10 күн бұрын
I get in my info: transcribe complete samples : 0 path : C:\F5-TTS\F5-TTS\src\f5_tts\..\..\data\my_speak_char\wavs error files : 5
@RaspiAudio
@RaspiAudio 10 күн бұрын
your path is wrong
@cyberbol
@cyberbol 24 күн бұрын
How long I need record my voice ? How you think ? Minimum training data ?
@RaspiAudio
@RaspiAudio 24 күн бұрын
@@cyberbol the reference recording (the voice to clone) could be very short like 10s. But if you need to train a new language you will need I think at least 20 hours of audio.
@cyberbol
@cyberbol 24 күн бұрын
@@RaspiAudio Ohh. Yes I wish train, Thank you. The problem with a clone is that it not working for other like EN and Chinese. I want use Polish so I don't have a option , need do model I think
@SyamsQbattar
@SyamsQbattar 8 күн бұрын
Unfortunately, it does not support Indonesian language.
@RaspiAudio
@RaspiAudio 8 күн бұрын
Find large audio books or audio file of minimum 10h in your language and train it
@normioffi
@normioffi 23 күн бұрын
Français originel?
@RaspiAudio
@RaspiAudio 23 күн бұрын
@@normioffi oui oui
@normioffi
@normioffi 23 күн бұрын
Génial ça
@mulagraphics
@mulagraphics 15 күн бұрын
Don't waste your time F5-TTS is horrible I'm sorry
@RaspiAudio
@RaspiAudio 15 күн бұрын
@@mulagraphics it's not, what else do you recommend?
@bomar920
@bomar920 14 күн бұрын
Actually I trained new language under 2 hours data . It’s very good 👍. I don’t know which script could do that
@christopherandrew1720
@christopherandrew1720 14 күн бұрын
@@bomar920 which language do you use? is it 1 speaker/multi?
Best AI Voice Generator | 2024.08
44:54
Thorsten-Voice
Рет қаралды 17 М.
Fine-tune Text-to-Speech Models for any Language: Introduction to TTS
24:18
How To Choose Mac N Cheese Date Night.. 🧀
00:58
Jojo Sim
Рет қаралды 110 МЛН
How to Fight a Gross Man 😡
00:19
Alan Chikin Chow
Рет қаралды 20 МЛН
Noodles Eating Challenge, So Magical! So Much Fun#Funnyfamily #Partygames #Funny
00:33
Lamborghini vs Smoke 😱
00:38
Topper Guild
Рет қаралды 49 МЛН
Realtime Text-to-Speech with GPT-SoVITS
18:43
Jarods Journey
Рет қаралды 2,7 М.
COMPLETE No-Nonsense VSCode Setup for Python Devs
26:05
ArjanCodes
Рет қаралды 37 М.
Clone Any Voice In 15 Seconds Using E2 F5 TTS
7:47
Data Juggler
Рет қаралды 1,9 М.
The Value of Source Code
17:46
Philomatics
Рет қаралды 207 М.
7 New AI Tools You Won't Believe Exist
14:09
Skill Leap AI
Рет қаралды 198 М.
F5TTS AI Voice Model Run Locally - ElevenLabs Level Open Source AI Voice Model!
12:49
F5 Text to Speech Tutorial | Hit "Refresh" on Your AI Voice!
24:45
Thorsten-Voice
Рет қаралды 3,3 М.
НЕ ПОКУПАЙ iPhone 17 Air!
0:40
ÉЖИ АКСЁНОВ
Рет қаралды 1,6 МЛН
Мучительная смерть  HUAWEI Mate XT
0:58
Кик Обзор
Рет қаралды 4,3 МЛН
СДЕЛАЙ ТАК . Пульт будет работать вечно
9:39
Мужские интересы 79
Рет қаралды 362 М.