Voice Cloning Made Simple Learn to Use Tacotron2 for TTS Voice Models

Рет қаралды 42,714

Жыл бұрын

In this video, we'll dive deep into the world of Text-to-Speech (TTS) technology and explore how you can use Tacotron2 to create your own custom TTS voice models! Whether you're a beginner or experienced in the field, this comprehensive tutorial will guide you step-by-step through the process of voice cloning and help you build a realistic TTS model of your voice.
Rasmurtech Website: rasmur.tech/
Link:
1. Rename wav files to 1.wav, 2.wav, etc
github.com/rasmurtech/tacotro...
2. Transcribe .wav files and generates a .txt file
github.com/rasmurtech/Tacotro...
3. Preprocesses audio files for training a Tacotron 2
github.com/rasmurtech/Tacotro...
4. Update audio metadata
github.com/rasmurtech/Audio-M...
FakeYou-Tacotron2-Notebooks GitHub
github.com/justinjohn0306/Fak...
FOLLOW US
---------------------------------------------------
Twitter: / rasmurtech
Facebook: / rasmurtech‎: rasmur.tech
Instagram: / rasmurtech
TikTok: / rasmuegx4k9

Пікірлер: 81

@Kevin-mw9ig Жыл бұрын

I wish there weren't typos in every single prompt you typed for the models to speak so that we could hear it try and speak a real sentence.

@beforeyourimmigrants8471 Жыл бұрын

You can instruct chat gtp-2 only write sentences that end in period, no caps comas or special characters. Can use panas to dynamically create that content string. Pydub to XSport around the moments of silence two-wav files and label them in sequence. When you're reading the sentences just make sure the pause and leave a few seconds to clip

@jwyang91 Жыл бұрын

While I was transcribing I got "RuntimeError: No audio I/O backend is available." Are there any dependencies I could be missing?

@wolpumba4099 2 ай бұрын

*Creating a Text-to-Talk Model of Your Voice* This video outlines the process of creating a text-to-talk model of your own voice using the Tacotron 2 model. The model will be able to read out any text you type, sounding like your own voice. *Requirements:* * *Microphone:* A good quality microphone is recommended for optimal results. * *ChatGPT:* Used to generate sentences for recording and training the model. * *Visual Studio Code:* Used to run the provided code for renaming and transcribing audio files. * *Python:* Required for running the pre-processing and metadata updating scripts. * *Google Drive:* Used to store and access the training data. * *Google Colab:* Online platform for training and synthesizing the model. *Process:* *1. Recording Sentences (**0:30**):* * Use ChatGPT to generate 50 sentences for training the model. * Record yourself reading each sentence, saving each as a separate audio clip. * More audio clips will result in a better sounding model. *2. Organizing Audio Files (**1:21**):* * Create a folder named "wavs" to store your audio clips. * Rename the audio files from "1.wav" to "25.wav" (or however many clips you have). * A provided script can automate this renaming process. *3. Transcribing Audio (**2:16**):* * Use the provided "transcribe_wav_to_rec.py" script to generate rough transcripts of your recordings. * Edit the generated transcripts for accuracy. * Ensure each line ends with a period and avoid using capitals or commas. *4. Pre-processing Audio (**6:24**):* * Use the "tacatron2_preprocessing_wav_files.py" script to convert the audio files to the format required by Tacotron 2. * This script changes the audio format, including sample rate and channels. *5. Updating Metadata (**9:24**):* * Use the provided script to update the title of each WAV file to match its corresponding number. *6. Uploading Data to Google Drive (**13:00**):* * Create a compressed zip folder of your "wavs" folder. * Upload the zipped folder to your Google Drive. *7. Training the Model in Google Colab (**14:00**):* * Open the provided "training_notebook.ipynb" file in Google Colab. * Follow the steps in the notebook to: * Check your GPU. * Mount your Google Drive. * Install Tacotron 2. * Load your dataset. * Train the model. * Monitor the loss values and stop the training when it reaches an acceptable level (below 0.30). *8. Synthesizing Speech (**20:00**):* * Open the provided "synthesize_notebook.ipynb" file in Google Colab. * Share the trained model file from your Google Drive and paste the link in the notebook. * Enter a phrase you want the model to say and run the script. * Play the generated audio file to hear your text read out in your own voice. (25:39) is the best model (still not usable) I used gemini 1.5 pro to summarize the transcript Token count 11,117 / 1,048,576

@Nobody-zq8bl 7 ай бұрын

What's the point of setting up the metadata when the "Load Dataset" step of the trainer says it removes all metadata?

@justRICHTOFEN 10 ай бұрын

when I run the meta data script, it will output wav files i the output folder, but they will not have their meta data updated.

@OuiSinthala 11 ай бұрын

This video is really great, thank you for making this video. And i wonder can I train a model for other non-English language like Thai?

@TypicallyThomas 6 ай бұрын

Yeah, it works the exact same way. You just create a dataset that tells the computer what the text is, and what sounds are produced when reading the text aloud. It will detect the pattern through training

@OuiSinthala 6 ай бұрын

@@TypicallyThomasthank you

@user-df8es1gc9b 3 ай бұрын

Thanks for the info. Do you know if is possible to download the generated cloned voice for being able to define that voice to be used with the mozilla TTS browser API? I know that you can define a custom voice, I don't know exactly the file format for that also, is this exportable?

@crazyfaint Жыл бұрын

Great video! Is there a way to train Tacotron2 to use a new language? I assume it only supports English?

@TheRonoxcz 11 ай бұрын

@@AstroInterceptor Can I follow this tutorial with 1000 files of my voice in my language (czech) and train model on it, or did I need to do it differently then this tutorial? I want to synthesize my voice in my language and create tts based on it. I have hours of my voice thanks to podcast and I can pretty easily transcript it. Thanks!

@ananthakrishnank3208 8 ай бұрын

Is this titling 10:05 thing any relevant for building your Tacotran2 model?

@JaveGeddes 11 ай бұрын

That's great.. How do I use the model as a regular tts in windows?

@justsomeguy6336 27 күн бұрын

Like a narrator? You’d have to make a language data path and voice path

@liltrailblazerx6026 Ай бұрын

Great video, used this and it worked excellently but every prompt you used in the video had typos in it hahaha

@kobvel 2 ай бұрын

How come new startups like HeyGen can generate good quality audio from just 30 seconds of audio? And here from 10 files the output is so bad? They have better pre-trained models?

@ananthakrishnank3208 8 ай бұрын

16:46 It is quite ridiculous that your total dataset is just a minute and 51 seconds in total. How could Tacotran2 learn your voice and still synthesize an okay-ish output. Meanwhile, I went ahead with a 10 min + audio data with 300+ wav files, and the output is below poor. I see that it does capture the voice well but throws a different word than what is intended. I tried various epochs. Also, I could not get the optimal encoder timestep vs decoder timestep graphs that you get while synthesizing. Lastly, I am not doing this in English. Can you provide a way. Thanks!

@saturnstaruniverse 20 күн бұрын

Did you find something am doing for sanskrit please guide me

@plzkthx9258 10 ай бұрын

Can you just use one large wav file?

@ElInformatikuDani 2 ай бұрын

buenas, me falta el ultimo paso con los archivos BAT de tu web, pero no lo encuentro. Muchas gracias por ese pedazo de tutorial

@IsaacEwenFrost 2 ай бұрын

19:09 this doesn't work for me. the error message is "ValueError: num_samples should be a positive integer value, but got num_samples=0"

@SaicharanPabbathi 2 ай бұрын

I had the same problem and when i checked the list.txt file was getting uploaded as an empty file. that's what causing the error for me. try uploading the list.txt file manually and make sure that it is not empty in the filelist folder. It should resolve the error you are getting. hoping that your error would resolve.

@SaicharanPabbathi 2 ай бұрын

@@pranavrajs528 yes I have tried uploading the list.txt file manually and made sure that it was not empty . That error got resolved it worked for me.

@SaicharanPabbathi 2 ай бұрын

@@pranavrajs528 yes. I was getting the same error and that worked for me.

@user-yv8hk7ci3p 3 ай бұрын

What can I do to implement other languages

@freman Жыл бұрын

Couldn't you have just copy/pasted the list from chatgpt and put the wav file in front of it rather than... go to all that effort only to have to edit it?

@user-yj2mr5we3k 3 ай бұрын

I just wanted to say I have same wallpaper :)

@arnabmukhopadhyay7089 9 ай бұрын

what is your PC configuration?? I got a GTX 1650Ti GPU and as far as I know tacatron2 cannot be run in my GPU

@jannatulferdousy9119 3 ай бұрын

Then use cpu

@Alexbestgamer 11 ай бұрын

It was a error at 8:( how do i fix that

@Cloud9ChroniclesAutomation 3 ай бұрын

Hello love the tutorial. i wanted to ask if there is a way to run this on local pc without using google coab.

@volt7452 26 күн бұрын

me and you both

@wahabali828 7 ай бұрын

can you please make video on waveglow?

@safdarhashmi6030 11 ай бұрын

I am getting the error as: ValueError: num_samples should be a positive integer value, but got num_samples=0,the dataset and everthing is loaded properly,what to do next?

@user-gv2ds7to9l 11 ай бұрын

same error

@kaiop2761 11 ай бұрын

did u get any solutions yet??

@chamsedinazouz9 10 ай бұрын

you must remove "/content/TTS-TT2/" from the transcript and it will work

10 ай бұрын

@@chamsedinazouz9from where up or down

@jahedulalamrifat760 8 ай бұрын

@@chamsedinazouz9 did this but still getting num_sample==0.

@bluemodize7718 10 ай бұрын

error : ValueError: num_samples should be a positive integer value, but got num_samples=0, what do I do

@jigz3903 10 ай бұрын

same error, have you solved it yet?

@bluemodize7718 10 ай бұрын

I didn't manage to solve it , it sucks man@@jigz3903

@ananthakrishnank3208 8 ай бұрын

Use this line audio_file_path = line.split('|')[0] , instead of audio_file_path = '/content/TTS-TT2/' + line.split('|')[0] in the transcripts code block, and follow the exact steps shown in the video from the start.

@yannainghtet7545 2 ай бұрын

@@ananthakrishnank3208 Hello, I used this line audio_file_path = line.split("|")[0] and run but this error still occur. How do you think about it?

@pranavrajs528 2 ай бұрын

@@yannainghtet7545 have you solved that error? even am getting the same error. Kindly help

@nmrfahmi Ай бұрын

Did you intentionally type every prompt's sentence incorrectly?

@dragonsage6909 Жыл бұрын

You could write the entire process into a Python script..I'm surprised you didn't use Linux

@LakeFamily-vb6ml 7 ай бұрын

this is still a valid process. Someone will take these and do what me and you would of done and welcome to the world of open source. I agree with you though. This could of all been done automated with logs confirming they are done and validation of files and filenames.

@dragonsage6909 2 ай бұрын

@@tommy12331 lolz..

@AstroCyrek 11 ай бұрын

How can i do it in other languages?

@user-be8le3qt9e 9 ай бұрын

you can but you would have to train the models in your language from scratch. the FakeYou models used here are fine tuned for English and Spanish only.

@professeurredstone2134 Ай бұрын

ModuleNotFoundError: No module named 'taglib' -> pip install pytaglib

@therealyojames Жыл бұрын

rip, even the 164 sample weight still sounds really bad :/

@mcgeedarion Жыл бұрын

I'm willing to pay you to do this for my voice. How much do you charge?

@beeceecee Жыл бұрын

I can do it for you

@tomyyoung2624 11 ай бұрын

did you actually drink it or yes?

@dtesta 8 ай бұрын

Vertical bracket?? It's a PIPE!

@Nono-hk3is 2 ай бұрын

This is the most chaotic programming how-to I've ever seen. You automated like 80% of the process, but you still need users to do trivial things like rename folders and copy/paste paths at each step. And you read the phrases from a text script, but then use speech to text recreate the script, which inherently adds errors? I can't tell if you actually know how the process works, ir you're jyst mimicking what someone else told you to do.

@user-hc7qi5xy3n 10 ай бұрын

error at 8 🥲 FileNotFoundError: [Errno 2] No such file or directory: 'filelists/clipper_train_filelist.txt' 🥲

@dipeshkoirala2957 9 ай бұрын

The name of file is different. You will see the new name in drive folder. You have to go to the code and find this line and change it to as the name in drive folder.

10 ай бұрын

İ did make everthing succesfull but code was gave this error; FP16 Run: False Dynamic Loss Scaling: True Distributed Run: False cuDNN Enabled: True cuDNN Benchmark: False --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 15 print('cuDNN Enabled:', hparams.cudnn_enabled) 16 print('cuDNN Benchmark:', hparams.cudnn_benchmark) ---> 17 train(output_directory, log_directory, checkpoint_path, 18 warm_start, n_gpus, rank, group_name, hparams, log_directory2, 19 save_interval, backup_interval) 3 frames /usr/local/lib/python3.10/dist-packages/torch/utils/data/sampler.py in __init__(self, data_source, replacement, num_samples, generator) 105 106 if not isinstance(self.num_samples, int) or self.num_samples 107 raise ValueError("num_samples should be a positive integer " 108 "value, but got num_samples={}".format(self.num_samples)) 109 ValueError: num_samples should be a positive integer value, but got num_samples=0 how can i solve this

@sarathkumar-gq8be 9 ай бұрын

did you correct the errorr

9 ай бұрын

@@sarathkumar-gq8be yes i was adjust way of the drive path

@sarathkumar-gq8be 9 ай бұрын

@ yes thankq, im also just know fins the issues

@sarathkumar-gq8be 9 ай бұрын

I uave one kore problem after train the model, go with other colab notebook , i gave the model path and all thoose things , but it says( but Gdown can't . Please check connections and permissions.

9 ай бұрын

@@sarathkumar-gq8beMay be it was above your colab project limited

@seanolivieri4829 3 ай бұрын

FileNotFoundError: '/content/TTS-TT2/wavs/1.npy|escuchame john vos tenes armado el video del cierre primario de coledoco del otro dia o tenes armado algo para un ateneo para hacerlo ya' no existe. Compruebe su transcripción y sus audios.