If you're looking for an online service it seems there are plenty available like clonemyvoice to queue your work on. If you are looking to perform the work locally & are tired of cloning repos which do not execute despite their boasts, this is the one for you. I've got it working well in linux using Cuda with out of the box settings and the README. Thanks @Sam Witteveen
@davidw86689 ай бұрын
The last Friedman was spot-on, was just waiting him to say something like: "... existentially speaking, it's in love where we truly find meaning in our universe."
@CodyAvant9 ай бұрын
“Anthropomorphize”
@davidw86689 ай бұрын
@@CodyAvant robots?
@shaunjohann9 ай бұрын
you and Philip over at AI Explained are my favs, thanks for doing what you do in the AI space. i always look forward to what you both create for us to think on
@bobobo16739 ай бұрын
Philips?
@BryanChance8 ай бұрын
OMG this is amazing. I just had my mind blown by Sora, the video generator from OpenAI. Everything is moving so fast it hurts my brain. :-_)
@shobhitagnihotri4169 ай бұрын
Sam's knowledge on LLM/AI is on different level .Thanks
@kate-pt2ny8 ай бұрын
Great tutorial, thank you for sharing, thank you Sam
@Joooooooooooosh9 ай бұрын
Sounds like we still have very far to go before open source models catch up to commercial services like playht and ElevenLabs.
@alx84399 ай бұрын
A lot of TTS models are doing the same drops when you start to play with their parameters- I've tested like half a hundred or them and I have a strong impression they're still quite fragile
@ash38444 ай бұрын
After running example usage cell getting this error "' returned non-zero exit status 1." could you pls help?
@samwitteveenai4 ай бұрын
That video is quite old, my guess is they have updated the code or library
@vrrevolution91838 ай бұрын
is there somewhere in the code that allows for more than 11 seconds of audio? i tested a few and it stops at 11
@eaglestudio42846 ай бұрын
can be this monetized on YT if i dont pay a plan?
@goyashy9 ай бұрын
This is a great demo. Seems like they're doing real good work! I tried turtle and a few other open source models (open voice), but none of them reach to the point where they seem to be a competitor of Elevenlabs in the future. This one has great potential!
@albertsitoe73408 ай бұрын
Nothing is better than Eleven Labs
@davidencarnacion25558 ай бұрын
Eleven labs is killin it.
@mastershake27829 ай бұрын
I like OpenVoice more than Bark tbh. Curious to see how MetaVoice stacks up.
@shaileshvekariya90598 ай бұрын
Can I run this on multiple GPUs?
@lucamatteobarbieri24939 ай бұрын
The original *uckerberg voice is more robotic than any cloning attempt 😂
@VaibhavShewale8 ай бұрын
sounds nice
@sherpya9 ай бұрын
looks interesting, does it really requires GPU with >=24GB RAM?
@samwitteveenai9 ай бұрын
no not that much RAM needed
@P-G-779 ай бұрын
Now I try the DEMO... work well...
@ForTheEraOfLove8 ай бұрын
I can't believe you haven't used that drunk Lex Fridman with your slurring system string to have an interview with you as "Drunk Tech"
@Tarbard9 ай бұрын
Interesting. I would have like to hear it clone your voice.
@samwitteveenai9 ай бұрын
lol good point I didn't try that. I might do that if they release the Fine Tuning code.
@Tarbard9 ай бұрын
@@samwitteveenai The demo link lets you upload a voice clip to clone, is that not enough? I didn't have good results with what I uploaded but if it's because it needs to be fine tuned that would explain why.
@IgnacioMartinez829 ай бұрын
Is it possible to use this with home assistant?
@samwitteveenai9 ай бұрын
yes bt the challenge is getting the TTS closer to real time. I gave another vid coming up that is more inline with a home assistant
@kamathsutra9 ай бұрын
I almost thought the beginning of the video you were using the model to speak.
@SyedMujtabaHassanRizvi9 ай бұрын
Make a video on how to train a language model using Direct Preference Optimization
@abdelkaioumbouaicha9 ай бұрын
📝 Summary of Key Points: MetaVoice, a startup, has released an open-source text-to-speech (TTS) model called MetaVoice 1B, which is a 1.2 billion parameter model trained on 100,000 hours of speech data. The model claims to have zero-shot cloning capabilities for American and British voices with just 30 seconds of reference audio. It uses Transformers and diffusion techniques in its architecture. The speaker demonstrates the model's capabilities, mentioning that it performs well for some voices but not consistently for others. Adjusting the temperature and guidance scale can affect the output to sound more like the desired voice. The model has limitations, such as occasionally dropping words or generating silence. Fine-tuning the model on longer audio samples could be interesting once the code is released. While the open-source nature of the model is promising, it still has a long way to go to match the performance of proprietary models like Google's SoundStorm and OpenAI's voices. 💡 Additional Insights and Observations: 💬 [Quotable Moments]: "The model performs well for some voices, but not consistently for others." 📊 [Data and Statistics]: The model is trained on 100,000 hours of speech data and has 1.2 billion parameters. 🌐 [References and Sources]: The video encourages viewers to try out the provided notebook and experiment with the model themselves. 📣 Concluding Remarks: The MetaVoice 1B model, an open-source text-to-speech model, shows promise with its zero-shot cloning capabilities and use of Transformers and diffusion techniques. However, it still has limitations and has yet to match the performance of proprietary models. The video encourages viewers to try out the model and see its capabilities firsthand. Generated using TalkBud
@codecaine9 ай бұрын
Nice work
@hqcart19 ай бұрын
is the notebook free for a100?
@samwitteveenai9 ай бұрын
unfortunately it is not free
@joffreylemery64149 ай бұрын
Next video on virtual avatars ?
@rubbercable9 ай бұрын
Is this 'offline' or an 'online service'?
@mastershake27829 ай бұрын
Local/Offline
8 ай бұрын
some examples of how it sounds: 5:30 6:21 6:47 7:15 ...
@DarfailАй бұрын
my guy doing the lord's work god bless
@Macatho8 ай бұрын
I dunno... Compared to Elevenlabs it's miles away... If you can spot that it's an AI voice within 3 seconds, it's not very good.
@jonathanmckinney58269 ай бұрын
Coqui xTTS v2 is still much better and includes cloning and more languages. Bark is not a great comparison as it's one of the worst open source TTS models, they haven't updated the repo in 5 months. It was a great thing when it came out of course.
@samwitteveenai9 ай бұрын
Interesting I will go back and check the Coqui models out. I was saddened when I heard they were shutting down as they had done a lot of good work in the TTS field
@nyny9 ай бұрын
unfortunately xTTS isn't open source
@samwitteveenai9 ай бұрын
So do we know what has happened to that IP now that Coqui has gone under? Who bought it etc?
@nyny9 ай бұрын
@@samwitteveenai They have only said that it retains the same conditions. They are probably shopping it or their investors want to liquidate.
@samwitteveenai9 ай бұрын
thanks for the update.
@fontenbleau9 ай бұрын
i would really like audio restoration tools, but any development in this stalled for 2 years already. Adobe closed access to it's speech restoration model, which was very bad with non english accents (i've tested). The only new thing is Intel added this january it's Ai audio tools to popular Audacity editor, but quality of their models i would call awful, from music remix to noise removal model, it's incredibly unoptimised, on Gpu works longer than on CPU. When we will get a tool to restore vinyl music no one knows, there's no motivation in development to restore old music.
@zyxwvutsrqponmlkh9 ай бұрын
Make a corpus of degraded vinal copies and pristine copies of the same content. This is the sort of thing I could crank out in like a week if I had the data already.
@fontenbleau9 ай бұрын
@@zyxwvutsrqponmlkhI don't know why there's such censorship but I can't write anything to you, the words filter is crazy, or channel author added every possible word to filter list. I want answer you, but can't, only this message comes through filters-worse than in communism.
@DocDoc-h4w9 ай бұрын
Hello Sam, would like to connect with you regarding a collaboration opportunity for a new product that we are building. Let me know the best way to reach you. Thanks.
@samwitteveenai9 ай бұрын
you can ping me on linkedin
@alanalvarado38628 ай бұрын
Doubt it!
@zyxwvutsrqponmlkh9 ай бұрын
Thanks, I had the misunderstanding that this was facebook. Honestly, results are quite good, best in class but it does fail sometimes producing long breaks or silence without saying all the words. But still it's enough better than anything else I think it's worth it.
@teaman7v9 ай бұрын
Best in class? This is way behind curve for tts and voice cloning.
@teaman7v9 ай бұрын
Best in class? This is way behind curve for tts and voice cloning.
@zyxwvutsrqponmlkh9 ай бұрын
@@teaman7v Bull scat. I've done bark, openvoice, coqui-ai, MockingBird etc. What do you have better that you can run locally? Because I've tried this and I like it.
@Joooooooooooosh9 ай бұрын
@@zyxwvutsrqponmlkhthey are all pretty terrible tbh compared to commercial models.
@zyxwvutsrqponmlkh9 ай бұрын
@@Joooooooooooosh Then what fucking good is it? Cant rely on anything commercial, they will change shit on the back end without warning, go out of business, throw up rate limits and costs a fortune. I want to make full on audio books, I'm talking thousands of hours of content. And I want it to be good, like better than the average audio book narrator. There is jack shit out there that is open and can be convincing for more than a couple minutes. This is a good step in the right direction, I can get Wisper to timestamp each syllable and use that data to ensure accuracy and find odd delays and shit to automate a re-generation of that segment. I can't do that with your closed source garbage services, if I cant make it run on a system I control it's not worth considering. And your all "it's not best in class because this other entirely deferent class exists too" 💩
@hqcart19 ай бұрын
I just tried out the demo, it really suck, a lot of flickering and voice cuts, you can feel it, sometimes i didn't understand what he was saying. i say this model was released too soon or because it sucked.
@jeffwads9 ай бұрын
Meh. Tortoise-TTL is at least as good. People yapping about how it doesn't match up to Elevenlabs just aren't following the guide. 3 samples 10 seconds wav at the right bit-rate is key.