NEW Open Source Model for Emotional Text to Speech

  Рет қаралды 27,556

Jarods Journey

Jarods Journey

Күн бұрын

Пікірлер: 180
@NFawc
@NFawc Ай бұрын
Agreed! This sort of TTS power with an Audiobook maker would be epic!
@genshinfinity
@genshinfinity Ай бұрын
We can use the angry AI voice to shout to robbers or trespassers. Just need to hook it to a motion sensor / cctv
@Jarods_Journey
@Jarods_Journey Ай бұрын
😂 Brilliant idea
@CheckTheWiki
@CheckTheWiki Ай бұрын
Or clone your own angry voice and have the motion sensor tell your pets to get off of somewhere they aren't allowed when you aren't home.
@jessequartey
@jessequartey Ай бұрын
These are really brilliant.
@maxlikessnacks123
@maxlikessnacks123 Ай бұрын
Or.... just record a few lines of you saying "stop it" or "i'll call the cops" and play that when the motion sensor gets activated. No ai required.
@jessequartey
@jessequartey Ай бұрын
@maxlikessnacks123 what if you want a context aware response.
@RobertJene
@RobertJene Ай бұрын
1:45 LOL melina. one time I downloaded all the Melina voice lines to my phone and listened to them + talked like her for an hour while I got into costume and makeup, then did a parody of her for stingers when I live-streamed Elden Ring
@Jarods_Journey
@Jarods_Journey Ай бұрын
Method acting at its finest 😂
@bmatt2626
@bmatt2626 Ай бұрын
Yeah! This sort of expression in AM is the dream. I've got a whole 3d puppetry / rendering / sfx / scoring pipeline with a voice-shaped hole in it. Thanks for digging into all this.
@namuzed
@namuzed Ай бұрын
lol, I've been in the same boat. xtts v2 hasn't been cutting it due to lack of emotion. I tried pairing it with RVC which gave improvement, but I'm not much of a voice actor. This seems like the key.
@SttravagaNZza
@SttravagaNZza Ай бұрын
What are you trying to do ? what tools are you using?
@bmatt2626
@bmatt2626 13 күн бұрын
@@SttravagaNZza Film dialogue eventually. I'm currently using Blender, CC4/iClone, Embergen, Liquigen, Resolve/Fusion. Audio side is Reaper, Divisimate, RapidComposer, Vienna Ensemble, various instrument and SFX libraries.
@JonnyCrackers
@JonnyCrackers Ай бұрын
The inflection is still all over the place and it places emphasis on the wrong words. Until they can figure out how to make TTS get that right, it's always going to sound weird.
@Jarods_Journey
@Jarods_Journey Ай бұрын
It's getting there alright, progress seems to be steady
@n_n_hapi
@n_n_hapi Ай бұрын
Holy smokes, that Melina was amazing
@notnotandrew
@notnotandrew 28 күн бұрын
Based on only a 4-second audio clip! I think the word "inclined" was the only word where it slightly broke character.
@OnlyJavascript
@OnlyJavascript Ай бұрын
Wow, I just subscribed! I've recently started exploring TTS and how it works, and I finally found a guru!
@Jarods_Journey
@Jarods_Journey Ай бұрын
Glad to have you :)
@shawn4990
@shawn4990 Ай бұрын
Wow... I had mentioned 'emotion' in an earlier post and bam! I had a feeling these models were in the works.
@robertotomas
@robertotomas Ай бұрын
I am also interested in training it with foreign languages. Do you know how much ram it would take? do the samples have to be well labelled (emotionally)?
@jjjbeastfd
@jjjbeastfd Ай бұрын
Please expand on this more and make a website too. I'm very interested in this, for sure! Youre the best, Jarod!
@rodrigovieirastudies
@rodrigovieirastudies Ай бұрын
Wow!!! This would be a fabulous addition to the audiobook maker!
@BloodyLiFe255
@BloodyLiFe255 Ай бұрын
I will use in my work, and make audiobooks out of it most probably, when i make it (will take some time) ill tag you
@rodrigovieirastudies
@rodrigovieirastudies Ай бұрын
@@BloodyLiFe255 , thanks! That's very kind.
@Random_person_07
@Random_person_07 Ай бұрын
Holy crap the zero shot is so good better than GPT SoVITS and XTTS and it doesnt even take long to infrence this is awesome!
@ponywarrioryt
@ponywarrioryt 27 күн бұрын
Wow that's insane how good this is for something you can run locally! Would love a webui of this.
@megamayo2500
@megamayo2500 Ай бұрын
Finally, a competitor to Bark AI. Bark was the only AI App that could do this back then. From what you describe from its architecture, sounds like this is a self existing build from Voice craft. It is known to be highly efficient in voice cloning direct from audio. The only catch is that a partial amount of the code is Linux based. I look forward to your WebUI contribution, although I'm curious on how these models can be trained. Overall amazing.
@GraveUypo
@GraveUypo Ай бұрын
being linux based is a plus to me.
@tierdropp7544
@tierdropp7544 Ай бұрын
thats literally based
@metalmassacrefilms
@metalmassacrefilms Ай бұрын
So it wouldn't work in Windows? As long as I can run in a Ubuntu sort of VM, it's fine for me.
@nodewizard
@nodewizard Ай бұрын
Very good quality! Progress being made in TTS.
@joseeduardobolisfortes
@joseeduardobolisfortes Ай бұрын
Years ago, I tried to start a project called "Vox Render", to render audio using the same principles used for image rendering: creating the audio from fundamental sinusoids and harmonics formed by the resonant chamber, but I didn't get very far due to hardware limitations at the time and never got around to it. One of the things I intended to do was create a markup language to add accents and emotional tones to the text that was to be rendered; I now realize that this kind of language would be perfect for these text-to-speech tools.
@TheWebgecko
@TheWebgecko Ай бұрын
Looking forward to when it is eventually able to understand tone in relation to sentence structure
@morpheusnotes
@morpheusnotes Ай бұрын
Holy Moly! This is 11labs quality right there.
@4.0.4
@4.0.4 Ай бұрын
Nah it's good but not THAT good... yet.🤞🏻
@wickedjutto
@wickedjutto Ай бұрын
Absolutely wild! Thanks for sharing!
@Rejekts
@Rejekts Ай бұрын
I'm adding this to my voice cloning webui for sure! real nice
@hamburger--fries
@hamburger--fries Ай бұрын
This is very easy to add to any web app or mobile app. I just made a simple app to allow a writer to upload a PDF and the app outputs a .wav file.
@SuperUniqueHandle
@SuperUniqueHandle 27 күн бұрын
This seems like a huge step up in open source TTS!
@sinayagubi8805
@sinayagubi8805 Ай бұрын
Please make a tutorial how to finetune it on a language it doesn't support maybe on Runpod. that would be amazing since that's where the money is. Edit: Subscribed.
@TheZaky80
@TheZaky80 13 күн бұрын
We would appreciate your efforts if you make another video for training this model in different languages, this will be very helpful
@metalmassacrefilms
@metalmassacrefilms Ай бұрын
How many variety of voices? And do we have commercial use to use them? I used to subscribe to Altered, which also had a text to speech with emotion, but they are very expensive and the emotions are not so good as this one. They also treated me bad as a customer. I would be glad to change to another TTS solution. Open source or free, as long as I have commercial use.
@dthSinthoras
@dthSinthoras Ай бұрын
Appreciate you cover the other languages thing! :)
@mumarr4690
@mumarr4690 Ай бұрын
this is exactly what im looking for, currently making a tools to make translated dubbing from a video. and right now i used microsoft tts which doesn't have "emotion" but fast
@fulldivemedia
@fulldivemedia Ай бұрын
Damn that is awesome, if you can make it work for tourists and rvc together in audio book maker
@nigeldogg
@nigeldogg Ай бұрын
Future meme: “she said with disgust”
@DanIel-fl1vc
@DanIel-fl1vc Ай бұрын
I use Applio to read text from ebooks. Paste in the parts of the ebook, generate natural sounding speech then convert it using a voice model. The tts is the bottleneck, preventing it from sounding indistinguishable from a real speaker. What's the most natural, convincing tts? Preferably with some UI so it's easy to copy paste an entire book in there and generate audio files that can be voice transferred.
@DataJuggler
@DataJuggler Ай бұрын
1:21 It sounds good, but it sounds like there is a deliberate pause between words that doesn't sound natural. I think you would have to speed it up some to sound better.
@kthalas
@kthalas 28 күн бұрын
It sounds like Jordan Peterson lol
@ElaraArale
@ElaraArale Ай бұрын
Man, you rock, thanks for the knowledge, for real.
@romanochelommmiii1526
@romanochelommmiii1526 25 күн бұрын
wow amazing bro please a web ui for the audio books will really be great thanks for your work
@dolboeb-tz4bw
@dolboeb-tz4bw Ай бұрын
Can I use a processor for generating or an AMD graphics card? Could you teach me how to train models for other languages?
@vuquangtruong5950
@vuquangtruong5950 Ай бұрын
Awesome bro. Good tutorial!
@VaibhavShewale
@VaibhavShewale Ай бұрын
ooh man, this is so amazing and clean!
@Katsumi_Maki
@Katsumi_Maki Ай бұрын
The 0 shot performance omg
@635574
@635574 Ай бұрын
Can we use this for game audio or what is the license? In interested in makibg a game where one of two MCs is an AI bot.
@TeamDman
@TeamDman Ай бұрын
wow that cloning is crazy
@Random_person_07
@Random_person_07 20 күн бұрын
I think E2 sounds better with emotions and voice F5 might need more training but its still pretty good
@IvarDaigon
@IvarDaigon Ай бұрын
you forgot to mention the best part.. its only 1.4GB unquantized so its ripe for optimization in on-device use cases.
@Jarods_Journey
@Jarods_Journey Ай бұрын
Not only that, it does inference on less than 5gb of VRAM :)!
@devon9374
@devon9374 21 күн бұрын
@@Jarods_Journey *Rubs had like Birdman* We need an alternative to ElevenLabs
@jonascale
@jonascale Ай бұрын
dude! how are you getting that to run without using the web interface? i have been trying to figure this out for days since i first seen your video. I would love to integrate this into my own project but i must be missing something. any chance you could do a deeper dive on this? this is the lowest latency that i have come across so far and think it would be perfect to give my own waifu a decent and responsive voice.
@derjungejesusjunge2047
@derjungejesusjunge2047 Ай бұрын
Did you try w-okada new TTS?
@flykiller
@flykiller Ай бұрын
Did you also check GPT-SoVITS TTS? I just found this after some new TTS comparison posts appeared on reddit with F5-TTS. It flew under my radar but V2 version looks very good on comparison examples. Couldn't check it out much but it seems that finetuning is very fast and resulting model output is also small like RVC models.
@Jarods_Journey
@Jarods_Journey Ай бұрын
I'm in the same boat as you, I probably saw the same comparison. I had tried out this repo when it released but both the demo and results were not impressive. Given that much better demo and display of its capabilities with FT, I'm gonna make my way back around to it
@JieTie
@JieTie 17 күн бұрын
It would be rly cool if the sotfware had feature to change already recorded audio to sound like sample audio :) do you know any ai open source software that could do that? :) I know there is RVC but you have to have a train model first, and that model requries ~15min of audio.
@nekomiruku
@nekomiruku Ай бұрын
I just want a voice mod that is a bit higher than my actual voice because its hard to keep it higher pitch without my throat getting tired
@ephimp3189
@ephimp3189 Ай бұрын
why did it mess up the "you" tone in "I don't want to hear you" while it was correctly saying "you" in 2 previous segments?
@Jarods_Journey
@Jarods_Journey Ай бұрын
Still not perfect in all cases, we leave that up to probability
@AlgorithmInstituteofBR
@AlgorithmInstituteofBR Ай бұрын
Try Pinoki homie...that command line looks like a nightmare. Salute!
@kait3n10
@kait3n10 Ай бұрын
With this tech, can you make Melina sound angry or sad if you just have her normal speech sample?
@Jarods_Journey
@Jarods_Journey Ай бұрын
This is something I'm looking into
@4.0.4
@4.0.4 Ай бұрын
If making a webui is a lot of work, you can just make a PR for some webui that's already out there
@Jarods_Journey
@Jarods_Journey Ай бұрын
I believe fakery has made one, he's pretty active so for inference, I don't have to get anything up and going.
@BloodyLiFe255
@BloodyLiFe255 Ай бұрын
Amazing man, thank you for sharing
@stevewarby12
@stevewarby12 25 күн бұрын
Is this to be added to the audiobook as a voice option ??
@hiddendrifts
@hiddendrifts Ай бұрын
imagine the day we get an open source version of openai's advanced voice mode. the age of the ai waifu dawns upon us
@keltyll
@keltyll 29 күн бұрын
Next year I bet, let's be patient
@ayanshproplayer5559
@ayanshproplayer5559 Ай бұрын
Not new but i think a emotional voice that can't create positive and push forward to in bad (Not Sleep) time
@silentswitch1309
@silentswitch1309 26 күн бұрын
Hi I was wondering if you can make a new updated tutorial for the audiobook...there's alot of scattered info from old to new versions on the channel and I've just been surfing your channel from the past 1hr fully confused...I just want to make a customized audiobook😭😭😭
@jeffwads
@jeffwads Ай бұрын
By the way, Tortoise TTS always did emotions as well. You just had to put (sad) (happy) in the text to get it. Been around for years now.
@Jarods_Journey
@Jarods_Journey Ай бұрын
Only issue is those were suggestions to the model, this is much more controllable. Tortoise you could also train anger tokens and whisper tokens into it by explicit data labeling - the beauty here is you don't need that explicit labeling to control the output
@azaharia10
@azaharia10 25 күн бұрын
That’s sound mind blown amazingly goodCan I use the celebrities model like Ariana Grande or Cat Valentine model for F5-TTS
@HadrianAibe
@HadrianAibe 18 күн бұрын
How big can the ref_audio file be 10 seconds? 50 seconds?
@jujjuj7676
@jujjuj7676 Ай бұрын
We need whisper voice training!!! Not sad or angry. Out of normal usage whisper or speaking quietly is used more then both angry and sad..its just more useful. To convey points....
@Jarods_Journey
@Jarods_Journey Ай бұрын
I'll have to see what this models whispering capabilities are
@jujjuj7676
@jujjuj7676 Ай бұрын
@@Jarods_Journey good luck, I have tried its almost impossible but maybe you got better ninja skills..👍
@lanaferraii2184
@lanaferraii2184 Ай бұрын
I also need a calm relaxing voice, would be highly appreciated if you find one ❤
@Making_Random_Edits
@Making_Random_Edits 4 күн бұрын
Can this work on text-generation? Like u know the AI chat characters?
@trashboatex
@trashboatex 25 күн бұрын
Damn we just need some ethical voice choices you know, like professional grade voices we can clone and use without running into copyright.
@udayrajpatel9048
@udayrajpatel9048 Ай бұрын
Fantastic tutorial ❤❤❤❤❤❤
@VikashKumar-t4g5v
@VikashKumar-t4g5v 27 күн бұрын
can you suggest me any cloud gpu provider who can attach our micrphones... please make a video on it. without rtx using realtime voice changer through cloud gpu and less delays.
@gumvue.studio
@gumvue.studio 18 күн бұрын
how use the emotions?
@SavvyStaks
@SavvyStaks Ай бұрын
I can't use it plz give a detail video on it, i am trying to run it in lightning ai code editor but its not working
@Jarods_Journey
@Jarods_Journey Ай бұрын
I can probably draft up a quick tutorial on how to use it in the terminal, but no indepth webui or package
@johnovercash1798
@johnovercash1798 Ай бұрын
Can you a video that start from the beginning not from the middle?
@notnotandrew
@notnotandrew 28 күн бұрын
If only you could give it a natural language instruction with regard to the tone of voice, or if there was some way to annotate subsections of the provided text with particular emotions, tones, pitches, timbres, timings, etc.
@Happynut72
@Happynut72 27 күн бұрын
Is there a way to bring down the vram usage. I have 16gb and it runs at 100%.
@EditorLue
@EditorLue Ай бұрын
bro make a video on training F5-TTS on custom data and bro can you make video on how to make a unique voice that doesnot exist by blending multiple voice using ai like VITS and F5TTS
@dragon3602010
@dragon3602010 28 күн бұрын
Is it available in others languages like French?
@Entity303GB
@Entity303GB 28 күн бұрын
only english and chines :((((
@OrsotarBarr
@OrsotarBarr Ай бұрын
First voice sounds like Maria from Silent Hill 2
@hdgdhnxbdx1619
@hdgdhnxbdx1619 Ай бұрын
Wait was that badger from brraking bad LMFAO
@mtaliamino7169
@mtaliamino7169 Ай бұрын
Hey buddy how can I contact you ?
@svenbjorn9700
@svenbjorn9700 Ай бұрын
Are there any GUI solutions for audiobook-length TTS? Something accessible to normies?
@Jarods_Journey
@Jarods_Journey Ай бұрын
The project I'm working on segments text files into smaller chunks that can be used to create an audiobook. Segmenting is the only way right now because long context audio generation would take too much compute and too much time
@svenbjorn9700
@svenbjorn9700 Ай бұрын
​@@Jarods_Journey Looks promising, but requires ultra-not-normie stuff like manually installing and configuring an entire separate alpha-stage github project (a >20min technical tutorial video from you and you're already experienced, literally inconceivable for normies), plus you have a note saying progress was paused a year ago :/ I will pay money ($5-$20) for a one-click installer if you ever complete this. I feel like there's a huge fucking market for this, but all the finished products get greedy and do subscription services so those are the only things that exist.
@EDashMan
@EDashMan Ай бұрын
What would you say is best text to speech model to try out because traditionally when using ai models I’ve had to use google google text to speech verse and then concert speech to speech in the ai models voice and then it’s not as clear or accurate sounding
@Jarods_Journey
@Jarods_Journey Ай бұрын
Mmph, right now, I'd say to try out this model, F5TTS. It's pretty good for starters. Then would be xtts, then styletts2, then tortoise
@EDashMan
@EDashMan Ай бұрын
@@Jarods_Journey what model do you think character ai uses? I’d love to try the model, it’s pretty good at instant voice clone with only 10sec of audio
@vickmackey24
@vickmackey24 Ай бұрын
Inflection points are wrong. Good, but not as good as OpenAI's advanced voice mode.
@bikgrow
@bikgrow Ай бұрын
We want WebUi for this model soon!!
@siddarth26
@siddarth26 Ай бұрын
Brother we are waiting for your webui
@gaeonx
@gaeonx Ай бұрын
Do you know if there's any way that i can use an ai voice model (pth file) to do something like TTS like this? or if it's even possible??
@Jarods_Journey
@Jarods_Journey Ай бұрын
Well, this uses its own model architecture which are saved in .pt files. Other models from other architectures are not compatible even if they have the same file type due to fundamental differences in the code
@l4l01234
@l4l01234 Ай бұрын
Just take the output audio files from this and plug it into RVC (audio-to-audio) to use any of the existing RVC models
@635574
@635574 Ай бұрын
But which part of this does OP want? the short sample to a good voice or the voice itself for other TTS or even realtime voice changer?
@LOC-Ness
@LOC-Ness Ай бұрын
can this do voice to voice though
@me-cm8or
@me-cm8or Ай бұрын
Damn that’s so good, it only got English language?
@me-cm8or
@me-cm8or Ай бұрын
Just checked their GitHub sadly it’s only English/ Chinese. To train another language you basically need to have over 10k hours or something 💀
@Jarods_Journey
@Jarods_Journey Ай бұрын
Only english and chinese right now
@bossgd100
@bossgd100 Ай бұрын
How fast it is ? Can it works in live / streaming mode ?
@Jarods_Journey
@Jarods_Journey Ай бұрын
Yes, it's pretty fast, 1 second inference for like 20 seconds of audio on my 4090
@Sitki-w4n
@Sitki-w4n 22 күн бұрын
can we have this with webui
@Cloudwalker2k3
@Cloudwalker2k3 Ай бұрын
I may have to ask for a tutorial on install of this.
@steve-g3j6b
@steve-g3j6b Ай бұрын
whats is the audiobook maker :D
@Jarods_Journey
@Jarods_Journey Ай бұрын
A project I'm working on to make audiobooks lol
@steve-g3j6b
@steve-g3j6b Ай бұрын
@@Jarods_Journey 🤤🤤
@analia390
@analia390 Ай бұрын
you, monster! (Ariel)
@HushHunt
@HushHunt Ай бұрын
Can I install in my pc i3 11th gen, no graphics card
@finalblast3825
@finalblast3825 Ай бұрын
You can use CPU but it will be unbearably slow. I suggest obtaining any nvidia card above 8 GB VRAM for this text to speech to work. I am using a GTX 1080 with 8 gigs and it works great with only up to 20 seconds of waiting time to generate.
@anagnorisis2024
@anagnorisis2024 Ай бұрын
does this work with webui like xtts?
@Jarods_Journey
@Jarods_Journey Ай бұрын
It would need to be built by someone, but looks like fakery has a gradio app up for the repo
@onlyyoucanstopevil9024
@onlyyoucanstopevil9024 Ай бұрын
AWESOME 😊😊😊
@PatchworxStudios
@PatchworxStudios Ай бұрын
Ok everything nice and dandy. But Ai evolving to fast to get a userfriendly version or comunity. I have so much pytons in my vram like i am an reptile zoo. Pls ignore all the Lamas. I am tired.
@GettingMiggyWithIt
@GettingMiggyWithIt 29 күн бұрын
holy moley
@pragmata7997
@pragmata7997 Ай бұрын
amazing
@visual_rev6192
@visual_rev6192 29 күн бұрын
The beginning was pretty bad both the sad and anger i thought was very terrible but hey we are at a good start so it's good in a sense that we never had this stuff before but it's still no where neqr as good as the real stuff but its making good progress can't wait for 2030 or even 2027 to see how much better quality thing's get
@Jarods_Journey
@Jarods_Journey 29 күн бұрын
As they say, it's the worse it'll ever be
@edderuiz
@edderuiz Ай бұрын
A model for spanish please ?
@omargoodman2999
@omargoodman2999 Ай бұрын
Meh, Evil Neuro still sounds more expressive and natural, I think.
@tylerboy19yp
@tylerboy19yp Ай бұрын
is it finetuneable?
@Jarods_Journey
@Jarods_Journey Ай бұрын
I believe the authors said its possible. IDK how to go about that rn though, so that's for a future investigation
@abhinavbisht9851
@abhinavbisht9851 Ай бұрын
How do you install it and use it...?
@Jarods_Journey
@Jarods_Journey Ай бұрын
You can follow their github to get the requirements, but I won't be creating a tutorial for it for awhile
@abhinavbisht9851
@abhinavbisht9851 Ай бұрын
@@Jarods_Journey yes I did...
@researchandbuild1751
@researchandbuild1751 Ай бұрын
That reference audio is already pretty bad lol
@GraveUypo
@GraveUypo Ай бұрын
idk it sounds extremely artificial to me [edit] well the elden ring example sounded really good. i guess you need a good sample to get good results
@Jarods_Journey
@Jarods_Journey Ай бұрын
I'd say Melina's voice is probably closer to content in the source training (audiobooks and podcasts) which helps a lot.
@hinduhistory7466
@hinduhistory7466 22 күн бұрын
hOME ALONE gANGSTA SCENE
Каха и лужа  #непосредственнокаха
00:15
Which team will win? Team Joy or Team Gumball?! 🤔
00:29
BigSchool
Рет қаралды 15 МЛН
бабл ти гель для душа // Eva mash
01:00
EVA mash
Рет қаралды 9 МЛН
Мама у нас строгая
00:20
VAVAN
Рет қаралды 2,8 МЛН
SpaceX's Starship is about to make History! Here's what will happen!
19:15
Anime Dere Voice Acting with Advanced ChatGPT Voice
22:30
Jarods Journey
Рет қаралды 34 М.
Why Are Open Source Alternatives So Bad?
13:06
Eric Murphy
Рет қаралды 677 М.
This is how I scrape 99% websites via LLM
22:44
AI Jason
Рет қаралды 80 М.
Vim Tips I Wish I Knew Earlier
23:00
Sebastian Daschner
Рет қаралды 77 М.
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
5:18
warpdotdev
Рет қаралды 122 М.
دبلومة uiux || المحاضرة 8 || Ux Process With Notion
36:27
Каха и лужа  #непосредственнокаха
00:15