i absolutely did not expect chatgpt able to do an "ara-ara" impression
@Billy4321ableАй бұрын
I thought OpenAI might have been exaggerating and cherry-picking the responses but NO this is truly REVOLUTIONARY in the AI voice space. You could literally practice your language skills with something like this for hours it's crazy. Not even just learning a language you could literally learn to voice act in another language with fantasy style voices. This is crazy! I'm afraid people are going to stop talking to each other if this ever becomes cheap enough to use full time. They sound like way more entertaining speaking partners than anyone I know. It's kinda messed up honestly. I struggle to see how people are going to adapt to having easy access to AI friends in the future. It only gets worse once they get robot bodies... This is going to be like real life Chobits.
@Jarods_JourneyАй бұрын
It's definitely a game changer, I'm 100% gonna be talking with it more lol. The quality coming out was literally fiction 2 years ago.
@BlastHeart96Ай бұрын
Wow AI is terrifying, but amazing at the same time. I wonder if ChatGPT will ever be able to “voice act” audio books by reading ahead and analyzing content clues, tones, setting, context, etc. maybe being generating sounds or bgm that fits the situation.
@GraveUypoАй бұрын
this is 100% able to do that. i just want an open model that does this too.
@Katsumi_MakiАй бұрын
I-It's not like I want to chat with you or anything, baka! 5:07
@Yaksha_IndraАй бұрын
I was cringing like this en.wikipedia.org/wiki/Tetanus#/media/File:Opisthotonus_in_a_patient_suffering_from_tetanus_-_Painting_by_Sir_Charles_Bell_-_1809.jpg Almost broke my back
@HogwartsStudyАй бұрын
That's crazy!!! I can't wait for this quality in a local Applio type solution.
@blasandresayalagarcia3472Ай бұрын
That's amazing! Im actually working on a textbook to audiobook project too 😂 was looking to selfhost something but didn't find anything so started making it.
@seifuishiguroАй бұрын
7:01 Exactly, I really wonder where they got all that voice data and how they went about labelling it. Human speech is complicated with so many different styles of speaking and ways of responding, if only other companies and open source teams could get their hands on such kind of a dataset.
@phenix5609Ай бұрын
First off Whoaaaa!! The voice gen, this give me really sad feelings we can get this kind of voice in local yet. Second youtube know me too well scary sometimes, i was also thinking about making an audiobook reader, i’m really curious about 2-3 thing tough, if you are willing to share, does your run all locally ? If so what voice or where do you found the voice, you want to, or what do you use to clone them if you clone some voice, what do you think is better as of now for this type of stuff full locally ? Last thing i think i saw a 24 ram, u must run with a 3090 or 4090, how much do you think is needed to run this without too much lag or wait time, can it be done with a 3080 10 go vram, 32 go ram ? Or is it to low ? i hope you could answer me thx.
@Murderface666Ай бұрын
The voice made a error speaking Japanese. "Haisai" is Okinawan for "hello."
@Jarods_JourneyАй бұрын
I had never heard haisai until that day lol
@CosmicTavernАй бұрын
This is INSANE. Need this on local.
@Jarods_JourneyАй бұрын
Same 😭
@shreyashmore1824Ай бұрын
Yes please Local 🥲
@DrewWaltonАй бұрын
Minor clarification: Standard voice mode (which has been available for a while) is bog standard speech-to-text, runs the transcribed text prompt against the model, and re-transcribes the text back to speech. Advanced Voice mode is *not* text-to-speech or speech-to-text *at all*. GPT-4o is natively multi-modal, so what you're actually experiencing is something called "voice-to-voice." That's right, it is, in a very real sense, hearing what you say and how you say it. Per my understanding, the only speech-to-text that really happens is for providing the text transcript of your conversations.
@Jarods_JourneyАй бұрын
This is correct, I make the correction a little later in the video, but yes, it encodes and decodes audio natively without needing to pipeline other networks in tandem which is impressive - I think we'll start seeing this paradigm more and more
@DrewWaltonАй бұрын
@@Jarods_Journey and of course I wrote this comment before getting to that part 😆 But yeah, absolutely. I've been having way too much fun with AV mode. $20 a month for something that's more fun than a barrel o' monkeys and extremely useful to boot? Shut up and take my money.
@leodark_animations2084Ай бұрын
@@DrewWalton Regarding how the model works in details you can find the info on OpenAi website or somewhere else? i'm pretty interested in understanding it more
@budbinАй бұрын
2:15 that's crazy
@saintkamus14Ай бұрын
"who do you even label that" been wondering that myself. What I think is going on, is that they have AIs that do the labeling from the raw data (they first become "experts" then they can do the labeling)
@darkreader01Ай бұрын
It would be crazy if we could generate a audio book or hear a full story book in that chatgpt advanced voice with emotions.
@Jarods_JourneyАй бұрын
That day will surely come. The only question is when 😅
@johansparr2409Ай бұрын
Awesome to hear, also love the audiobook update, looking forward to using it!!😊
@farrael004Ай бұрын
They probably didn't have labeled data. It's most likely pretrained on a massive amount of audio from youtube and other recordings from different languages, then fine-tuned using the voices available.
@Jarods_JourneyАй бұрын
I'd have to think the fine-tuning portion had some type of labeling since it has to understand what whispering is. Pretraining portion was most definitely unsupervised
@mirek190Ай бұрын
@@Jarods_Journey actually no .... Is the same situation with multimodal llms .... you are not even training model with pictures ... you are just adding projector to llm and ..is working just like that... magic
@basspigАй бұрын
Wow the Japanese language chat GPT voice is pretty darn convincing. Although it does sound like a Caucasian person speaking Japanese rather than a Japanese person speaking japanese. I'll be really impressed when it can do the voice of my favorite anime character.
@나익명Ай бұрын
Yeah the Australian accent also still sounds like an American person
@KnutNukemАй бұрын
Pls add an label function like [SPEAKER NAME], to automatically mark lines to a voice. Best would be, that all following lines would be added to a speaker after marking it. Could be a toggleable option. Also let multiple lines be marked and their voice set via CTRL + CLICK or SHIFT + CLICK to mark a bunch.
@Jarods_JourneyАй бұрын
Ooh, this might make for a good alternative tab to set it up for the speaker formats. I'll think about this one
@macmcleod1188Ай бұрын
6:00 ... mind blown.
@om474123 күн бұрын
Csn you tell me the promt you used in this video?
@nanieve4296Ай бұрын
DONT MIND IF I DO WITH ALL OF THOSE VOICE SET!
@TonyMezaXDАй бұрын
The British voice was the only bad impression. It sounded like an American attempting to do a British accent.
@sownheardАй бұрын
Wait isn't the voice American voice pretending to be British
@spiker.c6058Ай бұрын
Yeah the English voice are american unless they add a proper UK English language setting with proper UK English voices. But why would they do it just to get another english accent.
@TonyMezaXDАй бұрын
@@sownheard If that’s the case then it’s spot on. I just figured since it could switch to other languages easily it should be able to switch English dialects as well.
@TonyMezaXDАй бұрын
@@spiker.c6058 Oh in that case it was spot on.
@adolphgracius9996Ай бұрын
Can it say Yaaamite?
@radiantholeАй бұрын
My man got that Nihongo jozu
@matty.j_1997Ай бұрын
Great demo! How were you able to screen-record the Advanced Voice?
@Jarods_JourneyАй бұрын
Using just my phone's native recorder, then I just synced it up in editting
@matty.j_1997Ай бұрын
@@Jarods_Journey Yeah but it sounds like the voice was also recorded „officially“
@Jarods_JourneyАй бұрын
I guess Samsung is just that good 😅. Just the Samsung screen recorder with media sound enabled
@dthSinthorasАй бұрын
What is really missing in Audiobook Maker, that would make it usable, would be the possibility to use other languages.
@dadadiesАй бұрын
Whats your audio book maker? Is it some sort of audio dialog maker with different characters all in one interface? I wonder if it can be adapted for interactive dialog such as in a game. Especially if someone attaches a AI LLM to it that can generate dialogs based on their interactions. Youd have an even more special system (with other people contributing in those other areas).
@Jarods_JourneyАй бұрын
It's a tool that you can load up a text file and use tortoise TTS/styletts to generate audiobooks with. Currently adding features to it like different speakers for sentences, etc!
@Barrel_Of_LubeАй бұрын
it pretty much got all the languages (claimed by an openai dev) thats fking crazy
@AkuralliaАй бұрын
CAAAARA Isso é ABSURDO de incrível! Simplesmente magnífico 🤩
@NFawcАй бұрын
Looks like after running the generation, the voice settings for each line is lost? ie: After you ran the generation the line colours (speakers) were all lost (they all changed to grey)? ps: VERY interested in the audiobook maker, especially with a good TTS generator.
@Jarods_JourneyАй бұрын
Just for the generation, to show they were complete. If you load the audiobook again, it'll restore the colors and associated speaker
@NFawcАй бұрын
@@Jarods_Journey Understood. But after a generation, if you then want to regen just a sentence or two again, wouldn't it be good to have colour/settings as before the (full) generation?
@adamrastrand9409Ай бұрын
When will it be available in Sweden in EU would it be like next week October 5 or would it be like in six months or so
@Jarods_JourneyАй бұрын
I'm not too certain, you might wanna keep up with openAI to see when they announce it for non US countries
@SnafuuuАй бұрын
"I can't imagine the data needed for this" I can, it's the whole damn internet 💀
@nomadv7860Ай бұрын
Just wanted to point out it’s not text-to-speech, it’s able to hear the actual tone and emotion of your voice
@cookiefrnamikaze1674Ай бұрын
man you are such a masterpiece for asking the Ara ara
@bananalord9288Ай бұрын
goddammnit I lost it during the japanese demonstration XD
@Mika43344Ай бұрын
Did you hear about eleven labs reader?
@iseahosbourne9064Ай бұрын
Hey jarod, whats the best ai voice cloning tool as of today? RVC, Xtts, tortoise tts etc?
@Jarods_JourneyАй бұрын
Local? Pipe xtts/tortoise into RVC, and it's still very solid. Via corpo? Elevenlabs for sure.
@iseahosbourne9064Ай бұрын
@@Jarods_Journey Thanks for the info jarod!
@iseahosbourne906415 күн бұрын
@@Jarods_JourneyWas just thinking, what is the best tts to train,xtts finetune,tortoise? Im having trouble with xtts generating consistent speech. Not sure if thats because I trained a model with 21min of data doh.
@mal-avcisi9783Ай бұрын
This is insane
@tylerboy19ypАй бұрын
hey jarod what is the best current voice cloning fine tune tts model right now i can run locally?
@Jarods_JourneyАй бұрын
Gonna be either xtts/tortoise or styletts, and then I'm trying out parlertts, so we'll see how this comes along to see if I can recommend it
@tylerboy19ypАй бұрын
@@Jarods_Journey tried xtts base model i can't seem to get the dependencies to work with the fine tune model, have you heard doppleAI's voice models? they sound really good with only 1-3 minutes of audio
@spiker.c6058Ай бұрын
Ara ... ara .... my God !!!!
@WistrelChiantiАй бұрын
What a time to be alive!
@Англичанин_РАй бұрын
As long as you are not the native-level speaker, the robot will sound to you like a human.😅 How long can you chat to a robot without getting annoyed?
@xaiyeon_xiuzhenАй бұрын
OMG idc if im cooked that was awesome :D
@BoomBillionАй бұрын
😂try american tourist voice speaking Spanish.
@나익명Ай бұрын
Same with Australian accent haha
@Airbender131090Ай бұрын
This is riduculos😮 imagine this in video game npcs 😮 thats crazy
@ariverosmgАй бұрын
You don't need to lable things, it "understands" what you mean, it learned to "reason" slightly, no human needed in that loop then.
@Jarods_JourneyАй бұрын
I'm firm to believe a little bit of labeling is still needed to get it to understand all the pre training data, I just think in this case they have trained classifiers that can accurately label specific types of data. No human in the loop here, just classification models and lots of compute 😅
@Crazy_TruthАй бұрын
Plus plan 😢
@Jarods_JourneyАй бұрын
Unfortunately 😅
@justindressler5992Ай бұрын
Is the Japanese accurate and understandable i would love to learn Japanese through chat.
@TrC450Ай бұрын
Yes, at least what was shown here.
@Jarods_JourneyАй бұрын
Understandable, absolutely. It is more than accurate enough I'd say, but I'm not at a level high enough to determine whether or not it was using natural expressions. Though, the anime impersonations were pretty good imo
@baltakateiАй бұрын
We're so cooked.
@saint115ioАй бұрын
Wait, are you japanese? 😮😮
@Jarods_JourneyАй бұрын
A small bit :)
@DraggtarАй бұрын
The answer is: GPT chat creators are otakus
@hypersonicmonkeybrains3418Ай бұрын
I tried it and the audio quality is atrocious, bitcrushed, compressed, choppy and analog sounding. Not at all impressed, i would never consider paying a subscription for an Ai voice who's main feature is it can put on accents. I then tried Gemini voice and its crystal clear... Inflection PI AI, crystal clear... what gives.
@dadadiesАй бұрын
Her dramatic movie trailer narrator voice skill sucks. It sounds like someone's mom failing to do the task. Maybe if you asked her to make it sound professional i wonder if she would have sounded better.
@Jarods_JourneyАй бұрын
It may have been influenced by earlier portions of chat, but I'd agree it wasn't the best as shown here!
@FrankHouston-v5eАй бұрын
This is much better than OpenAI faked voice demo 🧐.