Building with Gemini 2.0: Native audio output

Рет қаралды 32,435

Күн бұрын

Пікірлер: 123

@raghavamrev5245 Күн бұрын

Yes!! replace the traditional TTS! Please bring this in google play books! I would love to have my books being read to me like an audio book! Game changer!

@alias_ansuz3336 Күн бұрын

Up!

@kromanfr 20 сағат бұрын

@@alias_ansuz3336it's already possible with IIReader

@maxcomperatore Күн бұрын

the speed of this is astonishing

@Tinman462 Күн бұрын

This is how the world ends... one perfectly-pitched whisper at a time 😊

@sj00100 Күн бұрын

Yeah remember when world ended when we had text to speech for years

@vectoralphaSec Күн бұрын

Ill take it.

@DaveK-q9y Күн бұрын

The whisper… all those ASMR youtube videos were useful

@BeepBeepBeepbop Күн бұрын

SOO exited for an alternative for OpenAI advanced voice!!!!

@TECHNOSTARTERSS Күн бұрын

When is not quite seamless you are prompting it to speak that way in the Open I want you don’t have to prompt it. It automatically adapts to you and it’s voice to voice speech to speech. This one seems like text to speech.

@raydosson2025 Күн бұрын

@@TECHNOSTARTERSS this one is not text to speech. that's why the title is "Native audio output".

@UrYesMan 5 сағат бұрын

This suits Google a lot. Google's been making Gemini very humanlike and now with this, their vision is even clearer.

@hahoang9542 Күн бұрын

Its the early Christmas gift from Google

@wassimharzli336 15 сағат бұрын

I think with this, Gemini will become the number one Ai outperforming OpenAi

@AIrtesan Күн бұрын

And you even change the order of the speakers, making the female voice lead. Kudos to the CX team. Wittily played!

@michaelcharlesthearchangel Күн бұрын

A man of Native American ancestry has been feeding all AI developers for the last decade, behind the scenes.

@aliettienne2907 21 сағат бұрын

2:31 If Gemni could handle both fast speaking and whisperings it shows how much robust ingenuity and development that went to devising the AI model. 😎💯💪🏾👍🏾

@aron2922 Күн бұрын

Her is truly here

@__J____ff 23 сағат бұрын

it's him & her .... hhhhhhhhh

@aiforculture Күн бұрын

Exceptional work 👏 love the example of the model intelligently adapting to fit the speed of reply it thinks you need.

@Momixer Күн бұрын

Yes! Please use this for the different KZbin soundtracks, because right now the generated ones are really bad

@CODE7X Күн бұрын

This isnt new but maybe its better than what was out there before! Cant wait to try it

@friedpizza262 Күн бұрын

Whoever made this video is cool

@LaPetiteCuillère Күн бұрын

when is available ?

@HUEHUEUHEPony Күн бұрын

As soon as Google kill their older products

@aquilesdg4305 Күн бұрын

I think it already is

@JaBigKneeGap Күн бұрын

@@aquilesdg4305 And _where_ exactly is it available?

@Ethereal_Enigma Күн бұрын

It's available right now in Google ai studio @@aquilesdg4305

@MainInternetUser Күн бұрын

Right now on AI Studio

@OumarDicko-c5i Күн бұрын

I will build my IA girlfriend now 😂

@CODE7X Күн бұрын

Haha yes

@games528 Күн бұрын

Ah yes, Irtafacial Antelligence

@aron2922 Күн бұрын

@@games528 This is funnier than it should be

@IceMetalPunk Күн бұрын

@@games528 In many languages, the adjective comes after the noun.

@flyingstapler1241 Күн бұрын

@@games528 It's called IA in many languages

@ShubharthakSangharsha Күн бұрын

2:32: damnnn ok am I'm impressed 👌 👏

@dliedke 18 сағат бұрын

Raining is good for running, be more excited for the rain AI!

@DangRenBo 4 сағат бұрын

Can we get word timings output together with the tts? We need closed captions for accessibility.

@trutenantedboderampt Күн бұрын

Great! Now we can hear non-sensical facts from history with native audio output!

@IN-pr3lw Күн бұрын

Google doing what OpenAI said they would months ago but we still didnt get 👏

@cagnazzo82 Күн бұрын

Actually with advanced voice I was having it speak english, french, elvish, and simlish in one sentence. The actual game-changer is being able to prompt the AI to do this. You can do this through voice commands with OpenAI, but for some reason ignored the ability to prompt for voices. Plus I think the whole 'her' situation got them rattled from voices almost altogether.

@IceMetalPunk Күн бұрын

? OpenAI Advanced Voice mode is already out

@IN-pr3lw 21 сағат бұрын

@@IceMetalPunk It's out but its not the one in the ads. It doesnt seem to be Audio to Audio, it still seems to be Audio to text to audio. I could be wrong but it doesn't feel like what we were shown.

@IceMetalPunk 17 сағат бұрын

@IN-pr3lw No, Advanced Voice Mode (called the audio-preview model in the API) is fully native audio output. It can do sound effects, tone shifts, voice differences, accents, etc.

@IN-pr3lw 16 сағат бұрын

@IceMetalPunk I cant get to to whisper or anything on mine 🤷‍♂️

@niceplace123 23 сағат бұрын

Look amazing, but did anyone get it to work in the actual AI studio? I ran into a ton of bugs, especially with non-English languages.

@flamyf 23 сағат бұрын

0:01 Has anyone find "Video understanding" demo? All other topics have a video on this channel

@demonsynth Күн бұрын

Mind blown. Playing with it now :)

@janjahrademusic Күн бұрын

haha yoo that's dope ..well done

@jab3966 12 сағат бұрын

if I have access, would this only be in iastudio? Or could I use it in code to run other tests, such that the output is audio?

@ThomasOberhoff Күн бұрын

This will put so many call-center agents out of work worldwide

@Chaotic-n5n Күн бұрын

Bro this thing is crazyyy 😱

@DanielMK Күн бұрын

Now that's impressive

@Terrantulla Күн бұрын

I cant help myself but feel like the next decade is going to get very weird

@NutriQlikAI-e4e 15 сағат бұрын

what's the point of releasing 2.0 when all the features are not available to test .. Note: Image and audio generation are in private experimental release, under allowlist. All other features are public experimental.

@MichealAngeloArts Күн бұрын

I don't have the "Output Format" and "Voice" options under "Model" in the AI Studio. I just have the "Token Count" immediately after Model.

@MichealAngeloArts Күн бұрын

I've just figured it out as I have to change from "Create Prompt" to "Stream Realtime" in the left pane. However I can't seem to change the audio effect. Whispering doesn't work with me although it is demonstrated in the Google post. How can we add these audio effects?

@Kiririn Күн бұрын

is the model in the video the flash version? i am unable to get it to whisper or laugh or change how it speaks

@shadydragon22 Күн бұрын

Same here

@ShawnFumo Күн бұрын

I think that's the part that is available in January. It is a bit confusing since they ended with saying to go to ai studio...

@shadydragon22 Күн бұрын

@@ShawnFumo Oh ok I see! Thanks for clarifying

@RichardPinewood Күн бұрын

level 4 AI is the next big thing, thats when Science gonna get intresting 😎

@GeneralKenobi69420 Күн бұрын

That thumbnail goes hard

@Kenykore Күн бұрын

This is so lovely

@999satyam Күн бұрын

ok that Hindi was nice, damn. Is there a paper on this?

@phiarchitect Күн бұрын

nicely done

@BroskiPlays 22 сағат бұрын

This is AVM but with less restrictions

@RickySupriyadi 17 сағат бұрын

how much? will I be able to afford them?

@devagarwal3250 Күн бұрын

woah this is so cool

@MemeConnoisseur Күн бұрын

who's going to fill the hollow the emptiness? idk something is super weird bout ai generated audio trying to be friendly and humanly..

@AnonymousNyanCat-qg6bb 17 сағат бұрын

There is a noticeable difference between the Gemini and the GPT. I am happier with GPT. No doubt, thanks.

@Fwuzeem Күн бұрын

How do we get it?

@rakeshkumarrout2629 Күн бұрын

lets start building with gemini 2.0

@Mediiiicc 22 сағат бұрын

Weird how we can still tell the voice is AI. The lack of errors makes it noticable. Just like how a jewler can tell apart real and fake diamonds since real diamonds have imperfections that fake diamonds dont have.

@AyyazZafar Күн бұрын

I tried but it does not whisper yet.

@snowhan7006 Күн бұрын

incredible❤❤

@DistortedV12 Күн бұрын

GOOGLE ships! Pixel phone stocks jumping!

@Happ1ness Күн бұрын

Hopefully it's not another lie. We all remember the Gemini "hands on demo".

@DarxKies 21 сағат бұрын

That was fun!

@Blooper1980 Күн бұрын

Pretty epic

@ruchirahasaranga8076 Күн бұрын

it does not support Sinhala language!

@pandoraeeris7860 Күн бұрын

I need an agent that can use any program on my computer. Just give us AIOS.

@J3R3MI6 Күн бұрын

Exactly

@CODE7X Күн бұрын

Exactly, but yes its already out , but for browser so far , and not released yet .... I hope google releases one :0

@ShpanMan Күн бұрын

Nothing that OpenAI's model can't do so far, but hey more competition is better for everyone!

@IceMetalPunk Күн бұрын

Hopefully it has cheaper API access. I blew through so much money just testing a few use cases of the OpenAI audio model through the API.

@DominickZollinger-e3r Күн бұрын

❤

@mightynathaniel5355 Күн бұрын

Would be better and more impressive if it kept the same voice or character when switching languages rather than using a totally different voice for each language. But all fun and looking forward to using this model.

@ROHIT-wx4nu Күн бұрын

This is how tts ends😂😂😂

@1brokkolibaum 23 сағат бұрын

I wonder why I am able to use it on my pc, but my phone doesnt have 2.0 unlocked 😮‍💨

@vectoralphaSec Күн бұрын

AGI is coming soon 2025

@JaBigKneeGap Күн бұрын

Dude, I swear. Like, that clock slaps 12 on, idk, january 5? I swear AGI will be here. Or anytime afterward.

@3thinking 15 сағат бұрын

Wild!

@stochastic84 17 сағат бұрын

I thought the voice felt very unnatural, and then the video answered why lol.

@braineaterzombie3981 Күн бұрын

Gimme my sandevistan , time to get chromed up

@pathring Күн бұрын

한국어 스피치는 조금 부자연스럽군요

@lakshiBro Күн бұрын

Oh well.

@MidgarMerc 23 сағат бұрын

Surely this won't be used to cause suffering at the expense of talented voice actors just so rich creeps get even richer. Surely.

@4letterdc Күн бұрын

hell yeah

@InternetKilledTV21 Күн бұрын

Oh Calculon

@andreaserrano3809 22 сағат бұрын

Being playing with it and only supports english lol... I guess this is just a smaller demo model to show for now

@eve_mtpl 18 сағат бұрын

It sounds like an hypocritical IA, can we go back to the i sound like an IA IA

@AstroZoe1804 Күн бұрын

I love it

@dfas1497tcf3 17 сағат бұрын

시도해 봤는데, 아직은 적용안됨.

@ashleigh3021 Күн бұрын

I don’t like the tone, cadence structure. They should call it “podcast voice”

@Selene-xf9yi 19 сағат бұрын

Cool

@BoydLIN-c3w Күн бұрын

I haven’t found Chinese 😂

@notvedxp Күн бұрын

😮

@donny6036 13 сағат бұрын

this is way too natural. i have to keep telling myself that this is generated.

@alejandromedina1019 9 сағат бұрын

banger

@joelcarter9137 Күн бұрын

Wow! That is completely pointless!

@bluepandaman Күн бұрын

What.. are you even talking about. How is this pointless?

@SabiUddin 23 сағат бұрын

Copium

@naughtycat9894 Күн бұрын

the most exciting thing 🎉