Exploring OpenAI's New GPT-4o Audio Preview Model: The Future of AI Audio Processing

Рет қаралды 3,560

Күн бұрын

Пікірлер: 19

@BartSlodyczka 2 ай бұрын

🗂 GET ALL THE CODE FILES: bartslodyczka.gumroad.com/l/jeznwq 📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7 📺Realtime API Tutorial Series: kzbin.info/aero/PLi7jtY2ZZqRYE8Lvw4MuLHTZPYTA4jZHQ&si=7DAE9z7YtQlMrzrd

@AngeloXification 2 ай бұрын

Instant subscription, then I saw you build and provide resources. Excellent content.

@BartSlodyczka 2 ай бұрын

Thanks legend 🤝

@alexanderkingstam5164 2 ай бұрын

You are very pedagogic and explaining very well. Thanks for sharing!

@BartSlodyczka 2 ай бұрын

thank you very much, appreciate this comment 🙏

@GiovanneAfonso 2 ай бұрын

very well structured video and test, great work! hope you do more videos

@BartSlodyczka 2 ай бұрын

thanks legend! Will do 💪

@derherrdirector Ай бұрын

You are an absolute legend! You should have millions of subscribers

@BartSlodyczka Ай бұрын

haha! thank you my man!

@pixelperfectpravin 2 ай бұрын

Most onpoint video 😍 i appreciate you

@BartSlodyczka 2 ай бұрын

thanks man! I appreciate you too 💪

@Rhiever Ай бұрын

If you’re just performing audio to text, is it necessary to specify both text and audio modalities? Will the model just ignore the audio file if you don’t specify both modalities?

@BartSlodyczka Ай бұрын

I haven't tested if the model will ignore it and yeah also not sure if you need to specify both. Made this code a couple weeks back and can't recall from the top of my head 🙏

@vsigal 2 ай бұрын

is it doing diarizarion? separation voices - voice1 - voice2 etc?

@BartSlodyczka 2 ай бұрын

I just tested using short audio with 2 speakers talking to each other. I asked for a transcript of the convo broken down by speaker and it gave me the below: **Speaker 1:** So, Erin, in your email you said you wanted to talk about the exam. **Speaker 2:** Yeah, um, I've just never taken a class with so many different readings. I've managed to keep up with all the assignments, but I'm not sure how to... how to... **Speaker 1:** How to review everything? **Speaker 2:** Yeah. In other classes I've had, there's usually just one book to review, not three different books. Plus all those other text excerpts and videos...

@vsigal 2 ай бұрын

@@BartSlodyczka wow wow, I will try. thank you

@yurijmikhassiak7342 2 ай бұрын

Thanks. How is that different from whisper voice to text? For voice to text usecase? The price difference is 10x. Is it faster? Is Quality better? The price looks stull very high. Like 20$/ hour of voice conversation. Almost, the cost of hiring humans for talking).

@BartSlodyczka 2 ай бұрын

Haven't done any work with whisper voice to text so i cant say, but in the demo I show this new audio model recognise abstract sounds and not just speech. So if whisper is cheaper for now, then you might stick with that for speech to text. Whereas for more dynamic sound recognition, you can use this audio model