🗂 GET ALL THE CODE FILES: bartslodyczka.gumroad.com/l/jeznwq 📋 Take This Quick Survey: forms.gle/otAr1xUamgyYZE5y7 📺Realtime API Tutorial Series: kzbin.info/aero/PLi7jtY2ZZqRYE8Lvw4MuLHTZPYTA4jZHQ&si=7DAE9z7YtQlMrzrd
@AngeloXification2 ай бұрын
Instant subscription, then I saw you build and provide resources. Excellent content.
@BartSlodyczka2 ай бұрын
Thanks legend 🤝
@alexanderkingstam51642 ай бұрын
You are very pedagogic and explaining very well. Thanks for sharing!
@BartSlodyczka2 ай бұрын
thank you very much, appreciate this comment 🙏
@GiovanneAfonso2 ай бұрын
very well structured video and test, great work! hope you do more videos
@BartSlodyczka2 ай бұрын
thanks legend! Will do 💪
@derherrdirectorАй бұрын
You are an absolute legend! You should have millions of subscribers
@BartSlodyczkaАй бұрын
haha! thank you my man!
@pixelperfectpravin2 ай бұрын
Most onpoint video 😍 i appreciate you
@BartSlodyczka2 ай бұрын
thanks man! I appreciate you too 💪
@RhieverАй бұрын
If you’re just performing audio to text, is it necessary to specify both text and audio modalities? Will the model just ignore the audio file if you don’t specify both modalities?
@BartSlodyczkaАй бұрын
I haven't tested if the model will ignore it and yeah also not sure if you need to specify both. Made this code a couple weeks back and can't recall from the top of my head 🙏
@vsigal2 ай бұрын
is it doing diarizarion? separation voices - voice1 - voice2 etc?
@BartSlodyczka2 ай бұрын
I just tested using short audio with 2 speakers talking to each other. I asked for a transcript of the convo broken down by speaker and it gave me the below: **Speaker 1:** So, Erin, in your email you said you wanted to talk about the exam. **Speaker 2:** Yeah, um, I've just never taken a class with so many different readings. I've managed to keep up with all the assignments, but I'm not sure how to... how to... **Speaker 1:** How to review everything? **Speaker 2:** Yeah. In other classes I've had, there's usually just one book to review, not three different books. Plus all those other text excerpts and videos...
@vsigal2 ай бұрын
@@BartSlodyczka wow wow, I will try. thank you
@yurijmikhassiak73422 ай бұрын
Thanks. How is that different from whisper voice to text? For voice to text usecase? The price difference is 10x. Is it faster? Is Quality better? The price looks stull very high. Like 20$/ hour of voice conversation. Almost, the cost of hiring humans for talking).
@BartSlodyczka2 ай бұрын
Haven't done any work with whisper voice to text so i cant say, but in the demo I show this new audio model recognise abstract sounds and not just speech. So if whisper is cheaper for now, then you might stick with that for speech to text. Whereas for more dynamic sound recognition, you can use this audio model