Yup, o3-mini is WORTH your money. Meta Q4 Earnings Prompt. Deepseek and Llama4 Insights

Рет қаралды 17,123

Күн бұрын

Пікірлер: 60

@augmentos 7 күн бұрын

The best 🎉 ACTUAL problems and testing deeper than make me snake nonsense and trolly problem. Fair analysis. Love it IDD

@danielhenderson7050 6 күн бұрын

Just found your channel yesterday, love your work. Definitely gonna use this.

@MrJeeoSoft 7 күн бұрын

Awesome, I like the way you do the hard work!

@kel78v2 2 күн бұрын

This is good. I've also noticed o3's response is more detailed and complete when you structure it down in json structure layers

@antonio_cl 7 күн бұрын

Now i wait each monday with 🍿for your videos 😊

@austrianhiker5475 5 күн бұрын

Very interesting comparison, thanks!

@mulderbm 6 күн бұрын

Great work on the benchmarking. Thanks for sharing that. Better than my own custom pipeline qa flows.

@georgemontgomery1892 7 күн бұрын

I think you're one of my favorites to watch when it comes to A.I.

@BertHeymans 6 күн бұрын

You can hear the cooling fans kick in during the tests.😆

@aaagaming2023 6 күн бұрын

Llama 4 will be interesting. What I cant wait to see is what Anthropic pulls out of their bag to remain competitive.

@haroldpierre1726 7 күн бұрын

DeepSeek R1 has been absolutely amazing for my AI agents! Seriously, a game changer. I was always looking at o1, but the cost was just too much. Switching over from Claude 3.5 to R1 gave me a huge accuracy jump, like from 90% to around 95%! And get this, I found a trick to make it even better! Since my agent summarizes meeting transcripts, I re-run the initial summary again through R1, but I tell it I spotted some errors in the first pass. Boom! It kicks out an even more accurate summary that's way closer to the prompt. I haven't been this excited since GPT-4 was released. By the way, the above was written by Gemini 2.0 Flash 01-21. I asked it to make my comment more coherent and it wrote a humanized version of my statement. I'm impressed.

@isayldz435 7 күн бұрын

what agents? i see only mail checking, reading mail, calender agents... i didnt see real world effect ai agents yet. which agents u are using

@haroldpierre1726 7 күн бұрын

@@isayldz435 I made my own.

@wrong_ideas 7 күн бұрын

Its pathetic. Come on you know,. I know it. Its' useless. It's not a thing. It will never be a thing. Run open source LLMs and close the book on the most ironically named company in history 'Open' AI

@carlhealy 7 күн бұрын

Great video! Seconding other's requests for effectively prompting OA Deep Research.

@BloodlinesNewTimes 7 күн бұрын

Hey man, this was actually really useful... Too many people who are in the space have more guessed and way too much is about the possibility and not the current state of what we have to actually do to guage the most effective way as the space moves faster than anything else in IT... Congratulations on being the Value creator needed brother! Do you have any tips on who to follow for someone who's really into the whole space for all what it's worth?

@tk0150 7 күн бұрын

This prompt within the prompt work is compelling for sure. Hope you share with us as you develop and explore ideas of new user interfaces.

@nguyenquangngoc3993 5 күн бұрын

Awesome❤❤❤

@smicha15 7 күн бұрын

Dude, you are a leader in this particular content category. Please don’t overuse the word VIBE.

@Adrian_Galilea 7 күн бұрын

Thumbs up to the first sentence, thumbs down to the second one.

@findingwisdomdotme 6 күн бұрын

@@Adrian_Galilea net neutrality on effect, net loss on effort

@michealhall7776 7 күн бұрын

A lot of people offering you suggestions on how to make content. I don't see any of them with a successful channel. You do you and keep this stuff coming

@aibeginnertutorials 7 күн бұрын

Brilliant as always. Thank you.

@ATH42069 6 күн бұрын

👑thank you

@MrErick1160 7 күн бұрын

Can you make a video to explain how to use your benchmark eval tool and make our own tests?

@parnashwind 7 күн бұрын

Have to thank R1 for making o3-mini so cheap. Without R1, I think the price of o3-mini will be much much much higher. Thank you R1

@bgtyhnmju7 6 күн бұрын

Great review. I only use GPT for interests and hobby stuff, and as a conversation buddy. That said, if I worked in any kind of job where it could help my productivity, the Pro plan would be a good deal... like a buck fiddy and hour to have a helper. Or put my rates up that much and it's paid for. Sweet.

@petedoyle 7 күн бұрын

Is there a specific plugin you're using for token count? I'd love to use it. Thanks in advance!

@scotter 7 күн бұрын

Digging your videos! FYI: Your "chopping hands" gestures in the background are distracting.

@testy_cool 7 күн бұрын

I really want to like o3-mini. The better reasoning + the speed would've made it the best for me. But it frequently feels like it doesn't follow my instructions as well as DeepSeek.

@micbab-vg2mu 7 күн бұрын

In practice, the problem that Claude 3.5 was unable to solve for me is still not solved by o3-mini. It is a good model, but not revolutionary.

@tk0150 7 күн бұрын

For me, I found Claude shines in character development for a specific agent I am creating. the nuances and the variety. The dynamic nature is noticeably step above the other models I’ve compared with using the same prompts. One reason I love this channel is the comparison on many levels of each of the models and finding their strengths and weaknesses

@xNghtMRxEdgex 7 күн бұрын

I ran it on Gemini Exp Advanced and got the same results as o3 mini high basically. Analyzing text doesn't really require reasoning capabilities. I don't think a Pro plan is worth at all atm. We'll see once they release a big model. I still love your videos!

@TradingLaboratory 7 күн бұрын

Some advanced proompt engineering right there

@gurindersingh1713 7 күн бұрын

please do sonnet 3.5 vs o3 mini in coding benchmark

@martinlyu4663 5 күн бұрын

wow,o1 mini is amazing!

@ryandetzel848 7 күн бұрын

The content is great, the hand constantly waving in the background is distracting though.

@ShawnThuris 7 күн бұрын

I'm guessing it's intended to prove to viewers that this is not an AI generated voice?

@XRROW_ 7 күн бұрын

I honestly didn't notice till you said something!

@rude_people_die_young 6 күн бұрын

Is it possible for any KZbinr to say a number without having to hold up that many fingers? 😅

@samson3523 6 күн бұрын

It’s good leave it

@elhadjisy19 Күн бұрын

Me too didn't even notice that, I'm too focus on the content and listening.

@doctor_snyus 7 күн бұрын

hi! thanks for another awesome video! i`ve seen an aloe1 tool in your vids. can you please share where can i get it, cause i was struggling so much to obtain it. thank you!

@Slitherpy 6 күн бұрын

Questions look like leading questions, you need to already know what’s being discussed in the Meta Report. What if you want to get get unforeseen insights?

@marma6937 7 күн бұрын

and for Deep Research ?

@AbuBakr1 7 күн бұрын

This is possible thanks to Deepseek, else openai will charge $1000 for 03 mini 😅

@isaacking4555 6 күн бұрын

Yes but o1 mini basically performs at the level of o3 mini while still being cheaper. Doesn’t seem that much has improved in that aspect based on your benchmark

@davidcache6802 6 күн бұрын

o3 mini gets things wrong a lot more often than deepseek. In fact, deepseek in my tests outperforms everything openai has, and by a lot. Its the difference between getting information from a junior dev, vs a phd professor..

@TreeLuvBurdpu 7 күн бұрын

I hate the limerick tests of AI. If anything, they prove that limericks were never a sign of genius.

@sfl1986 7 күн бұрын

how do you control the low/medium/high parameter via api?

@gorandigitalnomad 7 күн бұрын

'reasoning_effort' => "high"

@sfl1986 7 күн бұрын

@@gorandigitalnomad whats the default when calling o3-mini as a model?

@gorandigitalnomad 7 күн бұрын

@@sfl1986 "The default reasoning effort for o3‑mini is set to medium when calling it via the API. In other words, if you don’t explicitly specify the reasoning effort parameter (i.e. low or high), the model will use the medium setting by default. " answer from itself 😎