The best 🎉 ACTUAL problems and testing deeper than make me snake nonsense and trolly problem. Fair analysis. Love it IDD
@danielhenderson70506 күн бұрын
Just found your channel yesterday, love your work. Definitely gonna use this.
@MrJeeoSoft7 күн бұрын
Awesome, I like the way you do the hard work!
@kel78v22 күн бұрын
This is good. I've also noticed o3's response is more detailed and complete when you structure it down in json structure layers
@antonio_cl7 күн бұрын
Now i wait each monday with 🍿for your videos 😊
@austrianhiker54755 күн бұрын
Very interesting comparison, thanks!
@mulderbm6 күн бұрын
Great work on the benchmarking. Thanks for sharing that. Better than my own custom pipeline qa flows.
@georgemontgomery18927 күн бұрын
I think you're one of my favorites to watch when it comes to A.I.
@BertHeymans6 күн бұрын
You can hear the cooling fans kick in during the tests.😆
@aaagaming20236 күн бұрын
Llama 4 will be interesting. What I cant wait to see is what Anthropic pulls out of their bag to remain competitive.
@haroldpierre17267 күн бұрын
DeepSeek R1 has been absolutely amazing for my AI agents! Seriously, a game changer. I was always looking at o1, but the cost was just too much. Switching over from Claude 3.5 to R1 gave me a huge accuracy jump, like from 90% to around 95%! And get this, I found a trick to make it even better! Since my agent summarizes meeting transcripts, I re-run the initial summary again through R1, but I tell it I spotted some errors in the first pass. Boom! It kicks out an even more accurate summary that's way closer to the prompt. I haven't been this excited since GPT-4 was released. By the way, the above was written by Gemini 2.0 Flash 01-21. I asked it to make my comment more coherent and it wrote a humanized version of my statement. I'm impressed.
@isayldz4357 күн бұрын
what agents? i see only mail checking, reading mail, calender agents... i didnt see real world effect ai agents yet. which agents u are using
@haroldpierre17267 күн бұрын
@@isayldz435 I made my own.
@wrong_ideas7 күн бұрын
Its pathetic. Come on you know,. I know it. Its' useless. It's not a thing. It will never be a thing. Run open source LLMs and close the book on the most ironically named company in history 'Open' AI
@carlhealy7 күн бұрын
Great video! Seconding other's requests for effectively prompting OA Deep Research.
@BloodlinesNewTimes7 күн бұрын
Hey man, this was actually really useful... Too many people who are in the space have more guessed and way too much is about the possibility and not the current state of what we have to actually do to guage the most effective way as the space moves faster than anything else in IT... Congratulations on being the Value creator needed brother! Do you have any tips on who to follow for someone who's really into the whole space for all what it's worth?
@tk01507 күн бұрын
This prompt within the prompt work is compelling for sure. Hope you share with us as you develop and explore ideas of new user interfaces.
@nguyenquangngoc39935 күн бұрын
Awesome❤❤❤
@smicha157 күн бұрын
Dude, you are a leader in this particular content category. Please don’t overuse the word VIBE.
@Adrian_Galilea7 күн бұрын
Thumbs up to the first sentence, thumbs down to the second one.
@findingwisdomdotme6 күн бұрын
@@Adrian_Galilea net neutrality on effect, net loss on effort
@michealhall77767 күн бұрын
A lot of people offering you suggestions on how to make content. I don't see any of them with a successful channel. You do you and keep this stuff coming
@aibeginnertutorials7 күн бұрын
Brilliant as always. Thank you.
@ATH420696 күн бұрын
👑thank you
@MrErick11607 күн бұрын
Can you make a video to explain how to use your benchmark eval tool and make our own tests?
@parnashwind7 күн бұрын
Have to thank R1 for making o3-mini so cheap. Without R1, I think the price of o3-mini will be much much much higher. Thank you R1
@bgtyhnmju76 күн бұрын
Great review. I only use GPT for interests and hobby stuff, and as a conversation buddy. That said, if I worked in any kind of job where it could help my productivity, the Pro plan would be a good deal... like a buck fiddy and hour to have a helper. Or put my rates up that much and it's paid for. Sweet.
@petedoyle7 күн бұрын
Is there a specific plugin you're using for token count? I'd love to use it. Thanks in advance!
@scotter7 күн бұрын
Digging your videos! FYI: Your "chopping hands" gestures in the background are distracting.
@testy_cool7 күн бұрын
I really want to like o3-mini. The better reasoning + the speed would've made it the best for me. But it frequently feels like it doesn't follow my instructions as well as DeepSeek.
@micbab-vg2mu7 күн бұрын
In practice, the problem that Claude 3.5 was unable to solve for me is still not solved by o3-mini. It is a good model, but not revolutionary.
@tk01507 күн бұрын
For me, I found Claude shines in character development for a specific agent I am creating. the nuances and the variety. The dynamic nature is noticeably step above the other models I’ve compared with using the same prompts. One reason I love this channel is the comparison on many levels of each of the models and finding their strengths and weaknesses
@xNghtMRxEdgex7 күн бұрын
I ran it on Gemini Exp Advanced and got the same results as o3 mini high basically. Analyzing text doesn't really require reasoning capabilities. I don't think a Pro plan is worth at all atm. We'll see once they release a big model. I still love your videos!
@TradingLaboratory7 күн бұрын
Some advanced proompt engineering right there
@gurindersingh17137 күн бұрын
please do sonnet 3.5 vs o3 mini in coding benchmark
@martinlyu46635 күн бұрын
wow,o1 mini is amazing!
@ryandetzel8487 күн бұрын
The content is great, the hand constantly waving in the background is distracting though.
@ShawnThuris7 күн бұрын
I'm guessing it's intended to prove to viewers that this is not an AI generated voice?
@XRROW_7 күн бұрын
I honestly didn't notice till you said something!
@rude_people_die_young6 күн бұрын
Is it possible for any KZbinr to say a number without having to hold up that many fingers? 😅
@samson35236 күн бұрын
It’s good leave it
@elhadjisy19Күн бұрын
Me too didn't even notice that, I'm too focus on the content and listening.
@doctor_snyus7 күн бұрын
hi! thanks for another awesome video! i`ve seen an aloe1 tool in your vids. can you please share where can i get it, cause i was struggling so much to obtain it. thank you!
@Slitherpy6 күн бұрын
Questions look like leading questions, you need to already know what’s being discussed in the Meta Report. What if you want to get get unforeseen insights?
@marma69377 күн бұрын
and for Deep Research ?
@AbuBakr17 күн бұрын
This is possible thanks to Deepseek, else openai will charge $1000 for 03 mini 😅
@isaacking45556 күн бұрын
Yes but o1 mini basically performs at the level of o3 mini while still being cheaper. Doesn’t seem that much has improved in that aspect based on your benchmark
@davidcache68026 күн бұрын
o3 mini gets things wrong a lot more often than deepseek. In fact, deepseek in my tests outperforms everything openai has, and by a lot. Its the difference between getting information from a junior dev, vs a phd professor..
@TreeLuvBurdpu7 күн бұрын
I hate the limerick tests of AI. If anything, they prove that limericks were never a sign of genius.
@sfl19867 күн бұрын
how do you control the low/medium/high parameter via api?
@gorandigitalnomad7 күн бұрын
'reasoning_effort' => "high"
@sfl19867 күн бұрын
@@gorandigitalnomad whats the default when calling o3-mini as a model?
@gorandigitalnomad7 күн бұрын
@@sfl1986 "The default reasoning effort for o3‑mini is set to medium when calling it via the API. In other words, if you don’t explicitly specify the reasoning effort parameter (i.e. low or high), the model will use the medium setting by default. " answer from itself 😎
@ThaiTouan-i5f7 күн бұрын
Did you have to read all the Meta report by yourself to get your benchmark correct answers?
@alirezashekari76747 күн бұрын
Would you pls create a general promt for oa deep search?
@jasontr20117 күн бұрын
Interestingly, this video was created by O3 mini. Nice job AI 👍
@aimattant5 күн бұрын
it really isn't that good compared with Claude when doing coding