Reasoning LLMs battle : Qwen QwQ vs OpenAI o1 vs o1 mini vs Deepseek r1.

  Рет қаралды 3,104

YJxAI

YJxAI

Күн бұрын

Пікірлер: 27
@jeffwads
@jeffwads Ай бұрын
Good informative video. A suggestion: a chart at the end with pass or fail for the models.
@YJxAI
@YJxAI Ай бұрын
good suggestion thanks for that.
@madeniran
@madeniran Ай бұрын
For the Chinese models try swapping the word Unicorn with Qilin or Kirin. They somewhat resemble a Unicorn - Horned Horse.
@YJxAI
@YJxAI Ай бұрын
hmm. SHould try this.
@TheDiamondHawkOfficial
@TheDiamondHawkOfficial Ай бұрын
Thanks for the info bro,
@YJxAI
@YJxAI Ай бұрын
welcome bro :)
@wwkk4964
@wwkk4964 Ай бұрын
Great content!
@YJxAI
@YJxAI Ай бұрын
Thank you very much.
@ZachKang-c2p
@ZachKang-c2p Ай бұрын
great video!
@emport2359
@emport2359 Ай бұрын
Sick video man
@YJxAI
@YJxAI Ай бұрын
thanks man : )
@iamboring2535
@iamboring2535 Ай бұрын
gemini 1121 got all the questions right expect for the earnings problem and the unicorn svg
@YJxAI
@YJxAI Ай бұрын
comming up with it's video :). Actually planned that but this reasoning mode dropped.
@tescOne
@tescOne Ай бұрын
LOL that's the funniest thing. The actual "strawberry" model can perfectly guess how many r's are in "strawberry", but if you make it just a tiny bit more complicated, it fails as bad as before. @Chollet would laugh at this so much xD
@YJxAI
@YJxAI Ай бұрын
😂
@sangeetanarendrasingh5416
@sangeetanarendrasingh5416 Ай бұрын
Did you write the prompts yourself or did you get them from someplace?
@YJxAI
@YJxAI Ай бұрын
i have picked them up from various exams. The earning problem i made it. It was when o1 was released and when i tested it personally it shattered my questions so came up with that. Thanks for noticing.
@underTheStorm
@underTheStorm Ай бұрын
So which is best?
@YJxAI
@YJxAI Ай бұрын
It's the o1 but we also see that you might also get away with o1-mini. 1.o1 (good overall) 2.o1-mini (Good when you have very specific issue ) 3.Deepseek r1( could be cheaper than the too but api release will tell) 4.QwenQWQ ( The cheapest , Deepseek r1's api will tell if it retains that. Brings reasoning abilities to actual usable prices.) I hope it was helpful. :)
@JEHOASJY
@JEHOASJY Ай бұрын
When openAI makes a breakthrough other companies soon followed.
@Ahmadtayyem
@Ahmadtayyem Ай бұрын
But openai is not actually open! All models are depends on the google research for the transformars even chatgpt
@haroldpierre1726
@haroldpierre1726 Ай бұрын
They have no moat!
@successahead5598
@successahead5598 Ай бұрын
windsulf taking over
@YJxAI
@YJxAI Ай бұрын
I build first android application with it yesterday. tears🥹
@harriehausenman8623
@harriehausenman8623 Ай бұрын
pretty much meaningless. via the webinterface, you never know what model version you get and esp. OpenAI is known for making A-B tests. So you have to use the API. And a temperature above 0 makes no sense for these kind of tests.
@YJxAI
@YJxAI Ай бұрын
I get you bro but the point is. API pricing is of the charts. (o1-preview) And people will be using in most of the cases the chatgpt version. Yes.There could be internal system prompt change . Hidden AB tests. (yeah that is a downside but happens rarely.) Known AB tests are there and visible so we know when they come. All in all i get your point. I have thought about this and other things like factuality of models ( you can watch my "Can you trust LLMs" video). i have some plans to take these into account but. If i am being honest i am little busy on something related to family but i will try to get it implemented ASAP.
@harriehausenman8623
@harriehausenman8623 Ай бұрын
@@YJxAI Makes sense! 😉 You could just use the example python implementations for your tests. Just an idea.
Open Reasoning vs OpenAI
26:59
Sam Witteveen
Рет қаралды 31 М.
Жездуха 42-серия
29:26
Million Show
Рет қаралды 2,6 МЛН
БАБУШКА ШАРИТ #shorts
0:16
Паша Осадчий
Рет қаралды 4,1 МЛН
Qwen QwQ-32B Tested LOCALLY: An Open Source Model that THINKS
14:26
Ominous Industries
Рет қаралды 3 М.
OpenAI's Noam Brown Unpacks the Full Release of o1 and the Path to AGI
48:18
Unsupervised Learning: Redpoint's AI Podcast
Рет қаралды 50 М.
Qwen Just Casually Started the Local AI Revolution
16:05
Cole Medin
Рет қаралды 122 М.
Is OpenAI's o1 model a breakthrough or a bust?
7:32
Steve (Builder.io)
Рет қаралды 14 М.
Using open source LLMs with Cloudflare IA + HonoJS
23:59
nicobytes
Рет қаралды 569
Is Gemini Flash 2.0 Worth the hype?
37:59
YJxAI
Рет қаралды 397
Жездуха 42-серия
29:26
Million Show
Рет қаралды 2,6 МЛН