Reasoning LLMs battle : Qwen QwQ vs OpenAI o1 vs o1 mini vs Deepseek r1.

Рет қаралды 3,104

YJxAI

Күн бұрын

Пікірлер: 27

@jeffwads Ай бұрын

Good informative video. A suggestion: a chart at the end with pass or fail for the models.

@YJxAI Ай бұрын

good suggestion thanks for that.

@madeniran Ай бұрын

For the Chinese models try swapping the word Unicorn with Qilin or Kirin. They somewhat resemble a Unicorn - Horned Horse.

@YJxAI Ай бұрын

hmm. SHould try this.

@TheDiamondHawkOfficial Ай бұрын

Thanks for the info bro,

@YJxAI Ай бұрын

welcome bro :)

@wwkk4964 Ай бұрын

Great content!

@YJxAI Ай бұрын

Thank you very much.

@ZachKang-c2p Ай бұрын

great video!

@emport2359 Ай бұрын

Sick video man

@YJxAI Ай бұрын

thanks man : )

@iamboring2535 Ай бұрын

gemini 1121 got all the questions right expect for the earnings problem and the unicorn svg

@YJxAI Ай бұрын

comming up with it's video :). Actually planned that but this reasoning mode dropped.

@tescOne Ай бұрын

LOL that's the funniest thing. The actual "strawberry" model can perfectly guess how many r's are in "strawberry", but if you make it just a tiny bit more complicated, it fails as bad as before. @Chollet would laugh at this so much xD

@YJxAI Ай бұрын

😂

@sangeetanarendrasingh5416 Ай бұрын

Did you write the prompts yourself or did you get them from someplace?

@YJxAI Ай бұрын

i have picked them up from various exams. The earning problem i made it. It was when o1 was released and when i tested it personally it shattered my questions so came up with that. Thanks for noticing.

@underTheStorm Ай бұрын

So which is best?

@YJxAI Ай бұрын

It's the o1 but we also see that you might also get away with o1-mini. 1.o1 (good overall) 2.o1-mini (Good when you have very specific issue ) 3.Deepseek r1( could be cheaper than the too but api release will tell) 4.QwenQWQ ( The cheapest , Deepseek r1's api will tell if it retains that. Brings reasoning abilities to actual usable prices.) I hope it was helpful. :)

@JEHOASJY Ай бұрын

When openAI makes a breakthrough other companies soon followed.

@Ahmadtayyem Ай бұрын

But openai is not actually open! All models are depends on the google research for the transformars even chatgpt

@haroldpierre1726 Ай бұрын

They have no moat!

@successahead5598 Ай бұрын

windsulf taking over

@YJxAI Ай бұрын

I build first android application with it yesterday. tears🥹

@harriehausenman8623 Ай бұрын

pretty much meaningless. via the webinterface, you never know what model version you get and esp. OpenAI is known for making A-B tests. So you have to use the API. And a temperature above 0 makes no sense for these kind of tests.

@YJxAI Ай бұрын

I get you bro but the point is. API pricing is of the charts. (o1-preview) And people will be using in most of the cases the chatgpt version. Yes.There could be internal system prompt change . Hidden AB tests. (yeah that is a downside but happens rarely.) Known AB tests are there and visible so we know when they come. All in all i get your point. I have thought about this and other things like factuality of models ( you can watch my "Can you trust LLMs" video). i have some plans to take these into account but. If i am being honest i am little busy on something related to family but i will try to get it implemented ASAP.

@harriehausenman8623 Ай бұрын

@@YJxAI Makes sense! 😉 You could just use the example python implementations for your tests. Just an idea.