Good informative video. A suggestion: a chart at the end with pass or fail for the models.
@YJxAIАй бұрын
good suggestion thanks for that.
@madeniranАй бұрын
For the Chinese models try swapping the word Unicorn with Qilin or Kirin. They somewhat resemble a Unicorn - Horned Horse.
@YJxAIАй бұрын
hmm. SHould try this.
@TheDiamondHawkOfficialАй бұрын
Thanks for the info bro,
@YJxAIАй бұрын
welcome bro :)
@wwkk4964Ай бұрын
Great content!
@YJxAIАй бұрын
Thank you very much.
@ZachKang-c2pАй бұрын
great video!
@emport2359Ай бұрын
Sick video man
@YJxAIАй бұрын
thanks man : )
@iamboring2535Ай бұрын
gemini 1121 got all the questions right expect for the earnings problem and the unicorn svg
@YJxAIАй бұрын
comming up with it's video :). Actually planned that but this reasoning mode dropped.
@tescOneАй бұрын
LOL that's the funniest thing. The actual "strawberry" model can perfectly guess how many r's are in "strawberry", but if you make it just a tiny bit more complicated, it fails as bad as before. @Chollet would laugh at this so much xD
@YJxAIАй бұрын
😂
@sangeetanarendrasingh5416Ай бұрын
Did you write the prompts yourself or did you get them from someplace?
@YJxAIАй бұрын
i have picked them up from various exams. The earning problem i made it. It was when o1 was released and when i tested it personally it shattered my questions so came up with that. Thanks for noticing.
@underTheStormАй бұрын
So which is best?
@YJxAIАй бұрын
It's the o1 but we also see that you might also get away with o1-mini. 1.o1 (good overall) 2.o1-mini (Good when you have very specific issue ) 3.Deepseek r1( could be cheaper than the too but api release will tell) 4.QwenQWQ ( The cheapest , Deepseek r1's api will tell if it retains that. Brings reasoning abilities to actual usable prices.) I hope it was helpful. :)
@JEHOASJYАй бұрын
When openAI makes a breakthrough other companies soon followed.
@AhmadtayyemАй бұрын
But openai is not actually open! All models are depends on the google research for the transformars even chatgpt
@haroldpierre1726Ай бұрын
They have no moat!
@successahead5598Ай бұрын
windsulf taking over
@YJxAIАй бұрын
I build first android application with it yesterday. tears🥹
@harriehausenman8623Ай бұрын
pretty much meaningless. via the webinterface, you never know what model version you get and esp. OpenAI is known for making A-B tests. So you have to use the API. And a temperature above 0 makes no sense for these kind of tests.
@YJxAIАй бұрын
I get you bro but the point is. API pricing is of the charts. (o1-preview) And people will be using in most of the cases the chatgpt version. Yes.There could be internal system prompt change . Hidden AB tests. (yeah that is a downside but happens rarely.) Known AB tests are there and visible so we know when they come. All in all i get your point. I have thought about this and other things like factuality of models ( you can watch my "Can you trust LLMs" video). i have some plans to take these into account but. If i am being honest i am little busy on something related to family but i will try to get it implemented ASAP.
@harriehausenman8623Ай бұрын
@@YJxAI Makes sense! 😉 You could just use the example python implementations for your tests. Just an idea.