Firstly I don't have much in-depth in deep learning things... can you add xai grok in your comparisons, I'm also a LLM enthusiast , I'm impressed with it's reasoning abilities and more consistent on generating results comparing the gemini/Claude/gpt, and its code generation/reasoning is way more powerful, tho now its free to use. what I'm going to tell you, that might create controversy on me😂, but from my pov... in my list, 1) grok/claude 2) copilot/gpt 3) gemini i'll make grok as my goto llm tool, note: i'm not elon fanboy🙂 you're soo underrated , you need more recognition.
@YJxAI23 күн бұрын
thanks means a lot . There is nothing wrong in having a lit of yourself. That is why i say don't take my results as ultimate fact. If Lmsys guys can keep gpt4 above o1 than i think our lists are way better than that.
@shreyam100824 күн бұрын
Great video, but comparing with claude 3.5 sonnet would be a better comparison??? since its their latest model. with base model fee. WOuld like to see similar test, and more general dauly usage problem test from different fields, with sonnet.
@saikatkarmakar663320 күн бұрын
Change the temperature from 1 to 2 in Gemini flash.. and then see the accuracy
@davidcampos895219 күн бұрын
Can you *please* put that *LLM Test* for us to see and use it also, so we can also test models?
@YJxAI19 күн бұрын
yes bro working on it. I'll have to make some changes to make it more dynamic but surely do.
@YJxAI19 күн бұрын
yes bro working on it. I'll have to make some changes to make it more dynamic but surely do.
@davidcampos895218 күн бұрын
@@YJxAI thank you so much!
@iamboring253521 күн бұрын
How can you get consistency when you are using a temperature of 1 and top p value of 0.95. if you want consistency you must set it to a low value