X AI fooled me with grok 2 and sus-column-r...

  Рет қаралды 2,087

Chris Hay

Chris Hay

Күн бұрын

a new model called sus-column-r appeared alongside an anonymous model in the lmsys chatbot arena. in this video chris shows that using the knowledge cutoff of a model and it's knowledge of taylor swift and beyonce who probably created the model. he also gets it wrong as grok 2 from xAI wasn’t in his testing but his method and reasoning is correct
chris also explores the math and reasoning capabilities of this model and guesses on it's size and whether it's a project strawberry or q* model by pitting it against other models especially gpt-4o, gpt-4o-mini, gemini and claude

Пікірлер: 15
@chrishayuk
@chrishayuk 4 ай бұрын
turns out X AI created this model, which explains the issues i had with the math and reasoning parts.
@Redgta6
@Redgta6 4 ай бұрын
good job for fixing the title lol
@chrishayuk
@chrishayuk 4 ай бұрын
wasn't a massive change looool
@danielhenderson7050
@danielhenderson7050 4 ай бұрын
Tbh I didn't even consider Grok in the possible models!
@everyhandletaken
@everyhandletaken 4 ай бұрын
Nice one Chris, interesting!
@xxxNERIxxx1994
@xxxNERIxxx1994 4 ай бұрын
evenphototaken
@chrishayuk
@chrishayuk 4 ай бұрын
Glad you enjoyed it
@shuntera
@shuntera 4 ай бұрын
You need a wee edit at 0:53 :-)
@chrishayuk
@chrishayuk 4 ай бұрын
hahaha, i missed this, was a quick edit last night
@chrishayuk
@chrishayuk 4 ай бұрын
fixed, and updated, thanks for the heads up
@danielhenderson7050
@danielhenderson7050 4 ай бұрын
I love your videos. You should be way more popular than someone I won't mention 😅
@chrishayuk
@chrishayuk 4 ай бұрын
very kind, but honestly not about popularity, this channel is really just about getting thoughts out my head
what happens if you give claude's system prompt to llama3...
19:51
Гениальное изобретение из обычного стаканчика!
00:31
Лютая физика | Олимпиадная физика
Рет қаралды 4,8 МЛН
1% vs 100% #beatbox #tiktok
01:10
BeatboxJCOP
Рет қаралды 67 МЛН
Unreasonably Effective AI with Demis Hassabis
52:00
Google DeepMind
Рет қаралды 240 М.
Anthropic MCP + Ollama. No Claude Needed? Check it out!
18:06
What The Func? w/ Ed Zynda
Рет қаралды 10 М.
python apple mlx tutorial for beginners in ai
48:11
Chris Hay
Рет қаралды 3,3 М.
77% Of Employees Report AI Has Increased Workloads
33:02
ThePrimeTime
Рет қаралды 182 М.
Phi 4 on Ollama - is it REALLY better than Claude 3.5?
27:53
Chris Hay
Рет қаралды 17 М.
what’s underneath the mystery gemini 2 models?
18:01
Chris Hay
Рет қаралды 2,3 М.