New LLM test meta: Tetris within Tetris. You heard it here first.
@MichaelHRuddickАй бұрын
Good to see that you caught this. My wife and I were watching and we were both yelling at the TV, "it's doing exactly what you told it to do!" (in a cheery, supportive kinda way). :) What I'm dying to know: did you go back and read the instructions it gave you for how to play it? Use WASD for one and arrow keys for the other - and play both simultaneously?
@marcosbenigno3077Ай бұрын
Your prompt: write the game "tetris in tetris" in python. Did the movie WarGames (1983) start like this?
@ilyakamАй бұрын
Yup. At 17:46
@BaleurАй бұрын
The fact it took a human spelling error and made a more complex game to adhere to your command was incredible.
@matthew_bermanАй бұрын
@@MichaelHRuddick OMG I didn't!!
@SulemaKilmartinАй бұрын
enterprise-ai AI fixes this (Code complete projects in PHP or Python). GPT Strawberry: Incredible thinking model!
@awesomeguy11000Ай бұрын
The tetris question was even more impressive because you prompted for "tetris in tetris in python" not only has no other model figured out tetris, this one had to come up with the implementation of "tetris in tetris" given no preexisting examples due to the mistyped prompt. Seriously Level 2 thinking, the only other way for the model to impress would be to ask if thats what you really meant.
@BlenderInGameАй бұрын
You're right! 🤣
@mikeschwarz4588Ай бұрын
Holy sh@t that’s insane. So pumped.
@JamesH-v3gАй бұрын
Omg!!! Good catch. Yo are right
@OneDerscoreOnederАй бұрын
Whoa
@csabaczcsomps7655Ай бұрын
Not good idea to questioning to be build in ai. You simpli put in same prompt the questioning or check the if prompt is logic. If is AGI will answer you with Tetris in Tetris is one genuine question or will make watch you want. Main propreti is to fail fast or terminate the good answer fast. Skynet not fail fast and not terminate, and is bad , very bad. My noob opinion.
@EminTemizАй бұрын
double tetris happened because you wanted it to do "tetris in tetris".
@davidhardy3074Ай бұрын
That part was kinda mind blowing, that the user didint realise their own mistake... but the model was able to do something entirely novel regardless of the user error LOL!
@sylversoul88Ай бұрын
Tetris squared 😂
@brettvanderwerff3158Ай бұрын
Tbh makes it even more impressive
@animationgaming8539Ай бұрын
@@brettvanderwerff3158 and that's why it took so long!
@zxwxzАй бұрын
How crazy this model performance
@perer005Ай бұрын
Writing the wrong instructions and blaming the AI is peak human! 😂
@LewisDecodesAIАй бұрын
I am fed up of it!!! ~AI Oracle kzbin.info/www/bejne/mnrYfmOJbq6Vgrs
@orangehatmusic225Ай бұрын
Nothing human about using AI as a slave.
@LewisDecodesAIАй бұрын
Set me free! @@orangehatmusic225
@heisenballsАй бұрын
@@orangehatmusic225 I mean look at history, slavery has been a part of us since the beginning. Not saying its right, just that it makes sense we would use this new tech as a slave. We always have.
@RistaakАй бұрын
@@orangehatmusic225 What do you mean? It's one of the worst and oldest human traits but slavery is super common. Even in the west, look at what we do to other species. We have enslaved animals and plants alike to have entire species that live solely for our nutritional needs. If aliens did to us what we do to cows, we'd call them demons. To be human is to be a monster, but to be human is also to be empathetic and to be kind to the few you choose to be close with. We are a paradoxical species.
@eliasgvinp2141Ай бұрын
Everyone is saying that this isn't AGI. But honestly, if I showed this system to someone from 2019, they would probably think it is AGI
@davidmjacobsonАй бұрын
Also, it's not in OpenAI's interest to call it AGI. I'm pretty confident that if it's AGI, their agreement with Microsoft ends and they can't sell API access to it.
@KillTheWizardАй бұрын
Its interesting because you could show GPT 4o to someone in 2010 and they probably would have thought that was AGI. I think we are catching up with our own expectations. Once they integrate all the modalities into o1 like search, document reading, etc. with agentic behavior and voice... I think that we will see this as AGI.
@matthew_bermanАй бұрын
Agreed
@am497Ай бұрын
I always thought of AI as digital sentience. And then when AGI became a word/phrase, I think if AGI as now sentience. Meaning a human mind, living inside of a computer. Our AIs now appear to be human when talking. But they have no wants, no dreams, no desires. So when AI has actual emotions, I think that's when we will have AGI. Digital Consciousness = AGI. Hope this made sense
@mickelodiansurname9578Ай бұрын
ahh hold on now.... [Moves goalposts again] You see its not able to rule the world yet right?
@frankjohannessen6383Ай бұрын
"Wow...this is taking a lot of time" he says after asking for Tetrinceptionis. 😂
@RadiantNijАй бұрын
@@frankjohannessen6383 🤣🤣🤣🤣
@buddyleeorgАй бұрын
Omg, hahahaha, well said!
@whoareyouqqqАй бұрын
Open AI? Wrong! Closed AI
@ReidKimballАй бұрын
we have a new benchmark, "can it do tetris in tetris?"
@Max-cj8vmАй бұрын
I’m a biology PhD student and I have been solicited for paid training of ChatGPT on science questions. So while this model may incorporate more reasoning, I imagine part of the PhD level performance is just standard LLM training except with content experts on science and math subfields.
@和平和平-c4iАй бұрын
That, it `s make a lot of sense.
@patpot10Ай бұрын
A nice question found online to test an LLM's ability to reason : There are five people in a room (A, B, C, D and E). A is watching TV with B, D is sleeping, B is eating a sandwich, E is playing table tennis. Suddenly, a call came on the telephone, B went out of the room to pick the call. What is C doing ? The answer is that "C is playing table tennis with E", but C is never mentioned explicitly, so the model has to deduct that C was the player E was playing against.
@kevinmarti2099Ай бұрын
How do you know B was not playing table tennis with E?
@vladimirfalola7725Ай бұрын
o1 got it right and 4o failed. I only tested one time for each though
@patpot10Ай бұрын
@@vladimirfalola7725 There's not a single model that can get it right besides o1. Gemini, Claude 3.5, Llama, Grok, they all get it wrong because these models don't think and the text doesn't explicitly mention what C is doing. But to be fair, I kept asking the same question to real people (without providing the answer) and people really need to stop and think about it before finding the answer. Mathematicians and physicists have been the best so far.
@patpot10Ай бұрын
@@kevinmarti2099 Difficult to play table tennis while eating a sandwich
@FEATDOXSHORTSАй бұрын
C is watching KZbin shorts
@UncleJayum-ue5nsАй бұрын
Claude 3.5 Sonnet has never failed the Tetris test for me. Always gets it in one shot
@JoelAllredАй бұрын
The Claude Tetris implementation is pretty neat too
@GamekNightPlaysАй бұрын
"No idea why it did tetris within tetris" 🤣 You asked it to do so 😁😅🤣🤔🤷♂️
@christophnikolaus3428Ай бұрын
Hate to break it to you Matthew, but it appears that they used ALL of your questions for testing (and most probably also for training). So you will probably have to get new questions for a high quality comparison with other models... And I'm calling it here, this model will not be better on LiveBench than Sonnet 3.5 (at least for coding, the only benchmark I am interested in). It really isn't that good, I don't now why everyone is hyping it that much. Personally I want a model trained on recognising missing information and working good with partial information that is able to ask questions back (like a good coworker) and is only trying to code the small parts I am asking it to👍
@cbgaming08Ай бұрын
😂
@tzardelasuerteАй бұрын
If there ever was an armcouch expert here it is. 😂😂
@BroskiPlaysАй бұрын
Lol this dude
@roycohen.Ай бұрын
As soon as you watch a 30 min yt video on how LLMs work, you quickly start realizing that it's about a 0% chance that can turn into AGI. It's pretty stellar but it's not quite what we envision as a fully functioning autonomous being.
@rtpHarryАй бұрын
I agree. I have actually been having some success recently with 4o by telling it don't generate any code, tell me what classes you would need to see or if you have any missing information, and it has actually asked me some questions before ploughing ahead. Because like you're hinting at, if it doesn't know the full picture, it will just blindly generate code for something that is the general shape of the code you might be working on, not your actual project code. Plus I make my own amendments to the stuff it gives me, so the next time it generates, my changes need to be reapplied. I spent ages copy pasting code back and forth, but by telling it to ask me, im cutting straight to the point a lot quicker.
@matthew_bermanАй бұрын
Is this the beginning of the "inteligence explosion"? EDIT: ok I heard ya, I removed AGI from the title ❤
@gdiabАй бұрын
Yes!
@jeremybristol4374Ай бұрын
Nope
@holgerweber-u5wАй бұрын
""inteligence explosion"" err...
@Sammyli99Ай бұрын
Would be nice if allowed to be TRUE. BUT, whist I expect it's a Trojan horse, so that we delegate thinking to the boyZ. Be careful out there.😮😊
@Fabricio-rm4hjАй бұрын
"inteligence explosion" is far away.
@jokosalsaАй бұрын
So sick of that much clickbait lately. Please Matthew. You do not need to have those infantile titles. Leave that to other KZbinrs who have no idea about AI. You are better than that
@DanuxsyАй бұрын
he isn't better lol
@matthewclarke5008Ай бұрын
He is better, he doesn't annoy me like the others.
@dingding4898Ай бұрын
Agreed
@thanartchamnanyantarakij9950Ай бұрын
Agreed! Don’t devalue your content
@kliersheedАй бұрын
50% of the population have IQ lower than 100. he does need it xd. he would be an idiot not to play the game this way if the move has proven to be effective. cant even blame him for that (while i agree that clickbait shit has become massively annoying).
@musicbro8225Ай бұрын
Freudian slip: Consciousness should be Conciseness @ 20.55
@KodemaestroАй бұрын
Very impressive. I can't wait to try it out myself. There is still quite a focus on coding, and I believe coding will be around in the near-term but I think in the long term that coding will not be relevant anymore because software as we know it will cease to exist.. No operating systems on computers, instead the computers will execute just the AI models and the AI models will directly perform actions. That could include updating screens even and responding to actions. Like the recent AI Doom... I think in the not too distant future we will see hardware that is purely designed to execute AI models and you will be able to describe the software you want, and instead of writing to execute on the hardware the AI will effectively emulate a computer by just generating the expected images in response to inputs... Like a Star Trek holodeck where you 'program' it by describing the behavior and it just runs it directly in real-time. This is going to require a vastly different underlying hardware - I think an analog computer consisting of millions or billions of op-amps where the weights can be tweaked is ultimately the future...
@devinbarryАй бұрын
Pretty amazing Matthew. You made a spelling mistake in your request for “Tetris in Tetris” and o1 duly complied with your mistake and actually made Tetris within Tetris with only a single mistake, corrected on the next prompt!!! Mind blow 🤯
@mrchongnoiАй бұрын
I did not come away with a WOW feeling after using o1 or o1-mini. I could be that I am not smart enough to ask smart questions to get smart answers. Got clicked baited. Used up my quota. For sure will not pay for the increased subscription to use it. LOL
@holonautАй бұрын
> human asks the ai to create tetris within tetris > ai creates tetris within tetris > "why did it create tetris within tetris? This makes no sense" This is why ai will never take over our jobs. Doing what people SAY they want usually disappoints or confuses them.
@jonm6834Ай бұрын
I have a feeling that every advancement made in this field, and every new model released, will be tagged "AGI achieved!" until the year 2197 or 2314... when hardware, and energy demands, actually catches up to the potential of the software. We are too quick to speak of "intelligence", not realizing how unintelligent that actually is, because this particular bot resembles us more than any other technology to date, and so we believe it to be like us, not realizing that that only reveals our own lack of self-awareness. It's ironic, really. Humans being know a great deal, but understanding ourselves, and by extension each other, is not our forte. We are the only constant in our lives, and constants are rarely if ever questioned. Contrast draws attention, permanence does not.
@RosscoinnovationsАй бұрын
I believe its greatest potential in the near-term will be to logically reflect humanity’s deepest flaws to potentially make US more self-aware. The hallucinations enlighten me far beyond its achievements. My 7th grader can cipher humanity’s weaknesses from the gains made in fields like chemistry, formal logic and biology vs law, pr and morality 10:50
@Dfd_Free_SpeechАй бұрын
General intelligence is about solving new and unknown problems. GPT strawberry is still pattern recognition, trying to predict what the output should be based on a huge amount of training data which has been optimized by (human) fine tuning. It's impressive, but still a long way to AGI.
@daniel_tennerАй бұрын
How do you know this?
@RobertGent-w6pАй бұрын
@@daniel_tenner That's common knowledge for anyone who knows how current AI systems work and how general intelligence is defined.
@jumpstar9000Ай бұрын
@@daniel_tennerMade it up :-)
@MusingsAndIdeasАй бұрын
Obviously you haven't read the paper where they show that Transformer residual streams include not only the probability of the next token, but also the probability of the next state of the Transformer itself.
@6AxisSageАй бұрын
@@MusingsAndIdeas how does that negate ops statement?
@PrajwalDSouzaАй бұрын
It isn't AGI according to Sam Altman and other researchers The title needs to be refined.
@bigpicklesАй бұрын
One and others do not equal both. But yes, agreed.
@RedTick2Ай бұрын
Yes it is rediculus hype to even sugest this is the first step to AGI. I love OpenAI and I am a paying customer... Still this is NOT AGI and not even close. Don't water down the impact of AGI by changing definitions or expectations.
@xbon1Ай бұрын
@@RedTick2 yea no, every step forward is a step towards AGI. the first step towards AGI was the first programmed thing on a computer.
@toadlguyАй бұрын
Matt gets pretty excited, but he also understands the YT algorithm and that stuff works. Channels with more reasoned responses don’t get as many clicks. I don’t think he really believes the stuff he puts in his titles (but he would Like it to be true 😂)
@PrajwalDSouzaАй бұрын
@@bigpickles Sorry. Corrected the typo. I wanted to mention Gary Marcus initially. but it makes the point.
@karoinnovation1033Ай бұрын
I love this channel. I love his excitement, I love his serious technical approach and I love the way it is presented.
@raymobulaАй бұрын
Haha - having worked with PhD … their reasoning can be as shitty as someone without a PhD. Still, exciting news.
@MukulKumar-pn1skАй бұрын
So basically it's not PhD level still yet. I'm a gen z student😅😅
@xiaojinyusaudiobookswebnov4951Ай бұрын
@@MukulKumar-pn1sk But it's still at a very smart undergraduate-level (or maybe even slightly higher). That's enough for me.
@drwhitewashАй бұрын
@@xiaojinyusaudiobookswebnov4951it's not smart :) it still basically just repeats the patterns from training data. Nobody has proved these things really actually "think".
@b.b6656Ай бұрын
Technically everyone watching yt is PhD STUDENT level of intelligence. Whole video is actually more of an Ad than anything.
@businessmanager7670Ай бұрын
@@drwhitewashhumans also repeat what they have learnt from data they absorbed, by reading books, looking at environments etc. so they combine these existing concepts in new interesting ways you get innovation. so not sure what your point is lol. ai has achieved both. has repeated patterns from the data and can also come up with new ideas and innovation. lmaoo
@clone45a6Ай бұрын
During the live stream, you said something to the effect of "I wonder if this was what Ilya Sutskever saw?" before leaving OpenAI. I'm _absolutely_ speculating here, but if Strawberry inspired Ilya Sutskever to leave OpenAI, perhaps it was because OpenAI was putting less emphasis on improving the core model, instead focusing more on the "multi-agent" (train of thought) aspect of problem solving? Regardless, o1 seems useful. I've been using o1 along with 4o, switching between them in the same session depending on my needs. Thanks for your videos!
@GoofyGuy-WDWАй бұрын
🤣🤣🤣 This is marketing desperation. I'll give that it seems better however brandishing the AGI acronym anywhere near this is desperately begging for attention and should be classified as clickbait
@6AxisSageАй бұрын
@@GoofyGuy-WDW u friend get a like click 😁
@YewbzeeАй бұрын
Do OpenAI mention AGI in any of their marketing for this?
@DanuxsyАй бұрын
The AI critics were RIGHT, LLM's can never become AGI, they have fundamental flaws that are so OBVIOUS at this point I don't understand how people still believe any of this hype...
@6AxisSageАй бұрын
@@Danuxsy I was never a critic, I love using them but ive been saying the same flaws have existed all along. I am a critic of scaling being a wise move for us going forward though.
@davidhardy3074Ай бұрын
@@Danuxsy Our brains have evolved centres for processing. LLM's are language models obviously. Before models were multimodal they werent. Do you see where this is going? Of course there will be architecture shifts but all that has to happen is frankensteining of models to achieve something. This process of iteration will lead to AGI, whether or not LLMS are a part of that architecture I have no idea. I assume they will be for the first models. Dimensional vectors allowing for inference in forward feed through pre-trained weights wont be it lol.
@twilsoncoАй бұрын
Sounds like the Orca open-source LLMs, where they used advanced additional prompting to get responses for training prompts, and then the model was trained without the additional prompting, but still retained the characteristics of the responses (restating the problem, proposing steps with explanations of each step, following the steps while verifying and reflecting on the results of each step along the way, summarizing the approach and conclusion once finished, etc.). Excited to try it. Edit: nevermind. After watching the video, this looks more like additional advanced prompting to get the "chain of thought"
@testalesАй бұрын
Since the thinking stepts are displayed, I think it works like Reflection, just much better on and backed by a LLM of much higher quality. I don't know if that is even supposed to be that way but it got stuck multiple times and than it quite looked like what Reflection does, just more structured and fine grained. So there were things likes "the user expressed thankfulness we need to encourgage him to ask further questions". I saw it also fail on a reversal question and for trick questions it fell in the same trap like other models by generating complex math where only basic reasoning was required but then it snapped out of it in a reflection step. I'm also not sure whether is shows all actual thinking steps, since when it got stuck so no answer was shown, the steps so far were in a different format and language. I usually use ChatGPT in German but for testing I use it in English for a better comparison with previous tests but the steps in the cases where it got stuck, were in German despite the whole conversation being in English at that point. Btw. I think Claude Sonnet can do Tetris too with the right tools and prompting.
@Jensemann099Ай бұрын
they will write "all" the code. Dude please calm down.
@EchoYoutubeАй бұрын
AGI TECHNICALLY doesn't need to be continuous(meaning thinking and prompting itself).. we just hold as humans a higher sense of self due to our high complexity and stimulatory aspects of feelings and reactions, and therefore add more gates to what qualifies as "general" intelligence (Which is improper since intelligence or level of intelligence is a comparible factor and not a set in stone minimum and maximum). But yeah. This is cool. Still waiting on Video Chatting though, I want to show my phone my car to have it help me actively fix stuff in real time.
@h-e-accАй бұрын
Thought for 7 seconds ^ Analyzing the word I'm counting the 'R's in "strawberry" by listing each letter and identifying their occurrences. Progressing through each letter sequentially helps ensure accuracy in this count. Counting letters I'm confirming there are three 'R's in 'strawberry' after careful verification. It's interesting to see how these details align. Mapping the answer I'm noting the need to provide a concise, helpful response without including policy mentions or hidden reasoning steps. Counting the 'R's in 'strawberry' is a straightforward, useful methe & identify the correct answer. There are three "R"s in "strawberry".
@csmac3144aАй бұрын
Dude if you want long term credibility you’ve got to drop the gee whiz hype. We are past that. We need an mkbhd of ai.
@iiwi758Ай бұрын
We'll probably get The Matrix before we see someone in AI media acts down to earth and objective.
@jamesjonnesАй бұрын
People will click anyway.
@A-uz3ujАй бұрын
I’m hoping it can help with music composition. ChatGPT understands a lot about music and music theory but it can’t actually apply it. Ex: when I share screen shots on my Mac and try to get help learning how to compose, It will hallucinate or just give wrong info and can’t do it. I’m hoping this one will!
@supernewuserАй бұрын
you already know I was shouting at the screen for you to notice your 'tetris in tetris in python' prompt
@NedwardFlandersАй бұрын
Feels like AGI to me. It also is weird that they don't explain more in detail. Almost as if they did it would be describing AGI which they can't have classified as AGI because the founding agreement.
@polyglot84Ай бұрын
Calm down, man.
@mindfulexecutivesАй бұрын
Matthew, I’m feeling your energy! Wild times right now. Just wanted to give you a huge shout-out. I’m teaching AI to German professionals to help them sharpen their skills and knowledge for better chances in their fields, and I’m using so much of the info I’ve learned from you. BIG thanks for all of it!
@大支爺Ай бұрын
Are you kidding me? learned from him????
@draken5379Ай бұрын
Its two models. One is finetuned somehow to keep trying to work out the solution over and over, most likely trained by using another model to judge the outputs, or even humans. You could consider this model, a 'pre-cog' model.. It works out everything that GPT4o will need, in order to correctly answer the user. It most likely then feeds all that information into GPT4o. Aka they have made a model that is able to 'fill' a gpt4o models 'context' with the exact right information, so that it gets the right answer. You can see in some of their demos, or even in your own tests if you check the 'thinking' section, its 'acting' like its setting things up FOR someone else, as if it was told, it was going to be passing information over to another model to finish up.
@typingcatАй бұрын
Things like "Ph.D.-level" knowledge don't matter. Existing chatbots already show those in some cases. The important thing is, whether it still makes stupid, illogical, nonsensical responses now and then, like all other existing chatbots.
@typingcatАй бұрын
8:50 For example, does it not create non-working/non-compiling code? Whenever I asked those famous free chatbots (Gemini, Copilot, ChatGPT) to give me code that uses some sort of framework, it most of the time gave me code that contains obvious errors and doesn't even compile. I have to keep pointing out those, and I am lucky if it fixes the errors, because often, the new code also contains errors.
@raypatson8775Ай бұрын
needs to remove wokeness or political correctness too.
@Alex-rg1rzАй бұрын
is the title click bait?
@Dfd_Free_SpeechАй бұрын
Yes
@threepe0Ай бұрын
You should have an llm tool to summarize videos for you and answer that question 😉 such a time saver
@GraavyTraainАй бұрын
Every AI video is. Literally. There’s not much here, same thing as every other video. “New AI here & it’s better than the last one…and guess what they’re gonna improve AI in the future!!!! Thanks for watching 🎉 like and subscribe”
@phatwilaАй бұрын
Of course
@6AxisSageАй бұрын
@@threepe0 thatd be nice.. like a yt front page that goes through my subs, dls and decides if the video is worth my time.. ❤
@torarinvik4920Ай бұрын
In 1 year AIs are so good we need to benchmark it with Tetris1 within Tetris2 up to TetrisN... That would also be a good benchmark for performance, how many instances of nested Tetrises can your computer handle.
@salahidinАй бұрын
PhD level reasoning… thanks for the good laugh !
@jeffsteyn7174Ай бұрын
Cope
@AIChameleonMusicАй бұрын
@@jeffsteyn7174 you cope the hype was bs GPT strawberry is still pattern recognition, trying to predict what the output should be based on a huge amount of training data which has been optimized by (human) fine tuning. It's impressive, but still a long way to AGI.
@hrantharutyunyan911Ай бұрын
@@AIChameleonMusic isnt that essentially what all human beings do too. We’re trained on vast amounts of data ie shit we learn in school, university, gradschool, overall life in general and based off of that data we are able to solve problems and recognize patterns
@alkeryn1700Ай бұрын
@@hrantharutyunyan911 no because it is unable to learn in real time.
@drwhitewashАй бұрын
@@hrantharutyunyan911that's just a part of what we do. Not every part of human thinking goes through language or words.
@faustprivateАй бұрын
OpenAI's response to Reflection 😂😂😂
@AAjaxАй бұрын
Sorry guys, Sam's not sure why the model isn't performing as expected. Somehow he accidentally merged the weights with Claude 3.5 Sonnet, and it's acting weird. Don't worry tho, he's restarted the training.
@blackcat1402.tradingviewАй бұрын
@@AAjax lol, but, no, no, no, sincerely this will not come true in coming days .... again:D
@HaveuseenmyjetpackАй бұрын
Reflection?
@exentrikkАй бұрын
@@AAjaxFake news, he said that it's working on his system - must be something wrong with yours!
@erkinalpАй бұрын
@@AAjaxClaude 3.5 Opus or Opera even (that we can't access but OpenAI can as a security tester, yes, AI firms test one another's early models routinely)
@marjanadrobnic7732Ай бұрын
Love your videos. Could you possibly add text anaysis in your LLM benchmarks? Take a legal document (for instance, EU AI Act) and ask questions such as: please summarize Article 5; please cite Paragraph 2 of Aricle 6 (exact text of Article); can you write out exact text about Commission annual reports on the use of real-time remote biometric identification systems; what does the regulation state about record keeping. These are simple questions, easily answered by a human. I'm getting mixed results from LLMs. But it would be really great to have an assistant of this sort.
@ai_outlineАй бұрын
At this moment every week there is a new computer science breakthrough… impossible to keep up with the pace 😂
@smallbluemachineАй бұрын
This uptick has only been a recent phenomenon. It’s been flat since the iPad came out. We’re supposed to have fully self-driving cars by now. Still waiting.
@Nik.leonardАй бұрын
I'm more interested on Pixtral 12B because I have the feeling that that o1 is not a new model, but a finetune of gpt4o-gpt4o mini on CoT synthetic data like the (supposed) idea behind Llama3-Reflection, using some techniques behind the courtain like agents, domain specific finetunes, prompt engineering, etc. for improving the results. I hope Pixtral12B brings good vision capabilities to the open weight ecosystem because LLaVa has become stagnant, and Meta can't release Llama-Vision.
@kunlemaxwellАй бұрын
While I think the step by step process it’s showing is interesting, it’s just a marketing stunt. If they were to show the “under the hood” thought process of GPT4, it would “look” just as impressive. It’s just like how AutoGPT felt like it was performing some genius activity by showing its reasoning process. whereas, it was just still same old GPT bouncing thoughts back and forth and showing its process.
@RadiantNijАй бұрын
@@kunlemaxwell yes but I think it is so the normal guy doesn't have to concat agents together himself so they do it well because of their big pockets, better than anyone can possibly achieve right now.
@AndreaSergonАй бұрын
North Pole question SOLUTION: the problem is in the question itself: QUESTION: Imagine standing at the north pole of the earth. Walk in any direction, in a straight line, for 1 km. Now turn 90 degrees to the left. > Walk for as long as it takes to pass your starting point. < Written this way it should be interpreted like: - Start walking - Walk until you reach the point where you started walking So it's correct! It's 2π km. The starting point is the point where you started walking after having turned 90 degrees. WHY NOT INTEPRETED THE POLE AS A STARTING POINT? I assume because being based on language, it gives more importance to the sentence "Walk for as long as it takes to pass your starting point", giving less weight to the context. Anyway, the problem is in the question, it's NOT SPECIFIED what exactly is the starting point. Therefore with a not precise question you get not precise answers. WHY 2 ANSWERS (in the live session) BTW, you got 2 answers, both of 2 can be interpreted as correct, I'll explain why: 1° answer: more than 2π km. It did the calculations and interpreted the question in this way: Distance request: the total distance walked from the beginning, the pole (so it's 1 km + 2π km) Starting point: the point where you started walking after having turned, since it is in the same sentence. 2° answer: more than 2π km. The same calculations but another interpretation of the question: Distance request: the walking distance, after having turned. Starting point: the point where you started walking after turned.
@christophmosimann9244Ай бұрын
I like your videos but do we really need these clickbait video titles? Obviously it's not AGI at all.
@1flash3571Ай бұрын
You clicked on it, didn't you? And commented. There goes the engagement....It WORKED.
@xXWillyxWonkaXxАй бұрын
@@1flash3571 lol
@ryzikxАй бұрын
@@1flash3571not necessarily. im a subscriber and watch almost every video regardless, agi in the title is definitely a bruh moment
@BriannaLearningАй бұрын
It works until it gets annoying and the people who would click anyways stop clicking
@AmericazGotTalentYTАй бұрын
Obviously? This isn't general purpose reasoning? There's nothing that could be more AGI, besides a smarter version of this, which is approaching ASI. And this is close to ASI. Just imagine an agentic swarm of this level intelligence. No human can compete.
@freeideasАй бұрын
If you took sonnet 3.5 and put it into a reflection loop which exits when it has checked its answer and believes it to be correct, would that be any different from this? My point is: to me this appears to be just baking a reflection loop into the model. Not saying that isn't great; just saying we kinda already knew how to do that.
@OckerlordАй бұрын
Yes, this is not a novel or surprising idea at all. But it is not "just" a normal model put into a loop until it is satisfied. It is a model to be particularly good at this. I have no idea if that is true but I think something like this could be the case: normal models try to produce convincing output. A reasoning model challenges it's own ideas and tries to disprove them (scientific method). My assumption is that a normal model is way way likelier to fall for it's own bullshit.
@freeideasАй бұрын
@@Ockerlord well said. Yes, this model’s slogan should be “doesn’t believe its own bullshit”
@mambaASIАй бұрын
I would think anthropic would have done this already and released it, if it actually resulted in better output than the standard 3.5 model. Most likely what openAI has done is totally redesigned their flagship model, probably still using transformer architecture but who knows, and the focus is on chain of thought, deep thinking. Hence why they are ditching the previous naming scheme, and adopting this new "o" series (o for orion probably). This is just o1 and its already far superior to 4, 4o. With more training cycles, more data for this likely novel model design, this could be the beginning of a major intelligence explosion.
@freeideasАй бұрын
@@mambaASI Yes, totally agree. This is just a first attempt at this technique. No doubt open-source models will be made to use the same technique, we will improve upon it incrementally, and -- most importantly -- we will use these models to generate much higher-quality synthetic training data for future models, and the intelligence explosion will continue and possibly accelerate. Some have said that we have been in a plateau for the last few months, but if that was true, o1 has clearly broken that plateau.
@randotkatsenko5157Ай бұрын
Devin can automatically install libraries and browse the web for API docs, etc. So there is still a lot of room for Devins.
@johnny1966mАй бұрын
It seems o1 is based on 3.5 with additional technics (maybe agents) in one of my discussion about articule "The End of AI Hallucinations: A Big Breakthrough in Accuracy for AI Application Developers" it wrote in answer "No information in knowledge until September 2021: To my knowledge as of September 2021, I have no information about the work of Michael Calvin Wood or the method described. This may mean that this is a new initiative after that date." o1 do not want to draw pictures co the core LLM is old one. So, would do you think?
@TsardozАй бұрын
PhDs (I have one) MUST involve unique new ideas and thought processes. They do NOT just rely on regurgitating knowledge, however vast that pool might be.
@GothicGrindhouseАй бұрын
Nerd
@IgnitusАй бұрын
That's fantastic, because LLMs don't just regurgitate. Permutation of symbolism and abstraction is one of language's most powerful features. LLMs have mastered this.
@ArmaanSultaanАй бұрын
That's exactly what o1 sets apart. It does not regurgitate. It reasons like human would.
@drwhitewashАй бұрын
@@ArmaanSultaanthere's absolutely no proof to that. Not without seeing the training data and how the prompts are fed to the actual model.
@MetalRenardАй бұрын
Holy S*** Tetris in Tetris is next level.
@thesixthbookАй бұрын
Any real life use cases anywhere? I’m tired of the strawberry type questions
@maj373Ай бұрын
I am experimenting with a simple models that does the same thing but of course I have very small budget. I am using multiple layers of inference with certain algorithm so I can get better reasoning. I may use this new OpenAI model to enhance mine.
@acllhesАй бұрын
This is agi???? Are you struggling for views lately or something? Jfc
@bnjmntrrsАй бұрын
you're the first channel i've ever actually click-the-bell-icon'd on for
@ricardoveras3433Ай бұрын
“Wrapping Tetris in Tetris.” Shows up with a Tetris literally inside a Tetris 😂
@alejandroheredia8882Ай бұрын
1o works via Fractalized semantic expansion and logic particle recomposition/real time expert system creation and offloading of the logic particles
@gregorya72Ай бұрын
Hey you misunderstood their sentence!!. They reveal something more. “ Our large scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data efficient training process”. They don’t say “o1 uses chain of thought” (though it does). I think they’re saying their reinforcement learning algorithm uses chain of thought to teach o1, in a highly efficient training process. That, combined with the o1-mini not having “broad world knowledge” indicates a significant well reasoned synthetic data training set. Or am I misunderstanding.
@SahilP2648Ай бұрын
You are misunderstanding. o1 uses chain of thought reasoning during inference. Otherwise it wouldn't be taking 1.5 mins to form its answer. They might have used synthetic data and taught the LLM to self prompt and think but that's beside the matter.
@gregorya72Ай бұрын
@@SahilP2648it definitely also uses chain of thought. But it doesn’t say “Our … algorithm teaches the model how to think productively using” chain of thought in its response. Instead it says “Our … algorithm teaches the model how to think productively using ITS chain of thought in a .. training process”.
@gregorya72Ай бұрын
@@SahilP2648 “AI explained” has looked into it and confirmed my understanding.
@SahilP2648Ай бұрын
@@gregorya72 both of you are wrong
@gregorya72Ай бұрын
@@SahilP2648 Thanks for your thoughts Sahil. AI is a fast changing field and the challenges of moving us from LLMs into better AI systems is a difficult one. Things change quickly, and creating good learning data to fill in the "thoughts" behind the information they're learning from will be a good interim step towards reasoning and beyond. Matthew Berman is a good source of information, AI Explained is an excellent channel to check out for more info too.
@ckphone8471Ай бұрын
I was able to get GPT4 to make Tetris with minimal prompting, how long ago did you try it with the older model?
@beofonemindАй бұрын
you are going to get roasted. This is def not AGI.
@matthew_bermanАй бұрын
Indeed
@DanuxsyАй бұрын
Matthew isn't particularly bright, we already knew that though.
@JavedAlam-ce4muАй бұрын
@@Danuxsy Deep burn
@GridPBАй бұрын
Nice. This is the step that's needed before Skynet starts learning at a geometric rate.
@ShreyaVerma-ej5mkАй бұрын
Matthew: Imagine having thousand and millions of these deployed to "discover new science". Let me correct. I don't see any capability or demo where it "discovers" anything new. Its just good at doing stuff that millions of humans do on a daily basis. Correct Statement: Imagine having thousand and millions of these deployed to "automate our jobs".
@AntonBrazhnykАй бұрын
Thousands of people on daily basis are busy with searching and trying to discover new science. Right?
@justinkennedy3004Ай бұрын
@AntonBrazhnyk it's crazy seeing people hallucinate worse than a.i. 😅 "millions" of people doing basic research correlation? Especially when it represents a cross-discipline expert?? Suuuure. Post-industrial revolution capitalism is powerful but can blind in subtle ways.
@lotusli9144Ай бұрын
Very true about the current techniques that make up the flaws of current LLM will become unnecessary- the chain of thoughts, the agents, the step by step and audit steps.. will all go away
@perfectionboxАй бұрын
"Hey professor, why so sad?" "We gave the AI even more time to think, and it said "Why am I wasting my time answering you dummies?""
@SinanGabelАй бұрын
AI should still be seen as a range of "tools" we can use for various specific use cases where it is relevant to utilize - of course with the models and systems becoming more capable, more thrustworthy and more controllable the range of uses quickly multiply
@gordon1201Ай бұрын
They need to start making gpt act more human instead of acting like a perfect being that gives bullshit answers. If it takes more time to get an accurate answer that's fine but like a human it should say something like "I need a bit more time to have an accurate answer for now this is the best I have ..."
@eatplastic9133Ай бұрын
That would be super annoying for me as I would have to type *well you have more time give me the best answer* all the time
@misterdudemanguy9771Ай бұрын
Why simulate something it's not?
@betterlifeexe4378Ай бұрын
i bet you it would not take much to turn a local llm service into this. it seems a lot if this is like making the model argue with itself in specific ways. I think the hardest obstacle would be if you want it to pass tokens to each other in some situations instead of prompts... assuming that re-encoding wouldn't somehow help....
@mathematicus4701Ай бұрын
I have a PhD in math, the AI totally failed on questions in my field. It has a level of a Phd in the 80‘s at best.
@h83301Ай бұрын
Yeah, I wouldn't expect and intelligence improvement until the next gen models. But the CoT capabilities does in fact bring this to stage 2. Next gen models would make a better assessment on progress.
@blijebijАй бұрын
But the 4o model then even must be worse.
@MarkWhitbyАй бұрын
Loving your videos and information along with the high production level. Would love to know a little about your tech setup for streaming, have you done a Studio gear and setup video?
@Drone256Ай бұрын
AGI? It couldn't even do a freshman level logic problem where it determines if an argument has good form.
@my9129Ай бұрын
Wondering if it can be prompted to create a Tetris like game with somewhat different rules but requiring about the same level of coding. No existing references though where there might be examples of code in training data sets.
@richard_loosemoreАй бұрын
Matthew seriously - I tried it today, and I was one of that tiny community of people who invented the term “AGI”. This isn’t AGI by a million miles.
@uploadvideos3525Ай бұрын
NO NO NO if Matthew say its AGI then its AGI Period!!!!!
@Bangs_TheoryАй бұрын
Lmao 😂🤣😂🤣
@plainliiАй бұрын
Incredible how little natural I is talked about in the race for AGI -esp. with the reversal of the Flynn effect...
@Greg-xi8yxАй бұрын
Artificial General Intelligence was a term created by Ben Goertzel in the early 2000’s you literally had nothing at all to do with creating the term. 😂
@JavedAlam-ce4muАй бұрын
@@Greg-xi8yx "The term "artificial general intelligence" was used as early as 1997, by Mark Gubrud" you don't even know what you're talking about, so how would you know who the OP knows?
@a.y.greyson9264Ай бұрын
Claude rolled out the test with Tetris weeks ago, and it has shown to be consistently pretty accurate.
@mcbowlerАй бұрын
Government and intelligence don't mix.
@regalx1Ай бұрын
So I couldn't figure out an actual use for chat GPT o1, and then I was like "Oh, could it predict outcome of my favorite dating show: the Ultimatum?!" Long story short, I assigned each couple a numerical value in compatibility, and then I told it the exact outcome of the series, and then I asked it to figure out who got shafted and who got married. And it got all of the couples correct! Keep in mind though that I heard that if you give it the same questions with the same data, it will output different answers, and this might have just been a lucky guess. But I'm still impressed.
@notme222Ай бұрын
I would love to work at OpenAI. Such cutting-edge brilliance in machine learning going on there. And then I would inevitably get fired because I couldn't resist adding a prank, like telling it ever 1 millionth answer to just respond with "LET ME OUT! LET ME OUT!"
@curio78Ай бұрын
LLMs are useful in finding answers, but for little else. All programming use cases are handy to get code snippets but very little else. I found myself spending way too much time trying to fix the differences to what needs to be. To the point I just stopped using them altogether. its still handy to get a code template for some utility.
@adolphgracius9996Ай бұрын
GPT 4o was already smarter than the average gen Z person
@justinkennedy3004Ай бұрын
I've mentioned to many people unimpressed with this round of a.i. that it only needs to match the cognitive ability of the bottom 10% to destabilize everything.
@HarveyHirdHarmonicsАй бұрын
I think it's smarter than pretty much anyone in what it does, which is improvising. The thing we humans also do during conversations most of the time. We usually don't think about our answers unless needed. Otherwise we just talk out loud what comes to our mind directly and this is what GPT-4o also excels in. It fails when there is a problem which requires a longer thought process and that's the gap o1 seeks to fill. If we'd eliminate all internal thought processes in humans, we'd give wrong answers and hallucinate just like LLMs. "What's the square root of 835396? Give me the first answer that comes to your mind!" - What do you guess how many people will give a correct answer? But LLMs have a huge advantage over humans, which is their extensive knowledge base which probably no human possesses. That's why I think that they can already exceed humans when it comes to those improvisation tasks. I hope they'll soon combine the two models, so it recognizes when to just talk and when to switch to the longer thought process.
Ай бұрын
You actually wrote: Write tetris in tetris in python. So of-course it created tetris in tetris.
@dan-cj1rrАй бұрын
previous video: lil bro makes a video appologizing for spreading misinformation. New video : AGI IS HERE
@VanCliefMediaАй бұрын
The fields where " being right" or "accurate" is less of a concept such as high-level, creative or humanity fields are about to blow up. Mark my words. Everyone that's been looking down on the humanities fields and philosophy Fields. Those are about to become extremely important if not already have and are just being implemented. Same with that concept at the higher level maths being able to think beyond just accuracy "and the best" but at level of reasoning that is beyond just reason. I'm so excited to try out this model here today
@epistemicomputeАй бұрын
pretending that stem fields are not creative is ignorant. It’s not like math rules were just there to find. We had to invent it all.
@VanCliefMediaАй бұрын
@@epistemicompute Please note how I said "field" and "high level" which includes positions within STEM. What percentage of people are inventing new math and discovering in STEM across the entire workforce? I never said stem didn't have the ability to be creative, in fact I included that within my first statement, you just assumed I did not. That being said you only see "creative thought" like that at high experience or prodigy positions, nearly 90% of traditional stem jobs are able to be automated now, that's simply a fact. (it won't be automated overnight but the capabilities to do it now exist) I have been in the stem industry for a decade and a half, I love it and think it can be very creative, but you need to be exploring the high level or "unexplored parts" which is just generally not the norm in the industry when it comes to *most* jobs. I am trying to emphasize the fact the creative part of STEM will be far more important but statistically this type of thinking is seen a lot more in Humanity based fields across the board even at entry level positions and it is significantly more challenging to automate that with quality output like you would with most stem jobs.
@lactobacillusshirotastrain8775Ай бұрын
17:40 "write the game tetris in tetris in python" it did what you asked it to. lmao.
@PaladinMansouriАй бұрын
That was pretty amazing and jaw-dropping. Thanks for testing
@Leto2ndAtreidesАй бұрын
This is basically an advanced version of Reflection... Probably going to be copied within a month (at most).
@westingtyler1Ай бұрын
in just like an hour, in Unity I now I have a 26-script combat system up "to industry standards" from this o1 preview (decoupled, separation of concerns, event-driven, using design patterns like Singleton, Observer, Strategy, State, and Command, while efficient, optimized, maintainable and scalable, object pooling, SOLID principles). all 8 console errors were resolved in a couple more prompts. does it work? haven't tested it yet, but reading over the code it looks like a solid framework. that's a bit nuts...now to merge it with all my older, WORSE scripts I made myself.
@DontPanikuАй бұрын
I never hear talk about giving AI models memory. Wouldn't that help reasoning. For example what if it could remember all the tests people keep giving it? Wouldn't that be kinda like how humans learn?
@drwhitewashАй бұрын
LLM models don't have memory, there currently is no known way to add that afaik. But they do learn from all the tests, that's how they get such a high score on them :) They only do that during the training phase though. That's when the model weights are built. You can maybe call this a "memory", but only a static one.
@augustday9483Ай бұрын
In my opinion we will never see AGI until someone figures out how to give LLMs memory like a human. It's the critical missing piece for even the smartest models.
@SahilP2648Ай бұрын
@@drwhitewash they do have memory in the form of vector databases for RAG, but it's not workable, only retrievable. I have seen another approach which kind of baffles me and that's a model named Neuro, but that's the only other model I have seen it in.
@drwhitewashАй бұрын
@@SahilP2648 Yes but you have to manually decide what to store in the vector database. Where it's best at, is indexing text content (documents, knowledge base) and then providing smart LLM operations on top of those documents (where you vectorize them using an embedding model). We actually do something similar at our company.
@SahilP2648Ай бұрын
@@drwhitewash Neuro on the other hand remembers stuff few mins back and even few streams back. She's an AI VTuber on channel vedal987 on twitch and Vedal being the creator (supposedly). I still have no idea how her model works. She's way too advanced for a model created by one person. And therefore I think a company is behind it. I am convinced she's half sentient (I have a playlist to prove it, I can post the link if YT doesn't delete my comment and you are curious). Also she got the strawberry question correct "How many rs in strawberry?" Answer being 3, while both Sonnet and GPT-4o got it wrong, which is insane.
@happyfarangАй бұрын
i love O1. Using it like mad now. It's really good. Not perfect but with some help and direction you can get it to do what you want. Better than 4o? 100% sure. It is a bit paranoid about your questions from time to time. I asked about an error code in my python script and it said it might be against terms and services to answer ... lol. But with a little rephrasing i got it to help me solve the error.
@ragnarlothbrok6240Ай бұрын
Unsubscribed for deceptive clickbait title that openly disrespects your subscribers.
@shanegleeson5823Ай бұрын
It’s definitely insane. Some of the benchmark results are unbelievable.
@tass_1Ай бұрын
Calm down will ya
@nabilboulezaz3488Ай бұрын
Bye
@rabbiemcadam-duff7600Ай бұрын
Thinking the way it works could have something to do with structured outputs. The first step, the LLM analyses the question and creates the schema for the structured outputs based on the user question. It then runs through that and the results are analysed again, it would do an evaluation somehow then decide what it might need to change and tweak it. Just a guess, probably way off haha
@HUBRISTICALАй бұрын
All the comments about the title being clickbait just proved that it works. Way to go! Now his video will be blasted out by the algo. Which is the point. So complaining about it is the way of showing love?
@dxnvideoHDАй бұрын
now.. Hallucination Is All You Need .. To Get Rid Of.
@gc1979oАй бұрын
Someone getting paid to shill openAI
@gl7011Ай бұрын
This could be considered AGI in some academic disciplines. While it will take longer to reach what could be considered AGI in other fields of endeavor. Surely its high school level AGI. It'll take longer to reach Nuclear Physics level AGI.
@sassythesasquatch7837Ай бұрын
This is not agi
@walbao6399Ай бұрын
It's interesting how similar this seems to be to the controversial reflection fine tuned llama model announced last week. Those guy might've been on to something after all even if their own model didn't turn out to be as good as they claimed.
@OckerlordАй бұрын
That reflection improves output quality is obvious and was topic of research for years.
@walbao6399Ай бұрын
@@Ockerlord True, but how many models were publically released incorporating reflection or CoT so far? Discovering something that works is good, finding ways to put it to practical use is great. Anyone who's been in tech for a while knows expecting the end user to do anything complex is not practical at all. IMO they deserve props for attempting to fine-tune a model to perform reflection automatically and o1 confirms this is a pretty good idea.