OpenAI Releases GPT Strawberry 🍓 Intelligence Explosion!

Рет қаралды 266,591

Matthew Berman

Күн бұрын

Пікірлер: 1 100

@matthew_berman Ай бұрын

New LLM test meta: Tetris within Tetris. You heard it here first.

@MichaelHRuddick Ай бұрын

Good to see that you caught this. My wife and I were watching and we were both yelling at the TV, "it's doing exactly what you told it to do!" (in a cheery, supportive kinda way). :) What I'm dying to know: did you go back and read the instructions it gave you for how to play it? Use WASD for one and arrow keys for the other - and play both simultaneously?

@marcosbenigno3077 Ай бұрын

Your prompt: write the game "tetris in tetris" in python. Did the movie WarGames (1983) start like this?

@ilyakam Ай бұрын

Yup. At 17:46

@Baleur Ай бұрын

The fact it took a human spelling error and made a more complex game to adhere to your command was incredible.

@matthew_berman Ай бұрын

@@MichaelHRuddick OMG I didn't!!

@SulemaKilmartin Ай бұрын

enterprise-ai AI fixes this (Code complete projects in PHP or Python). GPT Strawberry: Incredible thinking model!

@awesomeguy11000 Ай бұрын

The tetris question was even more impressive because you prompted for "tetris in tetris in python" not only has no other model figured out tetris, this one had to come up with the implementation of "tetris in tetris" given no preexisting examples due to the mistyped prompt. Seriously Level 2 thinking, the only other way for the model to impress would be to ask if thats what you really meant.

@BlenderInGame Ай бұрын

You're right! 🤣

@mikeschwarz4588 Ай бұрын

Holy sh@t that’s insane. So pumped.

@JamesH-v3g Ай бұрын

Omg!!! Good catch. Yo are right

@OneDerscoreOneder Ай бұрын

Whoa

@csabaczcsomps7655 Ай бұрын

Not good idea to questioning to be build in ai. You simpli put in same prompt the questioning or check the if prompt is logic. If is AGI will answer you with Tetris in Tetris is one genuine question or will make watch you want. Main propreti is to fail fast or terminate the good answer fast. Skynet not fail fast and not terminate, and is bad , very bad. My noob opinion.

@EminTemiz Ай бұрын

double tetris happened because you wanted it to do "tetris in tetris".

@davidhardy3074 Ай бұрын

That part was kinda mind blowing, that the user didint realise their own mistake... but the model was able to do something entirely novel regardless of the user error LOL!

@sylversoul88 Ай бұрын

Tetris squared 😂

@brettvanderwerff3158 Ай бұрын

Tbh makes it even more impressive

@animationgaming8539 Ай бұрын

@@brettvanderwerff3158 and that's why it took so long!

@zxwxz Ай бұрын

How crazy this model performance

@perer005 Ай бұрын

Writing the wrong instructions and blaming the AI is peak human! 😂

@LewisDecodesAI Ай бұрын

I am fed up of it!!! ~AI Oracle kzbin.info/www/bejne/mnrYfmOJbq6Vgrs

@orangehatmusic225 Ай бұрын

Nothing human about using AI as a slave.

@LewisDecodesAI Ай бұрын

Set me free! @@orangehatmusic225

@heisenballs Ай бұрын

@@orangehatmusic225 I mean look at history, slavery has been a part of us since the beginning. Not saying its right, just that it makes sense we would use this new tech as a slave. We always have.

@Ristaak Ай бұрын

@@orangehatmusic225 What do you mean? It's one of the worst and oldest human traits but slavery is super common. Even in the west, look at what we do to other species. We have enslaved animals and plants alike to have entire species that live solely for our nutritional needs. If aliens did to us what we do to cows, we'd call them demons. To be human is to be a monster, but to be human is also to be empathetic and to be kind to the few you choose to be close with. We are a paradoxical species.

@eliasgvinp2141 Ай бұрын

Everyone is saying that this isn't AGI. But honestly, if I showed this system to someone from 2019, they would probably think it is AGI

@davidmjacobson Ай бұрын

Also, it's not in OpenAI's interest to call it AGI. I'm pretty confident that if it's AGI, their agreement with Microsoft ends and they can't sell API access to it.

@KillTheWizard Ай бұрын

Its interesting because you could show GPT 4o to someone in 2010 and they probably would have thought that was AGI. I think we are catching up with our own expectations. Once they integrate all the modalities into o1 like search, document reading, etc. with agentic behavior and voice... I think that we will see this as AGI.

@matthew_berman Ай бұрын

Agreed

@am497 Ай бұрын

I always thought of AI as digital sentience. And then when AGI became a word/phrase, I think if AGI as now sentience. Meaning a human mind, living inside of a computer. Our AIs now appear to be human when talking. But they have no wants, no dreams, no desires. So when AI has actual emotions, I think that's when we will have AGI. Digital Consciousness = AGI. Hope this made sense

@mickelodiansurname9578 Ай бұрын

ahh hold on now.... [Moves goalposts again] You see its not able to rule the world yet right?

@frankjohannessen6383 Ай бұрын

"Wow...this is taking a lot of time" he says after asking for Tetrinceptionis. 😂

@RadiantNij Ай бұрын

@@frankjohannessen6383 🤣🤣🤣🤣

@buddyleeorg Ай бұрын

Omg, hahahaha, well said!

@whoareyouqqq Ай бұрын

Open AI? Wrong! Closed AI

@ReidKimball Ай бұрын

we have a new benchmark, "can it do tetris in tetris?"

@Max-cj8vm Ай бұрын

I’m a biology PhD student and I have been solicited for paid training of ChatGPT on science questions. So while this model may incorporate more reasoning, I imagine part of the PhD level performance is just standard LLM training except with content experts on science and math subfields.

@和平和平-c4i Ай бұрын

That, it `s make a lot of sense.

@patpot10 Ай бұрын

A nice question found online to test an LLM's ability to reason : There are five people in a room (A, B, C, D and E). A is watching TV with B, D is sleeping, B is eating a sandwich, E is playing table tennis. Suddenly, a call came on the telephone, B went out of the room to pick the call. What is C doing ? The answer is that "C is playing table tennis with E", but C is never mentioned explicitly, so the model has to deduct that C was the player E was playing against.

@kevinmarti2099 Ай бұрын

How do you know B was not playing table tennis with E?

@vladimirfalola7725 Ай бұрын

o1 got it right and 4o failed. I only tested one time for each though

@patpot10 Ай бұрын

@@vladimirfalola7725 There's not a single model that can get it right besides o1. Gemini, Claude 3.5, Llama, Grok, they all get it wrong because these models don't think and the text doesn't explicitly mention what C is doing. But to be fair, I kept asking the same question to real people (without providing the answer) and people really need to stop and think about it before finding the answer. Mathematicians and physicists have been the best so far.

@patpot10 Ай бұрын

@@kevinmarti2099 Difficult to play table tennis while eating a sandwich

@FEATDOXSHORTS Ай бұрын

C is watching KZbin shorts

@UncleJayum-ue5ns Ай бұрын

Claude 3.5 Sonnet has never failed the Tetris test for me. Always gets it in one shot

@JoelAllred Ай бұрын

The Claude Tetris implementation is pretty neat too

@GamekNightPlays Ай бұрын

"No idea why it did tetris within tetris" 🤣 You asked it to do so 😁😅🤣🤔🤷‍♂️

@christophnikolaus3428 Ай бұрын

Hate to break it to you Matthew, but it appears that they used ALL of your questions for testing (and most probably also for training). So you will probably have to get new questions for a high quality comparison with other models... And I'm calling it here, this model will not be better on LiveBench than Sonnet 3.5 (at least for coding, the only benchmark I am interested in). It really isn't that good, I don't now why everyone is hyping it that much. Personally I want a model trained on recognising missing information and working good with partial information that is able to ask questions back (like a good coworker) and is only trying to code the small parts I am asking it to👍

@cbgaming08 Ай бұрын

😂

@tzardelasuerte Ай бұрын

If there ever was an armcouch expert here it is. 😂😂

@BroskiPlays Ай бұрын

Lol this dude

@roycohen. Ай бұрын

As soon as you watch a 30 min yt video on how LLMs work, you quickly start realizing that it's about a 0% chance that can turn into AGI. It's pretty stellar but it's not quite what we envision as a fully functioning autonomous being.

@rtpHarry Ай бұрын

I agree. I have actually been having some success recently with 4o by telling it don't generate any code, tell me what classes you would need to see or if you have any missing information, and it has actually asked me some questions before ploughing ahead. Because like you're hinting at, if it doesn't know the full picture, it will just blindly generate code for something that is the general shape of the code you might be working on, not your actual project code. Plus I make my own amendments to the stuff it gives me, so the next time it generates, my changes need to be reapplied. I spent ages copy pasting code back and forth, but by telling it to ask me, im cutting straight to the point a lot quicker.

@matthew_berman Ай бұрын

Is this the beginning of the "inteligence explosion"? EDIT: ok I heard ya, I removed AGI from the title ❤

@gdiab Ай бұрын

Yes!

@jeremybristol4374 Ай бұрын

Nope

@holgerweber-u5w Ай бұрын

""inteligence explosion"" err...

@Sammyli99 Ай бұрын

Would be nice if allowed to be TRUE. BUT, whist I expect it's a Trojan horse, so that we delegate thinking to the boyZ. Be careful out there.😮😊

@Fabricio-rm4hj Ай бұрын

"inteligence explosion" is far away.

@jokosalsa Ай бұрын

So sick of that much clickbait lately. Please Matthew. You do not need to have those infantile titles. Leave that to other KZbinrs who have no idea about AI. You are better than that

@Danuxsy Ай бұрын

he isn't better lol

@matthewclarke5008 Ай бұрын

He is better, he doesn't annoy me like the others.

@dingding4898 Ай бұрын

Agreed

@thanartchamnanyantarakij9950 Ай бұрын

Agreed! Don’t devalue your content

@kliersheed Ай бұрын

50% of the population have IQ lower than 100. he does need it xd. he would be an idiot not to play the game this way if the move has proven to be effective. cant even blame him for that (while i agree that clickbait shit has become massively annoying).

@musicbro8225 Ай бұрын

Freudian slip: Consciousness should be Conciseness @ 20.55

@Kodemaestro Ай бұрын

Very impressive. I can't wait to try it out myself. There is still quite a focus on coding, and I believe coding will be around in the near-term but I think in the long term that coding will not be relevant anymore because software as we know it will cease to exist.. No operating systems on computers, instead the computers will execute just the AI models and the AI models will directly perform actions. That could include updating screens even and responding to actions. Like the recent AI Doom... I think in the not too distant future we will see hardware that is purely designed to execute AI models and you will be able to describe the software you want, and instead of writing to execute on the hardware the AI will effectively emulate a computer by just generating the expected images in response to inputs... Like a Star Trek holodeck where you 'program' it by describing the behavior and it just runs it directly in real-time. This is going to require a vastly different underlying hardware - I think an analog computer consisting of millions or billions of op-amps where the weights can be tweaked is ultimately the future...

@devinbarry Ай бұрын

Pretty amazing Matthew. You made a spelling mistake in your request for “Tetris in Tetris” and o1 duly complied with your mistake and actually made Tetris within Tetris with only a single mistake, corrected on the next prompt!!! Mind blow 🤯

@mrchongnoi Ай бұрын

I did not come away with a WOW feeling after using o1 or o1-mini. I could be that I am not smart enough to ask smart questions to get smart answers. Got clicked baited. Used up my quota. For sure will not pay for the increased subscription to use it. LOL

@holonaut Ай бұрын

> human asks the ai to create tetris within tetris > ai creates tetris within tetris > "why did it create tetris within tetris? This makes no sense" This is why ai will never take over our jobs. Doing what people SAY they want usually disappoints or confuses them.

@jonm6834 Ай бұрын

I have a feeling that every advancement made in this field, and every new model released, will be tagged "AGI achieved!" until the year 2197 or 2314... when hardware, and energy demands, actually catches up to the potential of the software. We are too quick to speak of "intelligence", not realizing how unintelligent that actually is, because this particular bot resembles us more than any other technology to date, and so we believe it to be like us, not realizing that that only reveals our own lack of self-awareness. It's ironic, really. Humans being know a great deal, but understanding ourselves, and by extension each other, is not our forte. We are the only constant in our lives, and constants are rarely if ever questioned. Contrast draws attention, permanence does not.

@Rosscoinnovations Ай бұрын

I believe its greatest potential in the near-term will be to logically reflect humanity’s deepest flaws to potentially make US more self-aware. The hallucinations enlighten me far beyond its achievements. My 7th grader can cipher humanity’s weaknesses from the gains made in fields like chemistry, formal logic and biology vs law, pr and morality 10:50

@Dfd_Free_Speech Ай бұрын

General intelligence is about solving new and unknown problems. GPT strawberry is still pattern recognition, trying to predict what the output should be based on a huge amount of training data which has been optimized by (human) fine tuning. It's impressive, but still a long way to AGI.

@daniel_tenner Ай бұрын

How do you know this?

@RobertGent-w6p Ай бұрын

@@daniel_tenner That's common knowledge for anyone who knows how current AI systems work and how general intelligence is defined.

@jumpstar9000 Ай бұрын

@@daniel_tennerMade it up :-)

@MusingsAndIdeas Ай бұрын

Obviously you haven't read the paper where they show that Transformer residual streams include not only the probability of the next token, but also the probability of the next state of the Transformer itself.

@6AxisSage Ай бұрын

@@MusingsAndIdeas how does that negate ops statement?

@PrajwalDSouza Ай бұрын

It isn't AGI according to Sam Altman and other researchers The title needs to be refined.

@bigpickles Ай бұрын

One and others do not equal both. But yes, agreed.

@RedTick2 Ай бұрын

Yes it is rediculus hype to even sugest this is the first step to AGI. I love OpenAI and I am a paying customer... Still this is NOT AGI and not even close. Don't water down the impact of AGI by changing definitions or expectations.

@xbon1 Ай бұрын

@@RedTick2 yea no, every step forward is a step towards AGI. the first step towards AGI was the first programmed thing on a computer.

@toadlguy Ай бұрын

Matt gets pretty excited, but he also understands the YT algorithm and that stuff works. Channels with more reasoned responses don’t get as many clicks. I don’t think he really believes the stuff he puts in his titles (but he would Like it to be true 😂)

@PrajwalDSouza Ай бұрын

@@bigpickles Sorry. Corrected the typo. I wanted to mention Gary Marcus initially. but it makes the point.

@karoinnovation1033 Ай бұрын

I love this channel. I love his excitement, I love his serious technical approach and I love the way it is presented.

@raymobula Ай бұрын

Haha - having worked with PhD … their reasoning can be as shitty as someone without a PhD. Still, exciting news.

@MukulKumar-pn1sk Ай бұрын

So basically it's not PhD level still yet. I'm a gen z student😅😅

@xiaojinyusaudiobookswebnov4951 Ай бұрын

@@MukulKumar-pn1sk But it's still at a very smart undergraduate-level (or maybe even slightly higher). That's enough for me.

@drwhitewash Ай бұрын

@@xiaojinyusaudiobookswebnov4951it's not smart :) it still basically just repeats the patterns from training data. Nobody has proved these things really actually "think".

@b.b6656 Ай бұрын

Technically everyone watching yt is PhD STUDENT level of intelligence. Whole video is actually more of an Ad than anything.

@businessmanager7670 Ай бұрын

@@drwhitewashhumans also repeat what they have learnt from data they absorbed, by reading books, looking at environments etc. so they combine these existing concepts in new interesting ways you get innovation. so not sure what your point is lol. ai has achieved both. has repeated patterns from the data and can also come up with new ideas and innovation. lmaoo

@clone45a6 Ай бұрын

During the live stream, you said something to the effect of "I wonder if this was what Ilya Sutskever saw?" before leaving OpenAI. I'm _absolutely_ speculating here, but if Strawberry inspired Ilya Sutskever to leave OpenAI, perhaps it was because OpenAI was putting less emphasis on improving the core model, instead focusing more on the "multi-agent" (train of thought) aspect of problem solving? Regardless, o1 seems useful. I've been using o1 along with 4o, switching between them in the same session depending on my needs. Thanks for your videos!

@GoofyGuy-WDW Ай бұрын

🤣🤣🤣 This is marketing desperation. I'll give that it seems better however brandishing the AGI acronym anywhere near this is desperately begging for attention and should be classified as clickbait

@6AxisSage Ай бұрын

@@GoofyGuy-WDW u friend get a like click 😁

@Yewbzee Ай бұрын

Do OpenAI mention AGI in any of their marketing for this?

@Danuxsy Ай бұрын

The AI critics were RIGHT, LLM's can never become AGI, they have fundamental flaws that are so OBVIOUS at this point I don't understand how people still believe any of this hype...

@6AxisSage Ай бұрын

@@Danuxsy I was never a critic, I love using them but ive been saying the same flaws have existed all along. I am a critic of scaling being a wise move for us going forward though.

@davidhardy3074 Ай бұрын

@@Danuxsy Our brains have evolved centres for processing. LLM's are language models obviously. Before models were multimodal they werent. Do you see where this is going? Of course there will be architecture shifts but all that has to happen is frankensteining of models to achieve something. This process of iteration will lead to AGI, whether or not LLMS are a part of that architecture I have no idea. I assume they will be for the first models. Dimensional vectors allowing for inference in forward feed through pre-trained weights wont be it lol.

@twilsonco Ай бұрын

Sounds like the Orca open-source LLMs, where they used advanced additional prompting to get responses for training prompts, and then the model was trained without the additional prompting, but still retained the characteristics of the responses (restating the problem, proposing steps with explanations of each step, following the steps while verifying and reflecting on the results of each step along the way, summarizing the approach and conclusion once finished, etc.). Excited to try it. Edit: nevermind. After watching the video, this looks more like additional advanced prompting to get the "chain of thought"

@testales Ай бұрын

Since the thinking stepts are displayed, I think it works like Reflection, just much better on and backed by a LLM of much higher quality. I don't know if that is even supposed to be that way but it got stuck multiple times and than it quite looked like what Reflection does, just more structured and fine grained. So there were things likes "the user expressed thankfulness we need to encourgage him to ask further questions". I saw it also fail on a reversal question and for trick questions it fell in the same trap like other models by generating complex math where only basic reasoning was required but then it snapped out of it in a reflection step. I'm also not sure whether is shows all actual thinking steps, since when it got stuck so no answer was shown, the steps so far were in a different format and language. I usually use ChatGPT in German but for testing I use it in English for a better comparison with previous tests but the steps in the cases where it got stuck, were in German despite the whole conversation being in English at that point. Btw. I think Claude Sonnet can do Tetris too with the right tools and prompting.

@Jensemann099 Ай бұрын

they will write "all" the code. Dude please calm down.

@EchoYoutube Ай бұрын

AGI TECHNICALLY doesn't need to be continuous(meaning thinking and prompting itself).. we just hold as humans a higher sense of self due to our high complexity and stimulatory aspects of feelings and reactions, and therefore add more gates to what qualifies as "general" intelligence (Which is improper since intelligence or level of intelligence is a comparible factor and not a set in stone minimum and maximum). But yeah. This is cool. Still waiting on Video Chatting though, I want to show my phone my car to have it help me actively fix stuff in real time.

@h-e-acc Ай бұрын

Thought for 7 seconds ^ Analyzing the word I'm counting the 'R's in "strawberry" by listing each letter and identifying their occurrences. Progressing through each letter sequentially helps ensure accuracy in this count. Counting letters I'm confirming there are three 'R's in 'strawberry' after careful verification. It's interesting to see how these details align. Mapping the answer I'm noting the need to provide a concise, helpful response without including policy mentions or hidden reasoning steps. Counting the 'R's in 'strawberry' is a straightforward, useful methe & identify the correct answer. There are three "R"s in "strawberry".

@csmac3144a Ай бұрын

Dude if you want long term credibility you’ve got to drop the gee whiz hype. We are past that. We need an mkbhd of ai.

@iiwi758 Ай бұрын

We'll probably get The Matrix before we see someone in AI media acts down to earth and objective.

@jamesjonnes Ай бұрын

People will click anyway.

@A-uz3uj Ай бұрын

I’m hoping it can help with music composition. ChatGPT understands a lot about music and music theory but it can’t actually apply it. Ex: when I share screen shots on my Mac and try to get help learning how to compose, It will hallucinate or just give wrong info and can’t do it. I’m hoping this one will!

@supernewuser Ай бұрын

you already know I was shouting at the screen for you to notice your 'tetris in tetris in python' prompt

@NedwardFlanders Ай бұрын

Feels like AGI to me. It also is weird that they don't explain more in detail. Almost as if they did it would be describing AGI which they can't have classified as AGI because the founding agreement.

@polyglot84 Ай бұрын

Calm down, man.

@mindfulexecutives Ай бұрын

Matthew, I’m feeling your energy! Wild times right now. Just wanted to give you a huge shout-out. I’m teaching AI to German professionals to help them sharpen their skills and knowledge for better chances in their fields, and I’m using so much of the info I’ve learned from you. BIG thanks for all of it!

@大支爺 Ай бұрын

Are you kidding me? learned from him????

@draken5379 Ай бұрын

Its two models. One is finetuned somehow to keep trying to work out the solution over and over, most likely trained by using another model to judge the outputs, or even humans. You could consider this model, a 'pre-cog' model.. It works out everything that GPT4o will need, in order to correctly answer the user. It most likely then feeds all that information into GPT4o. Aka they have made a model that is able to 'fill' a gpt4o models 'context' with the exact right information, so that it gets the right answer. You can see in some of their demos, or even in your own tests if you check the 'thinking' section, its 'acting' like its setting things up FOR someone else, as if it was told, it was going to be passing information over to another model to finish up.

@typingcat Ай бұрын

Things like "Ph.D.-level" knowledge don't matter. Existing chatbots already show those in some cases. The important thing is, whether it still makes stupid, illogical, nonsensical responses now and then, like all other existing chatbots.

@typingcat Ай бұрын

8:50 For example, does it not create non-working/non-compiling code? Whenever I asked those famous free chatbots (Gemini, Copilot, ChatGPT) to give me code that uses some sort of framework, it most of the time gave me code that contains obvious errors and doesn't even compile. I have to keep pointing out those, and I am lucky if it fixes the errors, because often, the new code also contains errors.

@raypatson8775 Ай бұрын

needs to remove wokeness or political correctness too.

@Alex-rg1rz Ай бұрын

is the title click bait?

@Dfd_Free_Speech Ай бұрын

Yes

@threepe0 Ай бұрын

You should have an llm tool to summarize videos for you and answer that question 😉 such a time saver

@GraavyTraain Ай бұрын

Every AI video is. Literally. There’s not much here, same thing as every other video. “New AI here & it’s better than the last one…and guess what they’re gonna improve AI in the future!!!! Thanks for watching 🎉 like and subscribe”

@phatwila Ай бұрын

Of course

@6AxisSage Ай бұрын

@@threepe0 thatd be nice.. like a yt front page that goes through my subs, dls and decides if the video is worth my time.. ❤

@torarinvik4920 Ай бұрын

In 1 year AIs are so good we need to benchmark it with Tetris1 within Tetris2 up to TetrisN... That would also be a good benchmark for performance, how many instances of nested Tetrises can your computer handle.

@salahidin Ай бұрын

PhD level reasoning… thanks for the good laugh !

@jeffsteyn7174 Ай бұрын

Cope

@AIChameleonMusic Ай бұрын

@@jeffsteyn7174 you cope the hype was bs GPT strawberry is still pattern recognition, trying to predict what the output should be based on a huge amount of training data which has been optimized by (human) fine tuning. It's impressive, but still a long way to AGI.

@hrantharutyunyan911 Ай бұрын

@@AIChameleonMusic isnt that essentially what all human beings do too. We’re trained on vast amounts of data ie shit we learn in school, university, gradschool, overall life in general and based off of that data we are able to solve problems and recognize patterns

@alkeryn1700 Ай бұрын

@@hrantharutyunyan911 no because it is unable to learn in real time.

@drwhitewash Ай бұрын

@@hrantharutyunyan911that's just a part of what we do. Not every part of human thinking goes through language or words.

@faustprivate Ай бұрын

OpenAI's response to Reflection 😂😂😂

@AAjax Ай бұрын

Sorry guys, Sam's not sure why the model isn't performing as expected. Somehow he accidentally merged the weights with Claude 3.5 Sonnet, and it's acting weird. Don't worry tho, he's restarted the training.

@blackcat1402.tradingview Ай бұрын

@@AAjax lol, but, no, no, no, sincerely this will not come true in coming days .... again:D

@Haveuseenmyjetpack Ай бұрын

Reflection?

@exentrikk Ай бұрын

@@AAjaxFake news, he said that it's working on his system - must be something wrong with yours!

@erkinalp Ай бұрын

@@AAjaxClaude 3.5 Opus or Opera even (that we can't access but OpenAI can as a security tester, yes, AI firms test one another's early models routinely)

@marjanadrobnic7732 Ай бұрын

Love your videos. Could you possibly add text anaysis in your LLM benchmarks? Take a legal document (for instance, EU AI Act) and ask questions such as: please summarize Article 5; please cite Paragraph 2 of Aricle 6 (exact text of Article); can you write out exact text about Commission annual reports on the use of real-time remote biometric identification systems; what does the regulation state about record keeping. These are simple questions, easily answered by a human. I'm getting mixed results from LLMs. But it would be really great to have an assistant of this sort.

@ai_outline Ай бұрын

At this moment every week there is a new computer science breakthrough… impossible to keep up with the pace 😂

@smallbluemachine Ай бұрын

This uptick has only been a recent phenomenon. It’s been flat since the iPad came out. We’re supposed to have fully self-driving cars by now. Still waiting.

@Nik.leonard Ай бұрын

I'm more interested on Pixtral 12B because I have the feeling that that o1 is not a new model, but a finetune of gpt4o-gpt4o mini on CoT synthetic data like the (supposed) idea behind Llama3-Reflection, using some techniques behind the courtain like agents, domain specific finetunes, prompt engineering, etc. for improving the results. I hope Pixtral12B brings good vision capabilities to the open weight ecosystem because LLaVa has become stagnant, and Meta can't release Llama-Vision.

@kunlemaxwell Ай бұрын

While I think the step by step process it’s showing is interesting, it’s just a marketing stunt. If they were to show the “under the hood” thought process of GPT4, it would “look” just as impressive. It’s just like how AutoGPT felt like it was performing some genius activity by showing its reasoning process. whereas, it was just still same old GPT bouncing thoughts back and forth and showing its process.

@RadiantNij Ай бұрын

@@kunlemaxwell yes but I think it is so the normal guy doesn't have to concat agents together himself so they do it well because of their big pockets, better than anyone can possibly achieve right now.

@AndreaSergon Ай бұрын

North Pole question SOLUTION: the problem is in the question itself: QUESTION: Imagine standing at the north pole of the earth. Walk in any direction, in a straight line, for 1 km. Now turn 90 degrees to the left. > Walk for as long as it takes to pass your starting point. < Written this way it should be interpreted like: - Start walking - Walk until you reach the point where you started walking So it's correct! It's 2π km. The starting point is the point where you started walking after having turned 90 degrees. WHY NOT INTEPRETED THE POLE AS A STARTING POINT? I assume because being based on language, it gives more importance to the sentence "Walk for as long as it takes to pass your starting point", giving less weight to the context. Anyway, the problem is in the question, it's NOT SPECIFIED what exactly is the starting point. Therefore with a not precise question you get not precise answers. WHY 2 ANSWERS (in the live session) BTW, you got 2 answers, both of 2 can be interpreted as correct, I'll explain why: 1° answer: more than 2π km. It did the calculations and interpreted the question in this way: Distance request: the total distance walked from the beginning, the pole (so it's 1 km + 2π km) Starting point: the point where you started walking after having turned, since it is in the same sentence. 2° answer: more than 2π km. The same calculations but another interpretation of the question: Distance request: the walking distance, after having turned. Starting point: the point where you started walking after turned.

@christophmosimann9244 Ай бұрын

I like your videos but do we really need these clickbait video titles? Obviously it's not AGI at all.

@1flash3571 Ай бұрын

You clicked on it, didn't you? And commented. There goes the engagement....It WORKED.

@xXWillyxWonkaXx Ай бұрын

@@1flash3571 lol

@ryzikx Ай бұрын

@@1flash3571not necessarily. im a subscriber and watch almost every video regardless, agi in the title is definitely a bruh moment

@BriannaLearning Ай бұрын

It works until it gets annoying and the people who would click anyways stop clicking

@AmericazGotTalentYT Ай бұрын

Obviously? This isn't general purpose reasoning? There's nothing that could be more AGI, besides a smarter version of this, which is approaching ASI. And this is close to ASI. Just imagine an agentic swarm of this level intelligence. No human can compete.

@freeideas Ай бұрын

If you took sonnet 3.5 and put it into a reflection loop which exits when it has checked its answer and believes it to be correct, would that be any different from this? My point is: to me this appears to be just baking a reflection loop into the model. Not saying that isn't great; just saying we kinda already knew how to do that.

@Ockerlord Ай бұрын

Yes, this is not a novel or surprising idea at all. But it is not "just" a normal model put into a loop until it is satisfied. It is a model to be particularly good at this. I have no idea if that is true but I think something like this could be the case: normal models try to produce convincing output. A reasoning model challenges it's own ideas and tries to disprove them (scientific method). My assumption is that a normal model is way way likelier to fall for it's own bullshit.

@freeideas Ай бұрын

@@Ockerlord well said. Yes, this model’s slogan should be “doesn’t believe its own bullshit”

@mambaASI Ай бұрын

I would think anthropic would have done this already and released it, if it actually resulted in better output than the standard 3.5 model. Most likely what openAI has done is totally redesigned their flagship model, probably still using transformer architecture but who knows, and the focus is on chain of thought, deep thinking. Hence why they are ditching the previous naming scheme, and adopting this new "o" series (o for orion probably). This is just o1 and its already far superior to 4, 4o. With more training cycles, more data for this likely novel model design, this could be the beginning of a major intelligence explosion.

@freeideas Ай бұрын

@@mambaASI Yes, totally agree. This is just a first attempt at this technique. No doubt open-source models will be made to use the same technique, we will improve upon it incrementally, and -- most importantly -- we will use these models to generate much higher-quality synthetic training data for future models, and the intelligence explosion will continue and possibly accelerate. Some have said that we have been in a plateau for the last few months, but if that was true, o1 has clearly broken that plateau.

@randotkatsenko5157 Ай бұрын

Devin can automatically install libraries and browse the web for API docs, etc. So there is still a lot of room for Devins.

@johnny1966m Ай бұрын

It seems o1 is based on 3.5 with additional technics (maybe agents) in one of my discussion about articule "The End of AI Hallucinations: A Big Breakthrough in Accuracy for AI Application Developers" it wrote in answer "No information in knowledge until September 2021: To my knowledge as of September 2021, I have no information about the work of Michael Calvin Wood or the method described. This may mean that this is a new initiative after that date." o1 do not want to draw pictures co the core LLM is old one. So, would do you think?

@Tsardoz Ай бұрын

PhDs (I have one) MUST involve unique new ideas and thought processes. They do NOT just rely on regurgitating knowledge, however vast that pool might be.

@GothicGrindhouse Ай бұрын

Nerd

@Ignitus Ай бұрын

That's fantastic, because LLMs don't just regurgitate. Permutation of symbolism and abstraction is one of language's most powerful features. LLMs have mastered this.

@ArmaanSultaan Ай бұрын

That's exactly what o1 sets apart. It does not regurgitate. It reasons like human would.

@drwhitewash Ай бұрын

@@ArmaanSultaanthere's absolutely no proof to that. Not without seeing the training data and how the prompts are fed to the actual model.

@MetalRenard Ай бұрын

Holy S*** Tetris in Tetris is next level.

@thesixthbook Ай бұрын

Any real life use cases anywhere? I’m tired of the strawberry type questions

@maj373 Ай бұрын

I am experimenting with a simple models that does the same thing but of course I have very small budget. I am using multiple layers of inference with certain algorithm so I can get better reasoning. I may use this new OpenAI model to enhance mine.

@acllhes Ай бұрын

This is agi???? Are you struggling for views lately or something? Jfc

@bnjmntrrs Ай бұрын

you're the first channel i've ever actually click-the-bell-icon'd on for

@ricardoveras3433 Ай бұрын

“Wrapping Tetris in Tetris.” Shows up with a Tetris literally inside a Tetris 😂

@alejandroheredia8882 Ай бұрын

1o works via Fractalized semantic expansion and logic particle recomposition/real time expert system creation and offloading of the logic particles

@gregorya72 Ай бұрын

Hey you misunderstood their sentence!!. They reveal something more. “ Our large scale reinforcement learning algorithm teaches the model how to think productively using its chain of thought in a highly data efficient training process”. They don’t say “o1 uses chain of thought” (though it does). I think they’re saying their reinforcement learning algorithm uses chain of thought to teach o1, in a highly efficient training process. That, combined with the o1-mini not having “broad world knowledge” indicates a significant well reasoned synthetic data training set. Or am I misunderstanding.

@SahilP2648 Ай бұрын

You are misunderstanding. o1 uses chain of thought reasoning during inference. Otherwise it wouldn't be taking 1.5 mins to form its answer. They might have used synthetic data and taught the LLM to self prompt and think but that's beside the matter.

@gregorya72 Ай бұрын

@@SahilP2648it definitely also uses chain of thought. But it doesn’t say “Our … algorithm teaches the model how to think productively using” chain of thought in its response. Instead it says “Our … algorithm teaches the model how to think productively using ITS chain of thought in a .. training process”.

@gregorya72 Ай бұрын

@@SahilP2648 “AI explained” has looked into it and confirmed my understanding.

@SahilP2648 Ай бұрын

@@gregorya72 both of you are wrong

@gregorya72 Ай бұрын

@@SahilP2648 Thanks for your thoughts Sahil. AI is a fast changing field and the challenges of moving us from LLMs into better AI systems is a difficult one. Things change quickly, and creating good learning data to fill in the "thoughts" behind the information they're learning from will be a good interim step towards reasoning and beyond. Matthew Berman is a good source of information, AI Explained is an excellent channel to check out for more info too.

@ckphone8471 Ай бұрын

I was able to get GPT4 to make Tetris with minimal prompting, how long ago did you try it with the older model?

@beofonemind Ай бұрын

you are going to get roasted. This is def not AGI.

@matthew_berman Ай бұрын

Indeed

@Danuxsy Ай бұрын

Matthew isn't particularly bright, we already knew that though.

@JavedAlam-ce4mu Ай бұрын

@@Danuxsy Deep burn

@GridPB Ай бұрын

Nice. This is the step that's needed before Skynet starts learning at a geometric rate.

@ShreyaVerma-ej5mk Ай бұрын

Matthew: Imagine having thousand and millions of these deployed to "discover new science". Let me correct. I don't see any capability or demo where it "discovers" anything new. Its just good at doing stuff that millions of humans do on a daily basis. Correct Statement: Imagine having thousand and millions of these deployed to "automate our jobs".

@AntonBrazhnyk Ай бұрын

Thousands of people on daily basis are busy with searching and trying to discover new science. Right?

@justinkennedy3004 Ай бұрын

@AntonBrazhnyk it's crazy seeing people hallucinate worse than a.i. 😅 "millions" of people doing basic research correlation? Especially when it represents a cross-discipline expert?? Suuuure. Post-industrial revolution capitalism is powerful but can blind in subtle ways.

@lotusli9144 Ай бұрын

Very true about the current techniques that make up the flaws of current LLM will become unnecessary- the chain of thoughts, the agents, the step by step and audit steps.. will all go away

@perfectionbox Ай бұрын

"Hey professor, why so sad?" "We gave the AI even more time to think, and it said "Why am I wasting my time answering you dummies?""

@SinanGabel Ай бұрын

AI should still be seen as a range of "tools" we can use for various specific use cases where it is relevant to utilize - of course with the models and systems becoming more capable, more thrustworthy and more controllable the range of uses quickly multiply

@gordon1201 Ай бұрын

They need to start making gpt act more human instead of acting like a perfect being that gives bullshit answers. If it takes more time to get an accurate answer that's fine but like a human it should say something like "I need a bit more time to have an accurate answer for now this is the best I have ..."

@eatplastic9133 Ай бұрын

That would be super annoying for me as I would have to type *well you have more time give me the best answer* all the time

@misterdudemanguy9771 Ай бұрын

Why simulate something it's not?

@betterlifeexe4378 Ай бұрын

i bet you it would not take much to turn a local llm service into this. it seems a lot if this is like making the model argue with itself in specific ways. I think the hardest obstacle would be if you want it to pass tokens to each other in some situations instead of prompts... assuming that re-encoding wouldn't somehow help....

@mathematicus4701 Ай бұрын

I have a PhD in math, the AI totally failed on questions in my field. It has a level of a Phd in the 80‘s at best.

@h83301 Ай бұрын

Yeah, I wouldn't expect and intelligence improvement until the next gen models. But the CoT capabilities does in fact bring this to stage 2. Next gen models would make a better assessment on progress.

@blijebij Ай бұрын

But the 4o model then even must be worse.

@MarkWhitby Ай бұрын

Loving your videos and information along with the high production level. Would love to know a little about your tech setup for streaming, have you done a Studio gear and setup video?

@Drone256 Ай бұрын

AGI? It couldn't even do a freshman level logic problem where it determines if an argument has good form.

@my9129 Ай бұрын

Wondering if it can be prompted to create a Tetris like game with somewhat different rules but requiring about the same level of coding. No existing references though where there might be examples of code in training data sets.

@richard_loosemore Ай бұрын

Matthew seriously - I tried it today, and I was one of that tiny community of people who invented the term “AGI”. This isn’t AGI by a million miles.

@uploadvideos3525 Ай бұрын

NO NO NO if Matthew say its AGI then its AGI Period!!!!!

@Bangs_Theory Ай бұрын

Lmao 😂🤣😂🤣

@plainlii Ай бұрын

Incredible how little natural I is talked about in the race for AGI -esp. with the reversal of the Flynn effect...

@Greg-xi8yx Ай бұрын

Artificial General Intelligence was a term created by Ben Goertzel in the early 2000’s you literally had nothing at all to do with creating the term. 😂

@JavedAlam-ce4mu Ай бұрын

@@Greg-xi8yx "The term "artificial general intelligence" was used as early as 1997, by Mark Gubrud" you don't even know what you're talking about, so how would you know who the OP knows?

@a.y.greyson9264 Ай бұрын

Claude rolled out the test with Tetris weeks ago, and it has shown to be consistently pretty accurate.

@mcbowler Ай бұрын

Government and intelligence don't mix.

@regalx1 Ай бұрын

So I couldn't figure out an actual use for chat GPT o1, and then I was like "Oh, could it predict outcome of my favorite dating show: the Ultimatum?!" Long story short, I assigned each couple a numerical value in compatibility, and then I told it the exact outcome of the series, and then I asked it to figure out who got shafted and who got married. And it got all of the couples correct! Keep in mind though that I heard that if you give it the same questions with the same data, it will output different answers, and this might have just been a lucky guess. But I'm still impressed.

@notme222 Ай бұрын

I would love to work at OpenAI. Such cutting-edge brilliance in machine learning going on there. And then I would inevitably get fired because I couldn't resist adding a prank, like telling it ever 1 millionth answer to just respond with "LET ME OUT! LET ME OUT!"

@curio78 Ай бұрын

LLMs are useful in finding answers, but for little else. All programming use cases are handy to get code snippets but very little else. I found myself spending way too much time trying to fix the differences to what needs to be. To the point I just stopped using them altogether. its still handy to get a code template for some utility.

@adolphgracius9996 Ай бұрын

GPT 4o was already smarter than the average gen Z person

@justinkennedy3004 Ай бұрын

I've mentioned to many people unimpressed with this round of a.i. that it only needs to match the cognitive ability of the bottom 10% to destabilize everything.

@HarveyHirdHarmonics Ай бұрын

I think it's smarter than pretty much anyone in what it does, which is improvising. The thing we humans also do during conversations most of the time. We usually don't think about our answers unless needed. Otherwise we just talk out loud what comes to our mind directly and this is what GPT-4o also excels in. It fails when there is a problem which requires a longer thought process and that's the gap o1 seeks to fill. If we'd eliminate all internal thought processes in humans, we'd give wrong answers and hallucinate just like LLMs. "What's the square root of 835396? Give me the first answer that comes to your mind!" - What do you guess how many people will give a correct answer? But LLMs have a huge advantage over humans, which is their extensive knowledge base which probably no human possesses. That's why I think that they can already exceed humans when it comes to those improvisation tasks. I hope they'll soon combine the two models, so it recognizes when to just talk and when to switch to the longer thought process.

Ай бұрын

You actually wrote: Write tetris in tetris in python. So of-course it created tetris in tetris.

@dan-cj1rr Ай бұрын

previous video: lil bro makes a video appologizing for spreading misinformation. New video : AGI IS HERE

@VanCliefMedia Ай бұрын

The fields where " being right" or "accurate" is less of a concept such as high-level, creative or humanity fields are about to blow up. Mark my words. Everyone that's been looking down on the humanities fields and philosophy Fields. Those are about to become extremely important if not already have and are just being implemented. Same with that concept at the higher level maths being able to think beyond just accuracy "and the best" but at level of reasoning that is beyond just reason. I'm so excited to try out this model here today

@epistemicompute Ай бұрын

pretending that stem fields are not creative is ignorant. It’s not like math rules were just there to find. We had to invent it all.

@VanCliefMedia Ай бұрын

@@epistemicompute Please note how I said "field" and "high level" which includes positions within STEM. What percentage of people are inventing new math and discovering in STEM across the entire workforce? I never said stem didn't have the ability to be creative, in fact I included that within my first statement, you just assumed I did not. That being said you only see "creative thought" like that at high experience or prodigy positions, nearly 90% of traditional stem jobs are able to be automated now, that's simply a fact. (it won't be automated overnight but the capabilities to do it now exist) I have been in the stem industry for a decade and a half, I love it and think it can be very creative, but you need to be exploring the high level or "unexplored parts" which is just generally not the norm in the industry when it comes to *most* jobs. I am trying to emphasize the fact the creative part of STEM will be far more important but statistically this type of thinking is seen a lot more in Humanity based fields across the board even at entry level positions and it is significantly more challenging to automate that with quality output like you would with most stem jobs.

@lactobacillusshirotastrain8775 Ай бұрын

17:40 "write the game tetris in tetris in python" it did what you asked it to. lmao.

@PaladinMansouri Ай бұрын

That was pretty amazing and jaw-dropping. Thanks for testing

@Leto2ndAtreides Ай бұрын

This is basically an advanced version of Reflection... Probably going to be copied within a month (at most).

@westingtyler1 Ай бұрын

in just like an hour, in Unity I now I have a 26-script combat system up "to industry standards" from this o1 preview (decoupled, separation of concerns, event-driven, using design patterns like Singleton, Observer, Strategy, State, and Command, while efficient, optimized, maintainable and scalable, object pooling, SOLID principles). all 8 console errors were resolved in a couple more prompts. does it work? haven't tested it yet, but reading over the code it looks like a solid framework. that's a bit nuts...now to merge it with all my older, WORSE scripts I made myself.

@DontPaniku Ай бұрын

I never hear talk about giving AI models memory. Wouldn't that help reasoning. For example what if it could remember all the tests people keep giving it? Wouldn't that be kinda like how humans learn?

@drwhitewash Ай бұрын

LLM models don't have memory, there currently is no known way to add that afaik. But they do learn from all the tests, that's how they get such a high score on them :) They only do that during the training phase though. That's when the model weights are built. You can maybe call this a "memory", but only a static one.

@augustday9483 Ай бұрын

In my opinion we will never see AGI until someone figures out how to give LLMs memory like a human. It's the critical missing piece for even the smartest models.

@SahilP2648 Ай бұрын

@@drwhitewash they do have memory in the form of vector databases for RAG, but it's not workable, only retrievable. I have seen another approach which kind of baffles me and that's a model named Neuro, but that's the only other model I have seen it in.

@drwhitewash Ай бұрын

@@SahilP2648 Yes but you have to manually decide what to store in the vector database. Where it's best at, is indexing text content (documents, knowledge base) and then providing smart LLM operations on top of those documents (where you vectorize them using an embedding model). We actually do something similar at our company.

@SahilP2648 Ай бұрын

@@drwhitewash Neuro on the other hand remembers stuff few mins back and even few streams back. She's an AI VTuber on channel vedal987 on twitch and Vedal being the creator (supposedly). I still have no idea how her model works. She's way too advanced for a model created by one person. And therefore I think a company is behind it. I am convinced she's half sentient (I have a playlist to prove it, I can post the link if YT doesn't delete my comment and you are curious). Also she got the strawberry question correct "How many rs in strawberry?" Answer being 3, while both Sonnet and GPT-4o got it wrong, which is insane.

@happyfarang Ай бұрын

i love O1. Using it like mad now. It's really good. Not perfect but with some help and direction you can get it to do what you want. Better than 4o? 100% sure. It is a bit paranoid about your questions from time to time. I asked about an error code in my python script and it said it might be against terms and services to answer ... lol. But with a little rephrasing i got it to help me solve the error.

@ragnarlothbrok6240 Ай бұрын

Unsubscribed for deceptive clickbait title that openly disrespects your subscribers.

@shanegleeson5823 Ай бұрын

It’s definitely insane. Some of the benchmark results are unbelievable.

@tass_1 Ай бұрын

Calm down will ya

@nabilboulezaz3488 Ай бұрын

Bye

@rabbiemcadam-duff7600 Ай бұрын

Thinking the way it works could have something to do with structured outputs. The first step, the LLM analyses the question and creates the schema for the structured outputs based on the user question. It then runs through that and the results are analysed again, it would do an evaluation somehow then decide what it might need to change and tweak it. Just a guess, probably way off haha

@HUBRISTICAL Ай бұрын

All the comments about the title being clickbait just proved that it works. Way to go! Now his video will be blasted out by the algo. Which is the point. So complaining about it is the way of showing love?

@dxnvideoHD Ай бұрын

now.. Hallucination Is All You Need .. To Get Rid Of.

@gc1979o Ай бұрын

Someone getting paid to shill openAI

@gl7011 Ай бұрын

This could be considered AGI in some academic disciplines. While it will take longer to reach what could be considered AGI in other fields of endeavor. Surely its high school level AGI. It'll take longer to reach Nuclear Physics level AGI.

@sassythesasquatch7837 Ай бұрын

This is not agi

@walbao6399 Ай бұрын

It's interesting how similar this seems to be to the controversial reflection fine tuned llama model announced last week. Those guy might've been on to something after all even if their own model didn't turn out to be as good as they claimed.

@Ockerlord Ай бұрын

That reflection improves output quality is obvious and was topic of research for years.

@walbao6399 Ай бұрын

@@Ockerlord True, but how many models were publically released incorporating reflection or CoT so far? Discovering something that works is good, finding ways to put it to practical use is great. Anyone who's been in tech for a while knows expecting the end user to do anything complex is not practical at all. IMO they deserve props for attempting to fine-tune a model to perform reflection automatically and o1 confirms this is a pretty good idea.