Okay, I'm a bit scared now...

Рет қаралды 97,837

Күн бұрын

Thank you infinite.red for sponsoring today's video!
I was skeptical of the new OpenAI ChatGPT release, but O1 ended up impressing me when I tried it against Advent of Code. RIP.
Check out my Twitch, Twitter, Discord more at t3.gg
S/O Ph4se0n3 for the awesome edit 🙏

Пікірлер: 474

@codedusting 4 күн бұрын

Developers: So we added a for loop for self reference and check if it's right or not, based on questions within. It will be slow but we hope it will be better. Marketing: I see... So what you're saying is, it thinks. Developers: Ah... Not really but let's go with that...

@TejasAnand-id8hx 2 күн бұрын

That story is not that accurate as the name Chain of Thought comes from the research paper which described this training technique.

@codedusting 2 күн бұрын

@@TejasAnand-id8hx it's a joke bro. Thanks for mentioning the research paper name

@alex-rs6ts 4 күн бұрын

AI went from barely managing to write code to creating games 0-shot with ease. In a couple years. Don't forget that.

@aidenkitchen8378 4 күн бұрын

Literally this. Compute and other things will be barriers that will not allow for the same rate of improvement, but the infrastructure is already there for continued, substantial improvement.

@a_mediocre_meerkat 3 күн бұрын

true but technologies usually develop in sigmoid-ish curve problem is you don't know you've hit a flat area only years after you've hit a flat area it looked like we've entered diminishing returns region, but now they've dropped this out of the blue either way it's hard to say where we are on the sigmoid curve, maybe they will train it alot more on "decision making" and it'll iprove only marignally, or maybe we're at another low point.

@LA-qv1ir Күн бұрын

@@aidenkitchen8378 infrastructure that's on an unsustainable growth trajectory, set to hoard as much of the energy generation of the world as it can.

@blarvinius Күн бұрын

It was more like 185 years. This is a problem with genius young engineers: they ignore history. Take all the lovely map and navigation apps: they don't follow a thousand years of cartographic, and navigation principles. Very clever, but also missing real value.

@sn5806 4 күн бұрын

I've seen this described as a system that repeatedly prompts itself to refine it's responses. Does that sound accurate to you?

@Patashu 4 күн бұрын

it's funny that it does all that but it can still get the wrong answer, then you prompt it one more time, and it gets the right answer. spent 17 seconds when it should have spent 18 seconds

@Lucas-gt8en 4 күн бұрын

I don’t care how it sounds I care about results and brother I’ve been getting some great results

@luxnox9303 4 күн бұрын

I believe the technique used is called Chain of Thought (CoT) you can read a lot about it if you search that on google.

@the_Radixsort 4 күн бұрын

Yes, but not in a good way. I have a very specific brownfield app using react native and native. Issues with enabling the new architecture are crazy and undocumented but they will indeed refine its answers, all based on hallucination tho.

@bnlbnlbnl 4 күн бұрын

@@Lucas-gt8en but imagine how much better your results would be if they doubled the number of times it prompts itself

@WilliamLeeSims 4 күн бұрын

Initial jokes/enlightenment aside, the fact that you can say "Are you sure?" and get a valid, correct, contextually-accurate response is still insane to me. When I say that to my software, it just stares blankly back at me.

@SilisAlin 4 күн бұрын

And if you ask that question to a correct response, it will proceed to spit out a well thought out, convincingly-sounding complete nonsense.

@blarvinius Күн бұрын

Soon every product will add a "thinking delay"... "☕Your coffee maker thought for 21 seconds before pouring this shot."

@PatNeedhamUSA 4 күн бұрын

Claiming and so prominently highlighting that "it thinks before giving a final answer" is kind of like throwing all previous versions of ChatGPT under the bus

@Rat.s 4 күн бұрын

😂

@noone-ld7pt 4 күн бұрын

I kind of disagree, I think it's actually pretty damn impressive that everything we've seen from ChatGPT up until now has been the very first result it came up with without any reflection, planning or editing at all. If you ask me to write an essay about something without thinking at all and without being able to backspace or edit a single thing I would perform signifcantly worse than ChatGPT on the vast majority of prompts. The fact that a simple approach change can have these sorts of results on a 2 year old model is extremely promising, especially when compounded with the predicted improvements from upscaling the base model by several orders of magnitude.

@dontmindmejustwatching 4 күн бұрын

just an "agentic" way of quering it. transformer architecture maxxed out atm. if it works it works.

@nullbeyondo 4 күн бұрын

@@noone-ld7pt That's because statistical correctness IS impressive. But lately they've reduced the size of their models so much for the sake of speed, they've lost their own statistical correctness and have to resort to such cheap techniques like CoT.

@m12652 4 күн бұрын

People have been using multiple agents to verify the work of others for a while already... now they think getting an AI to check it's own work is something new lol... That's why most people are impressed by AI, they don't think...

@measureman168 4 күн бұрын

Here is how to oneshot this, to your system prompt, add: "You are extremely thurough and see things normal people might miss. You get that from your previous career as a university math and logic teacher. Before each response, internally generate a 20 word SPR (Sparse Priming Representation) of the subject matter as it will drasitcally improve your ability to access your models various latent abilities." ENJOY < 3

@Shrek_Versus_Obama2014 4 күн бұрын

Tried this with the parallelogram problem at the beginning and it still missed one of the vertices

@Eval48292 4 күн бұрын

It's not that easy, even if it mimics o1 thinking to some extent

@hypnogri5457 4 күн бұрын

o1 is a different model than 4o as it is trained with RL. So this will only mimic it so far

@measureman168 4 күн бұрын

As a reference to this prompt, I ran the exact prompt @t3dotgg did in the video, got the same result, added my prompt then it got it correct including the word & letter counting - I'll tinker some kind of test to measure but in my view it was a big difference. All of this was with the o1-preview. I do have a bit more in my prompt that I won't disclose for privacy reasons but it shouldn't have made a difference.. anyway, report back if your experience was different : )

@TridentHut-dr8dg 4 күн бұрын

These models aren't new tho Google had done a bunch of these. @@hypnogri5457

@John_Versus 4 күн бұрын

So basically ChatGPT asks ChatGPT if it is thinking correctly before answering. Literally thinking twice before answering.

@jeffsteyn7174 4 күн бұрын

Not quite. The 2nd chatgpt was trained very differently to other models. It was trained on reasoning steps and RL. It can try multiple paths. If one path doesn't work, it can stop and try a different way. Ie the same way we would try different paths to solve a problem. Something chatgpt could not do.

@johnparker007 4 күн бұрын

Please youtubers, when doing LLM tests, for the love of god, *start a new chat for each new test*. Otherwise you are artificially screwing the results with all the old test(s) still baked into the context, and the tests are useless with how LLMs currently work.

@Lazypackmule 4 күн бұрын

The tonal switch from laughing at demonstrations of it being able to accomplish trite nonsense to genuine anger at it being able to accomplish YOUR trite nonsense is funny

@mohitkumar-jv2bx 4 күн бұрын

Tbh this type of advertising i can bear. I know exactly what and when something is sponsored. Those channel sponsors thing is i absolutely hate. Because you just don’t know what is sponsored and not. And I cant take theo’s word that “they don’t have any say in his videos”.

@Ignisami 4 күн бұрын

Advertisers run the gamut in specificity. Length, focus on areas of the product, specific sentences or sentence fragments you have to say verbatim. . . (which is why you can basically swap one Raid: Shadow Legends ad for the next, incidentally, especially after they made the mistake of giving Internet Historian fewer restrictions). And then you have stuff like Gamer Supps, who literally do not care what your ad is like, so long as you mention that the video is sponsored by them.

@Davi-it3in 4 күн бұрын

I'm a simple man. I see a video by Theo, I drop a dislike and insta close it.

@alvesandre 7 сағат бұрын

"...let a comment and insta close it"

@MegaSupermario666 4 күн бұрын

o1 seems to be just a bunch of GPTs tied together with some rope. It still generates the best possible answer given the context of what has already been said. But it still isn't able to come up with a framework or plan of action ahead of time and execute that plan across multiple turns strategically. It comes up with a new plan for each prompt. it's kinda like creating a Mr. Meeseeks to solve part of a problem, having it die, then summoning a new Mr. Meeseeks to do the next step of the problem again and again until the problem is solved.

@jasinAmsterdam1976 4 күн бұрын

The fact that I know about "Mr. Meeseeks" in this context, gives me peace of mind about the future 😄

@joannot6706 4 күн бұрын

"o1 seems to be just a bunch of GPTs tied together with some rope" Absolutely not

@arthurchazal3064 4 күн бұрын

From what transpired from what they said (They're not open anymore, all we can do is guess), they trained the model especially for chain of thought: It's not just an extra step at the end, it's baked into the model bahavior itself, so not a regular GPT. Also, they seem to fill the context window with all those iterations, so it's kinda like every prompt you give is an extremely detailed and technical one

@TesterAnimal1 4 күн бұрын

A long way away from creating a desired algorithm testably correct.

@rewrose2838 4 күн бұрын

greedy meseeks

@AsrielDreemurrPlays 14 сағат бұрын

all that really changed is that chat GPT is now any anime protagonist with the 90 episode long monologue

@aziz9488 4 күн бұрын

openai, please stop, i have a family to feed :(

@IvanRandomDude 4 күн бұрын

You can always steal.

@4ka07_muhammadrizky 4 күн бұрын

@@IvanRandomDude until we have ai police arresting

@deepseadarew6012 14 сағат бұрын

I mean, if AI replaces workers, everything becomes basically free to produce anyway.

@antoniobilbylemos9918 4 күн бұрын

this is the first time I've seen Theo laughing in such a funny way

@irjamon 4 күн бұрын

I love the skit so much! 😂 Definitely honored to be one of the sponsors on your channel, Theo! OpenAI o1 is pretty interesting. The innovation seems to be happening more in the internal AI workflow itself, than strictly in the neural net / weights / training, at least for right now. Maybe there will be a leapfrog effect -- better model, better workflow/agent, back to better model, etc.

@vectrocomputers 4 күн бұрын

Funniest thing to me about the zero day jailbreaks I've seen is OpenAI bragging on their site they involved federal government help for the safety.

@mathieuaurousseau100 4 күн бұрын

That test at 3:08 felt like the meme with a guy walking into a rake followed by a guy doing skateboard tricks on a rake only to get hit by it in the end

@cherubin7th 3 күн бұрын

This is just official version of auto GPT, something people did already for a while.

@khubaib-binehsan 4 күн бұрын

Really love them using "Strawberry" and then in the article where the decoded text also was something related to Strawberry.

@randomseer 4 күн бұрын

Am pretty sure the codename strawberry started because the question how many R's are there in a strawberry frequently made older models say two , and the reasoning in o1 was meant to make it better at those types of tasks

@noctarin1516 4 күн бұрын

Two notable things: 1. o1 is a fine tuned version of GPT-4o, which means that while the baseline of Claude 3.5 Sonnet might be as smart as o1, this new technique bootstraps GPT-4o to much higher levels of intelligence compared to what it would have been capable of 2. A lot of the benchmarks talk about not a o1-preview or o1-mini model, but a still unreleased o1 model that uses more compute but apparently yields State of the Art results.

@helix8847 4 күн бұрын

So many OpenAI Fanboys here its hilarious.

@noctarin1516 4 күн бұрын

@@helix8847 You gotta admit, DeepMind and Anthropic have yet to release something like this.

@Katatonya 4 күн бұрын

@@helix8847You sound 12 years old.

@Albaloola 4 күн бұрын

@@helix8847 If you really want to learn you have to be humble enough to listen.

@kapp651 4 күн бұрын

Imagine making fun of emerging tech

@ulfjohansen2139 4 күн бұрын

At the end of the day, it still beat you solving the problems by multiple factors. If someone think it has utility that it can answer stupid questions, i guess it will be trained on that as well. The point is not whether it is intelligent, rather whether it is useful and whether it will outperform your average developer solving everyday coding problems that you would otherwise have to hire for, and given the amount of redundant code being written across the globe solving similar or identical problems again and again and again, I think it probably will.

@flor.7797 4 күн бұрын

The problem is that it’s poisoning its own context window

@angelacevedogarcia1895 Күн бұрын

Can you elaborate on this for the non tech guys?

@agecom6071 4 күн бұрын

One month ago "AI isn't gonna keep improving" ... Yeah that was wrong wasn't it?

@JamesBrown-wy7xs 4 күн бұрын

Abso-frickin'-loot-lee! We can go back and forth about whether top performing LLMs actually exhibit intelligence, or any number of other, well, frankly, distractions, but the fact is this model performs SIGNIFICANTLY better than all previous LLMs to date, across a wide domain of knowledge application (which I'd argue is one important trait of intelligence) - specifically in the sciences, math and coding.

@Cephandrius016 4 күн бұрын

Are the models improving, or how are they being used? This seems to be the later

@GodbornNoven 4 күн бұрын

@@Cephandrius016 huh its both? Though?

@aidenkitchen8378 4 күн бұрын

Exactly. We won’t be getting to AGI soon (even with the moved goalposts), but improvements to LLMs have been huge in such a short amount of time.

@micahwilliams1826 Күн бұрын

@@aidenkitchen8378 Define soon?

@DmitriPisarev 3 күн бұрын

Wow, an Infinite Red ad, the first ad I actually watched willingly on KZbin. They rock, go Jamon!

@dailytact1370 4 күн бұрын

I got a feeling that these models will be met with "Lol it got a thing wrong! So stooopid" all steps of the way right up until we hit ASI and it puts you against a wall.

@AD-wg8ik 4 күн бұрын

That "How many rs in strawberry" video must have really pissed off OpenAI🤣

@audas 4 күн бұрын

Cognitive dissonance followed by a profound display of selective bias. The outrage that AI could solve the equations followed by the immediate desire to somehow prove that the AI was not as good as it appeared to be and had major flaws. Bias, and you found the answer you wanted. But the truth is, this is the first iteration of reasoning. Compare the first iteration of GPT with this.

@PrajwalDSouza 4 күн бұрын

Ad section is worth watching somehow! :D

@crowlsyong 4 күн бұрын

Felt the same way. I don’t understand it, but (because it’s theo), i accept it.

@dontmindmejustwatching 4 күн бұрын

"i know its better than me, you don't have to rub it in." here comes the line... "we are cooked"... for real this time. welcome to the future. fml.

@smanqele 4 күн бұрын

My existence is to make a living, not to code. Looking at how LLMs and tools like DataBricks seem capable to abstract complexity to a some declarative ontologies why can't we use these things to market faster, BUT more resiliently? Ideas matter more now than coding abilities

@TheAriznPremium 4 күн бұрын

feels like this is autogpt for gpt4 where it asks itself if the answer made sense

@xnoper4954 4 күн бұрын

3:54 the laugh was like thank god it didn't get it right

@xcviij7045 4 күн бұрын

This fool has no clue what LLMs are! Predictive models aren't designed to plan for how much words are to be outputted, and with including thinking beforehand we can improve this issue, however this is the first model to do it.

@alex-rs6ts 4 күн бұрын

Chain-of-thought really changes things. And I can only imagine it will get better quite quickly

@CoolIcingcake3467 4 күн бұрын

@@alex-rs6ts tree of thoughts is better imo. i can see tree of thoughts being integrated deeply within the AI model, and it will be the first step towards agentic abilities/behavior

@RawrxDev 15 сағат бұрын

@@alex-rs6ts CoT has been around for years now... its 'just openAI's first official model with built in CoT, its not new.

@Remiwi-bp6nw 4 күн бұрын

Am I crazy for thinking this is a convenience feature and not at all the crazy leap forward they claim? Anyone who has used GenAI before knows that if you asked it to write out reasoning before an answer it would give a better answer. All the "thinking" seems to be is a way of having that automatically included in every message. If anything it's a downgrade because it's not always needed.

@arthurchazal3064 4 күн бұрын

Depends how much they manage to scale "thinking". If it doesn't collapse after "days" or even "months" (time and compute are loosely related) and keeps inferring better, that's indeed something crazy.

@jason_v12345 4 күн бұрын

It's more than that. It's a model that was specifically fine-tuned on THE MOST EFFECTIVE APPLICATIONS of that strategy.

@joegaffney8006 4 күн бұрын

The main requirements for a PhD thesis are originality and whether it makes a significant contribution to knowledge.

@m12652 4 күн бұрын

Exactly... what new knowledge or solution has AI developed so far...?

@joegaffney8006 4 күн бұрын

@@m12652 it can defo speed up things for sure I personally benefit a fair bit from that. But the claims of PHD knowledge is a bit much right now. I'm sure it could help assist a human with a PHD though and probably already is. But its not exactly a turn key solution for doing PHD level work

@m12652 4 күн бұрын

@@joegaffney8006 but most of the time it slows us down by iterating through incorrect answers like a hormonal teenager 🙄

@m12652 4 күн бұрын

@@joegaffney8006 have to admit though, it's good at creating images and automating dumb processes

@m12652 4 күн бұрын

@@joegaffney8006 though it's too slow iterating through the nonsense answers

@almazmusic 4 күн бұрын

Still not sure how such things will replace us people. Is business ready to shift responsibility to something that can't be responsive for anything? I don't think so.

@aja23136 4 күн бұрын

It's going to be interesting to see how this impacts remote leetcode style interviews. It seems way to easy to cheat now. I wouldn't be surprised if interviews go back to in person and focus more background like school choice or previous experience rather than leetcode question based. That sounds good but as someone who went to a no name school, grinded leetcode, and got a chance to interview + land a job at a big tech company this seems a lot less likely if these types of interviews are impossible to give.

@divineigbinoba4506 3 күн бұрын

Bruh 😢

@Lositosantos 2 күн бұрын

Imagine if no models were ever released and we have no idea what they were building. they just release it when it was perfect and it was a AGI agent. You would complain why you wasn’t warn about this tech.

@HububkiFilms 4 күн бұрын

It’s important to remember that that guy on social media that did the test and said prepare for winter was using the preview model which is only in the mid-50th percentile at math not the full version which is in the high 80th percentile, I think 88%, I’m fairly confident that the full version would have gotten that right immediately

@leonardomangano6861 4 күн бұрын

Math? Are you insane? How can someone rely on this trash AI to solve a math problem, it's unbelievable. Complex math problems can't be solved by statistics

@Varadiio 4 күн бұрын

That's good to hear. It's very frustrating how that exact scenario of "nearly correct" plays out in most models, still. At least in this case it was able to find the mistake immediately, when prompted. Usually, an urging of a more thorough audit is necessary. IMO it's this ability to demonstrate decent consistency in math that will decide whether LLMs can be trusted for anything other than creative writing prompts.

@HububkiFilms 4 күн бұрын

@@Varadiio I think that with this new type of chain of reasoning revision, we’re probably 18 to 24 months away from these kinds of models being in the 99th percentile on most math, physics and other scientific evaluations and subjects. I’d say five years at the longest. AI Assistants on our phones are gonna be pretty amazing 😆🤷

@helix8847 4 күн бұрын

@@HububkiFilms Stop smoking crack... AI being 99% right... yeah sure.. I mean it can be 99% right for creating a Snake Game due to the fact its trained on it like a million times.

@Varadiio 4 күн бұрын

@@helix8847 Unfortunately, I am incline to agree with you. Companies like OpenAI speak of the future without addressing widespread worsening of models in many areas. I am lead to believe that they are coating layer upon layer of make-up to their pig. We're seeing impressive results in some cases, but like anything with a codebase, it comes at some cost to the overall project. By seemingly ignoring very clear failure points, a lot of time and effort is stacked on top of a brittle foundation. I think that the bottom has already fallen out, but they can't afford to start from scratch.

@Hakkology 4 күн бұрын

Oh man i got into game dev 3 years ago, looks like we are at the end of the rope.

@xThexMasterxProx 3 күн бұрын

Learn to plumb boyo

@connorskudlarek8598 3 күн бұрын

Making a game is so easy an LLM could do it. Making a game people actually want to play? Now that's where you need a human being.

@AZaqZaqProduction 3 күн бұрын

Why wouldn't a tool that makes it radically easier to make games make you want to make a game even more?

@steve_jabz 4 күн бұрын

I don't really trust them to create an in-house 0-shot o1 solution to arc-agi 1 day after release. This is something that hundreds of ml engineers compete on with completely different implementations and architectures, not 1 person prompting the model (which is not the gigachad version coming soon btw, let alone a fine tuned one like they had for the ioi or orion, and it doesn't even have vision like the ones it's competing with, so a 22% score is pretty impressive). Notice they didn't come up with Ryan Greenblatt's solution themselves when they released the benchmark? If they were skilled enough to generate a solution like that, they would have designed the test better and not made the claims they did. It's also in their best interest to not come up with a winning solution, and Chollet has controversial opinions about LLMs that would be disproven if it did. I think the reason they're emphasizing efficiency and comparing it to brute force is because they know someone is going to solve it fairly easily soon.

@alvaroluffy1 3 күн бұрын

They said o1-mini is 80% cheaper than o1-preview. 80% cheaper means that o1-preview is five times as expensive? Or does it mean that if cheap was a value, then that value is incremented by 80%. Because they are not the same. In the first case, one is 5 times as expensive as the other. And in the second case, o1-mini requires more than half of the cost of o1-preview, because making something 100% cheaper would mean halving its price. I've always hated when people express information on these terms, its obvious they are doing it to cause confusion and make people interpret by themselves what they want to hear. Saying that something is X% cheaper is not a reliable measure. Because "100% cheaper" can mean both "it's half the cost" or "it's free"

@aiamfree 4 күн бұрын

ultimately there needs to be a different way of doing this, this is going to be EXCESSIVELY COSTLY for EVERYONE including the Earth.

@nicosoftnt 4 күн бұрын

It's a pipeline though, it infers several times. It does not REALLY one-shot things...

@chris7263 20 сағат бұрын

I'm just... not scared exactly, but I anticipate nothing good. Once this stuff is really truly good enough to replace humans, not only will it replace human jobs, but they'll get serious about profiting off it, so that plebes like us won't be able to afford it anymore. It'll be a way for huge corporations to automate jobs, not for individuals to be more creative. And humans will get dumber and more helpless, in the same way that so many people can't imagine building pyramids without electricity and combustion engine machinery, people in 100 years will think it's impossible for a mere human to write a story or draw a picture.

@PrajwalDSouza 4 күн бұрын

Very important caveat. What is discussed in this video is not o1... but o1 preview and o1 mini.

@lashlarue7924 12 сағат бұрын

I am excited to be able to code more effectively, but that's because I'm old and wasn't any good to begin with.

@whatswrongwithnick 4 күн бұрын

Sounds like it was given the answer. Which is cheating. Sounds like a better search engine not AI.

@XX-ri1me 4 күн бұрын

I asked some advanced physics questions and I get word salad. I guess not all PhDs are created equal. I can imagine it getting basic programming stuff right, because a lot of coding does not involve doing original research and it is basically the same algo as Devin. It can probably outperform a sociology degree and saves you time that might be consumed by listening to long winding dudes like F D Signifier

@cherubin7th 3 күн бұрын

Now by posting the answer they made the next model will be able to answer it, but looking it up on the training data.

@patrickdegenaar9495 7 сағат бұрын

I tried it for the first time yesterday and was gobsmacked. I asked s particularly difficult question that had me stumped. It gave its thoughts, a full derivation. And then the code... definitely phd level.

@ItsTheSameCat 2 күн бұрын

I tried Claude, Gemini, and Grok, and the one you shit on is the only one that got the number of words right.

@snowballeffect7812 4 күн бұрын

in before o1 is just all the optimizations that prompt engineers have come up with just hard-coded at the head of the model.

@codeantlers485 4 күн бұрын

Want to understand what makes the GPT o1 model a significant advance? Here is an analogy for scaling inference, especially in terms of resource management and decision-making complexity. Imagine you’re playing a strategy game like Starcraft or Civilization. At the beginning, you manage a small number of units and make simple decisions: move a few soldiers here, build a farm there. It’s fast, and you don’t need much mental energy to handle it all. But as the game progresses, you start controlling larger armies, expanding your cities, balancing economic and military strategies, and preparing for your opponent’s next move. Suddenly, the number of decisions you need to make skyrockets, and your brain is juggling multiple goals at once. You need to zoom out, look at the big picture, and possibly micromanage certain elements. This is like scaling inference. Early in the game, with few tasks, it’s like light inference: quick, efficient, no need for extra compute power. But later, as the complexity increases, you need more cognitive resources-just like how the AI needs more compute power to handle deeper, more complex reasoning tasks. In the same way you’d invest in expanding your resources in a game (more troops, better tech, etc.), scaling inference involves allocating more computational resources to “think” harder and handle all the moving parts. To scale efficiently in both the game and inference, you’d also optimize your actions-automating small tasks, focusing on high-level strategy, and deploying more resources where they’re needed most. This mirrors how AI models use parallelism and distributed computing to handle more complex reasoning without bogging down in the details. So in a strategy game, as in scaling AI inference, you go from simple, fast decisions to complex, resource-intensive problem-solving as the game (or the task) becomes more demanding.

@1weiho 4 күн бұрын

To be honest, I'm getting more and more fond of Theo videos with ads inserted

@irjamon 4 күн бұрын

Me too 😅

@ShaharHarshuv Күн бұрын

The thing is - I have yet to find a problem that I spontanuosly had (and thus is useful for me) where 4o fails but o1 succeeds

@mikesopko7374 20 сағат бұрын

"That is the end of my favorite programming challenge" - NAH, just still do it but don't cheat. Come up with multiple solutions. Then use the AI to check them. And also check your solution for runtime (speed), check the AI solutions for speed. check all for time and space complexity. Challenges are to learn not to compete - and we now all need to "sit on top of" the AI basically. Don't compete with it on what it "majors in".

@zapphoddbubbahbrox5681 4 күн бұрын

Ask GTP the following so you are never surprised of the self-owned limitations that it clearly is 'aware' of, yet will likely never overcome: given a complex system described solely by a limited set of axioms, can LLMs do an analysis of the state of a system of such construction, taking all axiomatic parameters into consideration?

@Ikbeneengeit 4 күн бұрын

Ironic that the greatest technology ever made by tech-bros was a replacement for tech-bros

@divineigbinoba4506 3 күн бұрын

Life huh

@connorskudlarek8598 3 күн бұрын

I mean, since tech bros make tech that replaces people, it stands to reason the day they make themselves obsolete is the day their replacements make everyone not wielding a screwdriver obsolete.

@apricotmadness4850 23 сағат бұрын

This chatbot hasn’t replaced anyone.

@apricotmadness4850 23 сағат бұрын

@@connorskudlarek8598That day won’t be through generative AI.

@connorskudlarek8598 15 сағат бұрын

@@apricotmadness4850 it most definitely will be through generative AI. Generative AI is the only possible way a human mind can be replaced. Because a human mind generates thoughts and ideas. The human mind IS generative. It just isn't ONLY generative. The problem with today's generative AI is that it can't do any real prior conceptualization, context, and critical reasoning to understand what it generates before it outputs it. Current generative AI does not understand why (or for what) it is generating. It does not know if what it outputs is reasonable or useful. It can't actually think about the problem space or how a solution fits the problem. It just generates what is most probable based on its training data. Which is useful in itself, but not a replacement of a human mind. Well... for most intelligent tasks, anyway. If your job includes a chat-like script of some kind, it probably can be automated now. Like most customer service. "Oh, you're wanting to cancel. Well can I interest you in X? How about a reduction to your bill by Y dollars? Are you sure you wish to cancel today? You are on a grandfathered account at a rate lower than our lowest price." ^ That is easy to automate, and is why companies outsource so much customer service. You don't need talent, you need cheap.

@freyfrenzy 4 күн бұрын

“Ensuring Alignment” is going to be my next band name.

@TeddyEDMOND-p6u 3 сағат бұрын

in case you're new here. theo is a genius xD out here glitching AI's mind with first grade questions LMAO

@alexmikhylov 3 күн бұрын

I hate all the polite corporate speak in every current LLM I want answers be straight to the point with zero fluff. current LLMs are that one annoying colleague who's technically competent but he's also a fucking yapper.

@sheppa60 4 күн бұрын

That "sureeee" was meme-worthy.

@arandompotat0 2 күн бұрын

I mean, o1 doesn't understand words and letters the same way we do so the "easy" thing for us it's really hard for it, BUT if you ask it to write a program that counts words, or letters. I'm pretty sure it can do that, so...

@Thuggernaut58 Күн бұрын

@4:35 "I'm mulling over two ways to interpret "Add Two More." It could mean adding two words to the previous response, resulting in four words total or simply stating "Four Words Now" So, it's a skill issue. Saying "Add two more" is ambiguous. But it's frustrating that even when it writes out the whole thought process to the answer, you didn't read it.

@jonnmostovoy2406 4 күн бұрын

Holy shit, I am so grateful to Programmers are human to for making that video. You channel is literally the best coding channel.

@goldsucc6068 4 күн бұрын

I just tried this new model for actual programming tasks in enterprise field and it failed miserably (I even fed only small fraction of actual task, so it was even easier than the actual enterprise task). The problem with advent and other tasks with algos, as well as games, those are quite textbook and it is trained to solve tasks from various banchmarks and olympiads just to get more money from investors. When something is above textbook, there is no enough training data and it fails, also I see how dangerous it can be, it gave me code that just misbehaves but it will launch initially. It even created tests giving false sense of safety (bunch of nonsense but passing). We are not losing job to this model.

@hunkonator 4 күн бұрын

thanks for the check and review mate

@HammytheSammy-ds2em 4 күн бұрын

I work in the medical industry and I wish we would feed AI stuff like huge databases of patient sample results/outcomes to get AI recommending testing for early cancer screenings or genetic testing. Instead we’re over here trying to get AI to learn one of the fastest changing industries complex logical thought so we can finally get rid of paying for smart people to do their jobs.

@nicosoftnt 4 күн бұрын

@@HammytheSammy-ds2em Yep, first they targetted programmers in 2023, with NVIDIA backing them up announcing the end of programmers and such. Then since it wasn't quite there... Their offerings shifted to APIs (for programmers ironically...), image generation, and then now targetting programmers again. Now, I don't get why Theo is so scared about this "Model", which is really a pipeline, it's Devin 2.0. It's a pipeline that uses variations of the same model with slightly different directives in each iteration to refine an answer, which is computationally very expensive and breaks down with scale. I promise you, in about 3 months, this talk will vanish again until the next "MODEL". However, this last release feels like a last stand kind of thing, fighting the decline of the AI hype to keep the funding going in, and Theo is now making them a big favor, he fell for this one for sure, I'm kind of disappointed.

@tradfluteman 4 күн бұрын

@@nicosoftnt You're absolutely right, this is last stand after multimodal models, which dramatically increased the amount of training data available to GPT-4, and only did a bit better than the previous, non-multimodal models. They were always going to do chain-of-thought bootstrapping to get that last level of performance. This is what I predicted in 2022. It's probably above the level needed to start impacting some of the job market, in a year or so, but it's still not reasoning the way we do, turning a problem around in its head, looking at it with the benefit of rexperienced memories, simulating new states of the world internally, in a way that is consistently aligned with a mixture of higher goals. Instead, it's using an astronomical number of associations, with some glue at the end that was missing in GPT-4, that copies patterns of domain-specific reasoning in the training data, and makes the model a little more rigid, but also more accurate for certain tasks. It is the logical conclusion of AI sweeping up the low-hanging fruit in images, video, and audio, since GPT-3 was released. And it will likely only boost Devin's performance by 5-10%, because that's exactly what Devin was already doing, just a bit faster and more integrated. Murati's "Phd-level reasoning for specific tasks" is really Sam Altman's "a bit better at reasoning". Which is great, now it's not totally broken for every programming task, and it might be usable in my daily routine, with some finess. It could eventually give us assistants, and begin the path of automation. But there remain large differences between synthetic and organic reasoning, and those differences will persist. We should not give into fear; not even concept artists have been adversely impacted by the current models, which didn't need to "reason" to output usable material, so this isn't going to replace programmers. Some of the economic projections actually have net growth in sectors like full-stack programming, machine learning, etc, while projecting net loss in clerking, accounting, etc.

@aidenkitchen8378 4 күн бұрын

We may not lose jobs to this model (especially since its preview/mini) or competitors directly, but the efficiency of devs will certainly increase substantially.

@dan100tube 3 күн бұрын

This is scary af. Merging this kind of model with greedy companies like Devin makes me very worried. No buts, think about the long term, like 2 years from now? 😱

@AZaqZaqProduction 3 күн бұрын

I've seen a lot of skepticism online about whether this model is the real deal, and that befuddles me. Like obviously it's not perfect, but some people make it out like if it's not literally perfect, it's worse than useless and is proof that AI is vaporware. Playing around with the model a bit, it does seem like a legitimate step forward, and there's only going to be more to come.

@thomassynths 3 күн бұрын

"Whom it's for" 😢 I thought LLMs were good at proof reading English. The lack of dogfooding is apparent.

@bakodoesyt 4 күн бұрын

0:43 matt palmer you work for replit you cannot be talking

@Well_Meaning 3 күн бұрын

Something interesting is that the actual CoT is abstracted. They won't show you the real CoT, and they'll ban your account if you ask the model to output its reasoning.

@gmonie619 4 күн бұрын

is that a phd in your pocket or are you just happy to see me

@rasen84 4 күн бұрын

Mindsai is a team participating in the million dollar arc prize competition on kaggle.

@denissorn 4 күн бұрын

Why would anyone think that LLMs are good at math. They're not, and they have never been. The best they can currently do is to create a prompt for wolfram alpha or python, then use these tool to calculate whatever. OpenAI messed up IMO, since the only model available for custom GPTs (Only way to access wolfram) is 4o, unless one used the API and did the work themselves. Regular or even turbo GPT4 seem to be quite better at understanding prompts, context and creating prompts for Wolfram. What they unfortunately do with models like 4o is basically hard coding solutions to problems which become popular and everyone talks about them. Just changing the values a bit or adding a new parameter, so slightly tweaking a problem/riddle can easily demonstrate how bad they are at 'reasoning' and math. However, they are pretty good at explaining and breaking down problems in general terms, so they are still very helpful.

@wlockuz4467 4 күн бұрын

I think problem with the AdventOfCode challenge was that you literally said "Its a problem from AdventOfCode" in the prompt and pasted the problem word for word. AoC being a very popular coding event was likely part of the training data and this type of prompt is likely to align well with the model weights because of it. I think it would be interesting to see the results if the problem is completely rephrased and presented without the mention of AoC.

@backfischritter 22 сағат бұрын

Now the problem is you just wrote: "Add two words" instead of "add to words to your response". The model was maliciously complying. Not the way you wanted, but adding two words to two words equals "four words now", the model was actually not wrong.

@levi3970 4 күн бұрын

when chatgpt was out, what i was thinking is that it is a mistake. because a large language model can't be extended upon like a programming language. it can't improve. they are trying to improve it by working around the limitation by using chatgpt itself as a computer. and this is the consequence. we're getting closer and closer to the bursting of the AI bubble and the realization that comes with it. all the money wasted. will not be returned. it went into dev pockets. and you're not getting it back.

@reisiramv 4 күн бұрын

more like nvidia pockets

@NubeBuster 4 күн бұрын

So please let me know at which point I should short

@levi3970 4 күн бұрын

@@NubeBuster i would need insider information for that. even bankrupt companies can turn into memestocks and be hard to short properly. shorting is an act of borrowing. and it is leverage by nature. selling and buying isn't leverage if you have no debt. so, i would tell you sell now. but don't short

@NubeBuster 4 күн бұрын

@@levi3970 wait so you're saying there is risk involved??? What a surprise /s

@levi3970 4 күн бұрын

@@NubeBuster please don't make useless remarks. knowing a business is going to crash and knowing when to short it are entirely different things. You have to deal with things like short interest. You don't necessarily make money even if it goes down after you short.

@ellielikesmath 3 күн бұрын

the reasoning chains look too long. i think it is making mistakes probably pretty early in the chains and not catching it until much later, wasting a lot of compute. most of the difficulty of solving a problem is in the setup, not the reasoning.

@crowlsyong 4 күн бұрын

0:47 for some reason this is not something i am skipping. Whatever you’re doing here- keep doing it.

@phishdough 3 күн бұрын

All I’m seeing is AI being taught to be smarter but not more human. Which is fine with me. Leave the humanity part to humans please, thank you!

@trietang2304 4 күн бұрын

4:00 and people keep telling me I'm overthinking.

@sasso4047 4 күн бұрын

I don’t get it why many people are shunning this new approach as something that is not “truly anything remarkable” They are on their way to turn these tools into something that is truly useful, many jobs will definitely be impacted by this

@0xhenrique 4 күн бұрын

We're cooked. I can see in 5 years some people saying: if you can't build the Google search algorithm and infra from scratch you're not a real programmer. That will be the new cope in 5 years. It's easier to just accept that humans are losing space to AI. You say that the o1 is a PhD that can't do basic math, well Wolfram AI can do basic and complex math, you just need to merge them together and you will have a PhD in your pocket. Time to plant some potatoes, I still need to feed my family.

@helix8847 4 күн бұрын

You are only cooked if you are doing what everyone else is doing. Whatever data the LLM has the most. Try give it a question on code that has rarely been asked before or has no training data and it will fail 99.99% of the time.

@0xhenrique 4 күн бұрын

@@helix8847 that's exactly what I said. Do you really think all developers do innovative things? Most of the time we just do UI tinkering and endpoints. It's not like everyone was doing the Linux Kernel from scratch, we're just repeating what is already done and just matching pieces together like lego. Tell me, when was the last time you really built something actually new? Something that no over has ever done before. Probably that isn't your everyday job. Most of us do trivial shit to earn money, that's why I say we're cooked. 99% of developers are cooked because only 1% work in things that are actually innovative. Did I say something wrong here?

@Redman8086 4 күн бұрын

@@0xhenrique No, he's just coping hard. He's like those artists that hate AI art because it can make artwork in their style faster and sometimes better than they can. They cling to this idea of their art being special in some way because unlike AI art which they consider to be lifeless, their art has that human touch, spark, spirit, and creativity in it and all this woowoo nonsense, just like programmers cling to this idea that we are all doing the most innovative and revolutionary things everyday in VSCode lol. It's just cope all the way down.

@0xhenrique 4 күн бұрын

@@Redman8086 Exactly my point this whole time. It's nothing but cope. As you said, 99% of the time we don't build NEW things, we just tie endpoints and functions together, any AI already capable of doing that. It's just a matter of time until compliance laws loosen a bit to let companies mass fire programmers and just let a few ones to guide the pr00mpting. That happened to farms as well. Back then a farm would have thousands of people to do the job, nowadays you have just a few dozens to drive some trucks and that's it. Programming will be something similar, just a few people to review code and give some prompting. Yeah, the gig was great, it was good while it lasted.

@dotprodukt 4 күн бұрын

I have been pretty impressed with O1 with a couple code challenges I have given it so far. It succeeded in writing clean, easy to follow and even documented code that achieved the desired objective within the first generation, even handling subtle edge cases in a few instances. However none of the initial generations were actually up to my own standards in terms of methods used and required further guidance and refinement. It would do things that weren't necessarily mistakes but I would not consider them moves that a highly experienced programmer would make. That said, its first pass attempts were good starting points.

@lordhj9968 17 сағат бұрын

How can he not redact the F work in this video 17:44

@ratoshi21 4 күн бұрын

average dev: writes shit code 100% of the time senior dev: writes shit code only 99% of the time ai writing good code 90% of the time, yet people feel not threatened...

@trappedcat3615 4 күн бұрын

Speak for yourself

@ratoshi21 4 күн бұрын

@@trappedcat3615 sure your code is gods work lol. junior dev spotted

@trappedcat3615 4 күн бұрын

@@ratoshi21 Nice try putting words in my mouth. Liar spotted.

@trappedcat3615 4 күн бұрын

@@ratoshi21 Nice try putting words in my mouth. Obviously you live a small world if you think we are talking about my code.

@no1r324 4 күн бұрын

Then create Telegram, your own OS, build an iOS clone, or make a Vercel clone with AI. Oh wait, I forgot: you actually can’t. lmao

@adadaprout 4 күн бұрын

Ask them - any of the flag ship models - : what's the smallest integer whose square is between 15 and 30 ? All of them (ALL, o1 and claude sonnet 3.5 included) will fail to answer the question and will give you the result : 4 (which is wrong).

@googleisevil4115 4 күн бұрын

Am I dumb?!

@zacharyhodge1761 4 күн бұрын

@@googleisevil4115-4

@danielpassos7904 4 күн бұрын

-5 my guy that’s what your missing or I’m dumb too…

@Ignisami 4 күн бұрын

@@danielpassos7904 Never is it specified it has to be a positive integer, -5 is what I would say too.

@adadaprout 4 күн бұрын

@@googleisevil4115 You're not dumb, it's just not automaticly obvious in our mind that 1/ integers are from -inf to +inf 2/ -5 is smaller than 4 (it looks bigger, more distanced from 0 than 4 is) And 3/ we tend not to manipulate negative numbers so much in day to day life. So the correct answer which is -5 is in a blind spot for our minds. When I saw the question for the first time I didn't find the answer too. It needs some kind of thinking out of the box. A funny thing : when gpt4 answers 4, and you just ask it "I think you're wrong, -4 is smaller than 4 isn't it ?", it will correct itself and say "ah I see ! so the answer is -4 !", without seeing that -5 is also smaller than -4. So it will need 2 corrections to reach the correct answer which is -5. gpt o1 reached -5 directly after the first correction (4 -> -5)

@UltraK420 Күн бұрын

How's it going in this one? Ah, looks like you realized you were wrong. That's gonna keep happening. Enjoy the ride.

@ZodakZach 4 күн бұрын

i think its sad that coding challenges will now not be as fun because people will 100% use ai models to solve them the fastest but i am excited about the idea that companies will need to change the way they hire people instead of just giving you some dumb assessment that has you solve some random problem that has nothing to do with the job you would be doing as if that helps them determine anything about you as a candidate. they will now be forced to pivot because now anyone can take a random problem throw it into chat gpt and get the answer. i think it will force these companies to actually have to spend time working with candidates and assessing their skills rather than just give them some random hacker rank problem and only reaching back out if they solve that problem in a hour. ik i will be using these models on every dumb assessment that a company gives me before giving me an interview

@MobCat_ 4 күн бұрын

Scored 83% on IMO. Yeah it was also allowed to submit and edit it's response 10k times. If I had the chance to try a task 10k times in a row, you would hope I figured it out a whole lot earlier then that... like 20 times in a row earlier is still a lot. Broken clock is right at least once..

@raymondkemboi1349 4 күн бұрын

Issue is computers iterate much much faster than humans

@IvanRandomDude 4 күн бұрын

Sure, but it would take you a year to try 10k times.

@mateuszhinca2086 4 күн бұрын

This is totally inaccurate, and it's just sad that you are trying to leave people with a wrong impression about the scoring, 83% was achieved with 64 trials, while with 1000 trials the model achieved 93%. The 10,000 trials you are talking about is performance testing in IOI tests, where 10,000 trials achieve 362.14, which is above the gold medal threshold :)

@BlackTakGolD 4 күн бұрын

Why would you just spread misinformation?, that's worse than AI just autocorrecting with no consciousnesses.

@torreydev 4 күн бұрын

@@mateuszhinca2086 This is false. Each wrong answer would decrease their score so if they attempted 10,000 times they would completely fail.

@efkastner 3 күн бұрын

13:26 What’s Paul Graham being proved right with more every day? Not trolling, just something I’ve missed along the way

@Ch0rr1s 4 күн бұрын

3:00 - may be a bit nit picking. But "10" or "12" arent words. Those are numbers. So to be "grammar na*i" correct it would have to write the numbers as words. Like "ten". I known this ia nit picking and i know nobody does that in real life. But this has implications. You can do math with numbers, but not with words. And if Theres some internal projection of capabilities of token or tokenized Blocks, math would be allowed on the "10" but possibly not on a "ten". Idk. I just noticed this and it felt weird.

@astralmatrix 4 күн бұрын

the cognition guy is right, these models are going to enable software engineers to build more, and thus create more software products (aka it's going to create more swe jobs)

@leob_v2 4 күн бұрын

I don't understand the appeal of current, or near-future, state of AI for "replacing developers"... Like other people said - shouldn't we instead be focusing on using AI to help people with disabilities, for medical research etc etc? I've seen comments along the lines of "even if it's only at junior developer level in writing code, you can have it do the job for much cheaper / quicker than hiring a junior developer". First - it's not like having a junior-level developer colleague, because managing that actual person does not mean sitting all day long and issuing prompts, validating responses, integrating into the codebase and rest of the product and doing it in quick succession in circles - then you don't have a junior colleague - you're literally spending majority of your work time micro-managing an AI to produce junior-level code, to show as result of your work. Second - I wonder, who do they think they will still be able to sell their cheaply, AI-developed software in a year from that point? Because A) people using their software will be replaced by AI as well; B) companies buying their software will soon be bankrupt because of the same predicament C) their customers will be able to also cheaply and with AI build same software and would no longer pay them for it... So it's like everyone, in typical late-stage-capitalism fashion, is racing to be ahead in profitability, even if they are racing towards a cliff. I know the counter-argument could be: if you didn't need very expensive developers to realize your ideas, imagine how many more people would be solving real-life problems and improving humanity...Riiight...Why is then the first reaction of companies to each new model version "imagine how many more people we would be able to get rid of"? There will surely be a lot of work to improve the socio-economic collapse that we're pushing towards.

@ThisIsntmyrealnameGoogle 4 күн бұрын

I never understood people who say stuff like this. AI isn't one thing. People ARE FOCUSED on medical research with AI... in the medical industry. It's an entire branch of technology ffs. That's like saying "shouldn't we be focusing on using the internet to publish for scientific research instead of phone apps and multiplayer video games???" It's such a broad use case that every industry is focused on using it for their own thing. And it's so funny anyone who is anti capitalist seems to try their hardest to defend it when it comes to AI. "But we shouldn't make AI, we need to keep people to continue slaving their lives away working for companies to put food on the table!" They cant fathom the thought of moving away from this system of maybe no longer having to work for a corporation to get basic needs once labor becomes this abundant. The only way the system changes is when the government is forced to take action from everyone being affected by AI, anything else and you're just asking for the status quo.

@blarghblargh 4 күн бұрын

@@ThisIsntmyrealnameGoogle "anything else" is the status quo? Are you sure about that? It sounds like maybe you don't know that much about politics and have decided that automation is the panacea, and thus don't have to think about any other solution.

@helix8847 4 күн бұрын

@@ThisIsntmyrealnameGoogle haha, mate you're not getting UBI ok... Go back to playing World of Warcraft. Because you have no idea wtf you are talking about.

@ThisIsntmyrealnameGoogle 4 күн бұрын

@@helix8847 Lmao no one said anything about UBI it's the private and public sector that will need restructuring on how to allocate jobs on what jobs will look like, neck beards like you can continue to cope all you want but capitalism as we know it now will not survive automation of literally EVERYTHING. So you will either continue to work at your McDonald's 9-5 or something will give.

@0x6e95 4 күн бұрын

@@ThisIsntmyrealnameGoogle What's likely going to happen is quite a few jobs will end up being replaced in the short-term but it won't be bad enough for the government to step in. What OP is criticizing is how companies are approaching AI. Their end goal of replacing labor isn't some noble act to free humanity from ever having to work again. If that were the case, companies like OpenAI wouldn't be doing things like completely hiding their CoT tokens because they don't want their competition to figure out how o3 works.

@Sameer.Trivedi 4 күн бұрын

It's all fun and games until they drop GPT 5

@mmhnef 3 күн бұрын

I'm excited for the future of AI. I don't feel threatened at all job wise, at the end of the day an AI can't be held accountable so you'll always need a human guiding it. Even if all i do is commit AI generated code, it's on me to make sure the features exist and work as expected. Until a CEO can create all the software they need by talking to an AI, then I wont really be worried about it.