"this new model is 99.9% accurate on the benchmark, a mere 0.9% increase from the last one at 99% !!!"
@imeakdo7Ай бұрын
This is the end of the AI boom. It will burst anytime now.
@bluesrockfan36Ай бұрын
Lol
@Brain4BrainАй бұрын
O3 just released 😂😂😂
@imeakdo7Ай бұрын
@Brain4Brain 10x more expensive than a human.
@Brain4BrainАй бұрын
@@imeakdo7 Cost fall down by 90% every 6 months in AI; you can see this happening with GPT-4 to GPT-4o, Gemini Pro to Gemini Flash, and many other models
@imeakdo7Ай бұрын
@Brain4Brain I don't see it being widespread untill there's at least a free trial like gpt 4o in ChatGPT. Also cost reduction is driven by hardware performance Increases which are slowing down
@adolphgracius99962 ай бұрын
People assume that artificial general intelligence is difficult, but the problem is that the average human is stupid, it shouldn't be that difficult to match the intelligence of an average person
@imeakdo7Ай бұрын
By that logic you could argue that gpt 4 is already agi
@EasternromanfanАй бұрын
The average human is not stupid
@youtubehatesfreespeech255522 күн бұрын
That's the dumbest argument 😂
@pyrohugs16 күн бұрын
They’re only useful if they are significantly better than humans. 1% error rate is fine at 10 tasks a week. At 10 tasks a second, it’s awful. You’re just building broken things fast.
@steve_jabz2 ай бұрын
Outline of the arguments so you don't have to waste your time: - despite PhD researchers everywhere saying the preview of o1 is doing something very different, he asserts that it just looks like GPT-4o with CoT - outdated graph of LLM progress with o1 series deliberately ommitted - graph shows MMLU scores of openAI models drastically shooting up to 90% (this is essentially 100% on the MMLU, since about 10% of the answers are false) - graph shows multimodal release of v4 didn't have v5 performance (was never intended to, 4o started development after v4 with the explicit goal of being a more efficient and multimodal v4 instead of just a mindless scale up) - he claims 4o performance saturating near the max of the MMLU is evidence of plataeu - he tries to further back up this claim of plataeu by pointing out that anthropic (a company started by ex-openai devs) trained a model that more or less caught up to v4 performance after OpenAI's 2 year lead. Of course, this is all right before the release of o1-preview knocking them all out of the park on benchmarks that actually have room to improve, which is just the baby version of o1, which is training the much larger orion model. tldr; someone watched gary marcus's embarassing speech right before the o1 release and decided to regurgitate his irrelevant claims/benchmarks
@Steve87082 ай бұрын
You seem to assume this video is all about o1, which it is not. This video is about the pace of progress and the impact on real world use cases. There’s a lot of flaws in your “outline” above, I’d suggest people watch the video themselves and then decide
@youcef_2 ай бұрын
bro's outline got more words than the video itself. It's 6 minutes, just watch it 2x speed if you're so tight no time
@steve_jabz2 ай бұрын
@@Steve8708 Why would someone make a video about progress plateauing and leave out the latest generation of models with test time compute trained to find correct answers using RL that are being used by PhD physicists to do work for them that previous generations couldn't even attempt to solve? I don't assume the video is about o1, I'm highlighting that it isn't. OpenAI were working on o1 while others were only just catching up to v4, so if it's deliberately cut off before the next generation took us into a new paradigm, that's not evidence of plateau.
@steve_jabz2 ай бұрын
@@Steve8708 Also, o1 summaries don't actually have anything to do with how the process works, so whatever similarities you see to CoT are probably down to pareidolia / confirmation bias. They're just there to give you a high level overview of the directions the q-star algorithm underneath is headed. This is specifically because what's underneath is alien and incomprehensible.
@coolTag-lb3fn2 ай бұрын
Why do people feel the need to drag AI progress down? Do you not want more productivity, more education, more healthcare around the world? I was going to make a comment like this one, but this is perfect! Do your research into the actual cutting edge instead of just watching attention-seeking Gary Marcus, if you want, I can make a list of things to watch/read and you will definitely change your mind, we all have so much to gain here from AI, don't make videos with no research behind them.
@13kgOfPersimmon2 ай бұрын
The room created by breakthrough technologies like transformers, AlexNet, llms is already filled. if no other breakthrough tech appears right around the corner - its a matter of small improvements now, not the achieving AGI. This is very comparable to transistors invention
@jimj26832 ай бұрын
I agree that "agi/asi" is still many decades away. There is simply no way to get there without brute force compute doing evolutionary algorithms.
@blengi2 ай бұрын
not sure evolutionary algorithms are that essential., of course they could creatively and selectively fine tune things. The average human 2000 thousand years back could barely count. They had essentially the same hardware we have. Now via mere universal education instilling a bit of scientific wisdom and logic, a vastly greater proportion of people can formally abstract conceptually and symbolically to the point they objectively perceive a world beyond once entrenched superstitions and religion, to create and sustain vast technocratic civilizations with 100 times the gdp. Surely we can similarly tune AI systems up like ever improving students without so much trial and error, given all the formal understanding which we already adequately relay generation to generation?
@eliotcamel7799Ай бұрын
@@blengi Modern technological advancements should be seen in the context of social interaction. Job specialization and language are what have really allowed for progress. I know some CS academics who predict that multi-agent systems ( Game Theory ) will be the next trend for this reason. Others say that the way forward is to facilitate AI agents which "physically" interact with their environments in specialized settings. This approach tends to be application specific and may work best outside the context of language models.
@loquek2 ай бұрын
Really appreciate the video and write up - I must admit as someone who uses LLM's daily (local and remote) I couldn't agree more with your assessment, really great to hear I am not alone.
@tomcraver9659Ай бұрын
That chart of Ai development "slowing" seems both biased to suggest slowing (a straight line would be a much better fit to the data points) and misleading in that even if the data followed that curve, it doesn't take into account that test problems are probably distributed geometrically in terms of difficulty for LLMs, and that the 100% mark is pretty much arbitrary - there could be other problems that are yet another order of magnitude harder for LLMs that aren't included.
@youtubehatesfreespeech255522 күн бұрын
AI fanboys are feeling the disappointment and don't like it
@icedtokey551112 сағат бұрын
Well looks like ai is improving exponentially now lol
@youtubehatesfreespeech255512 сағат бұрын
@icedtokey5511 no, it's not, hype as always
@Ali.Abdulla2 ай бұрын
The scaling laws are still upholding it's just the liabilities that increase as public AI LLMs approach the market. Potential for misuse, public disapproval and job insecurities all rise as LLMs scale towards AGI. AI IS doing ALL the work end to end in extremely high profit work such as quant trading.
@eL3M3nT4LisT2 ай бұрын
Most problems optimized by Divide and conquer. This technique should be applied to LLMs technology. Make multiple llms chunks to do their job, create Channels to communicate One with the other to get your final result.
@jeremytenjo2 ай бұрын
AGI will never happen. CEOs promise it only to boost their valuation.
@PeterStrmberg007Ай бұрын
It would make complete sense if AI plateaued at the level of humans intelligence, after all that's the data/knowledge we train them on. That's not to say it's useless. Having a bunch of PhD's working on your engineering team for peanuts is still going to cause a major shakeup in the job market. They may never be smarter, but they will always be cheaper, faster and make less errors. Where AI shines is in data analysis, and is leading to significant discoveries in many fields. AI will have way more impact than the internet, (without which it could never have been built), and much faster, even if we only take what we have today and make it more efficient.
@Brain4BrainАй бұрын
You saying? O3 debunked your entire video in one fell swoop 😂
@27sosite73Ай бұрын
we will see how muche better it performes one day) and find out was this video debunked or not :D
@EasternromanfanАй бұрын
No it didn't
@Brain4BrainАй бұрын
@@27sosite73 It’s benchmark score shows that it’s better than human at almost everything
@Brain4BrainАй бұрын
@@Easternromanfan It did
@theAmazingJunkmanАй бұрын
@@Brain4Brain Is it more space efficient than a human? Is it more resource efficient than a human? Can it function without an internet connection?
@josh-dev2 ай бұрын
Is agents basically simulation of real user akin to programmatically controlling a browser with something like puppeteer for example?
@steve_jabz2 ай бұрын
o1 benchmarks: - 93% on competition code from CodeForces (up from 11% for GPT-4o) - 94.8% on the MATH benchmark (up from 60.3%) - 83.3% on the international math olympiad (up from 13.4%) - 94.2% in PhD physics (up from 59.5%, completely surpassing human PhD physicists at 69.7%). - 95.6% on the LSAT (up from 69.5%) - 97% on formal logic (up from 79.8%) - 98.1% on college math (up from 75.2%) - 89% on AP chemistry (up from 76%) This guy: Bro I swear it's plateauing just look at this graph cut off right before they released their next generation model. Where do we even go from here??
@SynX-.2 ай бұрын
fr
@imeakdo7Ай бұрын
Once you reach 90% how do you improve? You get diminishing returns once you get to 90%
@steve_jabzАй бұрын
@@imeakdo7 The same way AlphaZero did, but the first step for doing that with LLMs is grounding novel solutions on the truth (i.e. does the solution actually show up in material reality), which o1 is just starting to learn how to do. Then once you can do that reasonably well, agents can do that "self-play" part that can go beyond expert humans that set the "100%". You do some experiment humans haven't done yet, and you do it 90-99% as well as an expert would, but then you use the results from that experiment to learn from your mistakes and learn what you can use that output for to conduct a new experiment that's even more novel than the last. Humans already do this. We don't build better and better tools by producing smarter and smarter humans that have original ideas, we use what the universe gave back to us after our experiments to forge new tools to perform more precise experiments to learn more (written down literature, not internal knowledge) to develop better tools, and we build on the shoulders of giants in terms of research. The only difference is we do this very slowly and can't hold much data in our head at a time to draw disparate connections.
@bluesrockfan36Ай бұрын
And now o3 came out 😂
@SynX-.Ай бұрын
@ fr😂
@richardantao32492 ай бұрын
Thank you Steve, yet another great video
@SonAyoD2 ай бұрын
Great perspective
@paca31072 ай бұрын
You are right, We are currently in the plateu of LLM development
@Rami_Zaki-k2b2 ай бұрын
You are wtong man ... AI is development is mot slowing down, it is approaching it is upper limit ... There is a difference .... We have already achieved AGI. Lot of people dont recognize it but we did. What Open AI is defining as AGI is actaully agentic ai ... And that is what will continue to develop agentic ai and ai systems that can produce new knowledge that is not in their training data ... But AGI ... We are past yhat point already ...
@blengi2 ай бұрын
given the foundational models already have all the knowledge human's have and that inference reasoning strategies improve base outputs quite predictably, further tuning the post training strategies to bias outputs more and more toward formal and post formal auto feedback should surely ultimately scale things to AGI? I don't see any roadblock to capture the chains of thought humans instill in students to achieve and surpass
@pixelperfectpravin2 ай бұрын
umm not sure - I think there is many many many uncovered grounds Just like how CoT is one mental model for generating smarter output - there are many mental models we as humans use which is fairly easy to recreate with algorithms
@06jtm2 ай бұрын
Great vid.
@mikeeomega2 ай бұрын
Great video Steve, I read the article yesterday.
@AdityaSinghCodes2 ай бұрын
Totally agree with you
@laherikeval25242 ай бұрын
Great content. Love from india.
@nickwoodward8192 ай бұрын
huh? wasn't perfect dark an n64 game? interesting video though, thanks :)
@nickwoodward8192 ай бұрын
ah, a remaster - not sure that's a fair comparison!
@Steve87082 ай бұрын
perfect dark zero is an original game for the 360, its a prequel to the n64 game
@nickwoodward8192 ай бұрын
@@Steve8708 ah, that makes more sense XD
@Vibin_with_Luis2 ай бұрын
Mate your dreaming the exponential growth in advancements towards AGI is undeniable, just listen to NVIDIA. AGI is here when Elon drops Grok 3, just wait a few weeks then make a comment.
@pixiedev2 ай бұрын
bro he knows better, Nvidia ceo said that to boost his gpu sales. And yes models are getting better but still not good as a good dev in a medium to large project.