o3 Model by OpenAI TESTED ($1800+ per task)

  Рет қаралды 19,227

Discover AI

Discover AI

Күн бұрын

Пікірлер: 121
@code4AI
@code4AI 19 сағат бұрын
As noted by a viewer, the x-axis representing the costs per task for O3 is logarithmic, so it will be .... slightly ..... more expensive. Smile.
@Quaintcy
@Quaintcy 19 сағат бұрын
The document says 172 times the compute of high efficiency mode. I think it’s reasonable to assume cost scale similarly. Your original estimate is closer than looking at the log graph
@r3ijmsszf3bsrew7tw7o
@r3ijmsszf3bsrew7tw7o 19 сағат бұрын
Your calculation of the cost is wrong as the x axis is a logarithmic scale and the next tick after 1k is 10k, not 2k. Hence the cost of a task of o3 high seems more like 7k-9k
@code4AI
@code4AI 19 сағат бұрын
Thank you for this comment. Somehow I refused to think it could get with o3 any more expensive than linear .....
@Aldraz
@Aldraz 19 сағат бұрын
Lol, yeah you are correct.. woah, that's a hefty price for such a simple task
@Adhil_parammel
@Adhil_parammel 19 сағат бұрын
It's around 3k.
@alexyooutube
@alexyooutube 18 сағат бұрын
Yes, the chart is on Logarithmic Scale. On the other hand, when we look at the Comparison Table with Token Usage (33m to 5.6b) and Cost per Task ($20 to $??) info. Token usage goes up roughly 200 times. If the relationship is linear, Cost per Task will be around $4000 per task.
@Originalimoc
@Originalimoc 16 сағат бұрын
That's somewhat insane.......
@SimonNgai-d3u
@SimonNgai-d3u 18 сағат бұрын
That MIT paper for TTT is probably the key to next level algorithmic unlock like TTC.
@idck5531
@idck5531 17 сағат бұрын
It is probably what they did use to achine this ARC-AGI score, it is just more expensive with such larger model (MIT used 8b parameter model, and their code is fully availible on Github).
@user-pt1kj5uw3b
@user-pt1kj5uw3b 18 сағат бұрын
Remember that ARC AGI also represents a second challenge which is representing spatial visual data in sequential tokens. Which by itself is quite a monumental task to overcome.
@mrpocock
@mrpocock 18 сағат бұрын
This is very impressive. As for the cost, this will fall. Someone will figure out what it is about the model that does reasoning, and then they will build a
@DianelosGeorgoudis
@DianelosGeorgoudis 14 сағат бұрын
I too have the impression this looks like a brute force success. Unfortunately it's not true that what is done by brute force initially is then improved with better algorithms - for the simple reason that better algorithms are often not found.
@mrpocock
@mrpocock 14 сағат бұрын
@DianelosGeorgoudis given how the models i can run on a single graphics card now so better than llms from last year, I'm optimistic.
@synthbrain3173
@synthbrain3173 19 сағат бұрын
I suppose most important that we have already this models and no matter cost per task, because cost mostly based on gpu\watt-hour but let`s replace by photonic chips and it's become almost zero cost
@code4AI
@code4AI 19 сағат бұрын
Sure, that is the new commercialization strategy of the FOR-PROFIT OpenAI entity. Zero costs .....
@ExtantFrodo2
@ExtantFrodo2 18 сағат бұрын
@@code4AI Tut tut. The expenses incurred by those for-profits types are far beyond the income. The new architectures will not only have many times the capabilities, but run at training AND inference at very affordable power consumption rates. Do you not think they know this? You won't need cloud compute, subscription services to train or run very capable AI assistants.
@densonsmith2
@densonsmith2 13 сағат бұрын
It is much, much too early to worry about how much it costs to run these models. The efficiency of the algorithm, hardware and energy generation are all very rapidly improving.
@ExtantFrodo2
@ExtantFrodo2 44 минут бұрын
@@densonsmith2 Indeed! Thank you for getting to the point I was trying to make.
@winstongludovatz111
@winstongludovatz111 16 сағат бұрын
Energy input will decrease substantially by the use of photonic computing (which is analog). Linear algebra (the crucial part) can already be done, but it is in its early stages. I think this (instead of quantum computing) will be the next revolution in computing. A suitable photonic element can do e.g. a two dimensional Fourier transform on as many pixels as fit the sensor in one single pass.
@P.SeudoNym
@P.SeudoNym 17 сағат бұрын
My hot take is that this methodology is correct, but it is early. The underlying LLM foundational model needs to be more intelligent to use more/better "intuition" to limit the test time compute scenarios; however, test time compute chain of thought will be optimized + foundational model intelligence will be optimized and pairing the two together will = quality go up and price go down rapidly. It's important to note that this announcement was about PR and staying ahead of Google...
@SebastianNowak
@SebastianNowak 16 сағат бұрын
In my view, there is no "intuition" because there is no any kind of "consciousness". What we have is ... "brutal force reasoning", "brutal force inference", hence "brutal force intelligence"... this is why it takes that long to find the correct answer.
@antonystringfellow5152
@antonystringfellow5152 10 сағат бұрын
@@SebastianNowak I agree and I agree with Yan LeCun that anyone who wants to contribute towards producing AGI should not be working with LLMs, they'll never get there. His team are seeing some promising early results on their JEPA (Joint Embedding Predictive Architecture) models. These models are designed to learn through vision rather than text. They form world models and these world models are modified as they learn. This seems to be how real intelligence works.
@hipotures
@hipotures 13 сағат бұрын
o3high: The scale is logarithmic, ending at $10k, but the graph does not start at 0 as it does not end at $10k. After tokens it comes out $3440 per task. From the chart $3500-3600 per task.
@Pasko70
@Pasko70 19 сағат бұрын
Beste Grüße aus Deutschland 👋. Deine aktuellen Videos finden die richtige Balance zwischen technischer Tiefe und Verständlichkeit. Immer wenn ich nur Bahnhof verstehe, schaue ich mir eins deiner Video zu dem Thema an und bin anschließend wenigsten ein Stückchen schlauer. Immer am Ball bleiben und weiter so 👍
@code4AI
@code4AI 19 сағат бұрын
Servus.
@CielMC
@CielMC 18 сағат бұрын
STEM Grad: 10$ @ ~99% efficiency. And what were the companies talking about replacing employees again
@szef_fabryki_azbestu
@szef_fabryki_azbestu 18 сағат бұрын
The first ARC example is ambiguous and this is the reason o3 failed because it produced only one possible outcome but different one was expected.
@SimonNgai-d3u
@SimonNgai-d3u 18 сағат бұрын
Yes. The example never showed touching would change the color, just piercing .
@szef_fabryki_azbestu
@szef_fabryki_azbestu 17 сағат бұрын
@@SimonNgai-d3u I've checked the next one (b457fec5) and apart from some minor glitches (random wrong pixel colors in some places) the main issue is that o3 apparently doesn't understand spatial relations in the example. The bottom squares are not covered by anything (another squares) so they should have only one color (red) but o3 repeats L-shaped pattern even for them.
@wwkk4964
@wwkk4964 15 сағат бұрын
To be honest our fixation on these arc AGI tasks, it's a bit like a dog discussing how humans are so bad at figuring out how to smell correctly to know what we dogs are thinking. it's very childish notion of intelligence, and I seriously doubt humans who are not exposed to modern technology would one shot know what the test taker imagines they are looking for. fairly certain that Piraha from Amazon or Onge in Andaman islands would fail too.
@szef_fabryki_azbestu
@szef_fabryki_azbestu 13 сағат бұрын
@@wwkk4964 I agree. Additionally imho it's stupid that we expect from language models to analyse those visual in nature tasks in a textual form. We have huge advantage over LLMs (if we feed them with those tasks described in textual form) because we first pass it through our visual layer in our brains and such pre-processed data in much easier to analyse. Try to solve those puzzles by listening to them and I'm sure you will not score 85%. Most of people probably will score 0% ;) For me it is obvious they should use multi-modal model and feed to it image representation of those tasks alongside the text description. No idea why they are not doing exactly that.
@wwkk4964
@wwkk4964 11 сағат бұрын
@@szef_fabryki_azbestu The previous semi private SOTA that Ryan Greenblatt achieved with GPT-4o was precisely what you described. It was a multimodal attempt using images and text description. While it's true that the type of tokenization of input will severely make it harder to figure out what is happening wrt to contextual awareness or recognizing compositional scales involved of the objects to manipulate, this is mostly still a critique at the level of syntax (valid of course). My critique is deeper in the sense that even if we have a cognition that can see what the test taker sees, and have no issues that is linked to perceptual faculties, it's still very weird to insist that "a 'general' intelligence" would necessarily find the test takers solution to be the only and fastest and best solution. Why would we assume people who are not familiar with pixel blocks or computer screens (such a plenty of human populations that live isolated in their own niche where they are experts in their domains that are relevant to their life (be it hunting fishing foraging, tool making, general survival) be considered not generally intelligent because they couldn't figure out or care for what a specific task on a led screen is supposed to be? it seems completely opposite of general intelligence and instead a fine tuned special intelligence that is needed to solve these tasks because it's mostly about modelling what the test taker thinks works and trying to overfit to that.
@pensiveintrovert4318
@pensiveintrovert4318 19 сағат бұрын
Just months ago Chollet had a different threshold for AGI. Now he moves his goalposts to keep his benchmark relevant, and his views.
@ItsRyanStudios
@ItsRyanStudios 18 сағат бұрын
Yup This seems to be a trend for many of us. We refuse to acknowledge agi even when it's right in front of us.
@HUEHUEUHEPony
@HUEHUEUHEPony 18 сағат бұрын
it doesn't matter if it cost a billion dollars for machine to do the same as toddler
@drhxa
@drhxa 18 сағат бұрын
Gemini 1.5 Flash is about the same capability as original GPT-4 was when it was announced but it costs ~800-1000 less. This is only 18 months of cost optimization and it's very real. So think in 18 months --> Today's $3000 per task will be $3 or $4 in 2026. Another 18 months it will cost $0.003, aka less than a penny. Cost is a non-issue. The fact that this scaled at all is all that matters.
@szebike
@szebike 18 сағат бұрын
I think he does the right thing these systems should 't be called AGI I dont think we a near yet. Benchmarks do not represent reality and 20 minutes and 7 K cost for a very simple task is a hurdle for decades ( if all goes well ). I think Openai made some "shortcuts" to get higher scores but if that "thinking" system still underperforms in certain tasks compared to GPT4 it hints at underlying overpromise and some "tricks" ( in the sense of very ineffective brute forcing )in my opinion.
@orwellstacticalbacon
@orwellstacticalbacon 18 сағат бұрын
The phone you have in your pocket has more computation power than the most powerful super computer in the 90's.
@CMDRScotty
@CMDRScotty 17 сағат бұрын
Ok here is something I don't understand the cost now is $1800 but how much will the same performance cost this time next in Dec 2025? I would think this is the highest the price point is going to get. When people started making computers in the 40s, 50s, and 60s they never could have imagined the 70s, 80s, and 90s. The idea that an average person could own a computer or that price would be so cheap that medium and small businesses could invest in desktops and laptops. This also doesn't include the 2000s, 10s, and 20s. Someone will find a way to bring the price down because the profit motive will push Open AI or someone else to push O3 down to O1 Prices.
@code4AI
@code4AI 16 сағат бұрын
This was primarily a lighthouse signal to investors, that AI is not saturating, AI is not hitting a wall, but showing global investors new profitable paths to the future. Therefore investors will not immediately demand their profits, but will wait a little bit longer for even higher profits. It is a mind game - with trillions of dollars, but primarily on the US market.
@Nworthholf
@Nworthholf 13 сағат бұрын
Well, if it means that alignment goes out of the window - its amazing news. (also, effectively doing an NP-full search on every task is not AGI by any stretch)
@winstongludovatz111
@winstongludovatz111 17 сағат бұрын
Three examples is not enough. And that is actually logical. We impute special significance to the geometric shapes from our experience, which the thing does not have. It has to make up for it via a much larger number of examples. And in real life applications it is actually able to do that, many more examples than a human could.
@code4AI
@code4AI 17 сағат бұрын
You misunderstand. I just showed you three examples where o3 fails, to give you a snapshot of the evaluation test. If you read the paper I have inserted, you will notice it is a complete suite of tests, multiple different test complexities.
@winstongludovatz111
@winstongludovatz111 16 сағат бұрын
@@code4AI OK, thanks, I will take a look at the paper.
@winstongludovatz111
@winstongludovatz111 15 сағат бұрын
@@code4AI Qutoted from the paper: >>We will refer to this dataset as ARC-AGI-1. It is a set of independent “tasks” (see figure 1), each consisting of a number of “demonstration pairs” (two or more, with a median count of three) and one or more “test inputs”
@luke.perkin.online
@luke.perkin.online 13 сағат бұрын
Three examples are enough. The whole point Chollet makes is that the machine needs as a bare minimum the cognitive primitives that we have, translation, reflection, rotation, ordering etc. There's only a few hundred you need, not a big number. It's the combatorial complexity that the human brain makes look easy!
@mdkk
@mdkk 11 сағат бұрын
Seems like the models are just using brute force, doesn't seem like they are really thinking or reasoning like a human
@psxtuneservice
@psxtuneservice 6 сағат бұрын
If a human get a problem he cant solve quickly, they also try different approaches, trying random things so see if there is progress, etc.. just we cant do 5000 new approaches per hour so we must filter them very strong. As they told us in University, it is possible to spend a lifetime on experiments without one working, if we cant deal with that frustration better get a non scientific job. So it is the same
@AbadonBIack
@AbadonBIack 2 сағат бұрын
​@@psxtuneservice The "Filtering options strong" IS reasoning. A human might need a few tries, but something that takes a human 3 attempts this AI might need 5,000. That's what they mean by brute force. It's not just trying options that make sense, it's trying every option it can find, regardless of if it is reasonable or not, because there is no mechanism for reasoning.
@JohnMcclaned
@JohnMcclaned 6 сағат бұрын
These are not models. the term is model has being co-opted and abstracted. they are basically running their own agentic workflows in the backend to attempt to manually correct bad assumptions as poor outputs. Hence, the rising costs. They are basically error grinding while loops.
@ricosrealm
@ricosrealm 15 сағат бұрын
Do you think o3 is tree of thought? That could explain its large jump in accuracy but also much larger computational cost at the high end (many more branches). Seems like the natural progression of o1 within a few months.
@user-td5gy2fh3p
@user-td5gy2fh3p 18 сағат бұрын
The existence we live in seems to be "in development" and approaching infinity; this is related to Gödel's incompleteness theorems. OpenAI is firsthand experiencing this and they're going to go broke if they don't stop. As they get closer, the problem just becomes more monumental and impossible to solve. I believe that human consciousness is somewhat part of this whole concept, which is why we have a "general" intelligence that these systems, relying strictly on logic, will never be able to achieve.
@szebike
@szebike 18 сағат бұрын
In my theory if we have true hardware at human molecular brain level we can achieve AGI ( with a limit of slightly above human but not too far off there is a limit for superintelligence in real time). But at the moment to be honest I take every info with heaps of salt because content creators aswell as those AI startups have a conflict of interest regarding making money by hyping up future (unproven ) potential > the true current capabilities. Also any intelligence shpuldnt be a product.
@idck5531
@idck5531 17 сағат бұрын
Do we need it to be consciousness? It can be the perfect logic machine and it will be enough to solve so much of our worlds problems. Our brains have limits, biology operate on low frequency to save energy, but those machines have much higher limit, many orders of magnitude higher.
@kirankumari-nz8sv
@kirankumari-nz8sv 16 сағат бұрын
Copium lmao you are wrong
@winstongludovatz111
@winstongludovatz111 15 сағат бұрын
Gödel's incompleteness theorem is often misunderstood. A proposition that can be proven in an axiom system, is true in ALL models of that axiom system. Conversely, if the proposition is true in ALL models of the axiom system, then it can be proven. This is a totally satisfactory state of affairs. But if we look at propositions that are true only in one particular model of the axiom system (that we are particularly interested in), why should it be provable (and then true in all models)? Gödel gets around that by constructing a proposition that is somehow intrinsically true ("I have no proof"), regardless of whether it has a proof or not, no reference to a model required. This is a very artificial sentence. But in general truth cannot be defined without reference to a model of the theory. Long story short: Gödel's theorem is mostly irrelevant to scientific inquiry.
@luke.perkin.online
@luke.perkin.online 12 сағат бұрын
Great video. I started to realise earlier this year LLMs can't generalise outside their training data. Interpolating in a pretrained space, no matter how high dimensional or how complex the manifold, does not mean it can adapt to novelty. Or rather, it can, if you turn the temperature up and spend thousands on compute verifying the garbage it produces using turing complete code.
@bogdanroscaneanu7112
@bogdanroscaneanu7112 12 сағат бұрын
Then why didn't they jump directly to making an AI that trains itself on any bit of data around it to see patterns just like our neural network does, without necessarily relying on language models only? Didn't they know from the beginning that theoretically this will be it's limit and it won't bring about a real AGI? Why it can't be made general without training data? It requires many more times the current computational power of chips or why?
@propeacemindfortress
@propeacemindfortress 19 сағат бұрын
AGI solved ASI tomorrow! trust me bro in 3 years we fly space ships, we will have infinite resources and energy at our finger tips... the whole world will have been transformed, politically ,economically, societally. it will be just like the summer of 69, just harder bro just harder and it will be good, soooo goooooood 😍🤩🎆 **comedy mode off** 😆
@propeacemindfortress
@propeacemindfortress 19 сағат бұрын
Dear reader, If you feel triggered by my comment then you have been given an opportunity to reflect on dreams and hopes and feelings and emotions, to reflect on marketing and the need of influencers to obey the youtube algorithm, an opportunity to reflect on unbounded optimism vs scientific approaches to problems and data Best wishes.
@shirowolff9147
@shirowolff9147 18 сағат бұрын
It will eventually be like that
@propeacemindfortress
@propeacemindfortress 17 сағат бұрын
​@@shirowolff9147 I would welcome it and it is a real and appealing possibility. But nothing is for certain. Not destruction, nor utopia, nor dystopia. But... as with everything else, there are many players on the global chess board, each with their own plans for it, if yours (or mine) align with their benefit is a different question. If for profit corporations in the progress of monopolizing the service called labor are the right vehicle is another question. I could bore you with a list of questions... or words about intelligence vs wisdom and how historically every significant technological advance served the ruthless in competition and war against the wise and blah blah hahaha, yeah not gonna do that. Best wishes have a great day and season.
@ibgib
@ibgib 18 сағат бұрын
Fascinating content. Im unsure about your surprise at the conclusion though! (c. 16:10) Were people thinking that the AGI/ASI would happen by a magical, one pattern to rule them all? For me, this has always been about the iterating process of aggregating unique patterns. These are the components that equate to Wolfram's "computational irreducibility", or Schmidhuber's evolution of compression, or - more abstractly - the belief of infinite primes. This research for me just shows that my vision of a new version control system is the "solution" that they're looking for - even if people dont have an interest in my ibgib implementation. Instead of one model to rule them all, its a living ecosystem of experts who, like humans/biological life, are iterating this process of pattern compression through time! Think of the isomorphism of tree searches and version control! Git is holding us back! Invest in my research! LOL (ok enough of that. Great video though!)
@shubhamverma9148
@shubhamverma9148 20 сағат бұрын
hey bro how to fine reasoning model like QWQ etc
@code4AI
@code4AI 19 сағат бұрын
I explained this in my video with the title "Test-Time Training Adapt: Novel Policy-Reward w/ MCTS" about two weeks ago.
@honkytonk4465
@honkytonk4465 19 сағат бұрын
'Hey bro'?😂
@rehmanhaciyev4919
@rehmanhaciyev4919 19 сағат бұрын
Thanks for the video.. The question I have is this: Isn't human only labels the final answers for the COT, Like the system generates different paths with final answers and final answers are evaluated by human other things by RL. Can someone pls explain thank you.
@dlbattle100
@dlbattle100 4 сағат бұрын
X axis is log scale so that box extends almost to the $10k mark.
@TheHorse_yes
@TheHorse_yes 18 сағат бұрын
Is it just me or does the whole thing seem like brute-forcing to get to the outcome?
@smorty3573
@smorty3573 18 сағат бұрын
yes that does seem to be the way o3 gets its answers. it *is* also the first one to do it at all tho. all these thousands of tokens indicate that this really is just bruteforcing, trying out all the possebilities. i like your take on this.
@shirowolff9147
@shirowolff9147 18 сағат бұрын
Humans also use bruteforce to make answers for things that still dont have answers, it doesnt matter how it learns, if eventually it can bruteforce instantly because of how fast its processing, thats good enough, it has to start somehow
@jksoftware1
@jksoftware1 9 сағат бұрын
That's why you don't use 1 model to do everything. For many things you will not need o3 to solve everything. You can pass it the output to another model for further processing.
@TheSingularityProject01
@TheSingularityProject01 14 сағат бұрын
You channel is by far the best for an informed and incisive understanding of AI. You keep it real and grounded. Are you on the X platform?
@ManishKrSingh-ov3oe
@ManishKrSingh-ov3oe 3 сағат бұрын
Cost axis is logarithmically growing. seems like o3 high would be at USD8000, which is insane.
@hypersonicmonkeybrains3418
@hypersonicmonkeybrains3418 18 сағат бұрын
If theres no scaling issue, and they get smart with more compute either at inference time or pre-training, then i think the colossus computer training Grok3 model will offer the same sort of performance but with less inference time compute so much cheaper, and it's intelligence will come from model size, training data and pre-compute so i think of that as its native intelligence.
@dijikstra8
@dijikstra8 15 сағат бұрын
I don't think the "high-compute" mode will be released, but the "low-compute" mode seems reasonable at $2000/month, giving you 100 requests per month, or less if they want to make sure to make sure to have some profit margin beyond the fact that every subscribed probably won't use the entire quota every month.
@wolfgangouille
@wolfgangouille 9 сағат бұрын
Lots of "experts" in the comment. AGI will be here soon, and we better worry about who will have access to it, instead of debating wether it's possible or not to achieve it.
@HansMcMurdy
@HansMcMurdy 18 сағат бұрын
Honestly, I didn't finish because the tests you showed it failing at have nothing to do with AGI and more to do with a diffusion-based model.
@arjunnayak9088
@arjunnayak9088 15 сағат бұрын
Diffusion is one of the tasks of AGI 😂. And o3 is incapable of that. AGI is not a joke.
@weareonesoup
@weareonesoup 14 сағат бұрын
Amazing work🎉🎉
@PrinceCyborg
@PrinceCyborg 19 сағат бұрын
With that cost, can they actually afford to do 5k monthly subscription, won't they lose money?
@shirowolff9147
@shirowolff9147 18 сағат бұрын
It wont be that expensive for them, this is just a test, they will somehow make it cost almost nothing but to us it will still be 5k
@gileneusz
@gileneusz 16 сағат бұрын
8:47 this is logarithmic scale so the next cost frame is $10,000, it's above the middle, so it looks like $7,500 for me...
@code4AI
@code4AI 16 сағат бұрын
See my comment on this ....
@gileneusz
@gileneusz 16 сағат бұрын
@@code4AI btw, great video, I really enjoy watching
@maximilianrck254
@maximilianrck254 17 сағат бұрын
I think o3 could solve these light weight Problems easier if you would let it run on a Server where it is allowed to do function calling and write and execute code on it’s own. As in it’s training data would be lots solutions which could do it for o3. We all know why this wouldn‘t be a good idea as a self reflecting and self enhancing AI isn‘t controllable any more.
@freedom_aint_free
@freedom_aint_free 12 сағат бұрын
Those folks that call everything and anything AGI seems to like to make fools out themselves! If a system is really and truly AGI it will be capable of recursively self improvement, this is why the question "When will we see ASI (Artificial Super Intelligence) often receives the answer: "A few minutes after AGI"
@CharlotteLopez-n3i
@CharlotteLopez-n3i 7 сағат бұрын
O3's impressive but not flawless. Task-specific limitations still exist. Test-time computation seems to be the key. Any thoughts on its future?
@MichaelScharf
@MichaelScharf 14 сағат бұрын
Why not combine LLMs with prolog? Let the LLM describe the examples and create rules and let prolog solve the combinatorics?
@willrodgers7974
@willrodgers7974 14 сағат бұрын
It's a good idea, and somewhat effective. There are already several examples published online, including here on KZbin. Iirc it usually enables better performance on some logic tasks for most non reasoning models.
@densonsmith2
@densonsmith2 13 сағат бұрын
Since the model outperforms most humans I suppose most humans don't have perfect training data either.
@matetheking
@matetheking 9 сағат бұрын
I’m curious: what’s the point? AI is surpassing every benchmark and challenge thrown at it. On top of that, the world’s smartest people are driving the technology forward. The goalposts keep shifting, and I can’t understand why seemingly intelligent people refuse to accept the reality of what’s coming. My only conclusion is that they can’t stand the idea of their cherished "mammalian advantage" being diminished by a machine. Their arrogance drives them, and their way of coping is to focus on finding one flaw-essentially, they’re picking pepper out of fly shit.
@user-mj2lm5fh1j
@user-mj2lm5fh1j 18 сағат бұрын
Hi, I am not sure who is reading this comment but I believe OpenAI will sacrifice itself in the process of building something based on transformers itself. To build AGI new innovations are needed, not inference time compute. I tried playing a simple game of tic-tac-toe and these llms failed miserably. I didn't notice any intelligent move to call them remotely close to intelligent systems. Human beings don't solve problems based on tokens, and it's foolish to try to solve any problems based on tokens. It is simply impossible to achieve AGI this way. In a few years, these companies will achieve some improvement some how and start labelling them as AGI, the way they call deep learning models as AI. Media will soon make it common. Ultimately, we will end up creating something that feels like intelligence but not exactly it. That's what humans have done over the centuries. I call these systems paradolie of Intelligence(PI)
@idck5531
@idck5531 17 сағат бұрын
Inference time compute is old news, read the new MIT paper which achived over 60% on ARC AGI with test time training on small 8b model. The key to solve those benchmarks and "out of distribution" problems is making the models learn during inference. Chaninging the weights during inference is the next big thing, we humans do it as well.
@hypersonicmonkeybrains3418
@hypersonicmonkeybrains3418 18 сағат бұрын
That simple test you showed at 1:00 yea thats piss easy, i understood it right away.. So it costs $1800 to fail at that, im sorry but that is not AGI... im not impressed.
@shirowolff9147
@shirowolff9147 18 сағат бұрын
And who said it was agi? Only youtubers are saying that, companies didnt, this is just opening the path to true agi
@christopheriman4921
@christopheriman4921 8 сағат бұрын
I got all of those examples near immediately, but I will admit that those would be very difficult problems to solve if I didn't have whatever mechanisms my brain employed for me to easily pull out the expected answer. I don't even know how I would break down a problem like that to put into an algorithm to solve the problem because you kind of have to make an algorithm that can generally recognize shapes as being distinct or a part of another and then you have to combine that with an algorithm that interprets the data/patterns it got in the intended way and finally you have to have the algorithm to output the correct actions to output that correct answer. All of those end up being complicated at best to write by hand even for some specialized cases so I am impressed with how far deep learning has come but there are still some obvious limitations to these systems we currently have. I think a general intelligence will likely end up being a fully autonomous turing complete program of some sort so not what we currently have but I think we are getting close to having all the right concepts necessary for computing intelligence.
@hypersonicmonkeybrains3418
@hypersonicmonkeybrains3418 7 сағат бұрын
@@christopheriman4921 The odd thing about these LLMs is that they can even show any sort of true intelligence. I mean science doesn't even understand the functioning of a fly's brain on a fundamental level.
@jeffreyspinner9720
@jeffreyspinner9720 17 сағат бұрын
Aren't you buring the lead? OpenAI said o3 is performing equivalent to AGI. (In OpenAI bizarro land, there are 5 levels to AGI... but they are saying they've achieved level 5) So, who's telling the truth, and was that kerfuffle months ago with the CEO derp getting fired and rehired this, just now being made public knowledge?
@miguelleandro6138
@miguelleandro6138 11 сағат бұрын
I’m also capable of handling these tasks well. Maybe we don’t really need it to do everything. What I truly need is for it to iron my clothes-that would make me super happy... the rest I can manage. I assume it would also make my wife very happy... a great Christmas gift. Can we test the model on this part?
@oatlegOnYt
@oatlegOnYt 18 сағат бұрын
Good results, bad economics. Now they will need to optimize to reduce computing cost maintaining the result level.
@shirowolff9147
@shirowolff9147 18 сағат бұрын
Thats why its good, optimizing is the easy part
@aminzarei1557
@aminzarei1557 17 сағат бұрын
So basically there was no "groundbreaking" breakthrough and it was again just "brute-forceing" the hell out of model 😅 Bro I'm thinking that openAi is the best but not in AI, in marketing 😂👌
@code4AI
@code4AI 16 сағат бұрын
See it from a non-technical perspective: This was primarily a lighthouse signal to investors, that AI is not saturating, AI is not hitting a wall, but showing global investors new profitable paths to the future. Therefore investors will not immediately cash in on their profits, but will wait a little bit longer for even higher profits. It is a mind game - with trillions of dollars, but primarily on the US market ... which is a global market. I suppose, Google and Microsoft are quite happy with this o3 announcement, since it will buy them time from their shareholders. Since they now have time to play catch up.
@gtrguy17
@gtrguy17 14 сағат бұрын
OpenAI is pure hype I stopped using them altogether. Anthropic is better cheaper and way more useful.
@meandego
@meandego 13 сағат бұрын
So... we call o3 model AGI but ot can't solve basic human task. This model is really like mathematician - can do crazy math, but can't make himself a sandwich and clean his room.
@fastneasy
@fastneasy 3 сағат бұрын
Sam Conman's 12 days of Shitmas
@noway8233
@noway8233 18 сағат бұрын
Its genious ..until Not😅
@matetheking
@matetheking 9 сағат бұрын
I’m curious: what’s the point? AI is surpassing every benchmark and challenge thrown at it. On top of that, the world’s smartest people are driving the technology forward. The goalposts keep shifting, and I can’t understand why seemingly intelligent people refuse to accept the reality of what’s coming. My only conclusion is that they can’t stand the idea of their cherished "mammalian advantage" being diminished by a machine. Their arrogance drives them, and their way of coping is to focus on finding one flaw-essentially, they’re picking pepper out of fly shit.
NEW Transformer for RAG: ModernBERT
17:07
Discover AI
Рет қаралды 2,5 М.
BREAKING: OpenAI's new O3 model changes everything
12:11
Theo - t3․gg
Рет қаралды 129 М.
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН
LCM: The Ultimate Evolution of AI? Large Concept Models
30:13
Discover AI
Рет қаралды 34 М.
How A.I. Could Change Science Forever
20:03
Cool Worlds
Рет қаралды 35 М.
Anthropic MCP + Ollama. No Claude Needed? Check it out!
18:06
What The Func? w/ Ed Zynda
Рет қаралды 9 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
2024's Biggest Breakthroughs in Computer Science
10:47
Quanta Magazine
Рет қаралды 231 М.
Build Anything with Claude Agents, Here’s How
18:35
David Ondrej
Рет қаралды 185 М.
Reinforcement Learning - My Algorithm vs State of the Art
19:32
Pezzza's Work
Рет қаралды 142 М.