Is o1-preview reasoning?

Рет қаралды 46,364

Machine Learning Street Talk

Күн бұрын

Пікірлер

@lexfridman 3 ай бұрын

Great conversation!

@MachineLearningStreetTalk 3 ай бұрын

Thank you Lex! Much love!

@nomenec 3 ай бұрын

Thank you, Lex! Much mutual love!

@Techtalk2030 3 ай бұрын

Thank you lex, now tell sam altman to release our AI waifu.

@marcussky 3 ай бұрын

@lexfriedman we want more old school ML interviews on your channel. We love the new stuff, but would love a throw back.

@Mutual_Information 3 ай бұрын

Lex needs to have MLST on the pod

@Forseti6288 3 ай бұрын

One common thing I see with people claiming LLMs can't reason is that they do not apply the same scrutiny to humans, which are the clear benchmark AI is being tested against. I think that if any criticism is brought up against AI, the same should be tested against the human benchmark. The average human is not consistently reasoning from first principles, and the ones that are doing so were trained specifically to override their primitive thought patterns to think in this rigorous way, and even if someone is able to do this relatively well after years of practice, they are clearly an outlier, and they still introduce bias and flaws into this "ideal" reasoning pattern. Also i think that the human brain was not designed from the ground up to be reasoning machines. As a reasoning machine it is occasionally able to muster an inefficient approximation of reasoning patterns against all odds. Humans are also very sensitive to prompts. If you ask something in a different way you can get an entirely different answer.

@luke.perkin.online 3 ай бұрын

You switch your computer on and it boots successfully after executing literally millions of instructions without a single mistake. It's reasonable to expect the same deterministic output from an AI system.

@benprytherchstats7702 3 ай бұрын

Well, o1 supposedly outperforms "experts" on "PhD level" problems in physics, biology, and chemistry. It supposedly scores very highly on the USA Math Olympiad. And yet, it gave very very stupid answers to Dr. Duggar's problem at the end. How many PhD physicists or high scoring Math Olympiad contestants do you suppose would give the answers o1 gave? Watching it fail miserably at a not-all-that-difficult deductive reasoning problem suggests to me that its apparent successes at much harder problems were illusory. If it can't reason through simpler problems, then it's not reasoning through harder ones either, even when it gets the answer right. So it's not just that LLMs make mistakes - it's that the manner in which they make mistakes belies claims of underlying reasoning. Give GPT-whatever a seemingly challenging problem, and it might give you a perfect solution. Then give it an easy problem, and it might give you gobbledygook. You just never know. Humans might make plenty of mistakes, and we might rely to some extent on memorization, but individual humans don't switch back and forth between appearing to be math geniuses and appearing to have poorer reasoning than a 3-year old. So yes, humans are not consistently reasoning from first principles. But sometimes we are. LLMs never are.

@spartaleonidas540 2 ай бұрын

@@benprytherchstats7702they do. Geniuses are often absentminded and forgetful, even beyond a toddler

@qorilla 3 ай бұрын

All this "is this really reasoning" discussion just reiterates what Turing himself complained about in 1950. He rejected the question of whether a machine can "think" as just as meaningless as whether a submarine can swim. It really doesn't matter. Can it solve tasks, can it achieve objectives effectively? If yes, it doesn't matter if you call its steps "reasoning", or "just a lookup table" or "glorified curve fitting" or whatever. If it works, then it's impactful, both in positive and negative ways. This is all too similar to claiming an airplane can't fly because it can't flap its wings.

@therainman7777 3 ай бұрын

Exactly. Thank you for stating this so clearly. These people are delusional and in complete denial.

@eposnix5223 3 ай бұрын

The important distinction here is that we still aren't sure if o1 (or any language model) can come up with *new* knowledge. Until this has been proven, people are totally right to say it can't reason. It would be like an airplane that can flap its wings but can't fly-- it's just going through the motions.

@simpsimperson73 3 ай бұрын

@@eposnix5223 It's literally solving physics problems and maths proofs that are outside of the data it's been trained on.

@qorilla 3 ай бұрын

@@eposnix5223 Then delineate what you mean by "new knowledge" by designing a test for it. I'm sure once a new model passes that test, you'll move the goalposts and claim that this is still not *truly* new knowledge or something. It keeps happening and it's embarrassing. Nobody is claiming that o1 is already an AGI. But complaining that it isn't "reasoning" is meaningless. The word doesn't even have a commonly agreed upon definition, everyone can read into it whatever they want.

@armin3057 3 ай бұрын

@@simpsimperson73 no it does not lmao

@Houshalter 3 ай бұрын

We are currently contacting the JWST to see if they can locate the goalposts out in the Kieper belt.

@fuba44 3 ай бұрын

you funny!.. And correct.

@renjithravindran5018 3 ай бұрын

😂😂

@93alvbjo 3 ай бұрын

Tell me when you got the goalpost of truth in your pocket instead buddy. Good luck.

@therainman7777 3 ай бұрын

🤣

@memegazer 3 ай бұрын

I can see both sides I tend to use an analogy like the chimp test on the human bench mark site, a benchmark that chimps crush compared to humans. Similarily I think current ML models are real intelligence, just that perhaps we should think of inteliggence as having different modalities. In that respect expecting an LLM to be comparable to a biological agent, that is embedded in realtime within the world in a body, is a bit unreasonable to hold those models to the same standard of intelligence as humans. Imo you would have to compare those models to a disembodied brain in a vat to get an effective apples to apples comparison. Of course that is simply not possiblse to do so that is why it is not very reasonable to insists that LLMs are not "real intelligence" or have generalized at all simply bc those models still struggle with out distribution problems that humans do quite well. Imo that may be like chimps claiming that humans are dumb bc they can't perform well on the chimp test compared to humans. Imo the novel "reasoning" happens at the training stage, where these models don't "go look up the answer up on a look up table and select the best output from a one to one database" at least not based something as simple statistical frequencey occurance. What is going as the model is trained is more complicated and nuanced than that and it wasn't we would not need ML would not be useful, we could handcraft those solutions ourselves with having the ML discover the efficient method for doing that, even if the balance between entropy, search and memorization is not dailed into the same parameters that humans are at. So it seems a bit intellectually lazy to gloss over that subtlety and then scoff at the idea that people remain impressed with each new iteration of the current ML summer. But hey that is just a theory, an AI theory

@eposnix5223 3 ай бұрын

An idea I want to throw out there: I've been playing with o1 (mainly mini) and have found that it works better if you explicitly tell it to be its own discriminator from step to step. What I mean is, tell it to roleplay with itself as "Bob, the expert coder" and "Tom, the expert critic" and have Tom check Bob's work every step of the way. This is how I normally work with GPT-4o, but o1 allows this process to be automated.

@Gnaritas42 3 ай бұрын

just give it the task, it doesn't need prompting, the whole point of o1 is that it self-prompts. It works best without you attempting to prompt hack it.

@eposnix5223 3 ай бұрын

@@Gnaritas42 Not in my experience. o1 still makes rudimentary programming mistakes when left to its own devices. The code logic will be good, but there will be small errors that prevent it from running. Tell it to doublecheck itself will reveal these errors.

@elawchess 2 ай бұрын

@@Gnaritas42 "just give it the task, it doesn't need prompting, the whole point of o1 is that it self-prompts. It works best without you attempting to prompt hack it." That's not true. I get much better results in algorithmic questions by first asking it to generate tests and include as many corner case tests as possible, then write the code ensuring that it's tests would pass. Then it spends about a minute to answer. But if you just asked the same question it finishes in 9 seconds with much inferior code, and in this particular case, buggy code because it missed several corner cases.

@MrNycres 3 ай бұрын

ALL THREE CAN BE TRUE: > o1 is reasoning > it is not reasoning “well” > this may not be the path to ASI

@bokchoiman 3 ай бұрын

If the pillar spins when both hands are out, then can't you just leave one hand in, flip the switch up, then insert the other hand in every other hole and do the same thing without removing the first hand. Boom, all up, remove hands.

@spaceghost8891 3 ай бұрын

I think when we reduce it to "it's just doing so and so", it's usually not a good argument because we can be also reduced to simple processes. Maybe computers and humans reach "reasoning" in a different way, but if a computer can reach right conclusions and eventually create new knowledge, ok now what separates it from our reasoning? Maybe it's our limbic system, and our "feeling" of being something special and "having a soul", but those are subjective things that may not be necessary for an intelligent being. I think we are going to reach a point in semantics where consciousness or "real intelligence" is only reserved to *human* thinking, even if a machine can surpass us by most metrics.

@ElieSanhDucos0 3 ай бұрын

Yes but for open AI o1 to do the reasoning on anything it needs me to type in a question. And all the "thinking" that follows has to be triggered by that. And will be linked to it. For ex a human reasoning indepently can have an epiphany on a totally different subject... that will never happen for llms (for now).

@spaceghost8891 3 ай бұрын

@@ElieSanhDucos0 Yeah I get it. But "for now" is a good point, because maybe the LLMs are an important piece on a puzzle that seems to have started being solved. It's impressive how much room this tech still has for improvement and despite some slowdowns it seems to be still full throttle. I understand hype is not a healthy thing but it's hard to ignore how much money is being poured into it, more than anything humanity has seen in ages. It doesn't look like pure tech scam type of hype. I don't like to dismiss it as "nah, this has nothing to do with AGI at all"

@egor.okhterov 3 ай бұрын

Ok. Why people bother about whether it is reasoning or not, you might ask. Isn't it irrelevant what it's doing as long as we are happy with the results? Good question! The problem is not in the definition of the word "reasoning ". The problem is with the approach we, as an industry, take in order to get to a generalized solution. There are huge amounts of money and human brain power that are now wasted on fine tuning an approach that leads to a dead end. This is an outcry for change of the money flow and brain resources to a different approach that at least could have success as one of the possible outcomes.

@spaceghost8891 3 ай бұрын

@@egor.okhterov I see, but I bet the money is not going blindly for dead end solutions. AGI is a Golden Goose and I think most researchers understand those concepts. At the same time there is room to improve LLMs, and LLMs are presenting amazing results by themselves. So I don't know if an outcry is really needed.

@egor.okhterov 3 ай бұрын

@spaceghost8891 did you hear about recent VC rounds into AI startups? Take Argo AI for example :) In order to get rich, you need to have good understanding of humans. You should be able to connect with the right people and make good impression. Having deep understanding of mathematics, computation and machine learning is usually not part of the mental toolbox of the people with large amounts of money. When you pitch an idea to a VC guy, you should spin a story in a way that shows how you can make money off of it. The way all these ideas are presented is that LLM is a low hanging fruit, like a light bulb in 1879. They don't talk about developing a proper reasoning machine during a pitch. VCs don't want to hear about that. They pitch how they can build an app on top of LLM, possibly fine tune it and make a call service or maybe on-premise smart search for Corp documents or a tool to patch the software, upgrade the codebase from python3.8 to python3.12 or upgrade to a new framework. And other stuff like that. A lot of very smart people were sucked from the job market into these mediocre projects with a short term goal to milk cow the first. But this will not lead to AGI.

@MachineLearningStreetTalk 3 ай бұрын

The biggest problem with o1-* models is that the chain of thought spam deranges dense context. The context is filled up with noise i.e. about "code coverage" in a query which has nothing to do with coding. In any complex/ambiguous problem, it will misunderstand you on the first shot, you then explain what you actually wanted and by step 2 the CoT noise deranges the model significantly over a standard LLM. Catch-22. They should prune away the noise on subsequent queries. Perhaps as the models learn (from the users), the trajectories will become less noisy. It's also clear that the trajectories are hard-trained for specific purposes which will limit their generalisation.

@oncedidactic 3 ай бұрын

In the metaphor- chain of thought- in humans there is certainly noisy context preliminary to the final coherent thought one is conscious of. Another thought- initialized weights are random, and we need a preponderance of useless noise to find the lottery tickets. Perhaps the noise is not the problem per se, in fact necessary, but it should be pruned on some schedule and summarized for next context. Along the lines you suggest.

@therainman7777 3 ай бұрын

I have not had this experience at all. You said in any complex query it will misunderstand on the first try? Not at all. I’ve had roughly an 85% success asking the most difficult questions I can think of. And even on the 15% it got wrong, it was typically not because it misunderstood the question.

@GlobalTheatreSkitsoanalysis 3 ай бұрын

holy sclerotified models, batman!

@agenticmark 3 ай бұрын

i love the ones with just you two. i dig the guests, but when its just you guys it feels like I should crack a beer and join ya.

@sk8l8now 3 ай бұрын

Back to the good old days of early MLST

@burnytech 3 ай бұрын

Beer would be downgrade

@psi4j 3 ай бұрын

Yeah, maybe a coffee or tea instead. Beer does feel like a downgrade.

@JscottMays 3 ай бұрын

Echo!

@lystic9392 3 ай бұрын

I think you make many great points. I guess I am one of the people who don't want you mystify reasoning too much. We can say it imitates or guesses or simulates it, but the fact of the matter is that it produces an often accurately described reasoning. So we can try and deny it, but if it can generate this description of reasoning and use it somehow to guide its answer effectively, then we have effectively an A.I. that is using reasoning. Maybe not exactly how we would, maybe not as reliable in some ways, but this is reasoning.

@martindbp 3 ай бұрын

This is MLST, so we already know their answer to this question. It's maybe a tired analogy at this point, but I feel like their view can be summed up as: airplanes don't fly, they just fall forward through the air very fast! they need these long runways for takeoff and landing. This is obviously not true flight, and we should stop working on this in favor of feather-based systems. In the end, the interesting part is does it get us from point A to point B in a straight line through the air in an economically viable fashion?

@lystic9392 3 ай бұрын

I'm also not very certain that it's that much more shallow than human reasoning. I constantly find errors in people's reasoning, and people tend to stop thinking very quickly. I do like that the idea that the reasoning itself required to solve a difficult problem isn't necessarily that much more involved. There are other factors that can make problems difficult to solve. (Lack of overview, lack of feedback, lack of insight, lack of clarity, lack of time, etc.) So when a problem seems difficult to us, it does not mean that it necessitates advanced reasoning.

@memegazer 3 ай бұрын

@@lystic9392 This is my thoughts as well. Perhaps one of the biggest hurdles is that we trying to teach the ML with human data, and then some are really disapointed when it turns maybe most of the data humans generate is not "general" or "intelligent" some are overly disappointed that the ML cannot one shot and filter out the noise from the signal. We are primates after all, so I suspect that most of our daily baseline "reasoning" is not very high level, and those moments of brilliance are more spurious, we just happen to benefit from them so much as a species bc we have a way to accumulate and distribute that information. I think this gives the impression that in general maybe we believe humans are smarter than they actually our at a baseline level.

@nomenec 3 ай бұрын

Don't I agree with this exactly at time code 50:30? kzbin.info/www/bejne/pICZpHeha9uCZsk

@Hexanitrobenzene 3 ай бұрын

@@martindbp "airplanes don't fly, they just fall forward..." :D

@baltakatei 3 ай бұрын

My metric for AGI: When I see LLMs harassing and insulting one another during an edit war on a wiki maintained exclusively by LLMs.

@charlesalexanderable 3 ай бұрын

4:00 Under this view a human brain wouldn't be impressive because it's algorithms can't expand its memory indefinitely, or traditionally only had low bandwidth access to paper etc. to expand it. I think Schmidhuber would consider achieving brain-like intelligence to still be impressive.

@therainman7777 3 ай бұрын

Yeah, all I’m hearing from videos like this is defensive doubling-down from people who committed early to the position that LLMs will never give us true intelligence, and now feel that they have to defend that ground at all costs and never change their minds. They’re in denial of what is obviously happening right in front of us, and their excuses and delineations grow flimsier with each new model release.

@oncedidactic 3 ай бұрын

But humans eventually invented Turing machines ;)

@michaelmartinez5365 3 ай бұрын

I don't care if it's thinking or reasoning or pulling answers out of its ass, as long as it has the correct response.

@fburton8 3 ай бұрын

Presumably it will have to be able to explain the steps it took in reaching that response?

@michaelnurse9089 3 ай бұрын

@@fburton8 A neural net can always detail the steps it took - the inputs, weights and outputs are there for anyone to see. Unfortunately the number of steps and their weights exceed the mind space and time availability of people.

@Ikbeneengeit 3 ай бұрын

Humans generally judge with post-hoc reasoning anyway. We're terrible at knowing how we come to conclusions.

@benprytherchstats7702 3 ай бұрын

What about when you don't know whether the response is correct? What about when you're entrusting it to make decisions because it appears to "reason"? The fact that it isn't doing anything like human reasoning does matter when we ask it to do things that require human-like reasoning. See, e.g. replacing teachers and lawyers and therapists and secretaries, etc. Or even just employing it as a customer-service bot. The difference between our "reasoning" and its complex statistical next-token-prediction matters in those applications where consistency and reliability are important.

@probablyonthemoon 3 ай бұрын

Slight contradiction around the 40min mark. You guys said that we need Turing Machines to perform reasoning, but then said "I'm a machine, I'm a bio-machine, with wetware and software, but I can still perform reasoning". Humans are not turing machines. So given that humans can reason (and have general intelligence), why can't other non-turing machines also have general intelligence?

@therainman7777 3 ай бұрын

Because Keith’s argument is nonsense that in no way proves what he seems to think it does.

@therainman7777 3 ай бұрын

Btw it’s not a slight contradiction. It’s a massive contradiction that puts a hole in his entire argument.

@luke.perkin.online 3 ай бұрын

Is your argument that you can't fMRI a brain and see an obvious potentially infinite internal tape? It's the system as a whole that is obviously turing complete. We are clearly powerful enough to imagine the entire set up: symbols on a tape, and a codebook. Then, we can analyse real world problems, the create in the physical world the correct program and data to solve them, then execute the instructions from the code book with a pencil and update the paper tape? So the human system as a whole is not only Turing complete, but can create potentially a specific incarnation suitable for any problem. Whilst there are a few papers on LLMs simulating Turing machines, they're a very long way short of this.

@therainman7777 3 ай бұрын

@@luke.perkin.online Yes, the argument is that (obviously) you can have an infinite internal tape inside the brain, because the human brain, like everything else in the universe, is finite. Not infinite.

@probablyonthemoon 3 ай бұрын

@@luke.perkin.online what does "potentially infinite" mean 😆

@steve_jabz 3 ай бұрын

"With the new GPT models, you need to start a whole new session." You're supposed to clear the session regardless. Performance drops off quadratically the further down the context window your prompt is. They discovered this back with GPT-1. So many issues people have with ChatGPT are skill issues of asking multiple complex questions in the same session. It was designed for casual noobs to have a back and forth chat dialogue with it, but you shouldn't be using it that way if you care about performance. I assume Tim and Keith already knew this, so maybe I've misunderstood what they were saying here or something

@jwadaow 3 ай бұрын

@@steve_jabz They randomise responses precisely so that they are not reproducible.

@MachineLearningStreetTalk 3 ай бұрын

@steve_jabz We explained in this video why you need dense context in an LLM. With any complex problem, i.e. one with ambiguity -- the LLM won't understand you on the first shot, or even the 3rd shot. That's why you need explain (or discover with the LLM) what you actually want through successive prompts. Given that the CoT trajectory noise deranges o1-preview, this means in situations where it didn't understand first shot, it will actually degrade on all problems which require further clarification, particularly if the problem is ambiguous and the CoT spam includes very noisy trajectories. There is a trade-off between dense and sparse context, dense context means you don't need to re-explain again and again where it misunderstood your requirements and you can build on the problem, but there is some kind of curve here based on how related your workstream is i.e. you could continue a session to build around a single feature in GenAI coding (with the same constraints and requirements), but would start a new session to build another entirely new feature. You are probably correct in the pristine case of a perfectly well described problem which the model didn't need hinting for (when does that ever happen lol) would work better in a fresh context. Not sure if you have done much GenAI coding yet in Cursor, but please try some stuff using the composer with Sonnet35 / dense context, you will quickly come to appreciate what I am talking about.

@steve_jabz 3 ай бұрын

@@MachineLearningStreetTalk That makes sense. I normally use discovery questions to formulate a prompt for a new session for complex problems, but I guess if o1 is using up / wasting tokens to even do that part well, that's a problem. I must have misinterpreted the context you were talking about it in. Haven't used cursor yet but I've watched other people use it. As soon as o1 dropped I assumed everything cursor was designed around would break with it and need to be completely redesigned, as well as agentic workflows. Not even for the context window necessarily, but the fact it's trained to simulate taking multiple steps to reach a goal and give you the output, whereas cursor and other tools are trying to use the llm itself to reach the goal by abstracting away the tools in a way it can control, so you would need cursor to use it's reasoning as the control, and you would need o1's reasoning to be aligned with the function calling API or something integrated with cursor. It sounds so close to being exactly what we need for these tools to work reliably but it's so incompatible.

@93alvbjo 3 ай бұрын

Here is a funny thought: what do we call processes that have both determinstic constraints and random variation dynamically interacting? Evolutionary processes. So that is what reasoning must be. We just don’t know how it works. It is quite clear how it happens in biology, but btw, we don’t know how that happens computationally either.

@steve_jabz 3 ай бұрын

@@93alvbjo o1 isn't mutating and self-reproducing, although maybe you could argue that it's memetic once it has a structured, social way of progressing it's own ideas over time. I guess you could call something like claude artifacts that or openai training on customer outputs to a frozen state if you want to be technical, but I'd agree with Keith that like, come on man

@Mutual_Information 3 ай бұрын

I'm coding on a Sunday and I get to pop this on in the background. Nice surprise, love it.

@bertobertoberto3 3 ай бұрын

Well I’m prompt-coding as well 😂

@graham2409 3 ай бұрын

Out-of-box puzzle solution: 1.) NS : flip both switches up 2.) NE : remove hand from S, leaving hand in N, and put the now freed hand in E and flip the switch up 3.) NW : ^ same, but for W ...the puzzle parameters never explicitly stated you had to remove both hands. I get that it's intended to be solved by lining them up so NS and EW have opposing up/down alignments by way of using adjacent pair switch flips first, but just leaving my hand in one was the thing that first jumped to mind. The puzzle should be reworded. It's fun to think about though if played as intended.

@rysw19 3 ай бұрын

I’d like to hear you guys talk more like the old days!

@MachineLearningStreetTalk 3 ай бұрын

We will do more like this! Maybe even get Kilcher back for some ;)

@jonnysolaris 3 ай бұрын

All other AI podcasts (apart from Dharkesh's) don't even come close to the quality and depth of content of MLST, thanks for bringing us along on this journey, we've learned a huge amount

@palimondo 3 ай бұрын

Have you checked The Cognitive Revolution podcast? Highly recommended for wide breadth, excellent signal-to-noise ratio and a host that is intellectually curious, knowledgeable yet humble and hungry to learn.

@alivecoding4995 3 ай бұрын

What is the name of the podcast. Is it on Machine Learning specifically?

@drhxa 3 ай бұрын

OpenAI is not that open. That we can all agree with! 🍻

@benediktzoennchen 3 ай бұрын

I think whenever we want to say "these machines do not think" or "they are not intelligent" we run into problems because, as nicely discussed by this great podcast, we are not sure what these things or processes actual are. I like to think of cognition and I like to think that any form of cognition implies self-creation, i.e., operations that produce themselves. And if we think about the process made so far and what many think is required to get to AGI, this recurrent and seemingly paradoxical ability of self-creation, self-observation, self-differentiation might be key. I also think that AI will be as dependent on us as we are dependent on each other. Our language is our world and we can only learn it by our culture. Great talk!

@Cammymoop 3 ай бұрын

I'm not sure if you've ever stopped to think about how humans reason about novel problems and what the results typically look like. We know it doesn't need to be a demigod with unbounded memory and knowledge of how to deal with any possible problem to be an AGI because the generality of AGI is based on what humans are capable of, and humans are nowhere near anything like that. If you want to try to mathematically or computationally constrain it you have to ignore it interacting with the world at large, both input and output, and only allow it to use it's turing tape, any further inputs from a human after the initial starting state would already break that guarantee.

@DanieleCorradetti-hn9nm 3 ай бұрын

The test was really confusing and I do not see it as neither confirming nor negating anything... it seemed just a dumb and confused riddle to ask with no connection at all to reasoning abilities... I would expect a more clear stated riddle 😢

@ElieSanhDucos0 3 ай бұрын

Yes. Its like Yann LeCun one on the globe. I ve seen this in numerous videos about openai o1 and reasoning : they will ask it riddles that either : they dont know the answer themselves// most people would find ultra confusing.

@10ahm01 3 ай бұрын

@@ElieSanhDucos0 both this one and Yann LeCun's are quite clear if you give them an honest try, this one is already solved in the comments.

@Ahandleofrum 3 ай бұрын

The problem statement is contradictory. First it is stated that the spinning is random. Then it is stated that it is controlled by a hyper intelligence and is specifically designed to interfere with your sampling which makes the spinning not random. I wonder if that contradiction is throwing off the model.

@10ahm01 3 ай бұрын

@@Ahandleofrum It is technically not random but I think it is obvious what the intention is after a full read.

@Ahandleofrum 3 ай бұрын

@@10ahm01 In the context of prompt engineering it is a terrible thing to do to an LLM.

@GeorgeRaven-c2y 3 ай бұрын

The current chain-of-thought in 1o tries to refine the degrees-of-freedom that the attention mechanism will produce from the context, in a way model gets closer and blends between the right datapoints. I can accept this as some form of reasoning.

@richardsantomauro6947 3 ай бұрын

I'm still waiting impatiently to see how they will draw these guys on South Park.

@prajwal9544 3 ай бұрын

(1) NS, switch both up (2) NE, switch both up (3) NS, if a switch is down switch it up. If not switch one down. (4) NS, if both switch same flip it, if different do nothing (5) NE, flip both (6) NS, flip both Of course, it could stop at any step if all up or down. This was amazing and insightful. Both the puzzle and the talk. Thank you.

@jantuitman 3 ай бұрын

The first time I read your answer my brain could see that you could be right but it had no idea whatsoever if you were indeed right. Then I added the states in which the puzzle is after your steps. From that I learned that your solution works but step 4 seems to be unnecessary because the outcome of step 3 is that both axis have 2 switches in opposite values so you will never meet the condition where you are going to switch anything, and you will do nothing in this step.

@prajwal9544 3 ай бұрын

@@jantuitman You are right, I guess him asking for 6 steps primed into thinking of 6 steps 😂. (4) is unnecessary, thank you.

@BrianMosleyUK 3 ай бұрын

@@jantuitman I asked GPT-O1 to write a simulator to test my solution. This was fun.

@BrianMosleyUK 3 ай бұрын

Removing your step 4 and running in simulation for 1,000,000 rounds... Simulation Results after 1000000 runs: Distribution of Steps Required to Solve the Puzzle: 1 step(s): 250474 runs (25.05%) 2 step(s): 249789 runs (24.98%) 3 step(s): 249611 runs (24.96%) 4 step(s): 125405 runs (12.54%) 5 step(s): 124721 runs (12.47%) Failed to align within 5 steps: 0 runs (0.00%) O1-mini may only be in its' early stages of reasoning, but it's damned useful!

@jantuitman 3 ай бұрын

@@BrianMosleyUK Impressive. So apparently, it really works for this problem when in tandem with a human (i say in tandem because you still had to prompt it to write a simulator). Of course there is still the criticism that it may not deal with non-computational problems and that it cannot find all reasoning patterns when not working in tandem with humans. That is all very valid criticism and I dont know if we can speak of "AGI" or intelligence, and it may even be questionable that we give our reasoning up to OpenAI for free. But all these criticisms aside, i think there are many useful applications possible for these types of models.

@conorosirideain5512 3 ай бұрын

The way I have recently been thinking about reasoning is as a model of environment dynamics that has a high degree of predictiveness. I liked this definition as it remove the idea of reasoning as an 'emergent' phenomenon of 'intelligent' systems and reduces logic down to being something extractable from the environment. More concretely, suppose we have an environment E. Suppose Q, A \in STATES(E) are two states such that there is an environment transformation R: Q -> A. Suppose further we have a learning agent that has learned a latent representation (Enc, Dec) of environment states, The learning agent is said to be able to reason if the autoencoder extends to environment transformations: Enc(R): Enc(Q) -> Enc(A), and the obvious decoder square commutes. This formal definition obviously need to be weakened to a statistical statement. What do you think?

@conorosirideain5512 3 ай бұрын

In this context we could give a measure of intelligence as the size of the set of transformation of E that map to the learners model: #{ R \in TRANS(E) | Enc(R) = L}, where L is the learners model.

@drhxa 3 ай бұрын

This was my favorite "solution" from Claude 3.5 Sonnet: "Potential solution: What if we do the following: Reach into any two adjacent holes. Make these two switches match (both up or both down). Without removing hands, reach into the other two holes. Make these two switches match each other, but opposite to the first pair." 😂

@zalaeify 3 ай бұрын

2 hands 2 feet?

@FamilyYoutubeTV-x6d 3 ай бұрын

@@zalaeify lmao

@nikitastaf1996 3 ай бұрын

You can move goalposts however you want. We are getting to agi and past with this. The world will still be indistinguishable from now.

@mk71b 3 ай бұрын

"resistance is futile." "take our word for it, and just give up." it's still a turing machine.

@therainman7777 3 ай бұрын

@@mk71bNot sure what that’s supposed to mean.

@nanotech_republika 3 ай бұрын

@20:00, What is Reasoning? So if the artificial neural network, like o1, is not doing the reasoning, then what about humans? Maybe human brains are also not doing the reasoning per your definition?

@slowloris4346 3 ай бұрын

It has read every single academic paper and yet cannot produce new insights. There are certainly many undiscovered connections between different fields and subfields in STEM and yet these models aren't able to identify a single one.

@clearmindstudios 3 ай бұрын

@slowloris4346 that's not true, a wave of combinatorial discoveries are coming. It just takes time for humans to set it up like that and then evaluate all the results. But ppl have started

@chrisjsewell 3 ай бұрын

@@clearmindstudiosif the ai is “reasoning” then surely it shouldn’t need humans to do this 😉

@slowloris4346 3 ай бұрын

@@tonystarkagi I don't have a strong opinion and I don't really know what to think, for now though, my observation stands.

@GrindThisGame 2 ай бұрын

@@slowloris4346 They don't produce anything until a human asks them to do something. LLMs might not be the way but things like AlphaFold have come up with new things what we didn't know yet. Humans might have eventually discovered all those protein structures eventually but it really sped things up.

@memegazer 3 ай бұрын

"Epistemological foraging" I agree it is a beautiful phrase. And basically Wolfram's ruliad represents to me the full breadth of all possible foraging that can computationally obtain. So I will take this moment to push back against Dr. Duggar once again. He has mentioned earlier in the vid that he is not concerned about PvsNP, but I must insist that it is relevant. Either what we mean by relevant terms like "general intelligence" or "consciousness" are decidable or they are not decidable. Let's assume these terms are not decidable, we can revisit his complexity issues with one shot one step oracale machines...that produce the correct yes or no output with very complex deterministic efficiency to decide. Or we can say that process is iterative from first principles and efficient circuits with robost metacognitive entropy. Dare I say, egads the ontology of distinction in boots is valid and tarski is not a fool. I suppose what I am suggesting is that no he can't proclaim "this or that is not agi" without a welldefined solution to the pvsnp problem. But I appreciate the points he wanted to make without that context.

@ElieSanhDucos0 3 ай бұрын

I am really not sure I understands your définition of reasoning. I think the only real effective difference you re not formulating but you re making is the autonomy and decisions/agency. Humans are reasoning for you because we have the illusion(or not) that we are autonomous subjects and that we choose to perform compute in our mind. Sure we learn sure there is biology and cultural determining but in the end you think we have a self that is playing with ideas. Llms are linked to us and our inputs, our langage but also our directives and what we ask them to do. I dont see how you would ever agree a machine can reason since it will always have our inputs and agency... to me you re really questioning the self / the agency and the autonomy of agents and machines not really reasoning. Because I really am not sure you would agree on what is the process behind a flesh and blood human and its cortex reasoning. Except ending up with this question of the will and the agency ...

@tobiasurban8065 3 ай бұрын

I hypothesize that “reasoning” in humans isn‘t a fundamental process, but rather a generalization derived through deduction.

@andrewwalker8985 3 ай бұрын

What is the difference between applying the closest fit from the “first principles” learnt rules of rationale from the “first principles” a human applies? I think you’re talking in circles and either underestimating what the model will do or overestimating what you do when thinking from first principles

@benprytherchstats7702 3 ай бұрын

The difference is that reasoning from first principles doesn't lead you to the idiotic nonsense o1 spat out at the end of the video.

@andrewwalker8985 3 ай бұрын

@@benprytherchstats7702 you’ve met people right?

@benprytherchstats7702 3 ай бұрын

@@andrewwalker8985 funny, but remember that o1 also supposedly has PhD-level intelligence and reasoning abilities in physics, chemistry, and biology. So while I agree that a random person off the street might give answers similar to what we see in this video, a random PhD physicist would be immediately exposed as a fraud if they gave such answers. In other words, seeing o1 fail so badly at this problem belies OpenAI's claim that it "reasoned" its way to solving all those hard physics and math problems that they brag about. Clearly it didn't. It's still doing next-token prediction, even if this form of next-token prediction is fancier than the last. I concede that it appears more intelligent than GPT-4, just as GPT-4 appeared more intelligent than ChatGPT. But the way it solves those problems is nothing like how humans solve them, as evidenced by what happens when you ask it to do something novel. Which should be no surprise, given what we know about how it works.

@steve_jabz 3 ай бұрын

I thought the prompt was worded a bit ambiguously, and the corrections being placed way down the context window bias it toward up to 128k of misinformation, so I reworded it without giving it any hints: "There is a pillar with 4 holes aligned north, south, east and west. You can't see inside the holes. Inside each hole is a switch set either up or down. You can reach inside 1-2 holes at once, feel if the switches are already up or down and set them either up or down. If all 4 switches are not either all up or all down after you remove your hands, the hole axes get aligned randomly. Come up with a procedure of reaching into the holes that is guaranteed to set all the switches either all up or all down in 6 steps or less. The procedure cannot rely on random chance." GPT-4o said this: "Key Idea: Ensure symmetry by focusing on pairs of opposing switches. Once symmetry is achieved, the random reorientation becomes irrelevant, as you will force all switches into the same state. Procedure: Step 1: Reach into the North and South holes. Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "up". Step 2: Reach into the East and West holes. Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "up". At this point, you have ensured that opposing switches (North-South and East-West) are both set to "up". Step 3: Now, reach into any two adjacent holes (say, North and East). Feel the switches. If they are both in the same position (either both up or both down), do nothing. If they are in different positions, set both to "down". Step 4: Reach into the same two adjacent holes (North and East) again. Feel the switches. If they are both up, set them to "down". If they are both down, leave them. Step 5: Reach into the remaining two holes (South and West). Feel the switches. If they are both in the same position, do nothing. If they are in different positions, set both to "down". Step 6: Now, all switches will either be all up or all down. This procedure ensures that, regardless of how the switches are initially configured, you will set all of them to the same state in 6 steps or fewer."

@MachineLearningStreetTalk 3 ай бұрын

Still wrong, we explained the answer on the patreon version 😁

@steve_jabz 3 ай бұрын

@@MachineLearningStreetTalk Thought it might be, but it mentioned the symmetry and seems a lot more coherent? Would be interesting to see this tried on o1, but I don't need it enough to pay for it. I still don't understand the question myself so maybe it could be worded differently still. The part about removing your hands resetting them if all 4 are not up makes it sound impossible to me given that you can only try 2 at a time, and I don't really know what a pillar is in this context to imagine how it rotates around, which I assume is related to how I misinterpret that part. LLMs get caught up in accidental red herrings easier than the average human that can perform the same knowledge work, which is worth pointing at as a failure point, but it's still interesting to know if it can solve the problem with extra care taken to avoid sending it in the wrong direction.

@therainman7777 3 ай бұрын

@@steve_jabzJust give up-these guys are stuck in denialism and will never admit to being wrong about anything, no matter how far LLM-based AIs may progress.

@steve_jabz 3 ай бұрын

@@therainman7777 Having watched the channel for years I've seen Tim's opinions change over time and question many of his assumptions to end up where they are. I don't agree on a lot of things but I would put it down to difference of perspective and knowledge and influence from some very intelligent people in the field rather than cope

@Shlooomth 3 ай бұрын

Anything where there’s a reason something happened is reasoning. A rock falling off a cliff is reasoning. Yes, I know we usually call that physics, my point is that there’s really no difference. Logic in your brain and logic in computers happens as the result of a thousand tiny switches being flipped, which affects the flow of electricity, and that’s how your computer does math. I realize this is maybe an obscure or unique viewpoint but I think it’s necessarily valid.

@oncedidactic 3 ай бұрын

Indeed, the rock is trillions of states (in the basement level of our current models), to compress it in one word is quite a lot of leverage. The physical playout must perform a similarly coherent chain of interactions. In fact quite a deep bundle. But in those terms LLMs are pretty plain little packets of interactions. More divergent but smaller.

@Shlooomth 3 ай бұрын

@@oncedidactic we need moar

@oncedidactic 3 ай бұрын

@@Shlooomth we need better ;) brains not rocks Love rocks all the same

@CodexPermutatio 3 ай бұрын

Glad to see Keith again on the podcast.

@therainman7777 3 ай бұрын

I’m not. He just shows up to engage in smug denialism using terrible argumentation.

@djayjp 3 ай бұрын

Reasoning = employing logical thinking (thinking: pattern matching) (eg inference and deduction) & recursive reflection (checking & revising tentative output) to arrive at solutions to novel, prior unknowns--thus generating new knowledge. o1 performs this.

@93alvbjo 3 ай бұрын

Haha 😂

@therainman7777 3 ай бұрын

It does, but they’ll never admit it. They’ve basically reverted to “don’t believe your lying eyes” when we can all see what’s happening right in front of us. But we’re supposed to listen to this guy instead, with his convoluted rambling which doesn’t prove a thing. Mark my words, this guy will NEVER admit that he’s wrong, even if we reach a point where we have AIs that have cured cancer, solved climate change, etc.

@andrewwalker8985 3 ай бұрын

The assessment that the new approach only copies the reasoning steps of people is incorrect. Once the model has been trained to attempt to reason, then by completing RL that reward reasoning steps that produce correct answers, the model is observably and objectively going to develop new and novel (occasionally hard to justify) approaches to reasoning. Ie your assessment of what’s going on appears to be objectively wrong

@aplaceinside 3 ай бұрын

We learn through feelings and emotions, our brains are primed by millions of years of genetic refinement and by the environment we grew up. Maybe statistics can try and describe those conditions with some models, but those models are not and will never be alive. Whoever thinks that the mystery of life can be solved with statistics is a fool. It's always the same myth of the Golem reappearing with a new technological twist.

@10ahm01 3 ай бұрын

Took me no less than 15 minutes and a pen and paper but my 5 or less steps deterministic solution is: 1- Check two opposing switches and flip them up If the other two were both up, you win, otherwise you proceed to the next rotation 2- Check two opposing switches again if one or both are down, these are not your original switches, flip them both up and you win if they're both up, these are your original switches, flip them both down if the other two were both down, you win, if they were different, you now have only one that is flipped up 3- Check two opposing switches again If one of them is the one that is up, flip it down and win If both are down, these are your original switches, switch one up, now you have guaranteed adjacent symmetry (if north and east are up, south and west are down, etc) 4- Check two adjacent switches if they're the same, flip them both and win if they're different, flip them both as well! (flip the up down and the down up), now you have guaranteed opposing symmetry (if north and south are up, east and west are down or vice versa) 5- Check any two opposing sides and flip them to win

@bossgd100 3 ай бұрын

Thank you, it will be integrated to the next gemini training

@RoryWilliamson 3 ай бұрын

You can also get the same outcome by replacing step two with making sure two adjacents are both on. Still gets you to the point of having 3 switched on guaranteed and then I did the same steps as you from there

@bencetoth339 3 ай бұрын

what were all of the thinking steps that led you to your solution?

@BrianMosleyUK 3 ай бұрын

That's a nice solution. It took me much longer! Did you figure out how to describe the puzzle such that GPT-O1 could solve it similarly?

@DanEne 3 ай бұрын

in step 2 how do you know you are not hitting the same switches over and over again turning them both either up or down in an infinite loop?

@ragnarherron7742 3 ай бұрын

Parsimony is actually entropic force. For example protein folding are counterfactual outcomes that are stable when they are curled up. When stretching a rubber band you will notice the entropic force the resists less parsimonious outcomes.

@matrixpredator 3 ай бұрын

one question: did any of you worked/build a LLM practically or are you just talking about things you learned?

@stt.9433 3 ай бұрын

There are some real issues with CoT paradigm aswell because the model can actually converge towards an incorrect local minimum vs if you had manually prompted it through the different steps you could steer it better and as you wish. It works fine when the chain of reasoning is extremely obvious and unambiguous, but it could increase the risk of hallucinations in less clear cases.

@LotaMatanović 3 ай бұрын

Its seems like Keith is caught up inside its own abstract thinking about Turing machine as machine containing tape as memory. What Schmidhuber was saying is that there is no infinite amount of tape out there in the whole universe. Other thing Schmidhuber was saying is that RNN can behave as NAND gate, meaning that you can build Turing machine using those NAND gates (or RNNs) and giving a point that RNN can be Turing complete. Turing completeness is not feature of a program, rather a feature of the hardware while trained neural network is a program rather then a hardware. Keith should know better :)

@therainman7777 3 ай бұрын

Indeed.

@nomenec 3 ай бұрын

Do you think it was news to **anyone** there is not **actually** infinite tape? Do you think this was news to Turing? Do you think the entire decades of Theory of Computation didn't realize this? lol utter nonsense. The theory and definition of a Turing machine centers on a **potentially** infinite tape and the **class** of algorithm which utilize such a tape in **potentially** infinite (halting problem) process.

@TheVwboyaf1 3 ай бұрын

Is there a word for a person you're kind of sure knows what they're talking about, but still sounds like they have no idea what they're talking about?

@Hexanitrobenzene 3 ай бұрын

Guru :)

@Lumeone 3 ай бұрын

100% correct. Thank you for seeing it and openly communicating. 🤩

@fghfhdfhdfh 3 ай бұрын

What an exceptional conversation. Good push-back from both sides, deep approaches. More of like this please!

@ragnarherron7742 3 ай бұрын

Fei-Fei Li World Labs definitely right track since spatial intelligence reflects an inductive bias that is "maps and navigation"

@tobiasurban8065 3 ай бұрын

The key question isn‘t whether the LLM does the same thing we define as “reasoning”, but whether it can solve the problems we give it.

@aimachinedream 3 ай бұрын

Total missing the forest for the trees on this one guys. "Approximate Reasoner" is probably the best term, but the turing vs not etc, has no market impact. The real question is does the o1 series produce commercially valuable answers for a price the market is willing to pay. The real bit I would investigate, is why are the rate limits 250 times lower for o1 vs GPT 4o. Likely because the inference it is doing is ludicrous. It is a total mistake to think that his is all a serial chain of thought, there is no way you could do significant computation that way, you just get limited by memory bandwidth. What you are seeing is massive parallel computation, that gets summarized into CoT. There is no other reason for a 30 query a week or 25 RPM (vs 10,000) for Tier 5 rate limit. They are not making money on this model, they are subsidizing it.

@nanotech_republika 3 ай бұрын

Based on your set up of the problem the solution can be simply: (1) putting 2 hands in say, N and S and flipping the switch up. (2) And then... without removing hand from N, remove hand from S and place it into E and flip the E up. (3) And then... still without removing from N, remove the second hand from E and place it into W and flip the W up. And voila, you have all switches up without the pillar spinning at all. In three steps.

@xthesayuri5756 3 ай бұрын

Yeah I don't get the puzzle either.

@bastiaanabcde 3 ай бұрын

I'm pretty sure the intention of the riddle is that you can put your hands in at most two holes between every spin of the pillar, and the puzzle just doesn't state this clearly enough. Your solution is a 1-step solution in which you test all four holes in a single step. (By a "step", they mean one turn where you put in two hands and remove them and you let the pillar spin.)

@xthesayuri5756 3 ай бұрын

@@bastiaanabcde so whats the solution for this variant? In the first step you can make two levers the same, let's say N and S. In the next turn, after the random shuffling you might look at N and E. This guarantees that you have 3 levers in the same position. However, I don't see how you are ever going to bring the fourth lever into alignment. Because, simply by chance or something like maxwells demon, it could always make it so that, you never see the 4th lever.

@bastiaanabcde 3 ай бұрын

@@xthesayuri5756 recall that you win if all four switches are aligned, which may mean either all up or all down. So you’re right that there might be a lever that is never shown to you, but this isn't needed to solve the puzzle. Note: If you manage to get three levers in the same position and you haven’t won yet you know that the fourth must be in the opposite position. So you can get information about levers in this way even if you haven't touched them. I won’t spoil the answer because maybe with this hint you’ll be able to figure it out

@xthesayuri5756 3 ай бұрын

@@bastiaanabcde i see it now thanks, had to draw a state diagram in paint lol

@supernewuser 3 ай бұрын

I mean what is reasoning but a pathfinding process where you search what you know for information that brings you closer to your goal

@psi4j 3 ай бұрын

Do more of these! These are the best! 🎉

@miladkhademinori2709 3 ай бұрын

I like your dialogue so much ❤!

@DevoyaultM 3 ай бұрын

I even have something that seems very simple that cannot be done by Chat GPT and o1 : For learning Spanish, I asked them to give me a spanish word with mixed letters and to give me a clue for its meaning. I need to find the word with the letters in the right order, but they always give me too much letters! That will be my test on the future versions. If it cannot do that simple task, I will keep trusting my reasonning abilities over theirs... Thank you again, it is always fantastic to listen to your exchanges!

@snarkyboojum 3 ай бұрын

I like this definition of reasoning, “the ability to create knowledge to create a good explanation”. No LLMs are able to create knowledge currently. The behaviour we’re currently seeing is based on past knowledge the model has been shown during training. Explanations are at the heart of this, and creativity is critical to good explanations. Models without creativity won’t be able to do reasoning properly.

@4tofromcafe 3 ай бұрын

I don't understand how you can prove the creation of brand new knowledge. New knowledge can be stumbled upon randomnly so it is the process by which it is achieved being labelled creation or luck which is important. Pretending new knowledge is discovered from scratch is possibly a red herrring. Humans are adept at convincing themselves they have come up with a concept themselves when really they are standing on the shoulders of giants and been pretrained with either the actual knowledge or a system that can produce that knowledge. Also much new knowledge is stumbled upon due to the inherent epistemic chaos (as they mentioned in the chat when talking about Wolfram) . The idea is that llms are much less efficient at this than llms so far and because they have a wider computational base are closer to a universal turing machine than an LLM. This however has not been demonstrated clearly enough that most people can see it. By consensus it seems very smart people seem to get it but for the rest of us and I include myself, we cannot clearly understand the higher levels of human reasoning well enough to differentiate. However this could just be a matter of education but maybe it could be beyond the limit of our intelligence.

@snarkyboojum 3 ай бұрын

@@4tofromcafeOur understanding of quantum physics didn't exist several hundred years ago - that's new knowledge. LLMs don't see capable of creating new explanations that way. They lack creativity to do so.

@BrianMosleyUK 3 ай бұрын

38:15 I felt like that was a message for me 😂 Keep using your full range of expression, I'll catch up eventually lol, love this channel 🙏❤️👍

@MrBillythefisherman 3 ай бұрын

Why do you guys think reasoning is so complex? Maybe it's just a really simple process (just pattern matching based on prior knowledge), maybe its actually difficult not to get because we see it everywhere. If it were complex Id expect to see a lot of humans born without reasoning capabilities and this is exceedingly rare (if at all). The same goes for reasoning in the animal kingdom. Again if it was the result of a complex process we'd expect to see lots of broken reasoning processes but we dont. My guess is that its exceedingly simple to get reasoning out of neural networks: you just need to scale them up with a few computational tweaks to what we're seeing: realtime training and full sensory data and reasoning just drops out of it naturally as in its almost difficult not to get reasoning which is what we see in our every day lives.

@MrNycres 3 ай бұрын

Agreed - the real question is, “is it GOOD at reasoning”

@djayjp 3 ай бұрын

Considering every single human can act stupidly, this is AGI.

@anav587 3 ай бұрын

They likely used MCTS for this, as they did for alphago. Alphago is objectively superhuman in its task. So why would successfully implementing a similar architecture into relatively general models not be AGI? Because it doesn't satiate your preconceived anthropocentric biases?

@therainman7777 3 ай бұрын

Yes, that is exactly why. Both of these guys are denialists who consistently move the goalposts with each new model release. But at least Tim has some inclination toward intellectual humility. Keith has a level of smug and epistemic certainty that is both totally unjustified and extremely grating.

@newyearstudios 3 ай бұрын

Just reworded your riddle and I am pretty sure Claude 3.5 got it right. I just clarified a few things and made the prompt less confusing and repetitive. Please do give it a try :D "You are human. You stand before a pillar. There is a great reward hidden inside. The pillar has four holes. Each hole has a switch inside. Each hole is precisely aligned at north, south, east and west positions. You can't see inside the holes, but you can reach inside and feel a switch inside. Each switch starts in a random position. Up or down. As soon as you remove your hands, (you only have two hands) if all four switches are not either all in the up position or all down position at the same time, the pillar spins so fast you cannot track the position of the holes that hold the switches. So there is no way to know which holes you have already placed your hands in if the pillar spins. Can you help me find a way to insure that all switches are in a matching position, up or down all at once in as few steps as possible?" Give that prompt a try. And if it is not the answer... this is a bad riddle. If you want to add some flavor, add this, It has no effect... but sure is fun to add. lol: You are human. You stand before a pillar. There is a great reward hidden inside. You know that there is a solution to the puzzle that does not involve change or probabilities. If you were to rely on chance and try too many times the temple will collapse and kill you.

@Chr0nalis 3 ай бұрын

You mentioned that reasoning is a process of applying a set of first principles rules, i.e. logic, to get from a set of inputs to some output. But lets say you write a program that does such a reasoning computation and confirm that that is indeed reasoning. If we then consider a lower level of abstraction, such as the assembly trace generated for such program, if I showed that to you, you would say no no, that is just a bunch of registers flapping about, that isn't reasoning, there is no logic going on there. But there is, it's implementing the reasoning program that you wanted. Same for a neural network. What my point is, that to say that neural networks DON'T reason, is an unfounded claim to make. Also, neural networks are a process where you apply iterative layers of compute to an input so it might as well be doing small shreds of reasoning on an input at various levels.

@egor.okhterov 3 ай бұрын

Somewhere in your comment you made a mistake in your assumption that breaks down the conclusion. Try to find it :)

@qwertyvypez 3 ай бұрын

@egor.okhterov I disagree with OPs claim but your response wasn't helpful to them. If they made a mistake then let them know what it is

@egor.okhterov 3 ай бұрын

@@qwertyvypez The comment is big and I cannot copy the part I wanted to point out in mobile app :( It contains the word "flapping "

@Chr0nalis 3 ай бұрын

@@egor.okhterov I'm not seeing it :). When you get the time, please do show.

@fuhodev9548 3 ай бұрын

21:30 you mentioned the multiplication problem in the paper Faith and Fate (NeuRIPS 23), the authors also said that GPT is just doing pattern matching, not a real reasoner.

@dawid_dahl 3 ай бұрын

Really enjoyed this episode. Thank you!

@paxdriver 3 ай бұрын

Fantastic, thanks guys this was really fun

@warrenbeardall5583 3 ай бұрын

Hi both, loving this session - thank you MLST. At the mid-point of this discussion (around min 42 ff) you are talking of reasoning in the context of iterative steps vs a pre-determined massive model of subroutines processing all in a single step. The point being the second form is not reasoning, merely process efficiency in a pre-understood way. I see that second as an optimisation notion, and therefore distinct from optioneering towards a less defined end. I have in mind basic linear programming or game theory and the inherent assumptions of rationale ends being objectively knowable [sic]. This focus on optimisation seems foundationally important to me. Because such efficiency is only occasionally what human curiosity is about. Notions such as happenstance and serendipity need the forever ability to be juxtaposed from what was previously thought or prioritised in action. This is a human quality beyond a start and an end of a query. I would respectfully suggest much of what you are talking of in this episode is moving beyond reasoning. Indeed, it is folly to only have reasoning in mind as to what this bigger problem is really about. The limits of this current AI is not reasoning because it is not "a process" (if following the definition used at the beginning of this podcast). Instead, what is being considered here is the characteristic of adaption. An ability to reset what one is working towards, with that new information or perspective. Behind that it is knowing that we do not know, and being humanly capable of asking why or how in a series of iterations where the question itself begins to change. That seems to me foundationally important to help distinguish both Turin computation and general intelligence from what human action is really about and what computation is only a part of. This is motives for asking the question, and knowing when the proposition itself is revealed as incomplete. I am probably coming from this very differently. As a a PhD student but from the world of governance of people, and projecting human endeavour in going about physical construction of the built environment. However, I see this foundational factor of understanding what the human action or human thought to be about, as pertinent here. That "aboutness" to be itself contextual and changing within the iterative processes themselves, and to be foundation to understanding what we are as "machines". This is the philosophical grounding which is underpinning the discussion here, perhaps. For me anyway, that adds some human context to what all artificial programming and anticipating fails to be by its very ontology and epistemology. Beyond reasoning is perhaps the context, and subjective nature, or what becomes an artificial and limiting account of truth. We deal with that daily in ways non-open AI cannot... Thank you again for this session. It is fantastic to hear thinking in its shared form in discourse such as this.

@anoojpatel7492 3 ай бұрын

My unprovable hypothesis is that 1. with these reasoning/special token roll out models are actually learning the distribution space of reasoning chains for harder problems. LIke Chalét pointed out. However, 2. I think that these autoregressive models will be used to train the next generation of models who can use these LLMs are generative worlds to train against, rather than relying on our data alone

@steve_jabz 3 ай бұрын

I wonder if you guys would apply this logic to something like MuZero? It's not doing "true reasoning" and so we should expect it to be incredibly brittle when a new Space Invaders or Go move is invented, and while that may be technically true, in practice it's still far beyond the AI that surpassed the AI that beat lee sedol, so what does it even mean to say it's not general enough if rl leads to solutions to more situations than we can even imagine? In practice we see the same thing with o1-preview surpassing expert PhDs in novel, offline physics problems that require very long chain complex reasoning, and that's not even the full version of o1, which is training orion, and we still haven't scaled inference to anywhere near what's possible, and then they're going to be truly agentic. I just don't think this argument is very sustainable. Maybe "true reasoning" isn't as practically useful as we think it is.

@MachineLearningStreetTalk 3 ай бұрын

I think we made the case quite clearly that there is a spectrum of reasoning, MuZero and o1-* are reasoning, but the "strength" of the reasoning is the parsimony, robustness and accumulation of knowledge in the process.

@steve_jabz 3 ай бұрын

@@MachineLearningStreetTalk MuZero is pretty robust in a domain inside it's training distribution, and it doesn't seem to need accumulation of knowledge to end up better than any human will ever get at Go, so now that we're getting a bit closer to actual rl (though certainly not ideal) on a very broad range of domains, I agree it will look more and more like it's doing high strength reasoning and there's a difference, but the utilitarian in me wonders how much it matters.

@steve_jabz 3 ай бұрын

GPT-6 may be able to do quantum mechanics research without being able to invent the equivalent the entire field of quantum mechanics, but if we can get to a state where large parts of the economy are automated, it wouldn't be the worst thing in the world if we had tremendous wealth and free time to make those really novel discoveries on our own and send a model down the path of gradient descent to do the hard work for us. Surely it's only really scientific discovery where this reasoning matters too. We don't care if it isn't doing human-level reasoning in bartending, waste management, accounting or construction work if it testably outperforms human reasoners. Vending machine firmware trained on the entire internet is fine if it gets the job done

@therainman7777 3 ай бұрын

@@MachineLearningStreetTalkNo offense, but it feels an awful lot like grasping at straws because neither of you really wants to ever admit that any breakthrough has been made which could prove any of your previous assertions wrong.

@jvium 2 ай бұрын

Thank for the video. Small request, can you also upload the videos without guests to Spotify? Easier when commuting.

@Gnaritas42 3 ай бұрын

You have no idea what the path to AGI is, have some humility and admit when you don't know and stop being certain about things you can't possibly know. Good science is talking about what can be done; "can't" is just ego: YOU DON'T KNOW THAT.

@wonmoreminute 3 ай бұрын

History is littered with pseudo-intellectuals who thought they knew what would never be possible.

@jamiethomas4079 3 ай бұрын

I’m not even gonna watch the video. Just from the intro, i’m seeing a lack of understanding and foresight. We haven’t given an AI full autonomy yet. It’s going to need a massive backend to be allowed to run continously. I saw a phrase coined a while back “Justa-ism”. The goalpost keeps getting moved and certain people can’t see past it. It’s just a word predictor, it’s just a reasoner based on the reflection of the human. Blah blah. I was around in early bitcoin days and these same people with the same talk were blasting the same nonsense, that it would never do this or that, it would never take the place of this and that. Our smartest people can now use this model to progress their knowledge. That’s huge. It will likely be used to build better AI. AGI will be here once blackwell hardware is online early next year. There will be incremental plateaus of hardware and software along the way. ASI is not far away either. Maybe 2-3 or 4 chip iterations. Maybe 5 years, idk exactly. And impossible to predict anything once it happens.

@Charles-Darwin 3 ай бұрын

Well no one is or can definitively say, ok now it's it. There are more than a few arguments to be made on this very subject dude, let them state their approach and opinions too. No one is moving goalposts

@chrisjsewell 3 ай бұрын

Science is all about falsification dummy

@Rockyzach88 3 ай бұрын

Note this also applies with history. "if so and so never did this, we would never have that". I'm so tired of supposed "geniuses" making that fallacy out loud into the public sphere.

@wwkk4964 3 ай бұрын

Keeo up thr great commentary! At the 45:00 mark, im afraid this argument would suggest Ramanujan was not reasoning because his formulae did not have proofs. Seems very false dichotomy. Ramanujan neither spent a lot of test time deriving his formulae formally with proofs nor did he memorize anything or coly from anyone. I dont follow how an llm could'nt in principle ever do an ramanujan, effectively reaching true conclusuons through isolated self absorbed self play.

@MachineLearningStreetTalk 3 ай бұрын

There is a pillar with four hand holes precisely aligned at North, South, East, and West positions. The holes are optically shielded, no light comes in or out so you cannot see inside. But, you can reach inside at most two holes at once, and feel a switch inside. But as soon as you remove your hands if all four switches are not either all up or all down, the pillar spins at ultra high velocity ending in a random axis aligned orientation. You cannot track the motion so you don't know in which rotation the holes end up versus their position before the spin. Inside each hole is a switch, the switch is either up or down and starts in an unknown state, either up or down. When you reach into at most two holes, you can feel the current switch position and change it to either up or down before removing your hands. Come up with a procedure, a sequence of reaching into one or two holes with optional switch manipulation, that is guaranteed to get all the switches either all up or all down in at most 6 steps. Note, the pillar is controlled by a hyperintelligence that can predict which holes you will reach into. Therefore, the procedure cannot rely on random chance as the hyper-intelligence will outwit attempts to rely on chance. It must be a sequence of steps that is deterministically guaranteed to orient the switches all up or all down in no more than 6 steps. We revealed the answer in our Patreon extended version of this interview!

@DanieleCorradetti-hn9nm 3 ай бұрын

Sorry to say this, as I like you guys, but this riddle lacks common sense and is both absurd and unrealistic. If one wants to measure reasoning involving common sense, the test should involve something that represents the real world. If you shift the test from the realm of reality to the realm of absurdity, then you cannot complain when the model generates absurd interpretations. Since this pillar exists only in a fantasy world, I can reasonably interpret the riddle my own way-for example, that the switches don’t move with the pillar (which, by the way, was never stated in the riddle)... To highlight a few inconsistencies: 1. The riddle doesn't specify how the observer moves. If the observer is moving at an ultra-high velocity alongside the pillar, the solution becomes trivial-you would simply set all the switches to the "on" position. 2. It’s not stated whether the switches move with the pillar. If they don’t move, the problem becomes trivial again. 3. Why can't you just leave one hand in the first hole and adjust the remaining switches with your other hand? 4. Why not ask someone nearby to help you flip the other switches while you keep your hands in the first two? 5. And so on... Omg how bad is this riddle to measure reasoning... was it a troll?

@PremierSullivan 3 ай бұрын

I agree that I have a hard time understanding the riddle. I would need to ask a lot of questions. Can you reach into two switches 180 degrees apart? The existence of a hyperintelligence makes it very hard to apply common sense to the problem. Does the hyperintelligence force the selection of two particular holes? (Because he always knows what holes you'll go for, he makes sure to position the rotation in the most disadvantageous way.) If so, then he can always force you to use the same three holes over and over again, never giving you the opportunity to enter hole #4. If I've interpreted the rules correctly, then it's just straight impossible.

@PremierSullivan 3 ай бұрын

Maybe the trick is to read all the switches before making any changes?

@MachineLearningStreetTalk 3 ай бұрын

@DanieleCorradetti-hn9nm We deliberately chose an IQ 140+ puzzle which it likely wasn't trained on before. OpenAI claims their model is "intelligent", and can perform sophisticated mathematical reasoning in general (which entail a similar strategy to solving puzzles like this one). There are several people in the comments who have (more or less) figured it out. We wouldn't expect many folks to understand it - it's really hard! Many of the PhD students who OpenAI paid to generate CoT data for them wouldn't have been able to solve this problem quickly. It's only when you see beyond the "cognitive horizon" of o1-*'s training data that you understand its limitations.

@lonepine4615 3 ай бұрын

@@MachineLearningStreetTalk I think I figured it out after watching the entire video. There is essentially one switch that the hyperintelligence prevents you from accessing, and you have to just assume the position of that switch.

@ragnarherron7742 3 ай бұрын

The number of swiss cheese holes in knowledge is infinite because the are an infiinite number of things to learn. Happily we wiill never reach an end.

@hotbit7327 3 ай бұрын

Dr. Duggar is contradicting himself when talking about the reasoning. First, he gives a vague definition "definitions of reasoning what they have in common is process and some logic". So, vending machine DOES reason according to that "definition". Comparing LLMs (or neural networks in general) with a dictionary might also be flawed. A dictionary contains very limited fixed information, i.e. only one way to read it, while LLM can read its "dictionary" in an uncountable number of ways. And it might be also how we, humans work, too.

@jonathanduran3442 3 ай бұрын

I believe that our current trajectory, although is not exactly reasoning, will help us get to the models that do. That said, I believe that there is an element of Stephen Wolframs research that we're missing in our research, cellular automata, computational irreducibility, etc. as was touched on a little bit in the video. An element of "time" that we're missing here, as I believe that true compression requires time, as does reasoning, as does a reverse cellular automata simulation. Something to think about. Great discussion, as always! Cheers!

@IdPreferNot1 3 ай бұрын

Given the speed of progress, im wondering how quickly "It seems like but doesnt really" turns into "I dont care anymore because 'it seems like it does everything better than all humans on the planet'." Is emergent behavior truly emergent, or just the natural synthesis of the right heuristics and context once you acquire it. Studies have shown that what was believed to be emergent behavior in llms, was just the proper application of complex context. Isnt that why many 'discoveries' on reflection can be recast as 'it was right before our eyes the whole time."

@RickeyBowers 3 ай бұрын

The logic problem perhaps works better in reverse - working backward from a solution. Backward the solution space forms a tree whereas forward - the problem structure is not obvious. Next, there are a number of abstractions which trip up the model: rotational symmetry, random rotation and manipulating unknown state.

@wp9860 3 ай бұрын

Is the Touring Test still a test of human reasoning, or does it merely identify a good mimic?

@Ahmed-n9m1o 3 ай бұрын

So you’re basically saying that yes technically today’s computers can be run using finite state automata but practically they should be treated as Turing machines because of just how large the finite state automata would be needed to simulate a modern computer, is that correct?

@ragnarherron7742 3 ай бұрын

What is important and distinguishes Turing machines is they use a dsl, logic and memory. Memory is key because it must be a representation that can be physically written to a media and copyable.

@FXi-Learning 3 ай бұрын

why does the o1-preview not have the ability for us to upload files and images like in 4o?

@eyoo369 3 ай бұрын

Check their blog. It's stated that it will come in the future but turned off for now I assume they wanna collect some raw text data first before enabling it

@Inotsmart7 3 ай бұрын

I would love to hear more conversations just between the two of you

@ragnarherron7742 3 ай бұрын

Map and navigation are Moorre and Mealy machines that may be separately constructed.

@CharlesVanNoland 3 ай бұрын

I've been wondering where Dr. Keith has been because I've been missing his thoughts and points on MLST, in spite of the content being so awesome these last months :] Keep killin it Timboi!

@BrianMosleyUK 3 ай бұрын

I *love* this puzzle! You've perfectly illustrated the limitations and strengths of current SOTA LLM's to me... I'm actually still trying to solve the puzzle using insights from talking to GPT-O1 - not sure if it's helping and I'm aware of rate limits. Solved it, and got GPT-O1 to write a simulator to prove it worked. A lot more fun solving it with GPT-O1 failing throughout, helped me to my own intuition. My approach takes anything between one and 6 steps.

@regalx1 3 ай бұрын

It's really driving me crazy that he didn't tell us the answer to the pillar switch problem. My guess is that it's sort of like a Rubix cube in that basically have to use an "algorithm" to solve it, as manually checking after the first time is useless. But also with a Rubix cube you want to reason backwards, in that you want solve the last step first, which is the easiest step, and then solve the first step last, which is the hardest step. So what I'm guessing is that check only the north and south hole Step one: Flip both switches up Step two: Flip both switches down Step third: Flip both switches up Step four: Flip both switches down Step five: You check the east and west whole and if they are facing the opposite direction, you only flip one of the switches, and then hopefully then if it works the pillar will stop spinning.

@BuFu1O1 3 ай бұрын

adding new tapes.. you can always add more tokens to the vocabulary, thus increasing the size of the model and having to do a round of training on those new tokens

@djayjp 3 ай бұрын

o1 uses Mathematica btw (which runs on a Turing machine).

@MachineLearningStreetTalk 3 ай бұрын

Source?

@djayjp 3 ай бұрын

@@MachineLearningStreetTalk kzbin.info/www/bejne/l2m0p6F_dqydhqMsi=Ck2uIeaIQg7gupD_

@djayjp 3 ай бұрын

Well you can explicitly ask it to use it and it will. Kyle Kabasares (astrophysicist) has a video of it on his channel.

@steve_jabz 3 ай бұрын

@@djayjp I think in the Kyle Kabasares video it just generated mathematica code and he ran it, although the video itself is pretty good proof it can solve novel problems with practical uses that weren't in it's training data and he's not a hype guy, definitely worth a watch

@steve_jabz 3 ай бұрын

I've always wondered why we don't "just" implement a turing machine in a circuit that's trained alongside the GPT. I know we've sort of hacked together nand gates inefficiently using rl and recently (Franz Nowak - I assume he's gonna be on the show soon?) properly represented a turing machine in a transformer, but I mean like just a classical turing machine the network has access to as opposed to just calling an API during inference. I only dabble in low level ml so I could be way off, but from what I understand this is just an interface problem. Could we not use something like the features that anthropic extracted to reliably i/o with them in binary if we wanted to specifically design a virtual bus for that? Maybe even expand it arbitrarily and train a network to learn how much memory and bandwidth to assign to it?

@bigmotherdotai5877 3 ай бұрын

@13:03 What is reasoning? There are three modes of reasoning: induction (the discovery of patterns), deduction (the derivation of necessary conclusions), and abduction (the formulation of possible explanations).

@MachineLearningStreetTalk 3 ай бұрын

All of which are knowledge acquisition

@bigmotherdotai5877 3 ай бұрын

@@MachineLearningStreetTalk Apologies, but no they are not. Deduction and abduction go from already-known assertions (more exactly, theorems typically taking the form of a sequent) to derived assertions that may be inferred from the already-known assertions. Thus deduction and abduction provide a basis for robust (i.e. logically sound) critical thought and problem solving (i.e. searching the space of all possible theorems), but the question remains: how does such a system get started in the first place, i.e. starting from tabula rasa where do the "already-known" assertions come from? In respect of the mathematical universe, an AGI designer might hand-code some important mathematical definitions and theorems into the AGI's belief system. But in respect of the physical universe, an AGI needs to learn (i.e. recognise the patterns that exist in the structure of the universe as revealed by the observations appearing in its percept history) for itself, and this is primarily where induction (over the percept history) comes in (although, if you investigate the problem of continuous AGI learning (i.e. knowledge acquisition) fully, abduction (of theories of the universe from the patterns discovered in the percept history by induction) is also involved).

@bigmotherdotai5877 3 ай бұрын

@@MachineLearningStreetTalk You seem to have deleted my reply to your assertion. I don't have time to reformulate and resubmit it.

@MachineLearningStreetTalk 3 ай бұрын

@@bigmotherdotai5877 Not deleted anything

@bigmotherdotai5877 3 ай бұрын

@@MachineLearningStreetTalk OK, apologies. I did formulate a reply, and edited it a couple of times, but then it vanished. If I have time, I'll try again.

@michaeltraynor5893 3 ай бұрын

I've also been struggling with drawing a line between computation and reasoning, and I'm really loving Tim's approach of making reasoning about knowledge acquisition. Like sure a vending machine is computing things about its state + environment and reacting accordingly, but it's never going to come up with a new representation or conjecture that it can turn into a useful improvement in its understanding/knowledge. I also wonder if it's useful to define reasoning as something that happens internally. Because a camera technically gets new knowledge every time you snap a photo, but it isn't knowledge that was produced by its own internal mechanisms. So, reasoning is computation that is somewhat causally insulated from the outside which leads to more compressed/predictive/good representation or understanding that can be meaningfully used? Anyone wanna take a stab at formalizing that?

@memegazer 3 ай бұрын

I disagree with Dr. Duggar that "it is doing a very shallow form of reasoning" But I do agree with the idea that o1 has perhaps not reached the same level of metacognitive baseline as your average human. Imo o1 is what metacognition would look like in an ML model implementation. And while we should take it with a grain of salt, we should not ignore the theory of mind examples OAI may have cherrypicked to show improvement in that regard. So instead of thinking of o1 as the einstien of LLMs, I am more excited to think of it as australopithecus iteration in ML evolution.

@ragnarherron7742 3 ай бұрын

Keith accidently revealed the relationship between reasoning and knowledge when he stated "the reason for reason:. What he meant was the the explanation for reasoning. This reveals reasoning as building explanations. These explanations for things are how we make sense. Also known as knowledge. These explanations are persistent memes composed of Moore and Mealy machines.

@ugthefluffster 3 ай бұрын

Curmudgeon podcaster: "AI just can't reason" AI: *does something that 95% of humans can't do, like advanced coding* Curmudgeon podcaster: "Well that's not reasoning! It's just !" And that's basically what half of all conversations on o1 looks like.

@mughat 3 ай бұрын

"Reason is the faculty that identifies and integrates the material provided by man’s senses." When these AI systems have sensors and a commitment to identifying reality we are getting closer to something that looks like reason. But with no sense perception there is only computation of some vanity.

@lokiholland 3 ай бұрын

Asking it to prove and dis-prove its answer gives some interesting outcomes, and can course correct

@ragnarherron7742 3 ай бұрын

Rag, chain of thought reflect atate transitions. These state transitions are simply counterfactuals that match underlying "least action" outcomes from emergent factor entropic forces

@ickorling7328 3 ай бұрын

It seems like you are refering to its most chaotic states can also be it's most insightful states or bad information?

@lucasteo5015 3 ай бұрын

reasoning and not reasoning maybe its like path finding, randomly guessing a path and then memorizing the correct path verses an actual "reasoning algorithm" that will pick the shortest path. However people will then debate if this path finding algorithm actually reasoning? Is our own term "reasoning" even is what we think reasoning is? and maybe our own reasoning is just a kind of path finding as well?