Do you think that ChatGPT can reason?

Рет қаралды 68,825

Күн бұрын

Пікірлер: 377

@luke.perkin.inventor 3 ай бұрын

My notes from this episode: Formal languages have intepreters, and can accept any gramatically correct code. The world is the interpreter for natural languages. We can't tell the difference between internaly reasoning from first principles and retrieval. Planning is an example of reasoning, e.g. stacking blocks that results in a certain sequence or shape. Swapping out the words 'stack' and 'unstack' for 'fist' and 'slap' and GPT4 fails. Reasoning is defined from a logical perspective. Deductive closure based on base facts. You don't need to just match distribution of query - answer, you need to do deductive closure. Transitive closure for example is a small part of deductive closure. People stop at the first intresting result from an LLM. For example, it can do a 13 rotational cypher, but it can't do any other number. If you can execute the general principle you should be able to do it. Ideation requires shallow knowledge of wide scope. Distributional properties versus instance level correctness. LLMs and diffusion models are good at one and not at the other. When an LLM critiques its own solutions, its accuracy goes down - it halucinates errors and incorrect verifications. Companies tell us they have million word context, but they make errors an intelligent child wouldn't make in a ten word prompt. They're good at 'style' not 'correctness'. Classical AI was better at correctness, not style. Teach a man to fish example - LLMs need 1 fish, 2 fish.. 3 fish... to N fish. A 'general advice taker' is roughly equivalent to the goal of general AI. "Modulo LLMs" - LLMs guess, and bank of external verifiers, verify. Back prompt, chain of thought, etc. Agentic systems are worthless without planning. It's not interchangable - toddlers can operate guns, cows with a plan can't answer the phone.

@NunTheLass 3 ай бұрын

Thank you. He was my favorite guest that I watched here so far. I learned a lot.

@AICoffeeBreak 2 ай бұрын

Thanks for having Prof. Kambhampati! I got to experience him first hand at this year's ACL where he also gave a keynote. What a great character! 🎉

@SouhailEntertainment 3 ай бұрын

Introduction and Initial Thoughts on Reasoning (00:00) The Manhole Cover Question and Memorization vs. Reasoning (00:00:39) Using Large Language Models in Reasoning and Planning (00:01:43) The Limitations of Large Language Models (00:03:29) Distinguishing Style from Correctness (00:06:30) Natural Language vs. Formal Languages (00:10:40) Debunking Claims of Emergent Reasoning in LLMs (00:11:53) Planning Capabilities and the Plan Bench Paper (00:15:22) The Role of Creativity in LLMs and AI (00:32:37) LLMs in Ideation and Verification (00:38:41) Differentiating Tacit and Explicit Knowledge Tasks (00:54:47) End-to-End Predictive Models and Verification (01:02:03) Chain of Thought and Its Limitations (01:08:27) Comparing Generalist Systems and Agentic Systems (01:29:35) LLM Modulo Framework and Its Applications (01:34:03) Final Thoughts and Advice for Researchers (01:35:02) Closing Remarks (01:40:07)

@DataTranslator 3 ай бұрын

His analogy of GPT to learning a second language makes 100% sense to me. I’m a nonnative speaker of English; yet I mastered it through grammar first and adding rules and exceptions throughout the years. Also, concepts were not the issue; but conveying those concepts was initially very challenging.🇲🇽🇺🇸

@stephenwallace8782 16 күн бұрын

Dude what's cool about this format is how much you trust your audience. Really great concentration and even a lot of more subtlely inspiring kinds of understanding. It makes the relationship between computer science and philosophy very clear.

@memetb5796 3 ай бұрын

This guest was such pleasant person to listen to: there is a indescribable joy in listening to someone that is clearly intelligent and a subject matter expert that just can't be gotten anywhere else.

@espressojim 3 ай бұрын

I almost never comment on youtube videos. This was an excellent interview and very informative. I'd love to hear more from Prof. Subbarao Kambhampati, as he did an amazing job of doing scientific story telling.

@Hiroprotagonist253 3 ай бұрын

For natural languages the world is the interpreter. What a profound statement 🤯. I am enjoying this discussion so far!

@trucid2 3 ай бұрын

I've worked with people who don't reason either. They exhibit the kind of shallow non-thinking that ChatGPT engages in.

@ericvantassell6809 3 ай бұрын

ayup. keywords provoke response without understanding.

@billykotsos4642 3 ай бұрын

especially CEOs

@ihbrzmkqushzavojtr72mw5pqf6 3 ай бұрын

Why are you talking about me in public????

@2dapoint424 3 ай бұрын

😎

@stevengill1736 3 ай бұрын

LOL - I imagine we all visit this probability space occasionally... ;*[}

@sammcj2000 3 ай бұрын

Fantastic interview, Prof Kambhampati seems to be not just wise but but governed by empathy and scepticism which is a wonderful combination.

@elgrego 3 ай бұрын

Bravo. One of the most interesting talks I’ve heard this year.

@jonashallgren4446 3 ай бұрын

Subbarao had a great tutorial at ICML! The general verification generation loop was very interesting to me. Excited to see more work in this direction that optimise LLMs with verification systems.

@rrathore01 3 ай бұрын

Great interview!! Some of the examples given in this interview which provides evidence that llms are not learning the underlying logic , colored block , 4*4 matrix multiplication, chain of thoughts issues. Best quote: i need to teach llms how to fish 1 fish and then how to fish 2 fish and fish 3 fish and so on and it would still fail on task of how to fish "N" fish for N> n it has not seen before

@johnheywood1043 2 ай бұрын

Best conversation on AI that I've been able to follow (not being a PhD in CS).

@thenautilator661 3 ай бұрын

Very convincing arguments. Haven't heard it laid out this succinctly and comprehensively yet. I'm sure Yann LeCunn would be in the same camp, but I recall not being persuaded by LeCunn's arguments when he made them on Lex Fridman

@edzehoo 3 ай бұрын

Basically there's a whole bunch of "scientists and researchers" that don't like to admit the AGI battle is being won (slowly but surely) by the tech bros led by Ilya and Amodei. AI is a 50-year old field dominated in the past by old men, and is now going through recent breakthroughs made by 30 year olds, so don't be surprised that there's a whole lot of ego at play to douse cold water on significant achievements.

@bharatbheesetti1920 3 ай бұрын

Do you have a response to Kambhampati's refutation of the Sparks of AGI claim? @edzehoo

@kman_34 3 ай бұрын

@@edzehooI can see this being true, but writing off their points is equally defensive/egotistical

@JD-jl4yy 3 ай бұрын

@@edzehoo Yep.

@jakobwachter5181 3 ай бұрын

@@edzehoo Ilya and Amodei are 37 and 41 respectively, I wouldn't call them "young", per se. Research on AI in academia is getting outpaced by industry, and only capital rivalling industry can generate the resources necessary to train the largest of models, but academics young and old are continuously outputting content of higher quality than most industry research departments. It's not just ego, it's knowing when something is real and when it is smoke and mirrors.

@dr.mikeybee 3 ай бұрын

Next word prediction is the objective function, but it isn't what the model learns. We don't know what the learned function is, but I can guarantee you it isn't log-odds.

@ericvantassell6809 3 ай бұрын

croissants .vs. yogurt

@Lolleka 3 ай бұрын

At the end of the day, the transformer is just a kind of modern Hopfield network. It stores patterns, it retrieves patterns. It's the chinese room argument all over again.

@memegazer 3 ай бұрын

@@Lolleka Not really. You can point to rules and say "rules can't be intelligent or reason" But when it is the NN that makes those rules, and the humans in the loop are not certain enough what they are to prevent hallucination or prevent the alignment problem then that is not the chinese room anymore.

@xt-89907 3 ай бұрын

Research around mechanistic interpretability is starting to show that TLLMs tend to learn some causal circuits and some memorization circuits (I.e., grokking). So they are able to learn some reasoning algorithms but there’s no guarantee of it. Plus, sequence modeling is weak on some kinds of graph algorithms necessary for certain classes of logical reasoning algorithms

@synthclub 3 ай бұрын

@@memegazer not hotdog, hotdog!

@JurekOK 3 ай бұрын

29:38 this is an actually breakthrough idea addressing a burning problem, that should be discussed more!

@aitheignis 3 ай бұрын

I love this episode. In science, it's never about what can be done or what happen in the system, but it's always about mechanism that lead to the event (how the event happen basically). What is severely missing from all the LLMs talk today is the talk about underlying mechanism. The work on mechanism is the key piece that will move all of these deep neural network works from engineering feat to actual science. To know mechanism, is to know causality.

@stevengill1736 3 ай бұрын

...yet they often talk about LLM mechanism as a "black box", to some extent insoluble...

@Thierry-in-Londinium 3 ай бұрын

This professor is clearly 1 of the leaders in his field. When you reflect & dissect what he is sharing. It stands scrutiny!

@whiteycat615 3 ай бұрын

Fantastic discussion! Fantastic guy! Thank you

@Paplu-i5t 3 ай бұрын

This discussion makes it totally clear about what we can expect from the LLMs, and the irrefutable reasons for it.

@jakobwachter5181 3 ай бұрын

Rao is wonderful, I got the chance to briefly chat with him in Vancouver at the last AAAI. He's loud about the limitations of LLMs and does a good job of talking to the layman. Keep it up, loving the interviews you put out!

@XOPOIIIO 3 ай бұрын

Reasoning requires loop thinking, to sort through the same thoughts from different angles, NNs are linear, they have input, output and just a few layers between them, their result is akin to intuition, not reasoning. That's why they give better results if you simulate loop thinking by feeding it's result to itself to create reasoning-like step-by-step process.

@pranavmarla Ай бұрын

I come back to this podcast every 2 weeks. Absolutely brilliant!

@shyama5612 3 ай бұрын

Sara Hooker said the same about us not fully understanding what is used in training - the low frequency data and memorization of those being interpreted as generalization or reasoning. Good interview.

@PrinceCyborg 3 ай бұрын

Easy, it’s all about prompting. Try this prompt with the Planbench test: Base on methodical analysis of the given data, without making unfounded assumptions. Avoid unfounded assumptions this is very important that you avoid unfounded assumptions, and base your reasoning directly on what you read/ see word for word rather than relying on training data which could introduce bias, Always prioritize explicitly stated information over deductions Be cautious of overthinking or adding unnecessary complexity to problems Question initial assumptions. Remember the importance of sticking to the given facts and not letting preconceived notions or pattern recognition override explicit information. Consider ALL provided information equally. re-check the reasoning against each piece of information before concluding.

@sofoboachie5221 3 ай бұрын

This probably the best episode I have watched here and I watch this channel as a podcast. Fantastic guest

@KRGruner 2 ай бұрын

Great stuff! ACTUAL non-hype commentary on AI and LLMs. I am familiar with Chollet and ARC, so no big surprises here but still, very well explained.

@GarthBuxton 3 ай бұрын

Great work, thank you.

@yafz 3 ай бұрын

Excellent, in-depth interview! Thanks a lot!

@annette4718 3 ай бұрын

This is a very refreshing episode. Lots of complex topics synthesized into easily digestible insights

@scottmiller2591 3 ай бұрын

Good take on LLMs and not anthropomorphizing them. I do think there is an element of "What I do is hard, what others do is easy" to the applications of LLMs in creativity vs. validation, however.

@swarupchandra1333 3 ай бұрын

One of the best explanations I have come across

@vishalrajput9856 3 ай бұрын

I love Rao's work and he's funny too.

@timcarmichael 3 ай бұрын

Have we yet defined intelligence sufficiently well that we can appraise it and identify it hallmarks in machines?

@stevengill1736 3 ай бұрын

I think if we qualify the definition of intelligence as including reasoning, then yes. I'd rather use the term sentience - now artificial sentience...that would be something!

@benbridgwater6479 3 ай бұрын

@@johan.j.bergman Sure, but that's a bit like saying that we don't need to understand aerodynamics or lift to evaluate airplanes, and can just judge them on their utility and ability to fly ... which isn't entirely unreasonable if you are ok leaving airplane design up to chance and just stumbling across better working ones once in a while (much as the transformer architecture was really a bit of an accidental discovery as far as intelligence goes). However, if we want to actively pursue AGI and more intelligent systems, then it really is necessary to understand intelligence (which will provide a definition) so that we can actively design it in and improve upon it. I think there is actually quite a core of agreement among many people as what the basis of intelligence is - just no consensus on a pithy definition.

@jakobwachter5181 3 ай бұрын

@@johan.j.bergman A spatula serves a helpful purpose that no other cooking tool is able to replace in my kitchen, so I find it incredibly useful. Turns out they are rather mass market too. Should I call my spatula intelligent?

@Cammymoop 3 ай бұрын

@rey82rey82 3 ай бұрын

The ability to reason

@oscarmoxon 3 ай бұрын

There's a difference between in-distribution reasoning and out-of-distribution reasoning. If you can make the distribution powerful enough, you can still advance research with neural models.

@SurfCatten 3 ай бұрын

Absolutely true. As an example I tested its ability to do rotation ciphers myself and it performed flawlessly. Obviously the reasoning and logic to do these translations was added to its training data since that paper was released.

@PrinceCyborg 3 ай бұрын

@prabhdeepsingh5642 3 ай бұрын

Leaving the debate of reasoning aside, this discussion was a damn good one. Learned a lot. Dont miss out on this one due to some negative comments. Its worth your time.

@swarnavasamanta2628 3 ай бұрын

The feeling of understanding is different from the algorithm of understanding that's being executed in your brain. The feeling of something is created by consciousness while that something might already be going on in your brain. Here's a quick thought experiment: Try adding two numbers in your mind, and you can easily do it and get an answer. Not only that, but you have a feeling of the understanding of the addition algorithm in your head. You know how it works and you are aware of it being executed and the steps you're performing in real time. But imagine if you did not have this awareness/consciousness of this algorithm in your head. That's how LLMs can be thought of, they have an algorithm and it executes and outputs an answer but they are not aware of the algorithm itself or it is being performed and neither have any agency over it. Doing something and perception that you are doing something is completely different.

@prasammehta1546 3 ай бұрын

Basically they are soulless brain which they actually are :P

@techchanx 3 ай бұрын

I agree fully with the points here. LLMs are good at "creative" side of language and media, though its not really the same creativity as humans. However its best to use that capability of LLMs to construct responses in an acceptable manner, while the actual data is coming from authoritative sources and the metrics coming from reliable calculations based on formulas, calculators or rule engines. Btw, I have given below a better written professional version of my above post, courtesy Google Gemini. I could not have said it any better. I concur with the assessment presented. Large language models (LLMs) excel at generating creative language and media, albeit distinct from human creativity. Leveraging this capability, LLMs can effectively construct responses in an appropriate manner, while sourcing data from authoritative references and deriving metrics from reliable calculations based on formulas, calculators, or rule engines. This approach optimizes the strengths of both LLMs and traditional information systems for a comprehensive and accurate solution.

@fedkhepri 3 ай бұрын

This is the first time I'm seeing either of the two people in the video, and I'm hooked. Lots of hard-punching and salients points to be gotten from the guest, and kudos to the interviewer for steering the discussion.

@Redx3257 3 ай бұрын

Yea this man is brilliant. I could just listen to him all day.

@CoreyChambersLA 3 ай бұрын

ChatGPT simulates reasoning surprisingly well using its large language model for pattern recognition and prediction.

@snarkyboojum 3 ай бұрын

Great conversation. I disagree that LLMs are good for idea generation. In my experience, they're good at replaying ideas back to you that are largely derivative (based on the data they've been trained over). The truly 'inductive leaps' as the Professor put it, aren't there in my interaction with LLMs. I use them as a workhorse for doing grunt work with ideas I propose and even then I find them lacking in attention to detail. There's a very narrow range they can work reliably in, and once you go outside that range, they hallucinate or provide sub-standard (compared to human) responses. I think the idea that we're co-creating with LLMs is an interesting one that most people haven't considered - there's a kind of symbiosis where we use the model and build artefacts that future models are then trained on. This feedback loop across how we use LLMs as tools is interesting. That's the way they currently improve. It's a symbiotic relationship - but humans are currently providing the majority of the "intelligence", if not all of it, in this process.

@larsfaye292 3 ай бұрын

What a fantastic and succinct response! My experience has been _exactly_ the same.

@sangouda1645 3 ай бұрын

That's exactly it, they start to really act as good creative partner at Nth iteration after explaining to it back and forth by giving feedback, but once it gets the hang of it, really acts like a student wanting get good score from a teacher :)

@notHere132 3 ай бұрын

We need an entirely new model for AI to achieve true reasoning capability.

@HoriaCristescu 3 ай бұрын

What you should consider is the environment-agent system, not the model in isolation. Focusing on models is a bad direction to take, it makes us blind to the process of external search and exploration, without which we cannot talk about intelligence and reasoning. The scientific method we use also has a very important experimental validation step, not even humans could reason or be creative absent environment.

@luke.perkin.inventor 3 ай бұрын

Great episode and fantastic list of papers in the description!

@weftw1se 3 ай бұрын

Disappointing to see so much cope from the LLM fans in the comments. Expected, but still sad.

@yeleti 3 ай бұрын

They are rather AGI hopefuls. Who's not a fan of LLMs including the Prof ;)

@weftw1se 3 ай бұрын

@@yeleti yeah, I think they are very interesting / useful but I doubt they will get to AGI with scaling alone.

@phiarchitect 3 ай бұрын

what a wonderfully exuberant person

@ej3281 3 ай бұрын

Very nice to hear from an LLM guy that hasn't lost his mind. He's simply wrong about LLMs being useful for unconstrained idea generation, but as far as his other views go, very enjoyable to watch.

@Paplu-i5t 3 ай бұрын

Such a sharp mind of a senior man.

@martin-or-beckycurzee4146 2 ай бұрын

Very interesting. Wonder whar mr kambhampati thinks about strawberry? Best for creative use cases? Now maybe, but progress is looking good - better than prof kambhampati was expecting…

@falricthesleeping9717 Ай бұрын

01:01:31 I had to listen to it multiple times, can't focus these days, that section he's specifically talking about chain of thought, and the guy that wrote the paper, he's saying it's one more way of brute-forcing it with more data, the data is just solving stuff with thought-chains, and it's kinda obvious, it's impressive that it can do alot of code-challenges, but almost all of the code challenges have solutions with in-depth explanations after the competition is done, so many people wrote so many things about how to solve them, and open ai said it themselves they fine-tuned the model with the solutions of other participants to increase their accuracy. even with all of that, given the most complex programming challenges it still fails to keep its consistency given real-world projects, now one way of reading this can be just give it more data, until every possible problem is in the training data and just improve the context window, but the point still stands, they're really not reasoning

@hartmut-a9dt 3 ай бұрын

great interview !

@ACAndersen 3 ай бұрын

His argument is that if you change the labels in classical reasoning tests the LLM fails to reason. I tested GPT 4 on the transitive property, with the following made up prompt: "Komas brisms Fokia, and Fokia brisms Posisos, does Komas brism Posisos? To brism means to contain." After some deliberation it concluded that yes, the statement holds true. Thus there is some reasoning there.

@hashp7625 3 ай бұрын

How did you test his primary point on this topic - that the GPT 4 training data is so large that it has been trained on common statements like this and that answering true is a likely distribution?

@JohnChampagne 18 күн бұрын

I accidentally typed using a standard (Qwerty) keyboard, rather than Dvorak, so I asked GPT to convert the text. It was beyond its ability, (more than a year ago, I think). Qwerty was made to be intentionally slow, to accommodate the mechanical devices that would tend to jam if typists went too fast. We should change outmoded patterns of behavior. After about eight hours of using Dvorak, you will match, then exceed your speed on the Qwerty keyboard.

@Tititototo 3 ай бұрын

Pertinently pinpointed, one killed 'the beast'. LLMs are just wonderful 'bibliothèques vivantes', quite great tools that save time by ignoring any educated protocols

@willd1mindmind639 3 ай бұрын

Reasoning in humans is about using abstractions or general understanding of concepts to arrive at a result. A perfect example is math problems. Most humans use shortcuts to solve math calculations which can be a form of reasoning. In a computing sense, reasoning would be calculating a math answer without using the ALU (Arithmetic logic circuts on the CPU). In a GPT context it would mean arriving at a result without having the answer (and question) already in the training distribution set. So for example, a human using reasoning can add two plus two as follows: 2 is a number representing a quantity of items in a set that can be counted. So 2 plus 2 becomes 1, 2, 3, 4 (counting up 2 places and then counting up 2 more places with 4 being the answer. Something like that is not possible on a CPU . And ChatGPT would also not be able to do that either because it wouldn't be able to generalize that idea of counting to any kind of addition of 2 numbers. If it could, without every combination of numbers written out using the counting method in its training data (or distribution), then it would be reasoning.

@virajsheth8417 27 күн бұрын

Not just counting, it can do even summation step by step like humans. So basically you're wrong. Look at my other comment.

@virajsheth8417 27 күн бұрын

To solve using traditional addition, we'll add the numbers digit by digit from right to left, carrying over when necessary. Step-by-Step Calculation: 1. Units Place: Write down 6, carry over 1. 2. Tens Place: Write down 2, no carry over. 3. Hundreds Place: Write down 5, carry over 1. 4. Thousands Place: Write down 9, no carry over. 5. Ten-Thousands Place: Write down 6, carry over 1. 6. Hundred-Thousands Place: Write down 1, carry over 1. 7. Millions Place: Write down 2, carry over 1. 8. Carry Over: Since there's an extra 1 carried over, we place it at the next leftmost position. Final Result: Answer: 12,169,526

@willd1mindmind639 26 күн бұрын

@@virajsheth8417 You are missing the point. Numbers are symbols that represent concepts and because of that have various ways the human mind can use those concepts to solve problems. It is that ability to explore concepts and apply them in a novel fashion is what is called reasoning. Your example is not "reasoning", as opposed to more of a "step by step" approach which is the most common pattern that exists to solve any particular mathematical problem. Which implies that those steps are easily found the training data and model distribution so of course that is what the LLM is using. Because what you described is the typical way math is taught in grade school. It in no way shape or form implies understanding fundamental math concepts and using those concepts in an ad hoc fashion to solve any problem. Ad hoc in this context would mean using a pattern not found explicitly in the distribution of the language model. The point you missed is that numbers being symbols that in themselves represent quantities of individual discrete elements is an abstract concept. And the ability to apply that kind of abstract understanding to solving or coming up with approaches to solve math problems is unique to humans, because that is how math came about in the first place. Another example of human reasoning: You can add two numbers such as 144 to 457, by simply taking each column and add them up with place value separately and then add the sums of the columns, without the need to calculate a remainder. Which results in: 500 + 90 + 11 = 601 or (5 x 100) + ( 9 x 10) + (11 x 1). It is not a common way of doing addition is my point and not something one would expect an LLM to come up with, unless of course you prompted it to do so and even then it may not come up with that exact same approach unless it is found in the training data. At the end of the day, what this is about is not "reasoning" as opposed to explaining how the LLM came up with an answer to a math problem. And having these AI algorithms be able to explain how it came to a answer has been something that has been requested for quite a while. But it is not "reasoning" in the sense of coming up with unique or novel approaches outside of the training examples based purely on understanding of underlying concepts.

@markplutowski 3 ай бұрын

1:31:09 - 1:31:32. “People confuse acting with planning“ . “We shouldn’t leave toddlers alone with a loaded gun.” this is what frightens me : agent based systems let loose in the wild without proper controls. A toddler AI exploring the world, picking up a loaded gun and pulling the trigger.

@markplutowski 3 ай бұрын

if the title says “people don’t reason” many viewers think it makes the strong claim “ALL people don’t reason“, when it is actually making the weaker claim “SOME people don’t reason“. that title is factually defensible but misleading. one could be excused for interpreting this title to be claiming “ChatGPT doesn’t reason (at all)“, when it is actually claiming “ChatGPT doesn’t reason (very well)“. One of the beauties of human language is that the meaning of an utterance derived by the listener depends as much on the deserialization algorithm used by the listener as on the serialization algorithm employed by the speaker. the KZbin algorithm chose this title because the algorithm “knows” that many viewers assume the stronger claim. nonetheless, be that as it may, this was a wonderful interview. many gems of insight on multiple levels ; including historical, which I enjoyed. I especially liked your displaying the title page of an article that was mentioned. looking forward to someone publishing “Alpha reasoning: no tokens required“. I would watch again.

@阳明子 3 ай бұрын

Professor Kambhampat is making the stronger claim that LLMs do not reason at all.

@markplutowski 3 ай бұрын

@@阳明子 1:20:26 "LLMs are great idea generators", which is such an important part of reasoning, he says, that Ramanujan was great largely because he excelled at the ideation phase of reasoning. 16:30 he notes that ChatGPT 4.0 was scored at 30% on a planning task. 1:23:15 he says that LLMs are good for style critiques, therefore for reasoning about matters of style, LLMs can do both ideation and verification.

@阳明子 3 ай бұрын

@@markplutowski 3:14 "I think the large language models, they are trained essentially in this autoregressive fashion to be able to complete the next word, you know, guess the next word. These are essentially n-gram models." 11:32 Reasoning VS Retrieval 17:30 Changing predicate names in the block problem completeley confuses the LLMs 32:53 "So despite what the tenor of our conversation until now, I actually think LLMs are brilliant. It's just the brilliant for what they can do. And just I don't complain that they can't do reason, use them for what they are good at, which is unconstrained idea generation."

@markplutowski 3 ай бұрын

@@阳明子 Ok, I see it now. I originally misinterpreted his use of a double-negative there where he says "And just I don't complain that they can't do reason". That said, he contradicts himself by admitting that they can do a very limited type of reasoning (about matters of style), and are weakly capable of planning (which is considered by many as a type of reasoning, although he seems to disagree with that), and can be used for an important component of reasoning (ideation). But yeah, I see now that you are correct - even though there are these contradictions he is indeed claiming "that they can't do reason".

@MateusCavalcanteFonseca 3 ай бұрын

Hegel said long time ago that deduction and induction are diferent aspects of the same process, the process of aquiring knlowdge about the world. great talk

@sonOfLiberty100 3 ай бұрын

Wow a good one thank you

@Neomadra 3 ай бұрын

LLMs definitely can do transitive closure. Not sure why the guest stated otherwise. I tried it out with completely random strings as object names and Claude could do it easily. So it's not just retrieving information.

@autingo6583 3 ай бұрын

this is supposed to be science. i hate it so much when people who call themselves researchers do not really care for thoroughness, or even straight out lie. don't let them get away with it.

@jeremyh2083 3 ай бұрын

It struggles with it if you create something it’s never seen before. It’s a valid point on his part.

@st3ppenwolf 3 ай бұрын

transitive closures can be done from memory. It's been shown these models perform bad with novel data, so he has a point still

@SurfCatten 3 ай бұрын

And it was also able to do a rotation cipher of any arbitrary length when I just tested it. There are definite limitations but what they can do is far more complex than simply repeating what's in the training data. I made a separate post but I just wanted to add on here that it can also do other things that he specifically said it can't.

@gen-z-india 3 ай бұрын

Ok, everything they speak is guess work, and it will be so until deep learning is there.

@johnkost2514 3 ай бұрын

This aligns nicely with the work Fabrice Bellard has been doing using Transformers to achieve SOTA lossless compression in his NNCP algorithm. Coincidence .. I think not!

@shizheliang2679 23 күн бұрын

Is there any references for the part about LLMs cannot detect transitive closure? I would love to see the details

@jeremyh2083 3 ай бұрын

Those people who are assuming the AGI is going to be achieved have never done long-term work inside any of the major GPT systems if you want to have a quick and dirty test, tell it to create you a fiction book first make 15 chapters and 10 sections with each chapter And then have it start writing that book look at it in detail and you will see section after section it loses sight of essentially every detail. It does a better job if you are working inside the universe, another author has already made and does the worst job if you were creating a brand new universe, even if you have it define the universe.

@mattwesney 3 ай бұрын

sounds like youre bad at prompting

@jeremyh2083 3 ай бұрын

@@mattwesney lol it does, doesn’t it, but you haven’t tried it and I have.

@rolandhochmuth1951 3 ай бұрын

I originally tried, Let a = b and b = c, does a = c? The response cited the transitive property and said yes, a = c. So, then I tried something like, Let asdf = hjkl and let hjkl = vbnm, does asdf = vbnm to try and eliminate a potential reference to a description of the transitive property that uses a, b, and c as variable names? The response was not able to identity the transitive property in this case and didn't know how to treat the = sign. I then modified the prompt after a few iterations too, Let asdf, hjkl and vbnm be variables. Let = be equality. Given asdf = hjkl and hjkl = vbnm, does asdf = vbnm? The response was: You're getting closer! Now we have a clearer setup. Since we've defined "=" as equality and the variables are unspecified, we can indeed apply the transitive property. Yes, given asdf = hjkl and hjkl = vbnm, then asdf = vbnm. The transitive property holds true in this case because we're dealing with standard equality.

@rolandhochmuth1951 3 ай бұрын

Extending this model a bit more. Let asdf, hjkl, vbnm and zxcv be variables. Let = be equality. Given asdf = hjkl, hjkl = vbnm, and vbnm = zxcv, does asdf = xxcv? Model 2.5s Yes, you're absolutely right! Following the transitive property: asdf = hjkl hjkl = vbnm vbnm = zxcv Therefore: asdf = zxcv Let me know if you have any other logic puzzles you'd like to try!

@DataJuggler 3 ай бұрын

0:18 When I was 4 years old, I was often stuck at my parents work. The only thing for me to do that was entertaining, was play with calculators or adding machines. I memorized the times table, because I played with calculators a lot. My parents would spend $8 at the drug store to keep me from asking why is the sky blue and other pertinent questions. I was offered to skip first grade by after kindergarten, and my parents said no. Jeff Bezos is the same age from me, and also from Houston. His parents said yes to skipping first grade. I told my parents this forever until they died.

@Jukau 3 ай бұрын

what is the bombshell? This is absolutly clear and known...it would be a bombshell if it would

@MachineLearningStreetTalk 3 ай бұрын

Read the comments section here, I wish it was clear and known. It's subtle and requires a fair bit of CS knowledge to grok unfortunately.

@hayekianman 3 ай бұрын

the caesar cipher thing is already working for any n for claude 3.5. so donno

@benbridgwater6479 3 ай бұрын

Sure - different data set. It may be easy to fix failures like this by adding corresponding training data, but this "whack-a-mole" approach to reasoning isn't a general solution. The number of questions/problems one could pose of a person or LLM is practically infinite, so the models need to be able to figure answers for themselves.

@januszinvest3769 2 ай бұрын

@@benbridgwater6479so please give one example that shows clearly that LLMs can't reason

@siddharth-gandhi 3 ай бұрын

Hi! Brilliant video! Much to think about after listening to hyper scalers for weeks. One request, can you please cut on the clickbait titles? I know you said for YT algo but if I want to share this video with say PhD, MS or profs, no one takes a new channel seriously with titles like this one (just feels clickbaity for a genuinely good video). Let the content speak for itself. Thanks!

@MachineLearningStreetTalk 3 ай бұрын

I am really sorry about this, we will change it to something more academic when the views settle down. I’ve just accepted it as a fact of youtube at this point. We still use a nice thumbnail photo without garish titles (which I personally find more egregious)

@siddharth-gandhi 3 ай бұрын

@@MachineLearningStreetTalk Thanks for understanding! 😁

@therainman7777 3 ай бұрын

We already do have a snapshot of the current web. And snapshots for every day prior. It’s the wayback machine.

@tylermoore4429 3 ай бұрын

This analysis is of LLM's as a static thing, but the field is evolving. Neurosymbolic approaches are coming, a couple of these are already out there in the real world (MindCorp's Cognition and Verses AI).

@billykotsos4642 3 ай бұрын

very insightful

@anuragshas 3 ай бұрын

On the Dangers of Stochastic Parrots paper still holds true

@alexandermoody1946 3 ай бұрын

Not all manhole covers are round. The square manhole covers that have a two piece triangular tapered construction are really heavy.

@VanCliefMedia 3 ай бұрын

I would love to see his interpretation of the most recent gpt4 release with the structured output and creating reasoning through that output

@wtfatc4556 3 ай бұрын

Gpt is like a reactive mega wikipedia....

@PavanMS87 3 ай бұрын

Gold mine detected, Subbed!!

@life42theuniverse 3 ай бұрын

The most likely response to logical questions is logical answers.

@pallharaldsson9015 3 ай бұрын

16:44 "150% accuracy [of some sort]"? It's a great interview with the professor (the rest of it good), who knows a lot, good to know we can all do such mistakes...

@benbridgwater6479 3 ай бұрын

I processed it as dry humor - unwarranted extrapolation from current performance of 30%. to "GPT 5" at 70%. to "GPT 10" at 150%. Of course he might have just mis-spoke. Who knows.

@kangaroomax8198 20 күн бұрын

It’s a joke, obviously. He is making fun of people extrapolating.

@television9233 3 ай бұрын

I've read some of Prof Subbarao's work from ASU. Excited for this interview.

@zooq-ai 3 ай бұрын

Wow, incredible episode

@mattabrahamson8816 3 ай бұрын

contrary to his claims gpt4o & sonnet do generalize to different cipher shifts. gpt4o: It looks like the text "lm M pszi csy" might be encoded using a simple substitution cipher, such as the Caesar cipher. This type of cipher shifts the letters of the alphabet by a fixed number of positions. To decode it, let's try different shifts and see if any of them make sense. For example, if we shift each letter by 4 positions backward (a common shift in Caesar ciphers): - l -> h - m -> i - M -> I - p -> l - s -> o - z -> v - i -> e - c -> y - s -> o - y -> u So, "lm M pszi csy" becomes "hi I love you." This decoded message makes sense as a simple phrase. If you have any additional context or need further decoding, please let me know!

@BrianPeiris 3 ай бұрын

Great, you got one sample. Now run it a hundred times each for different shifts and report back.

@luisluiscunha 3 ай бұрын

Maybe in the beginning, with Yannick, these talks were properly named "Street Talk". They are more and more Library of the Ivory Tower talks, full of deep "philosophical" discussions that I believe will be considered all pointless. I love the way Heinz Pagels described how the Dalai Lama avoided entering into arguments of this kind about AI. When asked his opinion about a system he could talk as to a person, he just said "sit that system in front of me, on this table, then we can continue this talk". This was in the 80s. Even to be profoundly philosophical you can think in a very simple and clear way. It is a way of thinking epistemologically most compatible with Engineering, that ultimately is where productive cognitive energy should be spent.

@plasticmadedream 3 ай бұрын

A new bombshell has entered the villa

@idck5531 3 ай бұрын

Possible LLMs do not reason, but they sure are very helpful for coding. You can combine and generate code easily and advance much faster. Writing scripts for my PhD is 10x easier now.

@dylanmenzies3973 2 ай бұрын

Reasoning is pattern matching. Verfication is something to be learnt and reinforced - humans are very often wrong, even complex math proofs. Generating possible paths for reasoning is inherently very noisy, and needs a balnce of verification to keep it on ttack.

@akaalkripal5724 3 ай бұрын

I'm not sure why so many LLM fans are choosing to attack the Professor, when all he's doing is pointing out huge shortcomings, and hinting at what could be real limitations, no matter the scale.

@alansalinas2097 3 ай бұрын

Because they don't want the professor to be right?

@therainman7777 3 ай бұрын

I haven’t seen anyone attacking him. Do you mean in the comments to this video or elsewhere?

@lystic9392 3 ай бұрын

I think I have a way to allow almost any model to 'reason'. Or to use reasoning, anyway.

@patruff 3 ай бұрын

I memorized all the knowledge of humans. I can't reason but I know everything humans have ever put online. Am I useful? Provide reason.

@fburton8 3 ай бұрын

What proportion of “all the knowledge of humans” do current models have access to?

@patruff 3 ай бұрын

@@fburton8 all of it, well everything on the open Internet so most books, most poetry, lots of art, papers, code, etc.

@albin1816 3 ай бұрын

Extremely useful. Ask any engineer who's addicted to ChatGPT / Copilot / OpenAI API at the moment for their daily workflows.

@malakiblunt 3 ай бұрын

but i make up 20% of my answers - can you tell which ?

@n4rzul 3 ай бұрын

@@malakiblunt So do you... sometimes...

@davidcummins8125 3 ай бұрын

Could an LLM for example figure out whether a request requires a planner, a math engine etc, transform the request into the appropriate format, use the appropriate tool, and then transform the results for the user? I think that LLMs provide a good combination of UI and knowledge base. I was suspicious myself that in the web data they may well have seen joke explanations, movie reviews, etc etc and can lean on that. I think that LLMs can do better, but it requires memory and a feedback loop in the same way that embodied creatures have.

@notHere132 3 ай бұрын

I use ChatGPT every day. It does not reason. It's unbelievably dumb, and sometimes I have trouble determining whether it's trying to deceive me or just unfathomably stupid. Still useful for quickly solving problems someone else has already solved, and that's why I continue using it.

@PaoloCaminiti-b5c 3 ай бұрын

I'm very skeptic of this, Aristotele inferred logic by looking at rhetoric arguments, LLMs could being extracting those features already while building their model to compress the corpus of data and this seems equivalent to propositional logic. It seems this researcher is pushing too much the accent on agents needing to be able of mathematical proof, which utlity in agents - including humans - is not well stated.

@virajsheth8417 27 күн бұрын

Absolutely agreed!

@briandecker8403 3 ай бұрын

I love that this channel hosts talks by the best experts in the field and generates comments from the lowest Dunning-Kruger keyboard cowboys.

@LuigiSimoncini 3 ай бұрын

Bwahaha!!!

@Neomadra 3 ай бұрын

I agree but Claude 3.5 does! ;)

@vpn740 3 ай бұрын

no, it doesn't.

@Neomadra 3 ай бұрын

@@vpn740It's a joke.

@JG27Korny 3 ай бұрын

I think there is broad misconception. LLMs are LLMs they are not AGI (artificial general intelligence). Each AI has a world model. If the question fits the world model it will work. It is like asking a chess ai engine to play checkers. That is why multimodal models are the big thing as they train not just on corpus of texts and on images too. So those visually trained AI models will solve the stacking problem on minute 19:00. It is not that chatgpt does not reason. It reasons but not as a human does.

@TastyGarlicBread 3 ай бұрын

There are a lot of people in this comment section who probably can't even do basic sums, let alone understand how a large language model works. And yet, they are very happy to criticize. We are indeed living in an Idiocracy.

@MoeShlomo 3 ай бұрын

People typically assume that LLMs will always be "stuck in a box" that is determined by their training data. But humans are of course quite clever and will figure out all sorts of ways to append capabilities analogous to different brain regions that will allow LLMs to effectively "think" well enough to solve increasingly-challenging problems and thereby self-improve. Imagine equipping a humanoid robot (or a simulated one) with GPT6 and Sora3 to allow it to make predictions about what will happen based on some potential actions, take one of those actions, get feedback, and integrate what was learned into its training data. My point is that people will use LLMs as a component of a larger cognitive architecture to make very capable systems that can learn from their actions. And of course this is just one of many possible paths.

@benbridgwater6479 3 ай бұрын

Sure, there will be all sorts of stuff "added to the box" to make LLMs more useful for specific use cases, as is already being done - tool use, agentic scaffolding, specialized pre-training, etc, but I don't think any of this will get us to AGI or something capable of learning a human job and replacing them. The ability for lifelong learning by experimentation is fundamentally missing, and I doubt this can be added as a bolt-on accessory. It seems we really need to replace gradient descent and pre-training with a different more brain-like architecture capable of continual learning.

@eyoo369 3 ай бұрын

@@benbridgwater6479Yes agree with that. Anything that doesn’t resemble the human brain will not bring us to AGI. While LLMs are very impressive and great first step into a paradigm shift. They are ultimately a hack route to reach the current intelligence. there are still so many levels of reasoning missing even from the SOTA models like Claude 3.5 and GPT-4o. For me the roadmap to general intelligence is defined by the way it learns and not necessarily what a model outputs after pre-training it. To be more specifically.. true AGI would be giving a model the same amount of data a human approximately gets exposed to in its lifetime and perform like a median human. Throwing the worlds data and scaling the parameters into billions / trillions.. although is impressive. But far away from AGI

@maxbuckley1246 3 ай бұрын

Which paper is referred to at 1:03:51 when multiplication with four digit numbers is discussed?

@MachineLearningStreetTalk 3 ай бұрын

Faith and Fate: Limits of Transformers on Compositionality "finetuning multiplication with four digit numbers" arxiv.org/pdf/2305.18654

@fburton8 3 ай бұрын

Have LLMs ever said they don’t know the answer to a question? Often this is the most useful / helpful response, so why doesn’t it do it? It’s disappointing.

@dankprole7884 3 ай бұрын

It kind of makes sense if you think it is just formatting a distribution of the most likely tokens in a plausible style i.e. a question and answer format. If "I don't know" isn't likely (i.e. if either the source material was either not in Q and A format or it was but the answer was not something like "I don't know"), then it's just not gonna be the answer given by the LLM. A hallucination IS the I don't know response. Unfortunately, not being able to detect that kind of defeats the whole purpose of using an LLM in the first place!

@fburton8 3 ай бұрын

@@dankprole7884 Good point. I have also encountered examples where the LLM _did_ know the answer and should have been able to suggest it as a possibility, but didn't do so until prompted with "What about [the answer]?". For example, ChatGPT had great difficulty finding a pasta very similar to conchiglie in size and shape but with smooth bumps instead of ridges. It went round and round in circles making completely inappropriate suggestions until I asked "What about cestini?". It was useful as a chocolate teapot for this kind of task.

@dankprole7884 3 ай бұрын

@@fburton8 yeah I've seen similar. I use Claude quite a lot for quick code snippets when I can't quite remember the one obscure command I need to do the exact thing I want. By the time I keep saying no not that and get the right answer, I could have googled it several times. It's very hit and miss at any level of complexity / rarity.