My notes from this episode: Formal languages have intepreters, and can accept any gramatically correct code. The world is the interpreter for natural languages. We can't tell the difference between internaly reasoning from first principles and retrieval. Planning is an example of reasoning, e.g. stacking blocks that results in a certain sequence or shape. Swapping out the words 'stack' and 'unstack' for 'fist' and 'slap' and GPT4 fails. Reasoning is defined from a logical perspective. Deductive closure based on base facts. You don't need to just match distribution of query - answer, you need to do deductive closure. Transitive closure for example is a small part of deductive closure. People stop at the first intresting result from an LLM. For example, it can do a 13 rotational cypher, but it can't do any other number. If you can execute the general principle you should be able to do it. Ideation requires shallow knowledge of wide scope. Distributional properties versus instance level correctness. LLMs and diffusion models are good at one and not at the other. When an LLM critiques its own solutions, its accuracy goes down - it halucinates errors and incorrect verifications. Companies tell us they have million word context, but they make errors an intelligent child wouldn't make in a ten word prompt. They're good at 'style' not 'correctness'. Classical AI was better at correctness, not style. Teach a man to fish example - LLMs need 1 fish, 2 fish.. 3 fish... to N fish. A 'general advice taker' is roughly equivalent to the goal of general AI. "Modulo LLMs" - LLMs guess, and bank of external verifiers, verify. Back prompt, chain of thought, etc. Agentic systems are worthless without planning. It's not interchangable - toddlers can operate guns, cows with a plan can't answer the phone.
@NunTheLass3 ай бұрын
Thank you. He was my favorite guest that I watched here so far. I learned a lot.
@AICoffeeBreak2 ай бұрын
Thanks for having Prof. Kambhampati! I got to experience him first hand at this year's ACL where he also gave a keynote. What a great character! 🎉
@SouhailEntertainment3 ай бұрын
Introduction and Initial Thoughts on Reasoning (00:00) The Manhole Cover Question and Memorization vs. Reasoning (00:00:39) Using Large Language Models in Reasoning and Planning (00:01:43) The Limitations of Large Language Models (00:03:29) Distinguishing Style from Correctness (00:06:30) Natural Language vs. Formal Languages (00:10:40) Debunking Claims of Emergent Reasoning in LLMs (00:11:53) Planning Capabilities and the Plan Bench Paper (00:15:22) The Role of Creativity in LLMs and AI (00:32:37) LLMs in Ideation and Verification (00:38:41) Differentiating Tacit and Explicit Knowledge Tasks (00:54:47) End-to-End Predictive Models and Verification (01:02:03) Chain of Thought and Its Limitations (01:08:27) Comparing Generalist Systems and Agentic Systems (01:29:35) LLM Modulo Framework and Its Applications (01:34:03) Final Thoughts and Advice for Researchers (01:35:02) Closing Remarks (01:40:07)
@DataTranslator3 ай бұрын
His analogy of GPT to learning a second language makes 100% sense to me. I’m a nonnative speaker of English; yet I mastered it through grammar first and adding rules and exceptions throughout the years. Also, concepts were not the issue; but conveying those concepts was initially very challenging.🇲🇽🇺🇸
@stephenwallace878216 күн бұрын
Dude what's cool about this format is how much you trust your audience. Really great concentration and even a lot of more subtlely inspiring kinds of understanding. It makes the relationship between computer science and philosophy very clear.
@memetb57963 ай бұрын
This guest was such pleasant person to listen to: there is a indescribable joy in listening to someone that is clearly intelligent and a subject matter expert that just can't be gotten anywhere else.
@espressojim3 ай бұрын
I almost never comment on youtube videos. This was an excellent interview and very informative. I'd love to hear more from Prof. Subbarao Kambhampati, as he did an amazing job of doing scientific story telling.
@Hiroprotagonist2533 ай бұрын
For natural languages the world is the interpreter. What a profound statement 🤯. I am enjoying this discussion so far!
@trucid23 ай бұрын
I've worked with people who don't reason either. They exhibit the kind of shallow non-thinking that ChatGPT engages in.
@ericvantassell68093 ай бұрын
ayup. keywords provoke response without understanding.
@billykotsos46423 ай бұрын
especially CEOs
@ihbrzmkqushzavojtr72mw5pqf63 ай бұрын
Why are you talking about me in public????
@2dapoint4243 ай бұрын
😎
@stevengill17363 ай бұрын
LOL - I imagine we all visit this probability space occasionally... ;*[}
@sammcj20003 ай бұрын
Fantastic interview, Prof Kambhampati seems to be not just wise but but governed by empathy and scepticism which is a wonderful combination.
@elgrego3 ай бұрын
Bravo. One of the most interesting talks I’ve heard this year.
@jonashallgren44463 ай бұрын
Subbarao had a great tutorial at ICML! The general verification generation loop was very interesting to me. Excited to see more work in this direction that optimise LLMs with verification systems.
@rrathore013 ай бұрын
Great interview!! Some of the examples given in this interview which provides evidence that llms are not learning the underlying logic , colored block , 4*4 matrix multiplication, chain of thoughts issues. Best quote: i need to teach llms how to fish 1 fish and then how to fish 2 fish and fish 3 fish and so on and it would still fail on task of how to fish "N" fish for N> n it has not seen before
@johnheywood10432 ай бұрын
Best conversation on AI that I've been able to follow (not being a PhD in CS).
@thenautilator6613 ай бұрын
Very convincing arguments. Haven't heard it laid out this succinctly and comprehensively yet. I'm sure Yann LeCunn would be in the same camp, but I recall not being persuaded by LeCunn's arguments when he made them on Lex Fridman
@edzehoo3 ай бұрын
Basically there's a whole bunch of "scientists and researchers" that don't like to admit the AGI battle is being won (slowly but surely) by the tech bros led by Ilya and Amodei. AI is a 50-year old field dominated in the past by old men, and is now going through recent breakthroughs made by 30 year olds, so don't be surprised that there's a whole lot of ego at play to douse cold water on significant achievements.
@bharatbheesetti19203 ай бұрын
Do you have a response to Kambhampati's refutation of the Sparks of AGI claim? @edzehoo
@kman_343 ай бұрын
@@edzehooI can see this being true, but writing off their points is equally defensive/egotistical
@JD-jl4yy3 ай бұрын
@@edzehoo Yep.
@jakobwachter51813 ай бұрын
@@edzehoo Ilya and Amodei are 37 and 41 respectively, I wouldn't call them "young", per se. Research on AI in academia is getting outpaced by industry, and only capital rivalling industry can generate the resources necessary to train the largest of models, but academics young and old are continuously outputting content of higher quality than most industry research departments. It's not just ego, it's knowing when something is real and when it is smoke and mirrors.
@dr.mikeybee3 ай бұрын
Next word prediction is the objective function, but it isn't what the model learns. We don't know what the learned function is, but I can guarantee you it isn't log-odds.
@ericvantassell68093 ай бұрын
croissants .vs. yogurt
@Lolleka3 ай бұрын
At the end of the day, the transformer is just a kind of modern Hopfield network. It stores patterns, it retrieves patterns. It's the chinese room argument all over again.
@memegazer3 ай бұрын
@@Lolleka Not really. You can point to rules and say "rules can't be intelligent or reason" But when it is the NN that makes those rules, and the humans in the loop are not certain enough what they are to prevent hallucination or prevent the alignment problem then that is not the chinese room anymore.
@xt-899073 ай бұрын
Research around mechanistic interpretability is starting to show that TLLMs tend to learn some causal circuits and some memorization circuits (I.e., grokking). So they are able to learn some reasoning algorithms but there’s no guarantee of it. Plus, sequence modeling is weak on some kinds of graph algorithms necessary for certain classes of logical reasoning algorithms
@synthclub3 ай бұрын
@@memegazer not hotdog, hotdog!
@JurekOK3 ай бұрын
29:38 this is an actually breakthrough idea addressing a burning problem, that should be discussed more!
@aitheignis3 ай бұрын
I love this episode. In science, it's never about what can be done or what happen in the system, but it's always about mechanism that lead to the event (how the event happen basically). What is severely missing from all the LLMs talk today is the talk about underlying mechanism. The work on mechanism is the key piece that will move all of these deep neural network works from engineering feat to actual science. To know mechanism, is to know causality.
@stevengill17363 ай бұрын
...yet they often talk about LLM mechanism as a "black box", to some extent insoluble...
@Thierry-in-Londinium3 ай бұрын
This professor is clearly 1 of the leaders in his field. When you reflect & dissect what he is sharing. It stands scrutiny!
@whiteycat6153 ай бұрын
Fantastic discussion! Fantastic guy! Thank you
@Paplu-i5t3 ай бұрын
This discussion makes it totally clear about what we can expect from the LLMs, and the irrefutable reasons for it.
@jakobwachter51813 ай бұрын
Rao is wonderful, I got the chance to briefly chat with him in Vancouver at the last AAAI. He's loud about the limitations of LLMs and does a good job of talking to the layman. Keep it up, loving the interviews you put out!
@XOPOIIIO3 ай бұрын
Reasoning requires loop thinking, to sort through the same thoughts from different angles, NNs are linear, they have input, output and just a few layers between them, their result is akin to intuition, not reasoning. That's why they give better results if you simulate loop thinking by feeding it's result to itself to create reasoning-like step-by-step process.
@pranavmarlaАй бұрын
I come back to this podcast every 2 weeks. Absolutely brilliant!
@shyama56123 ай бұрын
Sara Hooker said the same about us not fully understanding what is used in training - the low frequency data and memorization of those being interpreted as generalization or reasoning. Good interview.
@PrinceCyborg3 ай бұрын
Easy, it’s all about prompting. Try this prompt with the Planbench test: Base on methodical analysis of the given data, without making unfounded assumptions. Avoid unfounded assumptions this is very important that you avoid unfounded assumptions, and base your reasoning directly on what you read/ see word for word rather than relying on training data which could introduce bias, Always prioritize explicitly stated information over deductions Be cautious of overthinking or adding unnecessary complexity to problems Question initial assumptions. Remember the importance of sticking to the given facts and not letting preconceived notions or pattern recognition override explicit information. Consider ALL provided information equally. re-check the reasoning against each piece of information before concluding.
@sofoboachie52213 ай бұрын
This probably the best episode I have watched here and I watch this channel as a podcast. Fantastic guest
@KRGruner2 ай бұрын
Great stuff! ACTUAL non-hype commentary on AI and LLMs. I am familiar with Chollet and ARC, so no big surprises here but still, very well explained.
@GarthBuxton3 ай бұрын
Great work, thank you.
@yafz3 ай бұрын
Excellent, in-depth interview! Thanks a lot!
@annette47183 ай бұрын
This is a very refreshing episode. Lots of complex topics synthesized into easily digestible insights
@scottmiller25913 ай бұрын
Good take on LLMs and not anthropomorphizing them. I do think there is an element of "What I do is hard, what others do is easy" to the applications of LLMs in creativity vs. validation, however.
@swarupchandra13333 ай бұрын
One of the best explanations I have come across
@vishalrajput98563 ай бұрын
I love Rao's work and he's funny too.
@timcarmichael3 ай бұрын
Have we yet defined intelligence sufficiently well that we can appraise it and identify it hallmarks in machines?
@stevengill17363 ай бұрын
I think if we qualify the definition of intelligence as including reasoning, then yes. I'd rather use the term sentience - now artificial sentience...that would be something!
@benbridgwater64793 ай бұрын
@@johan.j.bergman Sure, but that's a bit like saying that we don't need to understand aerodynamics or lift to evaluate airplanes, and can just judge them on their utility and ability to fly ... which isn't entirely unreasonable if you are ok leaving airplane design up to chance and just stumbling across better working ones once in a while (much as the transformer architecture was really a bit of an accidental discovery as far as intelligence goes). However, if we want to actively pursue AGI and more intelligent systems, then it really is necessary to understand intelligence (which will provide a definition) so that we can actively design it in and improve upon it. I think there is actually quite a core of agreement among many people as what the basis of intelligence is - just no consensus on a pithy definition.
@jakobwachter51813 ай бұрын
@@johan.j.bergman A spatula serves a helpful purpose that no other cooking tool is able to replace in my kitchen, so I find it incredibly useful. Turns out they are rather mass market too. Should I call my spatula intelligent?
@Cammymoop3 ай бұрын
no
@rey82rey823 ай бұрын
The ability to reason
@oscarmoxon3 ай бұрын
There's a difference between in-distribution reasoning and out-of-distribution reasoning. If you can make the distribution powerful enough, you can still advance research with neural models.
@SurfCatten3 ай бұрын
Absolutely true. As an example I tested its ability to do rotation ciphers myself and it performed flawlessly. Obviously the reasoning and logic to do these translations was added to its training data since that paper was released.
@PrinceCyborg3 ай бұрын
Easy, it’s all about prompting. Try this prompt with the Planbench test: Base on methodical analysis of the given data, without making unfounded assumptions. Avoid unfounded assumptions this is very important that you avoid unfounded assumptions, and base your reasoning directly on what you read/ see word for word rather than relying on training data which could introduce bias, Always prioritize explicitly stated information over deductions Be cautious of overthinking or adding unnecessary complexity to problems Question initial assumptions. Remember the importance of sticking to the given facts and not letting preconceived notions or pattern recognition override explicit information. Consider ALL provided information equally. re-check the reasoning against each piece of information before concluding.
@prabhdeepsingh56423 ай бұрын
Leaving the debate of reasoning aside, this discussion was a damn good one. Learned a lot. Dont miss out on this one due to some negative comments. Its worth your time.
@swarnavasamanta26283 ай бұрын
The feeling of understanding is different from the algorithm of understanding that's being executed in your brain. The feeling of something is created by consciousness while that something might already be going on in your brain. Here's a quick thought experiment: Try adding two numbers in your mind, and you can easily do it and get an answer. Not only that, but you have a feeling of the understanding of the addition algorithm in your head. You know how it works and you are aware of it being executed and the steps you're performing in real time. But imagine if you did not have this awareness/consciousness of this algorithm in your head. That's how LLMs can be thought of, they have an algorithm and it executes and outputs an answer but they are not aware of the algorithm itself or it is being performed and neither have any agency over it. Doing something and perception that you are doing something is completely different.
@prasammehta15463 ай бұрын
Basically they are soulless brain which they actually are :P
@techchanx3 ай бұрын
I agree fully with the points here. LLMs are good at "creative" side of language and media, though its not really the same creativity as humans. However its best to use that capability of LLMs to construct responses in an acceptable manner, while the actual data is coming from authoritative sources and the metrics coming from reliable calculations based on formulas, calculators or rule engines. Btw, I have given below a better written professional version of my above post, courtesy Google Gemini. I could not have said it any better. I concur with the assessment presented. Large language models (LLMs) excel at generating creative language and media, albeit distinct from human creativity. Leveraging this capability, LLMs can effectively construct responses in an appropriate manner, while sourcing data from authoritative references and deriving metrics from reliable calculations based on formulas, calculators, or rule engines. This approach optimizes the strengths of both LLMs and traditional information systems for a comprehensive and accurate solution.
@fedkhepri3 ай бұрын
This is the first time I'm seeing either of the two people in the video, and I'm hooked. Lots of hard-punching and salients points to be gotten from the guest, and kudos to the interviewer for steering the discussion.
@Redx32573 ай бұрын
Yea this man is brilliant. I could just listen to him all day.
@CoreyChambersLA3 ай бұрын
ChatGPT simulates reasoning surprisingly well using its large language model for pattern recognition and prediction.
@snarkyboojum3 ай бұрын
Great conversation. I disagree that LLMs are good for idea generation. In my experience, they're good at replaying ideas back to you that are largely derivative (based on the data they've been trained over). The truly 'inductive leaps' as the Professor put it, aren't there in my interaction with LLMs. I use them as a workhorse for doing grunt work with ideas I propose and even then I find them lacking in attention to detail. There's a very narrow range they can work reliably in, and once you go outside that range, they hallucinate or provide sub-standard (compared to human) responses. I think the idea that we're co-creating with LLMs is an interesting one that most people haven't considered - there's a kind of symbiosis where we use the model and build artefacts that future models are then trained on. This feedback loop across how we use LLMs as tools is interesting. That's the way they currently improve. It's a symbiotic relationship - but humans are currently providing the majority of the "intelligence", if not all of it, in this process.
@larsfaye2923 ай бұрын
What a fantastic and succinct response! My experience has been _exactly_ the same.
@sangouda16453 ай бұрын
That's exactly it, they start to really act as good creative partner at Nth iteration after explaining to it back and forth by giving feedback, but once it gets the hang of it, really acts like a student wanting get good score from a teacher :)
@notHere1323 ай бұрын
We need an entirely new model for AI to achieve true reasoning capability.
@HoriaCristescu3 ай бұрын
What you should consider is the environment-agent system, not the model in isolation. Focusing on models is a bad direction to take, it makes us blind to the process of external search and exploration, without which we cannot talk about intelligence and reasoning. The scientific method we use also has a very important experimental validation step, not even humans could reason or be creative absent environment.
@luke.perkin.inventor3 ай бұрын
Great episode and fantastic list of papers in the description!
@weftw1se3 ай бұрын
Disappointing to see so much cope from the LLM fans in the comments. Expected, but still sad.
@yeleti3 ай бұрын
They are rather AGI hopefuls. Who's not a fan of LLMs including the Prof ;)
@weftw1se3 ай бұрын
@@yeleti yeah, I think they are very interesting / useful but I doubt they will get to AGI with scaling alone.
@phiarchitect3 ай бұрын
what a wonderfully exuberant person
@ej32813 ай бұрын
Very nice to hear from an LLM guy that hasn't lost his mind. He's simply wrong about LLMs being useful for unconstrained idea generation, but as far as his other views go, very enjoyable to watch.
@Paplu-i5t3 ай бұрын
Such a sharp mind of a senior man.
@martin-or-beckycurzee41462 ай бұрын
Very interesting. Wonder whar mr kambhampati thinks about strawberry? Best for creative use cases? Now maybe, but progress is looking good - better than prof kambhampati was expecting…
@falricthesleeping9717Ай бұрын
01:01:31 I had to listen to it multiple times, can't focus these days, that section he's specifically talking about chain of thought, and the guy that wrote the paper, he's saying it's one more way of brute-forcing it with more data, the data is just solving stuff with thought-chains, and it's kinda obvious, it's impressive that it can do alot of code-challenges, but almost all of the code challenges have solutions with in-depth explanations after the competition is done, so many people wrote so many things about how to solve them, and open ai said it themselves they fine-tuned the model with the solutions of other participants to increase their accuracy. even with all of that, given the most complex programming challenges it still fails to keep its consistency given real-world projects, now one way of reading this can be just give it more data, until every possible problem is in the training data and just improve the context window, but the point still stands, they're really not reasoning
@hartmut-a9dt3 ай бұрын
great interview !
@ACAndersen3 ай бұрын
His argument is that if you change the labels in classical reasoning tests the LLM fails to reason. I tested GPT 4 on the transitive property, with the following made up prompt: "Komas brisms Fokia, and Fokia brisms Posisos, does Komas brism Posisos? To brism means to contain." After some deliberation it concluded that yes, the statement holds true. Thus there is some reasoning there.
@hashp76253 ай бұрын
How did you test his primary point on this topic - that the GPT 4 training data is so large that it has been trained on common statements like this and that answering true is a likely distribution?
@JohnChampagne18 күн бұрын
I accidentally typed using a standard (Qwerty) keyboard, rather than Dvorak, so I asked GPT to convert the text. It was beyond its ability, (more than a year ago, I think). Qwerty was made to be intentionally slow, to accommodate the mechanical devices that would tend to jam if typists went too fast. We should change outmoded patterns of behavior. After about eight hours of using Dvorak, you will match, then exceed your speed on the Qwerty keyboard.
@Tititototo3 ай бұрын
Pertinently pinpointed, one killed 'the beast'. LLMs are just wonderful 'bibliothèques vivantes', quite great tools that save time by ignoring any educated protocols
@willd1mindmind6393 ай бұрын
Reasoning in humans is about using abstractions or general understanding of concepts to arrive at a result. A perfect example is math problems. Most humans use shortcuts to solve math calculations which can be a form of reasoning. In a computing sense, reasoning would be calculating a math answer without using the ALU (Arithmetic logic circuts on the CPU). In a GPT context it would mean arriving at a result without having the answer (and question) already in the training distribution set. So for example, a human using reasoning can add two plus two as follows: 2 is a number representing a quantity of items in a set that can be counted. So 2 plus 2 becomes 1, 2, 3, 4 (counting up 2 places and then counting up 2 more places with 4 being the answer. Something like that is not possible on a CPU . And ChatGPT would also not be able to do that either because it wouldn't be able to generalize that idea of counting to any kind of addition of 2 numbers. If it could, without every combination of numbers written out using the counting method in its training data (or distribution), then it would be reasoning.
@virajsheth841727 күн бұрын
Not just counting, it can do even summation step by step like humans. So basically you're wrong. Look at my other comment.
@virajsheth841727 күн бұрын
To solve using traditional addition, we'll add the numbers digit by digit from right to left, carrying over when necessary. Step-by-Step Calculation: 1. Units Place: Write down 6, carry over 1. 2. Tens Place: Write down 2, no carry over. 3. Hundreds Place: Write down 5, carry over 1. 4. Thousands Place: Write down 9, no carry over. 5. Ten-Thousands Place: Write down 6, carry over 1. 6. Hundred-Thousands Place: Write down 1, carry over 1. 7. Millions Place: Write down 2, carry over 1. 8. Carry Over: Since there's an extra 1 carried over, we place it at the next leftmost position. Final Result: Answer: 12,169,526
@willd1mindmind63926 күн бұрын
@@virajsheth8417 You are missing the point. Numbers are symbols that represent concepts and because of that have various ways the human mind can use those concepts to solve problems. It is that ability to explore concepts and apply them in a novel fashion is what is called reasoning. Your example is not "reasoning", as opposed to more of a "step by step" approach which is the most common pattern that exists to solve any particular mathematical problem. Which implies that those steps are easily found the training data and model distribution so of course that is what the LLM is using. Because what you described is the typical way math is taught in grade school. It in no way shape or form implies understanding fundamental math concepts and using those concepts in an ad hoc fashion to solve any problem. Ad hoc in this context would mean using a pattern not found explicitly in the distribution of the language model. The point you missed is that numbers being symbols that in themselves represent quantities of individual discrete elements is an abstract concept. And the ability to apply that kind of abstract understanding to solving or coming up with approaches to solve math problems is unique to humans, because that is how math came about in the first place. Another example of human reasoning: You can add two numbers such as 144 to 457, by simply taking each column and add them up with place value separately and then add the sums of the columns, without the need to calculate a remainder. Which results in: 500 + 90 + 11 = 601 or (5 x 100) + ( 9 x 10) + (11 x 1). It is not a common way of doing addition is my point and not something one would expect an LLM to come up with, unless of course you prompted it to do so and even then it may not come up with that exact same approach unless it is found in the training data. At the end of the day, what this is about is not "reasoning" as opposed to explaining how the LLM came up with an answer to a math problem. And having these AI algorithms be able to explain how it came to a answer has been something that has been requested for quite a while. But it is not "reasoning" in the sense of coming up with unique or novel approaches outside of the training examples based purely on understanding of underlying concepts.
@markplutowski3 ай бұрын
1:31:09 - 1:31:32. “People confuse acting with planning“ . “We shouldn’t leave toddlers alone with a loaded gun.” this is what frightens me : agent based systems let loose in the wild without proper controls. A toddler AI exploring the world, picking up a loaded gun and pulling the trigger.
@markplutowski3 ай бұрын
if the title says “people don’t reason” many viewers think it makes the strong claim “ALL people don’t reason“, when it is actually making the weaker claim “SOME people don’t reason“. that title is factually defensible but misleading. one could be excused for interpreting this title to be claiming “ChatGPT doesn’t reason (at all)“, when it is actually claiming “ChatGPT doesn’t reason (very well)“. One of the beauties of human language is that the meaning of an utterance derived by the listener depends as much on the deserialization algorithm used by the listener as on the serialization algorithm employed by the speaker. the KZbin algorithm chose this title because the algorithm “knows” that many viewers assume the stronger claim. nonetheless, be that as it may, this was a wonderful interview. many gems of insight on multiple levels ; including historical, which I enjoyed. I especially liked your displaying the title page of an article that was mentioned. looking forward to someone publishing “Alpha reasoning: no tokens required“. I would watch again.
@阳明子3 ай бұрын
Professor Kambhampat is making the stronger claim that LLMs do not reason at all.
@markplutowski3 ай бұрын
@@阳明子 1:20:26 "LLMs are great idea generators", which is such an important part of reasoning, he says, that Ramanujan was great largely because he excelled at the ideation phase of reasoning. 16:30 he notes that ChatGPT 4.0 was scored at 30% on a planning task. 1:23:15 he says that LLMs are good for style critiques, therefore for reasoning about matters of style, LLMs can do both ideation and verification.
@阳明子3 ай бұрын
@@markplutowski 3:14 "I think the large language models, they are trained essentially in this autoregressive fashion to be able to complete the next word, you know, guess the next word. These are essentially n-gram models." 11:32 Reasoning VS Retrieval 17:30 Changing predicate names in the block problem completeley confuses the LLMs 32:53 "So despite what the tenor of our conversation until now, I actually think LLMs are brilliant. It's just the brilliant for what they can do. And just I don't complain that they can't do reason, use them for what they are good at, which is unconstrained idea generation."
@markplutowski3 ай бұрын
@@阳明子 Ok, I see it now. I originally misinterpreted his use of a double-negative there where he says "And just I don't complain that they can't do reason". That said, he contradicts himself by admitting that they can do a very limited type of reasoning (about matters of style), and are weakly capable of planning (which is considered by many as a type of reasoning, although he seems to disagree with that), and can be used for an important component of reasoning (ideation). But yeah, I see now that you are correct - even though there are these contradictions he is indeed claiming "that they can't do reason".
@MateusCavalcanteFonseca3 ай бұрын
Hegel said long time ago that deduction and induction are diferent aspects of the same process, the process of aquiring knlowdge about the world. great talk
@sonOfLiberty1003 ай бұрын
Wow a good one thank you
@Neomadra3 ай бұрын
LLMs definitely can do transitive closure. Not sure why the guest stated otherwise. I tried it out with completely random strings as object names and Claude could do it easily. So it's not just retrieving information.
@autingo65833 ай бұрын
this is supposed to be science. i hate it so much when people who call themselves researchers do not really care for thoroughness, or even straight out lie. don't let them get away with it.
@jeremyh20833 ай бұрын
It struggles with it if you create something it’s never seen before. It’s a valid point on his part.
@st3ppenwolf3 ай бұрын
transitive closures can be done from memory. It's been shown these models perform bad with novel data, so he has a point still
@SurfCatten3 ай бұрын
And it was also able to do a rotation cipher of any arbitrary length when I just tested it. There are definite limitations but what they can do is far more complex than simply repeating what's in the training data. I made a separate post but I just wanted to add on here that it can also do other things that he specifically said it can't.
@gen-z-india3 ай бұрын
Ok, everything they speak is guess work, and it will be so until deep learning is there.
@johnkost25143 ай бұрын
This aligns nicely with the work Fabrice Bellard has been doing using Transformers to achieve SOTA lossless compression in his NNCP algorithm. Coincidence .. I think not!
@shizheliang267923 күн бұрын
Is there any references for the part about LLMs cannot detect transitive closure? I would love to see the details
@jeremyh20833 ай бұрын
Those people who are assuming the AGI is going to be achieved have never done long-term work inside any of the major GPT systems if you want to have a quick and dirty test, tell it to create you a fiction book first make 15 chapters and 10 sections with each chapter And then have it start writing that book look at it in detail and you will see section after section it loses sight of essentially every detail. It does a better job if you are working inside the universe, another author has already made and does the worst job if you were creating a brand new universe, even if you have it define the universe.
@mattwesney3 ай бұрын
sounds like youre bad at prompting
@jeremyh20833 ай бұрын
@@mattwesney lol it does, doesn’t it, but you haven’t tried it and I have.
@rolandhochmuth19513 ай бұрын
I originally tried, Let a = b and b = c, does a = c? The response cited the transitive property and said yes, a = c. So, then I tried something like, Let asdf = hjkl and let hjkl = vbnm, does asdf = vbnm to try and eliminate a potential reference to a description of the transitive property that uses a, b, and c as variable names? The response was not able to identity the transitive property in this case and didn't know how to treat the = sign. I then modified the prompt after a few iterations too, Let asdf, hjkl and vbnm be variables. Let = be equality. Given asdf = hjkl and hjkl = vbnm, does asdf = vbnm? The response was: You're getting closer! Now we have a clearer setup. Since we've defined "=" as equality and the variables are unspecified, we can indeed apply the transitive property. Yes, given asdf = hjkl and hjkl = vbnm, then asdf = vbnm. The transitive property holds true in this case because we're dealing with standard equality.
@rolandhochmuth19513 ай бұрын
Extending this model a bit more. Let asdf, hjkl, vbnm and zxcv be variables. Let = be equality. Given asdf = hjkl, hjkl = vbnm, and vbnm = zxcv, does asdf = xxcv? Model 2.5s Yes, you're absolutely right! Following the transitive property: asdf = hjkl hjkl = vbnm vbnm = zxcv Therefore: asdf = zxcv Let me know if you have any other logic puzzles you'd like to try!
@DataJuggler3 ай бұрын
0:18 When I was 4 years old, I was often stuck at my parents work. The only thing for me to do that was entertaining, was play with calculators or adding machines. I memorized the times table, because I played with calculators a lot. My parents would spend $8 at the drug store to keep me from asking why is the sky blue and other pertinent questions. I was offered to skip first grade by after kindergarten, and my parents said no. Jeff Bezos is the same age from me, and also from Houston. His parents said yes to skipping first grade. I told my parents this forever until they died.
@Jukau3 ай бұрын
what is the bombshell? This is absolutly clear and known...it would be a bombshell if it would
@MachineLearningStreetTalk3 ай бұрын
Read the comments section here, I wish it was clear and known. It's subtle and requires a fair bit of CS knowledge to grok unfortunately.
@hayekianman3 ай бұрын
the caesar cipher thing is already working for any n for claude 3.5. so donno
@benbridgwater64793 ай бұрын
Sure - different data set. It may be easy to fix failures like this by adding corresponding training data, but this "whack-a-mole" approach to reasoning isn't a general solution. The number of questions/problems one could pose of a person or LLM is practically infinite, so the models need to be able to figure answers for themselves.
@januszinvest37692 ай бұрын
@@benbridgwater6479so please give one example that shows clearly that LLMs can't reason
@siddharth-gandhi3 ай бұрын
Hi! Brilliant video! Much to think about after listening to hyper scalers for weeks. One request, can you please cut on the clickbait titles? I know you said for YT algo but if I want to share this video with say PhD, MS or profs, no one takes a new channel seriously with titles like this one (just feels clickbaity for a genuinely good video). Let the content speak for itself. Thanks!
@MachineLearningStreetTalk3 ай бұрын
I am really sorry about this, we will change it to something more academic when the views settle down. I’ve just accepted it as a fact of youtube at this point. We still use a nice thumbnail photo without garish titles (which I personally find more egregious)
@siddharth-gandhi3 ай бұрын
@@MachineLearningStreetTalk Thanks for understanding! 😁
@therainman77773 ай бұрын
We already do have a snapshot of the current web. And snapshots for every day prior. It’s the wayback machine.
@tylermoore44293 ай бұрын
This analysis is of LLM's as a static thing, but the field is evolving. Neurosymbolic approaches are coming, a couple of these are already out there in the real world (MindCorp's Cognition and Verses AI).
@billykotsos46423 ай бұрын
very insightful
@anuragshas3 ай бұрын
On the Dangers of Stochastic Parrots paper still holds true
@alexandermoody19463 ай бұрын
Not all manhole covers are round. The square manhole covers that have a two piece triangular tapered construction are really heavy.
@VanCliefMedia3 ай бұрын
I would love to see his interpretation of the most recent gpt4 release with the structured output and creating reasoning through that output
@wtfatc45563 ай бұрын
Gpt is like a reactive mega wikipedia....
@PavanMS873 ай бұрын
Gold mine detected, Subbed!!
@life42theuniverse3 ай бұрын
The most likely response to logical questions is logical answers.
@pallharaldsson90153 ай бұрын
16:44 "150% accuracy [of some sort]"? It's a great interview with the professor (the rest of it good), who knows a lot, good to know we can all do such mistakes...
@benbridgwater64793 ай бұрын
I processed it as dry humor - unwarranted extrapolation from current performance of 30%. to "GPT 5" at 70%. to "GPT 10" at 150%. Of course he might have just mis-spoke. Who knows.
@kangaroomax819820 күн бұрын
It’s a joke, obviously. He is making fun of people extrapolating.
@television92333 ай бұрын
I've read some of Prof Subbarao's work from ASU. Excited for this interview.
@zooq-ai3 ай бұрын
Wow, incredible episode
@mattabrahamson88163 ай бұрын
contrary to his claims gpt4o & sonnet do generalize to different cipher shifts. gpt4o: It looks like the text "lm M pszi csy" might be encoded using a simple substitution cipher, such as the Caesar cipher. This type of cipher shifts the letters of the alphabet by a fixed number of positions. To decode it, let's try different shifts and see if any of them make sense. For example, if we shift each letter by 4 positions backward (a common shift in Caesar ciphers): - l -> h - m -> i - M -> I - p -> l - s -> o - z -> v - i -> e - c -> y - s -> o - y -> u So, "lm M pszi csy" becomes "hi I love you." This decoded message makes sense as a simple phrase. If you have any additional context or need further decoding, please let me know!
@BrianPeiris3 ай бұрын
Great, you got one sample. Now run it a hundred times each for different shifts and report back.
@luisluiscunha3 ай бұрын
Maybe in the beginning, with Yannick, these talks were properly named "Street Talk". They are more and more Library of the Ivory Tower talks, full of deep "philosophical" discussions that I believe will be considered all pointless. I love the way Heinz Pagels described how the Dalai Lama avoided entering into arguments of this kind about AI. When asked his opinion about a system he could talk as to a person, he just said "sit that system in front of me, on this table, then we can continue this talk". This was in the 80s. Even to be profoundly philosophical you can think in a very simple and clear way. It is a way of thinking epistemologically most compatible with Engineering, that ultimately is where productive cognitive energy should be spent.
@plasticmadedream3 ай бұрын
A new bombshell has entered the villa
@idck55313 ай бұрын
Possible LLMs do not reason, but they sure are very helpful for coding. You can combine and generate code easily and advance much faster. Writing scripts for my PhD is 10x easier now.
@dylanmenzies39732 ай бұрын
Reasoning is pattern matching. Verfication is something to be learnt and reinforced - humans are very often wrong, even complex math proofs. Generating possible paths for reasoning is inherently very noisy, and needs a balnce of verification to keep it on ttack.
@akaalkripal57243 ай бұрын
I'm not sure why so many LLM fans are choosing to attack the Professor, when all he's doing is pointing out huge shortcomings, and hinting at what could be real limitations, no matter the scale.
@alansalinas20973 ай бұрын
Because they don't want the professor to be right?
@therainman77773 ай бұрын
I haven’t seen anyone attacking him. Do you mean in the comments to this video or elsewhere?
@lystic93923 ай бұрын
I think I have a way to allow almost any model to 'reason'. Or to use reasoning, anyway.
@patruff3 ай бұрын
I memorized all the knowledge of humans. I can't reason but I know everything humans have ever put online. Am I useful? Provide reason.
@fburton83 ай бұрын
What proportion of “all the knowledge of humans” do current models have access to?
@patruff3 ай бұрын
@@fburton8 all of it, well everything on the open Internet so most books, most poetry, lots of art, papers, code, etc.
@albin18163 ай бұрын
Extremely useful. Ask any engineer who's addicted to ChatGPT / Copilot / OpenAI API at the moment for their daily workflows.
@malakiblunt3 ай бұрын
but i make up 20% of my answers - can you tell which ?
@n4rzul3 ай бұрын
@@malakiblunt So do you... sometimes...
@davidcummins81253 ай бұрын
Could an LLM for example figure out whether a request requires a planner, a math engine etc, transform the request into the appropriate format, use the appropriate tool, and then transform the results for the user? I think that LLMs provide a good combination of UI and knowledge base. I was suspicious myself that in the web data they may well have seen joke explanations, movie reviews, etc etc and can lean on that. I think that LLMs can do better, but it requires memory and a feedback loop in the same way that embodied creatures have.
@notHere1323 ай бұрын
I use ChatGPT every day. It does not reason. It's unbelievably dumb, and sometimes I have trouble determining whether it's trying to deceive me or just unfathomably stupid. Still useful for quickly solving problems someone else has already solved, and that's why I continue using it.
@PaoloCaminiti-b5c3 ай бұрын
I'm very skeptic of this, Aristotele inferred logic by looking at rhetoric arguments, LLMs could being extracting those features already while building their model to compress the corpus of data and this seems equivalent to propositional logic. It seems this researcher is pushing too much the accent on agents needing to be able of mathematical proof, which utlity in agents - including humans - is not well stated.
@virajsheth841727 күн бұрын
Absolutely agreed!
@briandecker84033 ай бұрын
I love that this channel hosts talks by the best experts in the field and generates comments from the lowest Dunning-Kruger keyboard cowboys.
@LuigiSimoncini3 ай бұрын
Bwahaha!!!
@Neomadra3 ай бұрын
I agree but Claude 3.5 does! ;)
@vpn7403 ай бұрын
no, it doesn't.
@Neomadra3 ай бұрын
@@vpn740It's a joke.
@JG27Korny3 ай бұрын
I think there is broad misconception. LLMs are LLMs they are not AGI (artificial general intelligence). Each AI has a world model. If the question fits the world model it will work. It is like asking a chess ai engine to play checkers. That is why multimodal models are the big thing as they train not just on corpus of texts and on images too. So those visually trained AI models will solve the stacking problem on minute 19:00. It is not that chatgpt does not reason. It reasons but not as a human does.
@TastyGarlicBread3 ай бұрын
There are a lot of people in this comment section who probably can't even do basic sums, let alone understand how a large language model works. And yet, they are very happy to criticize. We are indeed living in an Idiocracy.
@MoeShlomo3 ай бұрын
People typically assume that LLMs will always be "stuck in a box" that is determined by their training data. But humans are of course quite clever and will figure out all sorts of ways to append capabilities analogous to different brain regions that will allow LLMs to effectively "think" well enough to solve increasingly-challenging problems and thereby self-improve. Imagine equipping a humanoid robot (or a simulated one) with GPT6 and Sora3 to allow it to make predictions about what will happen based on some potential actions, take one of those actions, get feedback, and integrate what was learned into its training data. My point is that people will use LLMs as a component of a larger cognitive architecture to make very capable systems that can learn from their actions. And of course this is just one of many possible paths.
@benbridgwater64793 ай бұрын
Sure, there will be all sorts of stuff "added to the box" to make LLMs more useful for specific use cases, as is already being done - tool use, agentic scaffolding, specialized pre-training, etc, but I don't think any of this will get us to AGI or something capable of learning a human job and replacing them. The ability for lifelong learning by experimentation is fundamentally missing, and I doubt this can be added as a bolt-on accessory. It seems we really need to replace gradient descent and pre-training with a different more brain-like architecture capable of continual learning.
@eyoo3693 ай бұрын
@@benbridgwater6479Yes agree with that. Anything that doesn’t resemble the human brain will not bring us to AGI. While LLMs are very impressive and great first step into a paradigm shift. They are ultimately a hack route to reach the current intelligence. there are still so many levels of reasoning missing even from the SOTA models like Claude 3.5 and GPT-4o. For me the roadmap to general intelligence is defined by the way it learns and not necessarily what a model outputs after pre-training it. To be more specifically.. true AGI would be giving a model the same amount of data a human approximately gets exposed to in its lifetime and perform like a median human. Throwing the worlds data and scaling the parameters into billions / trillions.. although is impressive. But far away from AGI
@maxbuckley12463 ай бұрын
Which paper is referred to at 1:03:51 when multiplication with four digit numbers is discussed?
@MachineLearningStreetTalk3 ай бұрын
Faith and Fate: Limits of Transformers on Compositionality "finetuning multiplication with four digit numbers" arxiv.org/pdf/2305.18654
@fburton83 ай бұрын
Have LLMs ever said they don’t know the answer to a question? Often this is the most useful / helpful response, so why doesn’t it do it? It’s disappointing.
@dankprole78843 ай бұрын
It kind of makes sense if you think it is just formatting a distribution of the most likely tokens in a plausible style i.e. a question and answer format. If "I don't know" isn't likely (i.e. if either the source material was either not in Q and A format or it was but the answer was not something like "I don't know"), then it's just not gonna be the answer given by the LLM. A hallucination IS the I don't know response. Unfortunately, not being able to detect that kind of defeats the whole purpose of using an LLM in the first place!
@fburton83 ай бұрын
@@dankprole7884 Good point. I have also encountered examples where the LLM _did_ know the answer and should have been able to suggest it as a possibility, but didn't do so until prompted with "What about [the answer]?". For example, ChatGPT had great difficulty finding a pasta very similar to conchiglie in size and shape but with smooth bumps instead of ridges. It went round and round in circles making completely inappropriate suggestions until I asked "What about cestini?". It was useful as a chocolate teapot for this kind of task.
@dankprole78843 ай бұрын
@@fburton8 yeah I've seen similar. I use Claude quite a lot for quick code snippets when I can't quite remember the one obscure command I need to do the exact thing I want. By the time I keep saying no not that and get the right answer, I could have googled it several times. It's very hit and miss at any level of complexity / rarity.