The Debate Over “Understanding” in AI’s Large Language Models

Рет қаралды 5,694

19 күн бұрын

Melanie Mitchell, Santa Fe Institute
Abstract: I will survey a current, heated debate in the AI research community on whether large pre-trained language models can be said to "understand" language-and the physical and social situations language encodes-in any important sense. I will describe arguments that have been made for and against such understanding, and, more generally, will discuss what methods can be used to fairly evaluate understanding and intelligence in AI systems. I will conclude with key questions for the broader sciences of intelligence that have arisen in light of these discussions.
Short Bio: Melanie Mitchell is Professor at the Santa Fe Institute. Her current research focuses on conceptual abstraction and analogy-making in artificial intelligence systems. Melanie is the author or editor of six books and numerous scholarly papers in the fields of artificial intelligence, cognitive science, and complex systems. Her 2009 book Complexity: A Guided Tour (Oxford University Press) won the 2010 Phi Beta Kappa Science Book Award, and her 2019 book Artificial Intelligence: A Guide for Thinking Humans (Farrar, Straus, and Giroux) was shortlisted for the 2023 Cosmos Prize for Scientific Writing.
cbmm.mit.edu/news-events/even...

Пікірлер: 27

@andytroo 14 күн бұрын

22:35 - how many of those tasks require knowledge of the exact spelling of the word? - LLM's are only passed the encoded tokens, and may not be aware of word spelling in a way that allows eg: acronyms?

@ZelosDomingo 16 күн бұрын

It seems like tokenization would really fuck with one's ability to do some of these tests. Like, I don't know how much the format of something like that would even be preserved? Also, makes me wonder how much the lack of physical 3d movement data/training would impact some of these reasoning tasks. Like, you can even notice in her language use about concepts and stuff how much "spatial" reasoning is involved. It seems like to do one of these tests fairly, you would have to completely sort of homogenize the way the test taker would be taking it? It brings to mind like, disabilities, you know? You wouldn't expect someone that was born not only blind, but completely unable to process visual data in any way we would recognize to be able to solve visual tasks necessarily, unless they generalize well to whatever mediums they do know.

@NullHand 15 күн бұрын

I once saw a research 'paper' on decoding the actual nerve signals that are sent from the mammalian retina to the brain. The "visual data" turned out to actually have been pre-processed into something like 6 channels of what could be described as "very stripped down image data". There was a "channel" of mostly just high contrast edges. There was a "channel" of cells that had recent luminosity changes. There were some "channels" that were apparently still in a status of WTF is this? So I would not be surprised if it turns out that information flow in our human brain turns out to be way more "Tokenized" than we assume.

@kellymoses8566 10 күн бұрын

It would be interesting to see the difference between LLMs trained on non-fiction, realistic fiction, and and fantasy.

@breaktherules6035 18 күн бұрын

EXCELLENT insights! Thank you so much for sharing!

@novantha1 14 күн бұрын

In my opinion understanding is actually pretty clear. In humans, a very useful skill in language acquisition is circumlocution, or referring to a concept without using the name of it. Now, for a true LLM it's possible that could be done with regurgitation of training data directly (what are the odds that some common turns of phrase show up on Wikipeda? Or that a dictionary would find its way into a dataset?), but in a multi modal LLM I think the ability to verify similar patterns of neuron activity for analogous inputs across modalities is pretty indicative of strong understanding and generalization. In other words, I think the strength of understanding can be measured as the number of unique inputs that can implement a given pattern of behavior in the FFN, or lead to the same or similar output in the language head, roughly.

@alexanderbrown-dg3sy 14 күн бұрын

Agreed. This is literally the basis for the formation of internal world models. To me. The world model in itself is confirmation of a deep contextual understanding, stop bottlenecks in that understanding(knowledge conflicts, Hallucinations..etc). This is an architectural and data issue though..fyi temporal self-attention makes a world of a difference. The model needs native temporal embeddings.

@ArtOfTheProblem 16 күн бұрын

can you post discussion?

@user-ec3rm9wr1n 13 күн бұрын

They can't it's logically unavailable

@electric7309 17 күн бұрын

Melanie Mitchell, ILY

@XalphYT 17 күн бұрын

25:06 I consider myself to be reasonably intelligent, but I am absolutely stumped by Problem No. 1. How are you supposed to evaluate the three blocks of letters below the alphabet? Are the two blocks on the first line supposed to serve as an example? Are you supposed to consider all three blocks together? Does the order of the blocks matter? I suspect that there is something implied here that I am missing.

@voncolborn9437 17 күн бұрын

Read left to right on the blocks. Notice that the second block matches the alphabet with the jumbled letters. The second row is the test. Match the the similar sequence, replacing the 'l'.

@alexmolyneux816 16 күн бұрын

fghij?

@NextGenart99 11 күн бұрын

Nice try chat GPT

@legathus 5 күн бұрын

26:18 -- Those results are highly suspicious. I suspect there's a confirmation bias in the human scores. The workers don't want to lose their qualifications, and so won't perform tasks that are too difficult. So there may be a drop off in participation or submission if a human worker feels like they may be in error for the more complex tasks. Furthermore, the human workers were given "room to think", where the prompting of the LLMs suggest they were not. I suspect allowing GPT-4 to use step-by-step reasoning would improve its score across the board. And dramatically more so if its allowed to create a python script to solve the problem.

@mordokai597 16 күн бұрын

lol! the difference in performance between the humans doing the test for free and the people being paid is about the same jump in performance you get from chatgpt when you just give it a prompt vs if you tell it "i'll give you $20 if you do a good job" xD

@lycas09 12 күн бұрын

The tasks where llm fail are either useless (not trained on many data), or based on vision capabilities (where are a lot worst yet these systems)

@SynaLinks 12 күн бұрын

Really good talk :)

@seventyfive7597 12 күн бұрын

So why did her methodology fail to work here? So we have to go to the basics, because you simply can't skip them: 1) Humans are repetition machines, they repeat and recombine their experiences, you can see it ESPECIALLY in the arts, but there we call it inspirations, humans take "inspirations" from their life experiences, and recombine them. 2) AI is the same, they too are repetition machines that recombine experiences, but their experiences are different from humans'. 3) Hence, for comparison, you may not test humans on subjects that they have not experienced, and for AI you may not do the same. However her entire testing methodology was based on testing on experiences that only humans had. Basically, she almost got it when she said that a child learns to wear socks under the shoes by his experiences, but then did not narrow her tests to be based on common experiences of AI and humans, rendering them a curiosity of translation, but not of understanding.

@optmanii 10 күн бұрын

The AI understanding of the world is different from Human being.

@AlgoNudger 14 күн бұрын

Lecun's too overrated in AI community. 🤭

@J_Machine 13 күн бұрын

Nope

@AlgoNudger 11 күн бұрын

@@J_MachineC'mon. 😂

@J_Machine 11 күн бұрын

@@AlgoNudger u don't understand nothing about AI 🤦‍♂️

@AlgoNudger 5 күн бұрын

@@J_MachineNow you sound like a stochastic parrot. 🤭

@J_Machine 5 күн бұрын

@@AlgoNudger If there is a Stocastic parrot that must be you 😁😁😁😁

@netscrooge 15 күн бұрын

Mitchell is great! Love her work. But Lecun's reckless, self-serving comments should not be elevated so high. It's like a TV news program hosting a flat-Earther to give both sides of the story.