At 1:08, Francois misses the point IMO. He explains how transformers/LLMs don't need vision. But what I find interesting is just how much humans *do* need the visual and mouse type input to do ARC. So it's not true that humans could solve ARC well at all if we didn't have vision. And this must tell us something about the intelligence humans are using, despite Francois focusing on the opposite issue. IMO the ARC Prize team should focus more on what aspects of humans intelligence are needed, not just what possible ways AI could solve ARC.
@KevinKreger13 күн бұрын
Intuition for program search is mostly about observing and using similarity then? I'm using tiny SLMs trained on the DSL/solvers. The first program it generates is run and the failed output grid given, then the SLM updates the code and tries again. You can see it evolve its approach. It's not all memory; it's a cut-and-try approach like a human programmer's debugging/reasoning trace. I think we are ignoring a clear fact -- there is no good neural visual circuit for the data array as it can't read x,y coords once it gets up to, say, 10x10. AND -- 7B is too big for the available GPU types 🙂
@shizheliang267913 күн бұрын
Is Ryan Greenblatt's approach one example of this category?
@AlgoNudger15 күн бұрын
Thanks.
@jb_kc__16 күн бұрын
probably the best presentations i've seen on AI
@TheTheeliman16 күн бұрын
Any chance you can post the generated code dataset before the competition ends?
@goodtothinkwith19 күн бұрын
It’s not just vision. It’s familiarity with the physical world. We think with our hands and LLMs simply haven’t had that data. Most of the problems require thinking in terms of physical object manipulation… that’s not necessarily reasoning.
@En1Gm4A21 күн бұрын
i think the perception to solid building blocks and then useing those for programm synthesis is really powerful
@InfiniteQuest8624 күн бұрын
How is this channel still so unsubscribed! This is the coolest thing in AI right now. I tried using morphological image analysis, and it did pretty well on the easy stuff in the training set, but then I looked more into it and there's so much logical components, that it just wasn't going to work.
@binig.459124 күн бұрын
what bothers me about program search is that i dont think it is related to reasoning at all. reasoning, in humans, happens over concepts that are already formed through continuous, similarity-based methods. so reasoning happens at a "higher level". but program search seems too "low level"
@egor.okhterov23 күн бұрын
Agree
@goodtothinkwith19 күн бұрын
Yeah, it’s more of a traditional optimization technique in a new medium… not reasoning
@jhanschoo7 күн бұрын
There is a link between program search and inductive reasoning. You can formalize finding the most probable next item in a sequence as the next output of the shortest program that generates a sequence that is accurate up to the examples produced.
@binig.45916 күн бұрын
@@jhanschoo yeah, that's probably true in theory. But that doesn't tell you much about reasoning. Or at least human reasoning. Also neural nets are also a kind of program search, it's just that the way they represent programs make them prone to falling into local optima.
@KevinKreger24 күн бұрын
Thanking you from Wisconsin. I'm trying NOT to use any brute force techniques... seems antithetical to the competition.
@InfiniteQuest8624 күн бұрын
Yeah I have to keep reminding people in the chat room of this. Everyone wants to either use LLMs or brute force, but the point is to come up with really brand new ideas.
@b.k.officiel87323 күн бұрын
@@InfiniteQuest86 What do you mean by brute force in practice ?
@InfiniteQuest8623 күн бұрын
@@b.k.officiel873 I would say anything that is guess and check rather than trying to understand the examples. So the guy that generates 20,000 python programs per question, I would label as brute force. But that doesn't apply to training. You can do anything you want in training. If it took 20,000 python programs per example in training to learn how to reason about the questions, then that is ok because the final solution isn't brute force. You can also custom program solutions to every example. If you see a square with a missing corner, fill it in. Etc. That is brute force. It will automatically solve all training data, but fail on the test set.
@TinevimboMusingadi-b9l24 күн бұрын
yes abstraction is the key to the solution i am building
@ctejada-025 күн бұрын
Thank you for sharing
@AlexKen-zv8mm25 күн бұрын
What's name of background music ?
@nicogsplayground25 күн бұрын
Darude - Sandstorm
@AliMoeeny25 күн бұрын
you were in boston and I missed it damn, I wish you announced this stuff here. before you do them :D
@nlhm26 күн бұрын
Cool
@jackswitzer5569Ай бұрын
Incredible talk Doctor Chollet!
@eyujis2Ай бұрын
Interesting, resonates with the approach in Deepmind's AlphaGeometry!
@aaronbeachАй бұрын
My AI professor in college (back in 2003) Christopher Riesbeck said that the definition of AI was a moving target (he had already been doing AI research for 30 years at that point) - he said "Artificial Intelligence is the search for the answer to the fundamental question: Why are computers so stupid?" Even the ARC-AGI description on the page for this prize defends it by saying "It's easy for humans, but hard for AI." I have been convinced that the question of AI has (and will always be until I see differently) the question of why a computer that can do many/most computation tasks orders of magnitude faster and better than a human still fails at certain basic (seemingly simple) human activities. A simple hypothesis yet to be rejected by the science is that a human brain is not a turing machine - that although we use the analogy of computation to correlate what humans and computer do, they seem to be doing two different things. One simple example of this (that is probably wrong) is that the human mind is a quantum process of some sort and that this explains why computers using pattern extrapolation require exponentially more parameters to achieve linear improvements in their ability emulate it. But that's just one example of how the human mind could be something fundamentally different and AI is just using another process ("compuation") to simulate.
@picklenickil2 ай бұрын
Oh it give me an 💡 idea. It sounds to me like when I watch my nephew learning to walk, my wife comes and provies him with a walker (hyper parameters space). What doesn't make sense to me is that, are we trying to build the equivalent of a kid, the walker, or wifu... Or a god forbidden amalgamation of the three?
@hamidelhachimi63432 ай бұрын
Your approach is good, i think the Gpt 4'o will perform well on the training phase. Did you notice that for the testing part, the shapes of Output can significantly vary How do you think you can guess the right answer , if you have only 2 attempts, for example, based on shape of the input (3,3) several solution are possible as output shape (3,3) (1, 2)(5, 5)(9, 9)(3, 1)(1, 1)(6, 6)(1, 4)(12, 12)(18, 18)(3, 6)(5, 3)(6, 3)(1, 3)(15, 15). Notice also the same key can lead to diffrent output. Thanks for your return
@clerothsun39332 ай бұрын
I get that this is a stepping stone, but calling it a test for AGI is just ludicrous. This isn't even close to AGI, it's just a toy.
@peterdayton67962 ай бұрын
Excellent talk. Chollet deserves a lot of credit for developing a benchmark before the GPT craze that is so simple and yet for which GPT scaling has only led to modest improvements on. Even if it does fall soon to ad-hoc approaches, lasting as long as it has is no small feat.
@imad19962 ай бұрын
Somebody from space is listening, "Oh, poor humans, They still refer to abstraction as a blurred image, hahahahahahah." Abstractions have their own use, but they could be a large source of limitations if we try to imitate that from humans. Abstractions could be human limitations in trying to understand matters. Unless we refer to the AI model as the abstraction layer. I don't know how relevant that could be.
@imad19962 ай бұрын
Throwing a million dollars into an AI puzzle fuels substantial momentum toward AI research. And that is awesome. Thank you for differentiating between the output of the process and the process itself. Unfortunately, trending concepts become marketing slogans, such as Samsung Galaxy AI :).
@immortalityIMT2 ай бұрын
Do not we have enough A.I. to brute force the model we are needing?
@googleyoutubechannel85542 ай бұрын
Going through the arc training tests, it's as I feared, many many of the problems are based on cultural conventions of human layout of information, human pattern design, or well known cultural patterns, or comp sci cultural knowledge, not to mention the bias in our visual system. There is no way for even a hyper 'intelligent' tabula rasa system to answer these problems correctly, as the answers are not even 'correct'. The winner will likely be the team that hardcodes in all the cultural visual baggage, where the 'answer' may not be strongest 'pattern' by any measure. A hyper intelligent system wouldn't ever score well on ARC, but a shitty human-overtuned system, one implemented probably with CLIP or something, that is actually a relatively dumb search over an exhaustive corpus of hand-coded patterns built to match the creator's ideas for input/output pairs. The creators of this test think they're smart because the search space is so large, but it isn't, they haven't realized in their hubris the number of different types of 'patterns' they're likely to think of, coming out of a few brains, that are actually good and 'fair' questions that require reasoning, is relatively small, and the kaggle competitors will think of most of them, submitting boring entries over and over again, slowly matching the answers in the problem set as they add ideas. (eg. I bet there's a grip of problems like 'game of life' cell automata, or battleship!, or fibonacci tee hee! all that type of crap, in the private test set, that sort of thing, and I bet the private answers are 'not even wrong', in that they're not the simplest or strongest pattern by any measure, other than by human comp-sci nerd culture)
@googleyoutubechannel85542 ай бұрын
I took a look at the ARC Prize... it... it doesn't make sense? A very smart human scores 22% on ARC... the top AI score is 46% on the leaderboard ALREADY? so... what are you trying to measure here? who is best at 'guess the algorithm I'm thinking of'?
@humnhumnhumn2 ай бұрын
''You are confusing the output of the process with the process itself'' Very nice!
@aimorethanyouaskedforАй бұрын
The argument against functionalism. It amazes me how many think that these are the same thing.
@way2on2 ай бұрын
we are not providing the same tools back to the core architecture. such as realtime inference
@danecjensen2 ай бұрын
Chapters (Powered by @danecjensen) - 00:00 - Intelligence, benchmarks, and AI hype in 2023 03:26 - LLMs autoregressive responses 07:36 - LLMs inability to solve CSR ciphers 10:14 - ML models rely heavily on human labor 11:41 - Minskystyle AI vs Makatsstyle AI 15:25 - Intelligence spectrum skilled, operational, efficient 16:42 - AI models should not be evaluated using human exams 17:29 - AIs next level of capabilities and efficiency 18:53 - RKGI AI benchmark for human intelligence 20:49 - Data sets Kaggle, RKGI, ARK 21:45 - Zach offers 1m RKGI solution competition 24:58 - Physicists describe intelligence and abstractions 29:09 - AIs ability to master tasks efficiently 30:50 - Two types of abstraction valuecentric and programcentric 33:48 - Decentralized program search forAGI 34:27 - Program synthesis PS vs machine learning 35:30 - Program synthesis overcomes combinatorial explosion, LMS limitations 36:46 - AI combines chess and discrete search techniques 41:17 - Deep learning components in discrete programs 41:56 - Deep learning for ArcGi program synthesis 43:19 - Program embedding for efficient search 43:45 - Python ArcGGI pipeline improvement 44:55 - LLMs fall short of GI, need breakthroughs 47:50 - Breakthrough in ArcGGI likely to come from outsider, not big labs 51:09 - Experiential learning and causality in childrens learning 52:16 - Humans are capable of few short program synthesis 53:32 - Human cognition works on fundamental level
@techsuvara2 ай бұрын
I love that the ARC AGI prize is $1mil, where the if in fact you can build something which can scale to AGI. The prize money for a business would by in the the billions...
@techsuvara2 ай бұрын
And they don't even drive cars well.
@JanSmetana2 ай бұрын
16:00 "If you know hot to drive only in very specific geo fence adreas - thats intelligence". Isnt this a SKILL how you defined that? An INTELLIGENCE is ability to drive anywhere (left, right, town, country ...), and not just at my own 3x3 streets box? (which would be learned skill, where you already tried every possibility)
@micheldominic59632 ай бұрын
he said "you know thats less intelligent" not "thats intelligence"
@Maximooch2 ай бұрын
“See you on the leaderboard for ARC-AGI” is quite a way to end it
@techsuvara3 ай бұрын
Excellent reality check for AI. I bet this video won't have 1 million views like the hype videos.
@techsuvara3 ай бұрын
There's IQ tests for that :D . This also goes to show how flawed IQ tests are. Especially when the children of those who wrote the tests seem to score the highest. (True story)
@NitsanAvni3 ай бұрын
20:20 Should we also measure agents with less / more shots? How many shots needed to get an agent to a score of 90?
@PaulTopping13 ай бұрын
Hey, that's the back of my head I see at the bottom right! Good presentation at a fun conference.
@binig.45912 ай бұрын
nice head
@johnkintree7633 ай бұрын
One definition of intelligence is the ability to solve problems in complex situations. There are many problems from the local to a global level that are not being solved well with the current level of intelligence. Our highest priority is creating collective human and digital intelligence. Language models in agentic workflows can extract entities and relationships from text, and merge that knowledge into graph representations. Keeping the human in the loop is important to catch and correct mistakes made by the speech recognition and language models. People can select parts of conversations they have with digital agents to be merged into a global shared graph representation. A global platform can be built pretty much with today's technology that can merge selected parts of millions of simultaneous conversations into a shared world model by the end of this year.
@theotherwhitemeat1403 ай бұрын
Finally! An AGI-like task
@FamilyYoutubeTV-x6d3 ай бұрын
This is really good!
@TooManyPartsToCount3 ай бұрын
It is not entirely clear that we humans perform some computational magic that is beyond merely relying on pre-learnt patterns. And perhaps it just 'feels like' we reach beyond the training set when we have those eureka moments, or imagine that we just came up with a solution to a novel problem? perhaps in fact we have such a tightly woven mesh of prior training examples that those moments when we arrive at 'eureka' are in fact just inevitable and only appear novel to us because we are unconscious of the combined results of all our prior training?
@ggir99793 ай бұрын
That's not really the point that the arch challenge is tackling. This issue is that LLMs cannot do recursions, they cannot do step by step reasoning. When they write down code for a computer program, they cannot interpret it and run it. Everyone that is working with a coding assistant will tell you that they routinely produce code that will not run. Very simple stuff, like simple loops that are easy to unfold. That's why you see a lot of neurosymbolic approaches beeing tried out nowadays ( as explained in the presentation). Humans, on the contrary, can interprate and execute computer programs. Tedious, slow, but doable. We built airplanes and rockets before we had computers, someone had to run these algorithms.
@TooManyPartsToCount3 ай бұрын
@@ggir9979 Point about the Arc challenge taken. I was raising what at least to myself is an interesting meta question, that being that our current state of understanding about what constitutes intelligence could be lacking? especially as concerns our (human) particular brand of intelligence. There is no denying that LLMs are far from what I would call intelligent though! more like data compression artifacts. Most claims that LLMs are more than that are just part of some marketing strategy (think Microsoft's 'sparks of AGI' paper/lectures).
@ggir99793 ай бұрын
@@TooManyPartsToCountGood point! I don't know if you know ben goertzel, he has a lot of interesting things to say about the theory of mind and what is intelligence that goes further than what you could see in this video.
@TooManyPartsToCount3 ай бұрын
@@ggir9979 This - kzbin.info/www/bejne/emnaqYeeedGJeMU And some other vids are where it started for me. Basically went back to school thanks to BG and JB :) Every time I see a YT video on AI with Sam Altman in the thumbnail I think 'why not Ben Goertzel?!' or at least Ilya or Andrej or Cholet or Bengio or......
@ggir99793 ай бұрын
@@TooManyPartsToCountThe rarest of things, a nice and civilized exchange of ideas on the internet :-) I will have to agree 100% with you, any of these speakers are much more interesting to listen to than Sam Altman. I am not familiar with JB's work, so I'll chexk it out next, thanks for the tip!
@Dr.Z.Moravcik-inventor-of-AGI3 ай бұрын
"Next breakthrough will come from an outsider" What breakthrough dude? Have you noticed that AGI = brain algorithms and that my brain algorithms are here since 2016? Nothing has changed since. It's all working. We are dealing with human idiots that are screwing humanity's future. How can idiots that don't see my 2016 invention save humanity? You are not going to save us, you are wasting the time humans have LEFT on this planet. You have already wasted 8 long years!! No one on this planet can return us 8 wasted years. This is not meant only on the speaker, this is meant generally on the whole world.
@G1364-g5u3 ай бұрын
# AI Progress and Generalization: A Critical Review ## Chapter 1: The AI Hype of Early 2023 **Timestamp: **0:00** - **1:57**** - Overview of the peak AGI hype in early 2023. - ChatGPT, GPT-4, and Bing Chat were perceived as revolutionary, with claims that AI would drastically increase productivity and replace many jobs. - Despite the hype, the actual impact on employment has been negligible, and fears of mass unemployment were unfounded. ## Chapter 2: Limitations of Large Language Models (LLMs) **Timestamp: **1:58** - **3:38**** - LLMs, including ChatGPT, have inherent limitations such as failing to understand context and falling into pattern matching rather than true comprehension. - These limitations are tied to the fundamental architecture and approach of current AI models, showing little progress over time. ## Chapter 3: Problems with Task Familiarity and Generalization **Timestamp: **3:39** - **6:31**** - LLMs struggle with unfamiliar tasks, performing well only on tasks they have memorized. - Performance issues arise from the extreme sensitivity to phrasing and the inability to generalize from known tasks to new, similar ones. ## Chapter 4: The Misconception of AI Intelligence **Timestamp: **6:32** - **13:51**** - Intelligence should not be equated with task-specific skill; true intelligence involves the ability to handle novel situations. - The speaker argues for a shift from task-based AI evaluations to measuring generalization and adaptability. ## Chapter 5: Redefining Intelligence and Measuring Progress **Timestamp: **13:52** - **19:15**** - Intelligence should be viewed as the ability to synthesize new solutions and adapt to new situations. - The current benchmarks, based on human exams, are inadequate for assessing AI’s true generalization capabilities. ## Chapter 6: The Abstraction Reasoning Corpus (ARC) and Generalization Benchmarking **Timestamp: **19:16** - **24:22**** - Introduction of ARC, a dataset designed to measure an AI's ability to generalize and perform novel tasks. - ARC aims to control for prior knowledge and experience, emphasizing the need for AI to infer solutions rather than relying on memorization. ## Chapter 7: The Role of Abstraction in AI and Human Intelligence **Timestamp: **24:23** - **30:36**** - Abstraction is the key to generalization; intelligence depends on the ability to recognize and apply abstract patterns. - LLMs are currently limited to low-level abstraction and lack the capability to synthesize new models on the fly. ## Chapter 8: Integrating Type 1 and Type 2 Thinking for AGI **Timestamp: **30:37** - **37:41**** - The next step in AI development involves combining Type 1 (intuition, pattern recognition) and Type 2 (logical reasoning) thinking. - Human intelligence excels because it merges these two forms of thinking, and AI needs to follow a similar path. ## Chapter 9: Combining Deep Learning with Program Synthesis **Timestamp: **37:42** - **42:48**** - Future AI advancements will likely involve merging deep learning (Type 1) with program synthesis (Type 2) to handle complex, novel tasks. - This approach could significantly improve AI’s problem-solving capabilities and generalization. ## Chapter 10: Practical Applications and the Future of AI Development **Timestamp: **42:49** - **45:12**** - Practical strategies for improving AI, such as using LLMs to generate and refine programs, show promise in advancing AI generalization. - The importance of innovative thinking and diverse approaches in overcoming current AI limitations. ## Chapter 11: The Need for New Breakthroughs and Intellectual Diversity **Timestamp: **45:13** - **48:36**** - The speaker emphasizes that progress towards AGI has stalled due to a lack of new ideas and intellectual diversity. - The speaker suggests that the next breakthroughs are likely to come from outsiders rather than big tech labs. ## Chapter 12: Future Directions and Closing Thoughts **Timestamp: **48:37** - **53:34**** - The development of AI tests and challenges like ARC 2 is discussed, aiming for more sophisticated and dynamic assessments. - Insights from observing human cognitive development, particularly in children, could inform AI research and the creation of more generalizable AI systems.
@dizietz3 ай бұрын
I've been a big fan of bongard problems and the Arc test tests similar problem space as bongard problems do. I do think that approaches like Greenblatt's dynamic generation of solutions will push the state of the art forward. I also think that the kind of encoding data the current models use and are trained on isn't fully conducive for this task, and more fine tuned models might help. Also, I think that Arc puzzles should be abstracted further out to three dimensions and more complicated transformations to encode patterns and transformations that even 150+ iq test takers have difficulty recognizing in <30 minutes. Of course constructing that dataset is difficult work. Also, it would be interesting if someone took the arc dataset and constructed an IQ test based on the questions. Given the ease of the questions even in the "hard dataset", possibly timing it like the wonderlic would make sense. I would be curious what the distribution etc of that dataset.
@samkee38593 ай бұрын
Chollets perception and articulation are unmatched
@ceilingfun21823 ай бұрын
Is this guy for real?
@randylefebvre31513 ай бұрын
yes why? The presentation is on point
@wege84093 ай бұрын
Yes, this is the creator of Keras, which is the software that TensorFlow is based on.
@spagetti66702 ай бұрын
Yes he is and probably he's way smarter than you are