Chollet's ARC Challenge + Current Winners

Рет қаралды 37,653

Күн бұрын

The ARC Challenge, created by Francois Chollet, tests how well AI systems can generalize from a few examples in a grid-based intelligence test. We interview the current winners of the ARC Challenge-Jack Cole, Mohammed Osman and their collaborator Michael Hodel. They discuss how they tackled ARC (Abstraction and Reasoning Corpus) using language models. We also discuss the new "50%" public set approach announced today from Redwood Research (Ryan Greenblatt).
Jack and Mohammed explain their winning approach, which involves fine-tuning a language model on a large, specifically-generated dataset and then doing additional fine-tuning at test-time, a technique known in this context as "active inference". They use various strategies to represent the data for the language model and believe that with further improvements, the accuracy could reach above 50%. Michael talks about his work on generating new ARC-like tasks to help train the models.
They also debate whether their methods stay true to the "spirit" of Chollet's measure of intelligence. Despite some concerns, they agree that their solutions are promising and adaptable for other similar problems.
Note:
Jack's team is still the current official winner at 33% on the private set. Ryan's entry is not on the private leaderboard or eligible.
Chollet invented ARC in 2019 (not 2017 as stated)
"Ryan's entry is not a new state of the art. We don't know exactly how well it does since it was only evaluated on 100 tasks from the evaluation set and does 50% on those, reportedly. Meanwhile Jacks team i.e. MindsAI's solution does 54% on the entire eval set and it is seemingly possible to do 60-70% with an ensemble"
Jack Cole:
x.com/Jcole75Cole
lab42.global/community-interv...
Mohamed Osman:
Mohamed is looking to do a PhD in AI/ML, can you help him?
Email: mothman198@outlook.com
/ mohamedosman1905
Michael Hodel:
arxiv.org/pdf/2404.07353v1
/ michael-hodel
x.com/bayesilicon
github.com/michaelhodel
Getting 50% (SoTA) on ARC-AGI with GPT-4o - Ryan Greenblatt
redwoodresearch.substack.com/...
Neural networks for abstraction and reasoning: Towards broad generalization in machines [Mikel Bober-Irizar, Soumya Banerjee]
arxiv.org/pdf/2402.03507
Measure of intelligence:
arxiv.org/abs/1911.01547
I think the audio levelling might be a bit off on this for the intro especially, I fixed it on the audio podcast version - sorry if it's annoying.
Pod version: podcasters.spotify.com/pod/sh...
TOC (autogenerated):
00:00:00 Introduction
00:03:00 Francois Chollet's Intelligence Concept
00:08:00 Human Collaboration
00:15:00 ARC Tasks and Symbolic AI
00:27:00 Evaluation Techniques
00:35:23 (Main Interview) Competitors and Approaches
00:40:00 Meta Learning Challenges
00:48:00 System 1 vs System 2
01:00:00 Inductive Priors and Symbols
01:18:00 Methodologies Comparison
01:25:00 Training Data Size Impact
01:35:00 Generalization Issues
01:47:00 Techniques for AI Applications
01:56:00 Model Efficiency and Scalability
02:10:00 Task Specificity and Generalization
02:13:00 Summary

Пікірлер: 280

@MachineLearningStreetTalk 7 күн бұрын

Post show reflections: I know I was pushing the "LLMs are databases" line quite hard, and the guests (and Ryan's article) were suggesting that they do some (small) kind of "patterned meta reasoning". This is quite a nuanced issue. While I still think LLMs are basically databases, *something* interesting happens with in-context learning. The "reasoning" prompt (or database query if you like) is parasitic on the human operator - but the LLM itself does seem to do some of the patterned completion/extension of the human reasoning prompt pattern in context i.e. "above the database layer" there is some kind of primitive meta patterning going in which is creating novel combinations of retrieved skill programs in the LLM. It's a subtle point but I don't think I was able to express it in the show. - Tim

@TheTEDfan 7 күн бұрын

Hi Tim, LLM output is in the mathematical sense a sequential chaotic process. The ARC challenge uses 2D grids which requires more visual intelligence that is not purely sequential. It is entirely possible and even highly likely that multi model models that have the ability to focus sequentially on parts of images and have in context sequential generation of possible similarities in the examples and testing those during inference will lead to solving the ARC challenge without the need for coding. The main problem with the ARC challenge is that the most advanced models (which are closed source) are not allowed to participate. It is difficult to state those models don’t work if they cannot participate. The solution is to ensure the advanced models are used in a mode that prevents them from training future models on the interactions. A very easy prompt that can be used on multiple hard problems is “keep trying and testing ideas until you find one that works.” That is very much how humans solve such puzzles. LLMs are indeed not just databases if you see what they can do with sequential problem solving like e.g. coding and debugging entirely novel programs. Just as zooming in on Mandelbrot set is not a database problem. Both can take you into directions that you can only discover by going there.

@InfiniteQuest86 7 күн бұрын

No need to apologize. As far as I remember, every guest you've had agrees with the LLMs are databases theory. So, it's pretty reasonable to say that if most of the experts in the field agree. I think the confusion comes from all the marketing hype that the companies put out to try to oversell their capabilities.

@InfiniteQuest86 7 күн бұрын

@@TheTEDfan You can use whatever you want on your own. I'm sure you would get some reward if you could solve all the challenges with an LLM (since many have tried and apparently the best is still only 50%). But the point of the ARC challenge is to discover new ideas, so using an LLM goes against the spirit of it no matter what.

@Charles-Darwin 7 күн бұрын

Excellent content! I'm awaiting the Geffrey Hinton interview.... Also, this avenue that you've been reporting on reminds me of this paper on the brain - as an aside to ML/AI, it's interesting theory as to the organic functions we're trying to create digital counterparts to: [note: yt wont let me post links] google search "National Library of Medicine Top-down predictions in the cognitive brain"

@shuminghu 6 күн бұрын

ICL or RAG doesn't conflict with your view. They are basically relatively trivial kind of of program systhesis on top of LLM , since they involve some heuristic or embedding based discrete choices to guide the LLM. It's a rather weak form of system 2 on top system 1 to help performance IMO.

@paxdriver 7 күн бұрын

I can't begin to tell you how much quality of life this channel has brought me over the years since my health issues have impeded my mobility. These videos are so stimulating and profound, I wish I offer more. I so, so, so much appreciate your work Tim, and Yannic and Keith too. Thank you all so much.

@MachineLearningStreetTalk 7 күн бұрын

Thank you!!

@AkhilBehl 6 күн бұрын

I think the resurgence of the ARC challenge is one of the most interesting things to have happened this year in AI. Just the level of nuance and debate it has forced into the conversation can only be good for the community. Whether it is beaten or not, we’ll all be wiser for having gone through this exercise. Chollet really has devised an incredibly ingenious challenge.

@diga4696 7 күн бұрын

I often share your insightful and well-explained videos with my children. I want to express my sincere gratitude to everyone involved in creating MLST's content. It's truly exceptional. I wish more content creators would prioritize clear, informative delivery over sensationalism, as you do so well. Thank you!

@MachineLearningStreetTalk 7 күн бұрын

Thank you!!

@jmarz2600 7 күн бұрын

I not sure I see ARC tests as examples of "abstraction" and/or "reasoning." I see them as our capacity - at the Perceptual level - to automatically Categorization concrete things into like "kinds" of things due to their perceived similarity (or dis-similarity in the case of a missing similar piece). This is why young children (not yet operating at a very high level of verbal abstract reasoning) can solve these types of problems. The problems are resolved at the perceptual level - not at the higher (verbal) levels of abstraction reasoning. If the images are flashed for a brief faction of a second, you won't "perceive" the solution. Instead, you just stare at them over time, and your brain instantiates them through constructing neural pathways that are similar. And you see the "solution". This is why humans don't need large, labeled data sets to "get" what a cat is. A young child doesn't even need to be at the verbal stage to differentiate dogs and cats in to different "kinds" of things.

@divineigbinoba4506 7 күн бұрын

That's kinda the best measure of core human intelligence we have, this test is not contaminated by knowledge. There's a blur between knowledge and intelligence and that's the issue with current IQ test.

@MrMichiel1983 7 күн бұрын

The idea of the ARC test is that you have a various simple perceptual tests, but the AI needs to instruct itself to solve it. That way it needs to reason, which can be defined as needing to instantiate less brute force perceptual solvers.

@johnvanderpol2 4 күн бұрын

I think many people under estimate the complexity of our visual cortex. There has been interesting research, based on persons with brain defects. And every time one finds new insights. What looks simple in the arc challenge is million of years of evolution. Language is only a few thousands of years Reasoning likely even less. Amazing conversation. Thanks

@Aarron-io3pm 7 күн бұрын

You talk through and present everything so clearly, I can follow along and understand easily despite knowing nothing about ML, thanks!

@dylan_curious 8 күн бұрын

Long term planning and zero shot learning seem like the last hurdles to AGI

@davidafunk85 7 күн бұрын

Zero shot learning sounds easy enough, right? 😜

@benbridgwater6479 7 күн бұрын

It's not so much zero-shot learning that's needed as runtime incremental permanent learning. It doesn't seem gradient descent would work since you'd just be fine-tuning on new experience and would end up losing the pre-trained model's capabilities. Runtime learning might sometimes be one-shot, but other times generalization over repeated patterns, learning exceptions, etc. Really need to ditch gradient descent altogether and find new incremental method.

@MrMichiel1983 7 күн бұрын

@@benbridgwater6479 No need to ditch it all together; gradient descent is how you learn to throw a ball; a little better each time. Gradient descent is not how you learn to reason a little better each time.

@MrMichiel1983 7 күн бұрын

Also multi-context, so the AI can work on a text making use of a working context. That way there is no contamination between the working text and the meta-goals. This way you can ask the AI to be "didactive" in parts of the text and "critical" in other parts.. I would think embedding slicing could solve that.

@benbridgwater6479 7 күн бұрын

@@MrMichiel1983 Yes, but in our brain fine motor skills like ball throwing are learnt by Cerebellum, while cognitive pattern matching/etc is learnt by Cortex. What AGI needs is for "cortical learning" to be available all the time ([preferably no training vs inference time distinction).

@KitcloudkickerJr 7 күн бұрын

I love these guys. what an amazing conversation

@burnytech 3 күн бұрын

Wow this is better than expected

@ArtOfTheProblem 7 күн бұрын

Congrats! We both hit 131k subs at the same time :) - What's everyone's take on few shot promting vs. test time fine tuning? My sense is in the limit, few shot prompting would be all you need, and ultimately zero shot (based on their point that as the foundation model gets bigger, you need lest test time tuning)

@BrianMosleyUK 7 күн бұрын

Absolutely fascinating, rich and informative episode. Lots to consider here, thank you so much. 🙏👍

@dr.mikeybee 7 күн бұрын

Error functions have a natural tendency towards parsimony, reflecting the principle of Occam's razor. This principle is also observed in nature, where simple and efficient solutions often prevail, suggesting that parsimony is a fundamental aspect of both human and other natural systems.

@azi_and_razi 7 күн бұрын

Not always. There is that recent paper "Neural Redshift" claiming that this is not universal for all neural networks and is in fact related to specific architecture choices: "But unlike common wisdom, NNs do not have an inherent “simplicity bias”. This property depends on components such as ReLUs, residual connections, and layer normalizations." They show that randomly initialized networks (not trained yet) have different complexity properties based on metrics such as frequencies in Fourier decompositions, order in polynomial decompositions, and compressibility of the input-output mapping and it influences how hard is to train networks and get good generalization. Their conclusions: "We examined inductive biases that NNs possess independently of their optimization. We found that the parameter space of popular architectures corresponds overwhelmingly to functions with three quantifiable properties: low frequency, low order, and compressibility. They correspond to the simplicity bias previously observed in trained models which we now explain without involving (S)GD. We also showed that the simplicity bias is not universal to all architectures."

@0xmassive526 12 сағат бұрын

never touched machine learning, don't know what a tensor even is (just seen it as a class in some machine learning code on twitter), but 35 mins in the video and I dont feel lost. you bet im subbing.

@GabrielVeda 7 күн бұрын

I am nonplussed by the ARC-challenge. LLMs fail on it simply because they are out of domain. These are visuospatial tests. Better pre-training and scale on a multimodal model is the path forward. All these algorithmic and symbolic addons are just hacks. Forget about search space. Just make a bigger model and let it do what it does best - build abstractions and spot patterns. I also think Chollet is massively underestimating the amount of visuospatial training humans receive in their lifetime. Which is why I think it is fair to simply suggest multimodal scale as the ultimate and best solution to this challenge. It is perfectly fine to challenge Chollet's take on this. Beware reflexive deference to authority.

@NextGenart99 7 күн бұрын

Yes I kept saying this, The LLM visual recognition capability isn’t precise enough for this. It’s not that the LLM doesn’t know, it’s just that it can’t see it as clearly as people think.

@drhxa 7 күн бұрын

Agreed! If you've ever really tested the best multimodal frontier models on images to their limits you'll know they have terrible vision ability. I'd guess it's less than 1% as capable as a human. As Ryan pointed out in his blogpost if a human was read aloud problems from the ARC test while blindfolded and asked to speak the output row by row, that's what you're asking today's models to do in solving this. Of course they won't do well, and neither would 99.999% of humans

@benbridgwater6479 7 күн бұрын

Sure, the easy way to solve these would be to train on a massive test set of similar problems, but that defeats the purpose of making progress towards AGI. You'd like an AGI to be able to solve problems that it is not familiar with and has not pre-trained on.

@TheTEDfan 7 күн бұрын

Entirely agree. LLMs evaluate and generate sequential input and output. Visual patterns require a more parallel interpretation which is entirely possible with neural nets. Very much overhyped. A soon as more visual multimodal models are trained it will be fairly straightforward to let the model guess patterns and similarities, test these in context during inferences and conclude whether they work or not. And continue to search until there is a solution. Very much like humans. Nothing special about that. Visual data requires more data and compute but nothing that is unimaginable in the near term.

@drhxa 7 күн бұрын

@@TheTEDfan yes although I do question whether even with high quality vision perceptual capability that will translate to visual reasoning. I suspect there are some things the human brain does as it's trained on very high res continuous vision. So we learn how things move in an image/video and predict what they will look like over the following few seconds continuously. It may require training on video directly as opposed to just images. Or maybe the multimodality will transfer easily, idk

@rubncarmona 4 күн бұрын

In my mind we're simply looking for algorithms from which Gestalt naturally emerges.

@NextGenart99 7 күн бұрын

I don't think the LLM's image recognition capabilities are precise enough for the ARC challenge. It's not that the LLM doesn't know; it's more that it cannot see as clearly as you think.

@drhxa 7 күн бұрын

Yes it cannot see and (as part of that) it cannot reason in the spatial domain. Sooner or later this will be solved

@sirkiz1181 7 күн бұрын

Yeah, I think it’s unclear if simply solving the image recognition will be enough, it might not be, but the ARC test does feel a little pointless currently when the LLMs clearly can’t see clearly.

@InfiniteQuest86 7 күн бұрын

Yeah LLMs are not precision instruments. They are good at getting the gist of things in any domain. This is just an extra layer that makes that even more pronounced. I would argue if you made an LLM be able to solve these, they would cease to be useful in any other domain. Plus what the heck is wrong with people. It's not a language task, it's vision+reasoning task. I don't understand why people try to use a language algorithm to solve literally everything now.

@NextGenart99 7 күн бұрын

@@InfiniteQuest86 Yeah, I did a quick experiment where I described the scene of the first question on the test in great detail, and the model was able to get it correct because it was now a text-based problem. I then refreshed the chat and asked the model to describe what it sees, and it was evident that the vision capabilities were the issue, though I was already suspicious of this being the issue based on my long experience with the model.

@marcfruchtman9473 7 күн бұрын

I tried this with GPT4o, and it does understand the color scheme when I paste it in. It can't grasp the conversion tho. I think without some hints this particular problem is very difficult to solve.

@mouduge 8 күн бұрын

At 5:38, it's not monoticity, it's increasing vs decreasing amplitude. Just nit-picking. 😄

@MachineLearningStreetTalk 8 күн бұрын

You passed the test, well done my friend :)

@marcfruchtman9473 7 күн бұрын

Ah, looks like we both noticed that, but actually it is not increasing and decreasing amplitude. It's increasing and decreasing variation in Amplitude.

@samifawcett4246 7 күн бұрын

@@marcfruchtman9473 actually its the emotional state of a 5th grade doodler, get it right please.

@puneeification 7 күн бұрын

@@marcfruchtman9473 I mean you really only have to read the legend under the picture guys, it's not that hard...

@marcfruchtman9473 7 күн бұрын

@@samifawcett4246 hehe

@aiwonderer 7 күн бұрын

Very informative interview!

@shuminghu 7 күн бұрын

The superposition example at 9:34 is not in the clockwise order: It's yellow -> red -> pink -> white with later ones on top. Left edge of first one shows pink is on top of red.

@redacted5035 5 күн бұрын

Timestamps 00:00:00 Introduction 00:03:00 Francois Chollet's Intelligence Concept 00:08:00 Human Collaboration 00:15:00 ARC Tasks and Symbolic AI 00:27:00 Evaluation Techniques 00:35:23 (Main Interview) Competitors and Approaches 00:40:00 Meta Learning Challenges 00:48:00 System 1 vs System 2 01:00:00 Inductive Priors and Symbols 01:18:00 Methodologies Comparison 01:25:00 Training Data Size Impact 01:35:00 Generalization Issues 01:47:00 Techniques for AI Applications 01:56:00 Model Efficiency and Scalability 02:10:00 Task Specificity and Generalization 02:13:00 Summary

@dr.mikeybee 7 күн бұрын

I think an autoregressive self-supervised next token prediction multimodal training system that uses BERT embeddings concatenated with abstract representations of images of Chinese characters would teach a model enough abstract spacial reasoning to pass the arc challenge. The trick here is that these images have semantic meaning, and the Chinese language has incorporated spacial ideas with semantic purpose into characters.

@KaplaBen 7 күн бұрын

9:54 I don't think it's clockwise. The first example shows that white should be on top, but also that pink should be on top of brown (there would be more brown in the solution otherwise). So I'd guess: white > pink > brown > yellow (in terms of z-index)

@benbridgwater6479 7 күн бұрын

It's overlaying 4x4 quadrants of the original on top of each other, treating black as transparent. 4 over 1, then 3 over 1, then 2 over 1. Solution is resulting quadrant 1.

@marcfruchtman9473 7 күн бұрын

@@benbridgwater6479 You can also think of it like layering colored papers on top of each other with the black part cut out. If the Upper Right is Quadrant 1, And Numbering Clockwise, in Quadrants... Lay down Quadrant 4 first (All yellows + Black), then lay Quadrant 3 on top of it (All Pink and Black), then lay down Quadrant 2 ((Red + Black) and finally lay the last layer Quadrant 1 (White and black)... the black represent nothingness or glass or pure alpha channel or transparency depending on your viewpoint, just as long as you let anything under black come thru to the top). Once a color gets covered by anything other than black, it is superseded.

@simonahrendt9069 7 күн бұрын

Your original comment is correct (white > pink > red/brown < yellow). The comment directly above confuses the order the quadrants are superimposed, ignoring that pink visibility trumps red/brown visibility.

@benbridgwater6479 7 күн бұрын

@@simonahrendt9069 No - the only special color is black which acts as transparent when overlayed on top of something else. Otherwise what "trumps" what just comes from the overlay order - whatever ends up on top wins. Easy to verify in GIMP.

@marcfruchtman9473 7 күн бұрын

@@simonahrendt9069 The "order" is Layer 1 (Yellow = Q4), Layer 2(Pink=Q3), Layer 3 (Red=Q2), Layer 4(White=Q1). Red covers pink. Pink does NOT trump red. Where Q4 means quadrant 4, and you layer it down going counterclockwise starting at Q4. (But you can name the quadrants whatever you want... the order does not change).

@MachineLearningStreetTalk 7 күн бұрын

Clarification on reasoning. "knowledge acquisition = reasoning" is a great heuristic but clearly isn't exactly correct. I think it helps folks understand what is meant in this context though. It might be more correct to say that reasoning is the thing we do when we "rearrange the variables" to construct models (in many cases composed of existing models we already have) to make sense of the world. My cohost Keith Duggar defines reasoning as **performing an effective computation to** **derive knowledge or achieve a goal.** “We may have knowledge of the past but cannot control it; we may control the future but have no knowledge of it.” - Claude Elwood Shannon Science leverages control to gain knowledge. Engineering leverages knowledge to gain control. Reasoning is the **effective** computation in both. Effective method: A method is considered effective for a class of problems if it meets these criteria: 1. It consists of a finite number of precise instructions. 2. It terminates after a finite number of steps. 3. It always produces a correct answer for problems within its class. 4. It can be executed by a human using only writing materials. 5. It requires no ingenuity, only strict adherence to the instructions.

@benbridgwater6479 7 күн бұрын

Reasoning doesn't require knowledge acquisition - it can just be reasoning over known facts, using known methods. What reasoning does need is multiple steps and combinatorial application of knowledge. Essentially Intelligence = prediction, and reasoning = multi-step what-if prediction.

@MachineLearningStreetTalk 7 күн бұрын

@@benbridgwater6479 This is just semantics - I agree with you. Reasoning can be described as "creatively recombining knowledge you already have". There is still an infinite space of recombinations which needs to be performed efficiently.

@khonsu0273 6 күн бұрын

Tim, and Yannic and Keith are the smartest tech-journos I ever heard of 😀

@earleyelisha 7 күн бұрын

Hey Tim, could we ask Chollet about a broader measure of intelligence (physical, visual understanding of the world instead of abstract anthropocentric)? i.e - cats, dogs, gorillas, etc have immense intelligence about the world but they wouldn’t even rank on ARC.

@eva__4380 7 күн бұрын

They can tho. Take any animal they do understand concepts such as connectedness or inside vs outside

@earleyelisha 7 күн бұрын

@@eva__4380 Indeed. The ARC challenge isn’t designed though in a way that would allow a non-anthropomorphic intelligence to illustrate this fact.

@6lack5ushi 7 күн бұрын

I can back up everything he said with lower domain examples, especially the 32k context deg! I’ve written to OpenAI devs on this since the 120k context models came out. 4o was an improvement from turbo but still lacks the nuance of the OG gpt4 model. We still use this where we can. But the context window hurts us

@pliniocastro1546 7 күн бұрын

Any examples on your lower domain proposal?

@6lack5ushi 7 күн бұрын

@@pliniocastro1546 RAG, Recall, Search. grab any piece of text, after 32K you just get a significant drop in intelligence. give it 100K tokens ask it to identify or count the number of times a word is said. then use command F.

@JimStanfield-zo2pz 7 күн бұрын

I enjoyed this video a lot. Youre an excellent speaker and you clearly illustrated some of the current issues with the llm approach. Unlike Gary Marcus you have fully convinced me that the deep learning method via ANNs is not up to the task of achieving full blown human intellect. You have convinced me that there are absolutely additional priors that are needed that are not going to map very well on to this particular form of artifical learning. Thank you for the insight. A lot more work needs to be done

@stevelk1329 7 күн бұрын

I guess I'm missing something? The first ARC example - the three pairs of 06 drawings, numbered 15, 43 and 84, with each of the 2 in any pair called either A - on the left, or B on the right. So givien those definitions and the three pairs of six drawings, it seems like he gets part of the first one (15) wrong when he says the six drawings on the left are 'not connected together'; he then says (correctly) that the six on the right 'have a gap'. For 43, he gives a different explanation than the verbiage below the drawings, and calls it monotonicity. He says on the right 'they're all going up or they're all going down' - I guess this could be considered correct depending on if you approach each drawing from either the left or right side, even though they're described as all increasing. But then he says the drawings on the left are 'going up and down at the same time' which is not monotonicity (I just looked it up :-), is it? I don't know this stuff. What am I getting wrong? Thanks for an interesting video.

@luke2642 6 күн бұрын

Around 1:56:15 we get to the nub of it... Traversing a manifold in the forward pass of an LLM, in activation space, is more than database retrieval.

@bwowekeith4472 4 күн бұрын

ARC is really addictive

@TheLegendaryHacker 7 күн бұрын

22:05 How does a simple version of the exact solution Chollet imagined go against the spirit of the metric?

@marcfruchtman9473 8 күн бұрын

Thanks for this interesting Video. Just getting started (first few minutes into the video) but it appears to me that Bongard's test 4:44 number 43, regarding sets of amplitude has an error. By definition Amplitude is the displacement vs the baseline (peak height or peak trough) therefore, this question seems to have some errors where the amplitude in some of the samples in set A are not increasing, and some in set B are not decreasing, therefore it appears to me that what they might have been looking for was actually meant to be the Variation in Amplitude., ie the answer should be Set A has "Increasing Variations of Amplitude" and Set B has "Decreasing variation of Amplitude" (where variation would be the peak to trough of the wave vs the previous wave's peak to trough) To be more specific, Not all examples in Set A are increasing in Amplitude, but all are increasing in "variation". And not all examples in Set B are decreasing in Amplitude, but they are all decreasing in variation. In that respect, it is important to note that the "quality" of the test question could reflect poorly on some AI, that can't really "complain" about the answers not being present if they are forced to choose from a set of answers?

@marcfruchtman9473 7 күн бұрын

Additionally, at 10:00, the 1st 8x8 diagram converts to a 4x4 diagram. It is incorrectly converting. It looks to me like the Top Left Quadrant is layer 1, going counter-clockwise, Bottom Left, then Bottom Right, and Finally the top layer is Top Right. By stacking the layers in order, bottom to top, and considering Black as absence of color -- meaning black would appear like "glass", then if you stack the upper left quadrant first, you get pretty much all yellow, then stack bottom left, you cover with mostly pink, leaving some yellow showing thru the black, then you stack the red(brn?) and that is where the "error" shows up. The 3rd quadrant of Red should create 2 red dots at row 3 [R][R][P][P], then finally you add the last layer, which is why it "appears" to take precedent, because it is last. You get: [Y][Y][W][B] [P][P][P][R] [R][R][W][P] (NOT [P][R][W][P] [W][W][P][B] In other words the "Answer" (at least as far as I can tell, accidentally has pink in the 3rd rows, 1st column and it should not.

@puneeification 7 күн бұрын

Depends how formal you are about the definition of amplitude I guess? Decreasing / increasing (peak to peak) amplitude, maybe? I don't think that has much bearing on the test itself since the answer remains the same regardless of how nitpicky you want to get about the description you choose.

@marcfruchtman9473 7 күн бұрын

@@puneeification If a signal is increasing in amplitude, it gets greater from baseline. Therefore, question 43, incorrectly shows amplitudes both increasing and decreasing on the left and right sides. Even if you consider peak to peak, that is still an amplitude off of a baseline, therefore Question 43, right side, 2nd column, row 2, shows a rising amplitude from baseline. A question needs to be formulated properly in order for it to be considered a valid test of intelligence.

@minecraftermad 8 күн бұрын

29:00 would mamba 2 be effective with the symmetry transformation combination? it seemed to be a bridging concept between ssm, and transformers at least, if i understood the paper correctly. Another paper on llm grokking, seems like an important step towards reasoning.

@benbridgwater6479 7 күн бұрын

Groking is training-time generalization (learnt via gradient descent). What these example-based ARC tests require isn't learnt generalizations over training samples, but rather runtime ability to form new generalizations over each ARC problem's example before/after pairs. Not only is there no gradient descent available at runtime, but the generalization task is a bit different since we know a generalization exists for each problem, so it's really a matter of search to find it (i.e. find a composition-of-transformations description for example # 1 expressed in a generalized form that also works when applied to the other examples for the problem).

@mrpocock 7 күн бұрын

The difficulty of this task doesn't come from long range interactions. The problems are very small.

@Cygx 7 күн бұрын

Are we sure that there is only one correct answer for all the questions though? What if the test set is based on our understanding of the world and rules, when in fact ML can find hidden rules that are just as valid for the test set, but aren't rules that we have thought of as valid?

@marcfruchtman9473 7 күн бұрын

Excellent point.

@deadeaded 7 күн бұрын

You can say that about literally every single ML result. Whether they perform poorly or well, they're always picking up on patterns in data. For example, when a NN learns to identify a snowplow based on the presence of snow in an image, they are picking up on a valid pattern (snowplows are more likely to show up when there's snow around). The challenge has always been to make them learn the right pattern.

@wwkk4964 7 күн бұрын

Yes. Remember when you were being tested and you had to guess what the questioner thinks is the right answer, even if it's not? Some of our tests to the ML algorithms follow that pattern, it has to guess not only the right answer but make sure it's what we think is right.

@marcfruchtman9473 7 күн бұрын

@@wwkk4964 LoL, well, if a student shows me that a question is formulated incorrectly, I give them credit, and then I fix the question. This question set has been out in the wild for HOW LONG now and still no one has fixed it?

@wwkk4964 7 күн бұрын

@@marcfruchtman9473 negative numbers did not exist for European mathematicians well over a millenia after brahmagupta gave rules for computations with negative and zero.

@holonen 7 күн бұрын

I'm curious of what prompt might trigger an LLM to "deploy" the kind of visual pattern seeking that an ARC challenge demands. I came to think about the so called "gestalt laws" that seem to be hard coded into our perception and thinking. From a certain vantage point these could be understood as the kind of "psychological priors" that might help when solving ARC problems. So; would a prompt like "Always use the gestalt laws when solving problems". This might put an LLM in a mode that together with train-of-thought and visual thinking seems to do.

@holonen 7 күн бұрын

"Regarding the prompt "Always use the gestalt laws when solving problems", this instruction could potentially enhance an LLM's ability to address the ARC problem in the following ways: Principle of Similarity: Grouping similar elements together can help in identifying patterns and relationships within the data, facilitating better comprehension and reasoning. Principle of Proximity: Clustering elements that are close to each other can aid in understanding the structure and context of information, which is crucial for solving complex problems. Principle of Closure: Encouraging the model to perceive incomplete shapes as complete can improve its ability to fill in gaps and make more accurate predictions based on partial information. Principle of Continuity: Recognizing continuous patterns can help the model follow logical sequences and enhance its reasoning capabilities. By incorporating these gestalt principles, LLMs could potentially improve their performance on the ARC benchmark by enhancing their pattern recognition, problem-solving, and reasoning skills, leading to more human-like comprehension and decision-making abilities "

@djcardwell 5 күн бұрын

once a machine realizes "I think therefore I am" we've achieved AGI

@MrBillythefisherman 7 күн бұрын

To rule out that the core problem for an LLM is that this problem is 2D. Has anybody tried to create a 1D version of ARC? Same type of problems just in a 1D grid?

@marcfruchtman9473 7 күн бұрын

Most LLM convert this into 1D internally... they just take every pixel, and convert it into a very long 1 dimensional array then consume it in 1 shot, don't they?

@DavidConnerCodeaholic 8 күн бұрын

Cool!

@AhtiAhde 6 күн бұрын

Considering the Aristotelian Blank Slate idea and LLMs in general, we would perhaps benefit from philosophers to do some work for us in trying to well-define what is going on here. Fortunately I have been doing just that. Aristotle went out of fashion during the Enlightenment era Rationalism from which the digital computing paradigm. At the same time we get the idea of bottom-up causality explains everything and that language has a pure logical form. Then comes the 20th century and states that digital arithmetic systems can not compute quantum information efficiently (Schrödinger Equation), bottom-up causality doesn't cut it (Gödel) and language is evolutionary and logic has no content (Wittgenstein). In practice this means that natural language has Zipf's Law (fractal dimensionality of word frequency), where most frequent words are syntactic, in other words, they give logical support to the content of communication. Guess what? When we measure sequential windows of Zipf's Law the grammatical top frequency stabilizes around thousand tokens, which probably explains the quality difference between GPT-2 and GPT-3; and why other technologies see similar quality increase when they reach the same limit and why Transformers have not had qualitative leaps from further scaling. Logical structures are Blank Slates in a sense. These are the "genetical brain organs" in humans (Chomsky) or the architecture of neural network training apparatus / feature engineering. What happens to the content after learning is the interesting thing that ARC-like tests should go for. According to Eero Tarasti and his Existential Semiotics, with humans we encode phenomenal patterns together with our noumenological existential volatility. In other words emotional frustration allows us to easily switch context when our initial problem solving goes wrong. Neural Networks do not have that. They are purely phenomenologically limited to the target function. In Quantum Machine Learning we also get the Phase Shift parameter which could be "emotionalized", but doing it in human-like manner would be nearly impossible. But still it would give something more controllable than with Transformers, where you have to guess the correct magic word for invocation of viable secondary contexts. In other words, human brains have evolved for our own environment for a very long time, which gives us super powers because the "no-content" apparatus of us is always inside the distribution. I call this Cartesian digital bottom-up computational system as Hobb's Golem (Descartes said computers are imposdible, Hobb's said "challenge accepted); Hobb's Golem is not a product of natural evolution, but it is always built with engineering principles. We both might start as Blank Slates, but the way our Logic Organs work and the way our content gets encoded, is fundamentally different. Trying to prove there is no difference is silly. The interesting question is, how does that matter? At which point should we take information pollution of digital environments seriously and try to build more "brain friendly" user experiences? Are LLMs part of that or against? I think LLMs might be good for "more democratic access", but problematic as "unverifiable content consumption"; should we just start paying journalists and educators proper salary so they could do their jobs rather than trying to synthesize away the human component in information refinement? Would be interesting to see a show around these domains. QNLP, Aristotle, Wittgenstein, Complexity Sciences, 4E cognition (Post-Cognitivism and their reaction to Connectionism).

@drdca8263 5 күн бұрын

I don’t see how you get “bottom-up causality not working” from Gödel’s completeness theorem?

@jasonbartlett1357 7 күн бұрын

The "winner" would be the person scoring 100%. The "leader" is the person with the current highest score.

@sofia.eris.bauhaus 4 күн бұрын

9:10 it's not clockwise, magenta is on top of red, the order is white-magenta-red-yellow.

@Will-kt5jk 7 күн бұрын

9:56 *anti-clockwise

@theoreticalorigamiresearch186 6 күн бұрын

Can you get Maurice weiler on the podcast?

@krashnchoudhary2680 6 күн бұрын

I'm interested for ai feature

@geldverdienenmitgeld2663 7 күн бұрын

can we lern to creaty symphonies from seeing frequency images of audio files? No. It is not a matter of intelligence but a matter of state representations. LLMs have no sense for the arc states. Therefore these tasks are difficult for them. But it is no problem to train NN with such tasks and they woulod be able to solve them because they would learn to understand the states like the understood linguistic states.

@memegazer 7 күн бұрын

What does it say about society and our traditional metrics of performance assuming the premise that LLMs are not "true intelligence" is valid. Suppose the analogy "LLMs are just look up databases" is true...even though if so they are data bases that machines organized not humans but what does it mean if that sort of compression/memorization can perform tasks that society values as useful even if we want to debate if it is actually a sign of intelligence. For some reason I am reminded at how chimps out perform humans in the aptly named "chimp test" meanwhile nobody is particularly impressed with the kind of profound insight that can offer to our beliefs about what relevant philosophical terms might actually imply. Either way it seems obvious to me that we are on the verge of major disruption regardless if people want to debate the metrics of semantics. We have still entered the age of the thinking machine to my view bc no longer are people discussing whether it is or is not possible for machines to reach human levels of performance in some arbutrary task...now the question is what is the best metric to use for comparision... IMO this is what Turing meant to broach with his Turing test thought experiment. Not as a true test to detemine some measurable difference...but rather he anticipated a time when attitudes and perceptions would shift focus beyond the mystical beliefs about what it means to think and more towards the demarkation of how that plays out as an application and how humanity will respond in terms of what we value.

@InfiniteQuest86 7 күн бұрын

Watch some more of these videos. Pretty much everyone he has every talked to agrees that LLMs are just a database lookup and can't do any reasoning. All the experts in the field agree on this. And it just makes logical sense. By what mechanism could they possibly do reasoning? There's nothing in the algorithm to do that. LLMs are intelligent in the way wikipedia is. It contains a lot of information, but it isn't going to be able to reason on it. ARC is a perfect example. The prompt "Tell me the 5th letter in this sentence" is another. It can't even handle these simple things in large part because it doesn't understand anything. It just probabilistically selects the next likely word given the previous words. People really need to come to terms with this. That's literally all it's doing.

@ATKS-mz2oo 4 күн бұрын

recommend more please help

@takamotoyagami4222 7 күн бұрын

Can some one explain the puzzle at 9:50

@benbridgwater6479 7 күн бұрын

It helps to have a programmer's mind ... Call the top two quadrants of the 8x8 pattern as "1 & 2", and bottom two quadrants as "3 & 4". Now, treating the black squares as transparent, copy quadrant 4 over 1, then 3 over 1, then 2 over 1. The output 4x4 pattern is the resulting quadrant 1.

@marcfruchtman9473 7 күн бұрын

@@benbridgwater6479 Yes. Except the "answer" that they provide in the video is "incorrect", I don't understand why they have it wrong, but technically the 3rd row should be RED RED White Pink, not Pink Red White Pink.

@peterkonrad4364 7 күн бұрын

also: i dont understand how tokenization is still a thing. i mean it may have been useful 15 years ago to speed things up. but nowadays the first two layers could do the tokenization. just take the ascii characters. you lose two layers of efficiency, but you would gain so much. suddenly the model can count letters in words, can count words, etc. without having to resort to external tool-calls. and you could do all of this 2d grid made of ascii characters stuff natively. ascii as direct input is an obvious choice.

@benbridgwater6479 7 күн бұрын

The problem with that - if you start by embedding characters rather than sub-word tokens, is that the initial embeddings (before they start getting augmented by transformations) won't have any meaning. It makes the leaning task much harder. Starting with tokens the model can learn a word2vec type embedding where some semantics is already captured.

@peterkonrad4364 7 күн бұрын

@@benbridgwater6479 yeah i get that. my question is only: how many layers of a say multilayer perceptron does it take to emulate the tokenizer? take the word "tokenizer". lets say these are two tokens: "token" and "izer". it is clear that using these as input has many advantages. i just wonder if it really takes so much computing power of a neural net to learn this tokenization from the single characters? is it not after layer 2 or so already basically the same? it just has to combine the five characters into one thing. and then you have the embedding of that thing, that is equivalent to the "token" embedding.

@peterkonrad4364 7 күн бұрын

@@benbridgwater6479 and isnt it true that your level 0 input actually shouldnt have any meaning or semantics? it gets its semantics on the way through the layers, but not at the input level. a pixel in a cat picture doesnt have a meaning either until it gets further down. i agree that it slows it down and makes it harder, but it should be possible and it should have other benefits.

@drdca8263 5 күн бұрын

I imagine rather than ASCII you mean one token for each possible byte, rather than just the 127 or so that encode for an ASCII character? (So that it can still handle Unicode characters)

@mikelord93 7 күн бұрын

Why don't we uncristalize the models? Let them run in training mode and use a function to determin the backpropagation signal strenght. This should make them able to slightly alter the weights when they are wrong. Give them neuroplasticity and see what comes of it

@bobbond6724 7 күн бұрын

Reasoning crystallised into LLM is different to reasoning crystallised into a human brain how exactly?

@cakep4271 7 күн бұрын

Flexibility? Human crystalized intelligence is not really very crystalized at all. Even our old memories are constantly being modified by more recent experiences

@benbridgwater6479 7 күн бұрын

The main point is that the human also has fluid intelligence (combinatorial reasoning/problem solving) while the LLM doesn't.

@Julian.u7 7 күн бұрын

You throw the word “non computable” nonchalantly. Please elaborate.

@MachineLearningStreetTalk 7 күн бұрын

I think we address this in the some of the other shows we have on this i.e. kzbin.info/www/bejne/gGHTkKeef6-hpdE and kzbin.info/www/bejne/o3a5n6hjgL-dp5Y - but we will be sure to address it in the upcoming show with Chollet. The basic idea is just like with computing some Bayesian quantities are computationally intractable because they require you to consider every possible value of a thing i.e. an infinite number of things, Chollets measure of intelligence also requires to to consider the space of all possible programs which clearly isn't possible to do on a computer

@mikezooper 7 күн бұрын

We need a kind of AI baby that learns by interacting with the world, and living alongside humans. An evolutionary algorithm needs to reward AI that uses less data for higher inference. How can that we done? Could something like that be run using a basic simulation of some problems.

@drdca8263 5 күн бұрын

The reason there hasn’t yet been a successful attempt at this, is not for lack of people thinking of it.

@Soul-rr3us 7 күн бұрын

Discord is the best

@fai8t 7 күн бұрын

Automatic Reference Counting

@luiscunha6657 7 күн бұрын

Counting on chatgpt to summarize this. People don't like the "bitter lesson" about AI. Actual chartered Psychologist here.

@MachineLearningStreetTalk 7 күн бұрын

The "bitter lesson" was wrong though, because all our best models work precisely because they have a tonne of hand-designed engineering and priors. Believing in scale was a cute idea a couple of years ago

@luiscunha6657 7 күн бұрын

@@MachineLearningStreetTalk Thinking it's wrong is a cute idea nowadays. Maybe 70 years and a stage of the field where one reference manual with 1000+ pages called "Modern AI" (Russell / Norvig) dedicated a (very) few pages to ANNs, in a sub-section of a chapter, and Hinton having papers rejected because one about NNs was already enough for a top AI Conference wasn't enough. People need to turn to philosophy and making knots in their minds. You really think that was the way Connectionist ideas were introduced In Psychology? Or what originated the Transformer architecture? Or even what showed the Transformer could go from GPT 2 level performance to GPT 3 and beyond? This is the intuition: your brain has neurons. Neurons do a certain type of information processing. You have a lot of them, and evolution made human neurons organize in a particular way that makes humans have cognitive capabilities that are far beyond what other animals have. In neuroscience, many folks are convinced the changes in the human brain were also mainly a matter of scaling (number of neurons and connections in certain areas: no need for "hand-design engineering" there too: cute, him?). But no doubt you are much smarter than I am (no ironoy or cinicism here: just something I believe). I just think you have a too much "philosophical" tendency, and you have a profound need to complicate stuff. There was a time I regularly followed your show: it was fresh, and ideal to watch over my kid as he played in the street. But nowadays, if this is your idea of street talk... Kind regards, and thank you for answering my comment. I feel honored.

@luisluiscunha 7 күн бұрын

@@MachineLearningStreetTalk Not believing in scale is a "cute" idea nowadays. Believing in it was a bold stance that few took-kudos to Frank Rosenblatt and Geoffrey Hinton. Undoubtedly, their inspiration came from the brain and their background in psychology. The human brain isn't that different from other animals near our evolutionary line. Recently, Demis Hassabis, who also has a background in neuroscience, joined Hinton in criticizing Chomskian ideas about language. There's no time for discussions now, but it's unfortunate that my more curated answer done on the tablet was deleted. Kind regards. As someone who isn't a native English speaker, I thought, "Why not ask ChatGPT to improve my grammar?" After all, if AI can mimic the brain, it can surely handle my sentences!

@MachineLearningStreetTalk 7 күн бұрын

We have interviewed connectionists, try the Nick Chater interview. NNs are nothing like the brain whatsoever, you might enjoy the Max Bennett interview when we release that. Unfortunately due to the complexity of the topic I can't address your other points here but suffice to say we have addressed them many times on previous episodes.

@luisluiscunha 7 күн бұрын

@@MachineLearningStreetTalk I appreciate the time you took to answer me. Thank you very much. Keep the good work.

@andybaldman 7 күн бұрын

Complaining about what ISN’T agi is like publishing your password on your Facebook page

@Bigre2909 7 күн бұрын

I'm feeling dumb : I must be a LLM

@thorvaldspear 7 күн бұрын

5:34 I still can't understand this one even after 3 replays. I guess I'm not generally intelligent!

@thorvaldspear 7 күн бұрын

After reading @mouduge's comment I now recognize it as increasing vs decreasing amplitude. Took me a while.

@marcfruchtman9473 7 күн бұрын

@@thorvaldspear No, that is incorrect... Amplitude is supposed to be referenced to the baseline. Therefore, technically, the answer provided by the page displayed in the video is incorrect. Set A has is Increasing variation of the amplitude (on left), and decreasing variation of amplitude on the right. The answer regarding amplitude alone is not correct, specifically because , right side, 2nd column, row 2, shows a rising amplitude from baseline. However, if you modify the question to be "Variation" in amplitude, then it would make a lot more sense.

@thorvaldspear 7 күн бұрын

@@marcfruchtman9473 You're overthinking it.

@marcfruchtman9473 7 күн бұрын

@@thorvaldspear If I were to look at the amplitude of a wave on an oscilloscope, I would measure it by looking at the "peak" versus the baseline. The question clearly mixes up decreasing and increasing amplitudes on both sides of the image. So, whoever wrote the question was either thinking variation of amplitude, but never wrote down the word "variation", or simply misunderstood it. Either way, the question/answer is not valid.

@DouglasMiles 7 күн бұрын

I dont really see why people assume a NN helps the LLM beyond being a probabilistic distributed hash. Why do you/they think the backprop ever mattered beyond probabilities being normalized? A reasonable common sense assumption is that the LLM is merely recursively best-fitting crystalized tokens. I predict in a couple years we will be laughing at how much extra work could of been avoided with a more efficient probabilistic hashing system.

@MachineLearningStreetTalk 7 күн бұрын

This is true, but the generalisation power of LLMs is (in my opinion) that they implicitly permute many different symmetry-based variations of the training data, so there is a surprising amount to find if you guide the search process. Of course, these permutations will just be "in the neighbourhood" of the data it was trained on, so the space of creativity is still grounded by the source data, the inductive prior used and the search query (the prompt).

@mikes1037 6 күн бұрын

I do not see this as reasoning, this approach is generate x amount of solutions and check what sticks on the wall.

7 күн бұрын

This kind of approach, trying to understand "how human intelligence solve problems" and do the same seems too simplistic. Remembers "symbolic AI". Think about how we solve problems and try to figure out how to make a machine to do the same!! I think "connectionism" still seems much more promising! Give the data, let the system "interpolate" (call what you will) no matter how, and expect for a magic to emerges! The secret of transformers model is the ability to make math connections (induction) and let the linguistic make another layer of human experiences connections (deductive) on top. And this seems to break down the idea that AI is only about hardware computations. Just because the second layer (linguistic) is not math, even being supported by math! Every time we try to explain what intelligence must do, we are in the wrong path! We still don't know what intelligence is! Because we don't know, we cannot think about "intelligence must do this" or "must be like that"! That's my guess!

@mrpocock 7 күн бұрын

Can a solution win ARC even if it can't do anything else?

@benbridgwater6479 7 күн бұрын

Yes - the rules are on Kaggle, but nothing about it needing to do anything else. Obviously there's quite a perceptual component to the challenge, which perhaps favors hybrid approaches (neural perception + symbolic program synthesis/search), but no rules saying they have to be.

@mobiusinversion 7 күн бұрын

ARC creates a lot of work because of domain mismatch

@MrBillythefisherman 7 күн бұрын

Surely the issue for an LLM is that it's just a 1D token predictor (strings) rather than a 2D token predictor (images) required here? I presume someone has tried to turn an LLM into a 2D next token (pixel) predictor?

@user-lb5cp5mw4u 7 күн бұрын

I think models are bad at this kind of test mostly because they are not trained for that. AI researchers were mostly focused on text and even vision models are mostly optimized for object recognition, not recognizing patterns and spatial intelligence. So most of the ARC test difficulty comes from picking the field that is mostly ignored by AI scientists and model progress there is lagging behind. And of course for humans this kind of problem is really easy cause recognizing predators is a high priority task for survival.

@peterkonrad4364 7 күн бұрын

how is an ai model like an llm supposed to know about spatial relationships. here: ABC DEF GHI for an llm this is just a text ABC DEF GHI. how is it supposed to know that B is "above" E. or that if you go diagonally from A towards E, you end up at I? or that A is two to the left of C? it has never learned about that. it has no collection of features and representations and vector embeddings that say "to the left of" or "above" or "two steps further" or "diagonal" or "straight" or "inside" or "outside" (E is "inside" of B D F H), etc. at least not a text model. an image model or a video model maybe. or a multimodal model. but the llm would have to have access to these features. and it would need a "grid mode" and run in that whenever a problem with a 2d grid comes up. or take for example connect 4. draw a 6 by 7 grid. the discs fall "down". how is the llm supposed to know, what down is? it doesnt know, that my monitor is standing upright on my table and for me it looks like down, gravity is switched on, etc. these supposed super-intelligent models cant even count the correct number of discs on a board! training models like these, would go like this: you come up with thousands of example grids and let humans label them, what they see there. for example: 5 by 5 grid with three blobs. red blob is to the left of the blue blob. the red blob consists of 3 pixels and the blue one of 5. and so on. but you can already see, that this is not so easy as labelling cat and dog picutures! there is much more detail to be described. it is much more abstract. the grid is already an abstraction from the real world we live in for us humans which is much more smooth! that's the problem.

@drdca8263 5 күн бұрын

From the newline character, and lots of examples in the training set?

@xthesayuri5756 8 күн бұрын

Someone already got above 50% 😂

@takamotoyagami4222 7 күн бұрын

Did someone actually above 50%

@Matt-yp7io 7 күн бұрын

yeh not adhering to the rules of the competition though ... and not validated on the actual test set

@television9233 7 күн бұрын

18:00

@AChonkyBird 7 күн бұрын

my jippity got 110%. Amazeballs.

@wwkk4964 7 күн бұрын

Yes, it will be 100% soon. Then we will know that arc test was nothing like what it would take get to what these people think AGI is supposed to do.

@bjy789456 7 күн бұрын

You’re ruining the dataset by explaining your reasoning which the LLMs are now gonna be training on.

@Mkoivuka 7 күн бұрын

That's fine, this is the public dataset. The method described has not yet been tested on the private dataset (where the method described wouldn't be allowed)

@Perry.Okeefe 7 күн бұрын

Man, I went from watching a yudkowsky lecture to this... spooky.

@mikezooper 7 күн бұрын

Just need an academics brain in a jar, linked up to a Raspberry Pi. Full AGI solved. Only joking! Amazing talk.

@twirlyspitzer 7 күн бұрын

A General Intelligence algorithm is not that intractible after all...

@tigran.aghababyan 8 күн бұрын

I doubt some Bushman can pass ARC-AGI test. Does it mean he is not intelligent? I think Francois Chollet really underestimates our experience when he talks about our ability to resolve a "new" task.

@TenFrenchMathematiciansInACoat 8 күн бұрын

Do you have a better test? Or any evidence a random tribal person couldn't perform on this test?

@divineigbinoba4506 8 күн бұрын

I'm sure cave men would be able to pass it, If not they wouldn't have survived.

@tigran.aghababyan 8 күн бұрын

@@divineigbinoba4506

@tigran.aghababyan 8 күн бұрын

The average IQ of bushmen (someone like cave men) is estimated by Richard Lynn at 55. And they have survived.

@divineigbinoba4506 8 күн бұрын

Of course they won't pass all the test but they'll surpass current LLM... The caves men core knowledge would be less than ours. If you looked at some of the ARC test, it's mostly pattern recognition and nothing special.

@naninano8813 8 күн бұрын

glad this is being reported on. LLMs are overrated.

@ckq 8 күн бұрын

LLMs are far more generally intelligent than any other model (say image, audio) since the training data on language has more knowledge. If you think about image/audio generation, they are essentially specialized with less intelligence due to the large amount of data but minimal knowledge stored in that medium. I think what's better than an LLM would be a logic machine that prioritizes truth over plausibility.

@therainman7777 8 күн бұрын

@@ckqWe learned along time ago that symbolic AI just isn’t going to work so the sort of logic machine you’re describing is unlikely to ever work. Unless you mean something else by it.

@nikbl4k 7 күн бұрын

i would call it a logic machine that has *some* priority, and for us to continue exploring what the best way to think about those are,

@TheManinBlack9054 7 күн бұрын

Doesn't really seem overrated now that the LLMs have managed to beat this and become a new SOTA, does it?

@mikezooper 7 күн бұрын

This gets tiresome. LLMs are intelligent at certain things and awful at others, just like ALL humans.

@ps3301 7 күн бұрын

If there are tons of human trying to solve this arc challenge, and we still can't find the solution, at least we can prove human isn't so intelligent after all.

@nellyx8051 3 күн бұрын

When AI solves the ARC challenge we will conclude it wasn't a measure of intelligence. Same story for the last 50 years.

@MachineLearningStreetTalk 2 күн бұрын

If it turns out to be possible to cheat/shortcut it, absolutely. Dileep George speaks about the "perceptual leakage" of ARC here substack.com/home/post/p-145553885 - it's only a good benchmark if it can't be gamed, and I agree it probably will be eventually

@djcardwell 5 күн бұрын

are all AI researchers this depressed???

@InfiniteQuest86 7 күн бұрын

This whole thing makes me pretty angry. I guess this type of gaming the system is always going to happen when money is involved. They tried to game the system to win in pretty bad faith in my opinion. The ARC challenge clearly states that the purpose is to find new ideas, which means if you try to use an existing thing like an LLM you should be disqualified (even if you contorted yourself horribly with crazy prompt engineering (which it sounds like they did), then you still are going 100% against the spirit of the competition). Secondly, training it on extra examples to try to memorize is strictly against the intent of learning from a few examples. Yes, you can do that, but that is not the point of ARC. Also, should be disqualified. I can't tell whether they know what they are doing and are just trying to protect themselves, or if they really believe their own BS. Basically, if you can't do it the way it was intended to be done, then you are doing it wrong. You need to come up with a new idea. Aka, the literal point of the competition. They didn't put down so much money so you could just use an LLM. Duh, of course people can try to do that. No one cares. They are trying to find new ideas.

@damirelsik4996 8 күн бұрын

F I R S T

@holyclock 4 күн бұрын

another CAPTCHA ugh.

@myspace5671 7 күн бұрын

one of the stupidest thing ever. give computer a test designed by humans then say humans do better at it. wtf

@mrpocock 7 күн бұрын

That's what people used to say about playing chess.

@drdca8263 5 күн бұрын

Who else do you suggest design the test?

@user-bt2dd4hj6e 7 күн бұрын

gosh i wish the guests didnt speak like they are sleep-deprived for days. I am sure they are all brilliant people but to listen to them is another story urgh...

@AkhilBehl 6 күн бұрын

@0xmassive526 12 сағат бұрын

never touched machine learning, don't know what a tensor even is (just seen it as a class in some machine learning code on twitter), but 35 mins in the video and I dont feel lost. you bet im subbing.