New ARC-AGI high score! 55.5% (MindsAI)

Рет қаралды 10,657

Machine Learning Street Talk

Күн бұрын

Пікірлер: 39

@ginogarcia8730 Ай бұрын

why it gotta sound like a montage before something happens in the Terminator haha

@RickySupriyadi Ай бұрын

well when it reach 100% the music will change

@CodexPermutatio Ай бұрын

Wow! 55.5% so soon.

@szebike Ай бұрын

@NickWilliams-v3q The censorship both by youtubes brainless bots and some channel owners became insane.

@7TheWhiteWolf 5 күн бұрын

88% now.

@deadeaded Ай бұрын

I'm sorry, but the slow zoom at 4:04 onto Francois wearing a backpack is hilariously dramatic.

@davidrichards1302 Ай бұрын

Now now... be nice. It was poignant. Hilariously poignant.

@Low_commotion Ай бұрын

@@davidrichards1302 On the contrary, I agree with OP but I loved that it was

@davidrichards1302 Ай бұрын

@@Low_commotion 🙄 There goes the souffle.

@opusdei1151 Ай бұрын

This feels like a movie trailer. 🙂

@opusdei1151 Ай бұрын

@NickWilliams-v3q yeah the censoring is coming from youtube. My comments also get censored sometimes. I don't know how much is public. But you can see there submission on kaggle

@opusdei1151 Ай бұрын

@NickWilliams-v3q kaggle

@TimeLordRaps Ай бұрын

I feel an oversupply of talent, and if you can maximize on that through your channel, you're doing the right work as has seemed to always have been the case. Happy to have been subscribed for so long.

@KevinKreger Ай бұрын

Mind AI is awesome! Thanks for sharing. Great video. I'm working on test time fine tuning🎉

@binig.4591 Ай бұрын

i wonder how task-specific these solutions are.

@ievutec Ай бұрын

Is there any eta on the categorical deep learning (Symbolica and Deepmind folks)? Been waiting on it for a while:)

@MachineLearningStreetTalk Ай бұрын

Sorry for the delay on that, focus over the last months has been scaling up my editing team and processes, there are a few older videos like that one which are far too complex for my new editors to work on. I've been maxed out just scaling up the new process and travelling doing interviews. I am hoping to get it out the door before I go on holiday next weekend, wish me luck!

@quantumspark343 Ай бұрын

Cant wait for agi

@szebike Ай бұрын

As fars as I know Minds AI was an approach which tried to brute force the ARC benchmark by incorporating all the prior attempts and correct answers of other bots in a training dataset like fashion. So its essentially not why arc was created nor does it hint to any improvement of that benchmark performance.

@Bidam682 Ай бұрын

The fact that it works by brute forcing either means that AGI can be achieved by brute forcing or that François Chollet's hypothesis are wrong. The test is supposed to measure the ability of the models to tackle tasks they have never seen before. So the problems in the private dataset are supposed to be different enough to measure this, according to Chollet's understanding of what constitutes intelligence. If by learning on all prior attempts you can solve the private datasets then in theory it means that current transformer models CAN achieve Chollet's definition of true intelligence by just scaling. Of course it could also means that the test is not good enough, but it shouldn't be a concern for the competitors.

@szebike Ай бұрын

@Bidam682 No I disagree, brute forcing only gets to slightly above 50% which is not a succes by any means. Also Chollets point is that any intelligent system can abdtract and synthecise new approaches with only a general dataset. If you stuff a model with solutions you only can get to a certain point and you will never be able to pluck all holes especially in the real world all data is ambiguous and requires new knowledge generation "on the fly" like Chollet describes it to act properly. We cant produce datasets for all possible outcomes.

@factorousfactorous3522 Ай бұрын

Yes, it doesn't feel like a real improvement in reasoning.

@paultparker Ай бұрын

Can you substantiate this?? You have seen Chollett's talk from AGI, right? "It's Not About Scale, It's About Abstraction" is the title of the video, posted by Machine Learning Street Talk. MLST says MindsAGI sponsored that video (which is a must see IMHO, but contains no reference to MindsAGI), and MindsAGI has a post where they are recruiting. Perhaps they are trying to deceive, but their post says "MLST is sponsored by Tufa Labs: Are you interested in working on ARC and cutting-edge AI research with the MindsAI team (current ARC winners)? Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2." System 2 reasoning does not sound like brute-forcing the solution space. However, they may be "brute forcing" the problem space by exploring the domain of possible problems and solutions in order to do deep learning on that domain. If I understood Chollett's talk correctly, he does not object to this as part of the creation of a hybrid system--essentially it is a way to do deep learning that is focused on the particular space that is the domain of ARC, just as LLM's are focused on the space of language. For ARC he deliberately chose a domain that LLM's would not do well at, but that does not mean you cannot create a system that is domain specific as part of an acceptable solution IIUC. If such domain-specific "brute forcing" is _all_ they are doing and they are not using the kind of hybrid approach Chollett believes is necessary, then it will be interesting to see what he says when they reveal their solution and code, which I understand they have to do in order to win the yearly prize. They may decide to withdraw their submission rather than do that, but currently it is still the top submission on the official leaderboard, which says the leaderboard is closed and they are evaluating the solutions.

@szebike Ай бұрын

@paultparker I may edit this post but I saw it in an podcast earlier on another channel. I may rewatch it but if I remeber cortectly I think it was like " it could use the answers so far given to finetune a model " ( which means they would brute force it by implementing solutions into a new trainingdataset and contradicts the synthesis requirement) and also they could submit multiple times.

@que_93 Ай бұрын

His voice is so deep-- much deeper than the methodology he has used to achieve this goal.

@danielschwegler5220 Ай бұрын

@henrytopia721 Ай бұрын

yes I love arc test

@XShollaj Ай бұрын

Wondering what the full approach is. Did they simulate billions of similar problem synthetic datasets and trained the model based on that? Probably some RL too irt

@benhermans8214 Ай бұрын

It's great they got such a high score, but it matters how they did it. Michael seems to be in denial about this.

@10NightLord Ай бұрын

I'm not convinced by this company's claims. Since they haven't published any credible/open research papers to substantiate their results, especially when compared to leading AI companies. It raises questions about their true focus-are they genuinely advancing toward useful AGI, or is the goal just to win a prize?

@atrocitasinterfector Ай бұрын

cool!

@psi4j Ай бұрын

Yeeeeeeerrrrr 🎉

@mennol3885 Ай бұрын

One guy, Nick Williams spammed all comment with a comment that his comments keep getting deleted. All while using multiple accounts. So I did the logical thing, reported his comments as spam. If you were wondering what happened to his comments, then apparently more people did this.

@MachineLearningStreetTalk Ай бұрын

Yes I just noticed this, I didn't delete his original comment - bot spam is a massive problem on KZbin right now and I assume they have had to increase the aggressiveness of automated deletions, especially if you post a link, post from a new account, bot accounts typically have the format "Name-123xzy". It wouldn't surprise me if Nick's account was a bot account. But now that Nick has spammed all the comments section on this video - I have banned his account, this is what he posted (not trying to censor him) "My comments get censored, no need to continue pooping on my human rights, and invading my privacy through data theft. Original post: They have the right idea, I like this direction of development. I would like to see deeply into their process to see where I can find inefficiency and offer up improvements." That said, we do get a lot of grossly offensive comments on MLST YT, and we always ban them immediately. We welcome constructive criticism of any kind, but if folks are blatantly trolling, being extremely offensive and basically "taking a shit on our front lawn" we won't hesitate to ban. These folks are welcome to criticise us publicly from their own no-name social media accounts.

@ronilevarez901 Ай бұрын

"Training at test time" Sounds like cheating 🤔

@Daniel-Kramer Ай бұрын

training at test time is what we want, that's what human beings do. current LLMs are static. they literally can't learn. that's not very intelligent of them

@Bidam682 Ай бұрын

AGI is the ultimate cheat I agree. It cheats by using its reasoning capabilities.

@dennisestenson7820 Ай бұрын

Then people are masters at cheating.