Q-Star 2.0 - AI Breakthrough Unlocks New Scaling Law (New Strawberry)

  Рет қаралды 133,406

Matthew Berman

Matthew Berman

Күн бұрын

Пікірлер
@avi7278
@avi7278 Ай бұрын
Mon: AI is doomed, Tues: AGI is here! .... My god.
@SimonNgai-d3u
@SimonNgai-d3u Ай бұрын
AI news is basically cypto trading
@drwhitewash
@drwhitewash Ай бұрын
Agi isn't here, just hype for investors.
@itsARSX
@itsARSX Ай бұрын
yeah all these youtubers are just spitting out every news they see 🤢
@ImranHossain-by6nk
@ImranHossain-by6nk Ай бұрын
The only thing scarier than AI companies are AI KZbinrs
@とふこ
@とふこ Ай бұрын
2010s "bitcoin to moon" 2020s "AI to moon"
@alcott122
@alcott122 Ай бұрын
The concept in "Mastering the AI Money Game" book completely explains market. Trend is changing with AI
@lukag3155
@lukag3155 Ай бұрын
thanks for sharing that man
@flop-oe5mr
@flop-oe5mr 15 күн бұрын
Inward
@ethans4783
@ethans4783 Ай бұрын
If the entirety of the internet is not enough data to improve your model, then maybe its an architecture problem. And this looks to be a novel approach to try and overcome that, awesome!
@antonystringfellow5152
@antonystringfellow5152 Ай бұрын
The aim here is to create an AGI. Using more data during training does not move a model towards AGI at all. It increases the knowledge base and knowledge is not intelligence, it's just information. Put another way, you can no more create intelligence using data stored in binary code than you can by carving 1s and 0s onto rocks. Intelligence is the ability to understand, not recall stored information, and this requires the ability to form useful models and make predictions based on those models.
@mrd6869
@mrd6869 Ай бұрын
It's is an architecture problem.Does your brain need the entire internet to train? Nope. Its more capable and super energy efficient. They just need to research a new layout
@TravisChalmers
@TravisChalmers Ай бұрын
"MAYBE" 😂
@guilhermehx7159
@guilhermehx7159 Ай бұрын
Exactly. Its funny how no one talks about architecture. Only scaling, energy and data
@codelapiz
@codelapiz Ай бұрын
Its a bit more complicated. Your brain is sort of pretrained on billions of years of labeled data(reproduced or not). Its only finetuned on your experiences in your life. And even they are very abundant and diverse. And you got billions of extremely well labeled datapoints by watching peoples emotions as you grew up. Lets not pretend humans magically get this intelligent. A lot of us dont even get intelligent at all. And if we lived in isolation our whole lifewe would not be nearly as smart as even gpt 3. no matter how much internet we read.
@gnosis33-real
@gnosis33-real Ай бұрын
I don't think we really have an equivalent until we create a model with online learning vs Pre-trained. It needs to be able to continue to update the network with every interaction.
@mikeschwarz4588
@mikeschwarz4588 Ай бұрын
It feels like this is a step in that direction… right?
@adolphgracius9996
@adolphgracius9996 Ай бұрын
Chill out, that's how you get skynet, hell naah
@gnosis33-real
@gnosis33-real Ай бұрын
@adolphgracius9996 😅
@gnosis33-real
@gnosis33-real Ай бұрын
@@mikeschwarz4588 id agree with that
@Mihalk777
@Mihalk777 Ай бұрын
connecting AGI to the internet what could go wrong
@NoHandleToSpeakOf
@NoHandleToSpeakOf Ай бұрын
This relies on the set of transformation rules that preserve the underlying concept. Coming up with such set is far from trivial. They tricked the benchmark with hand-crafted set but this does not scale.
@GrindThisGame
@GrindThisGame Ай бұрын
I was wondering how the "generated" similar test set examples without knowing exactly how each one is solved. Seems like overfitting.
@MaciejTrebacz
@MaciejTrebacz Ай бұрын
This is true, and it feels like cheating. At the same time the idea of fine-tuning the model on the fly based on the inputs sounds very promising and I bet someone will come up with a more generalized way of doing that.
@d_b_
@d_b_ Ай бұрын
@@MaciejTrebacz If the state was held, maybe? If its not catastrophic to the rest of the models abilities. Still not understanding what the use of it is if the weights are reset after inference.
@wurstelei1356
@wurstelei1356 Ай бұрын
I think: The reason why you have to open source is so they can see, if you cheat. BTW OpenAI o1 and Claude shouldn't be on the list for that reason.
@heyman620
@heyman620 Ай бұрын
@@GrindThisGame Ho, they know. For example, they know how you can rotate it. Embarrassing hacking, 0 value for this work. This benchmark does not work, although the idea is nice. The last winner was some stupid search and heuristics if I recall correctly, zero "intelligence". This idea clearly works only for toy problems.
@tristanreid5770
@tristanreid5770 Ай бұрын
While I'm excited about the idea of TTT, the whole idea of the benchmark is that it's trying to test ability to generalize - specifically the ability to solve a problem you haven't seen before and thereby don't know the domain. What this method is doing is generating new test data by applying geometric transformations - but that's because the writers of this paper knew that those puzzles remain solvable under those geometric transformations! They're thereby leveraging domain knowledge to give the LLM an edge. I'm not saying I don't think it's a valuable approach - just that this paper more exposes a weakness in the assumptions of the benchmark rather than definitely demonstrating that this approach leads to generalization.
@Ikbeneengeit
@Ikbeneengeit Ай бұрын
Thanks for explaining this. I was indeed wondering how the model can produce correct synthetic data if it doesn't know the solution to the question already.
@HansKonrad-ln1cg
@HansKonrad-ln1cg Ай бұрын
here's a conversation i just had with chatgpt: i have a simple problem here: peter has 8 apples. sue has twice as many apples as peter. how many apples does sue have? i want you to list all the things that could be changed in this problem such that it remains the same problem. chatgpt: Sure, let's explore the variables that can be modified while keeping the core structure of the problem intact: Names of the People: Change "Peter" and "Sue" to any other names. Example: John has 8 apples. Mary has twice as many apples as John. Number of Apples: Change the number of apples Peter has, and adjust Sue's apples accordingly. Example: Peter has 10 apples. Sue has twice as many apples as Peter. Item: Change the apples to any other item. Example: Peter has 8 oranges. Sue has twice as many oranges as Peter. Ratio: Change the ratio from twice to another multiplicative relationship. Example: Peter has 8 apples. Sue has three times as many apples as Peter. Context: Change the context or setting in which the problem takes place. Example: Peter found 8 seashells. Sue found twice as many seashells as Peter. Time Element: Add or change the time element without altering the fundamental relationship. Example: Yesterday, Peter collected 8 apples. Today, Sue collected twice as many apples as Peter. By adjusting these elements, the basic structure of the problem-one person having a certain number of items and another person having a multiple of that number-remains unchanged. Anything else you'd like to explore?
@corvox2010
@corvox2010 Ай бұрын
@@HansKonrad-ln1cg Just because I dont understand the patterns I see, doesnt mean I can solve something I never solved with similar examples. Sooner or later will spot the pattern.
@HansKonrad-ln1cg
@HansKonrad-ln1cg Ай бұрын
the benchmark doesnt assume anything, it has no weakness. if you come up with anything that solves the private, closed set, you are done. there are no other rules. you can put in as much domain knowledge that you think you might have no problem. your solution doesnt even have to be intelligent or ai or smart or anything. it just turns out, that it is not so easy.
@Sven_Dongle
@Sven_Dongle Ай бұрын
Yeah, they used specific sets of geometric transforms to game the system as described in the appendix, hardly a "general intelligence" approach it seems: Table 3: We provide the augmentations use in our TTT procedure with their function signature and description. Augmentation Name Description Rotate(90) Rotates a grid 90 degrees. Rotate(270) Rotates a grid -90 degrees. Rotate(180) Rotates a grid 180 degrees. Flip(0) Flips a grid horizontally Flip(1) Flips a grid vertically Reflect(0, reverse=True) Flips a grid horizontally and prepend to the left of the original grid Reflect(1, reverse=True) Flips a grid vertically and prepend to the above of the original grid Reflect(0, reverse=False) Flips a grid horizontally and append to the right of the original grid Reflect(1, reverse=False) Flips a grid vertical and append to the left of the original grid RandomTranslateXY() Shifts a grid randomly both in horizontal and vertical directions. The maximum shift size is 4 Transpose() Reflect a grid on diagonal IncreaseResolution(2) Upscale the grid by interleaving elements in both horizontal and vertical directions IncreaseHeight(2) Upscale the grid by interleaving elements in vertical direction IncreaseWidth(2) Upscale the grid by interleaving elements in horizontal direction Chain([Rotate(90),IncreaseResolution(2)]) Sequential application of Rotate(90) and IncreaseResolution(2) Chain([Rotate(270),IncreaseResolution(2)]) Sequential application of Rotate(270) and IncreaseResolution(2) Chain([Rotate(180),IncreaseResolution(2)]) Sequential application of Rotate(180) and IncreaseResolution(2) Chain([Flip(0),IncreaseResolution(2)]) Sequential application of Rotate(180) and IncreaseResolution(2) Chain([Flip(1),IncreaseResolution(2)]) Sequential application of Rotate(180) and IncreaseResolution(2) Chain([Transpose(),IncreaseResolution(2)]) Sequential application of Rotate(180) and In
@nilsmach6399
@nilsmach6399 Ай бұрын
What they’re doing is simply adjusting a generalized model on the fly to fit the problem domain. After this, the model has seen thousands more geometric figures, and that’s enough to solve these toy problems. But this doesn’t take us any further toward solving a more complex problem, especially if there’s more to it than just understanding the domain language. The result will be essentially the same as training the model specifically for the test problems beforehand.
@leonwinkel6084
@leonwinkel6084 Ай бұрын
I agree but the main point here is the technique in my opinion. Adjusting an llm on the go is exactly the same thing our brain does. So the technique used here is surely one step further towards agi.
@consciouscode8150
@consciouscode8150 Ай бұрын
I think it's more of a first step. "Training" using synthetic data from geometric transformations is a discrete approximation of visual reasoning in the mind's eye, so the next step would be to take this technique and train a model to create the sorts of transformations which yield better predictions.
@cyberprotec
@cyberprotec Ай бұрын
This technique sounds like fine-tuning to me, though.
@nilsmach6399
@nilsmach6399 Ай бұрын
@@consciouscode8150 This is my point: If they could somehow determine which transformation leads to better results given the problem, then both of you would be right, and this would be a significant step toward AGI. However, what they do instead is simply multiply the relevant area of the model without actually changing this area to direct it toward a solution.
@pon1
@pon1 Ай бұрын
You are right, but the toy problems have been a real pain in the bottom. It can do the most advanced stuff, but stumbles on the most basic. It's a step in the right direction, now we just need to make it learn permanently, and that is much more difficult, because the weights already there can't simply be replaced with the "lora" that it has learned, cause that would break other things. It's a difficult problem, but I believe it can be solved.
@vio_tio12
@vio_tio12 Ай бұрын
love these paper reviews with a simplification of concepts. Keep doing them please!
@Justin_Arut
@Justin_Arut Ай бұрын
I can see this utilized as scaled up MoE. With multiple agents, each of which has self-trained in this manner in its own domain, a central manager could call up the appropriate expert for a given task, for increased overall efficiency.
@alfredomoreira6761
@alfredomoreira6761 Ай бұрын
This is just doing Lora for each test and called it TTT. It's too much compute and they don't publish the ttt time to train and asjust Lora parameters. It's a setup that only work to pass ARC.If ARC has scoring based on time to infer (like IQ tests), this technique would get bad results. In the end it's not practicable.
@evidenceX
@evidenceX Ай бұрын
Soon we going to have insane models open source
@Tetsujinfr
@Tetsujinfr Ай бұрын
Cool, but how does the model know which transform to apply at training - test time? Looks to me the model developpers knew which transforms are relevant to achieve good results for ARC, so in a way they supervised the training. Now maybe those transforms generalise beyond problems like those in the ARC benchmark, but I do not this they have proved this in the paper. Do I miss something?
@rolandwfleming
@rolandwfleming Ай бұрын
I was wondering exactly the same ...
@ppbroAI
@ppbroAI Ай бұрын
I think they use the vector representation of the problem and apply transformations trying to hit a sweet spot
@Tetsujinfr
@Tetsujinfr Ай бұрын
I would probably need to read the paper though tbh. Since the on-the-flight via lora training at inference time seems very powerful. Like maybe the model could infer dynamically the best type of transforms to apply at inference training time depending on the input problem. Probably the core of the training would actually be to best predict those types of transforms to apply depending on the problem. I would need to look at the inference run time as well, looks extremely costly to train and to run inference to me, at first glance.
@jtjames79
@jtjames79 Ай бұрын
​@@Tetsujinfr I can see the value in a super slow, super expensive AGI. It's not like human experts are cheap or fast. I used to wait several hours just for boobies at 14.4 kbps. In fact, it taught me patience and the value of using your time wisely, that kids just don't have these days. Since I prefer to be involved in every step of any process, even if I just have to trust that the AI knows what it's doing. Going slower works for me. First it was compiling, then it was rendering, now it's inferring. Any excuse for a Nerf gun fight.
@xthesayuri5756
@xthesayuri5756 Ай бұрын
The really smart models like o1 or Sonnet might be able to come up with augmentations themselves and then train themselves.
@Jayk-56
@Jayk-56 Ай бұрын
Notes on this 'breakthrough', they only improved results slightly on a paper published Nov 4th 2024 by Cornell University, both of those papers are part of paper submissions to win cash prizes (you don't need to beat ARC to get the prize, just have the best paper/method), like others mentioned in the comments this is not easily generalized since they had to use some domain knowledge in crafting the LoRA adapters. Progress for sure, but contextually not unique
@alfredomoreira6761
@alfredomoreira6761 Ай бұрын
There should be a penalty on ARC challenge for long inferrence time. These methods would not see the day of light
@entreprenerd1963
@entreprenerd1963 Ай бұрын
@@alfredomoreira6761 - there's a time cut-off on specified hardware. The folks who did this paper didn't fit within those limits, that's why they don't appear on the actual ARC leaderboard at the ARC website.
@Curlack
@Curlack Ай бұрын
One thing they need to incorporate is when input does not have a high confidence level understanding or could be ambiguous, the AI shouldn't feel the need to decide what is meant and just answer. It should ask questions to clarify and when the confidence is high enough, perform the task. That's how a teacher would determine what a student doesn't understand and adapt his input accordingly.
@WaltWhite71100
@WaltWhite71100 Ай бұрын
Test time training sounds like an important step towards in context training. What we need is the ability for people to explain during runtime, how a model can improve its performance. When I’m using my cars autopilot for example, I’d like to be able to vocally tell the autopilot what I like and don’t like about its performance, similar to what I would do with a student human driver. This new information should get incorporated into the autopilot’s future actions immediately. This type of test-time-training, as well as the use of reasoning, and using ‘guard rails’, is what humans do every day as we learn new skills.
@LukeKendall-author
@LukeKendall-author Ай бұрын
I think one aspect of animal intelligence is spatial reasoning applied to perceived images (e.g. used in navigation for exploring to find food and returning to the nest). This mode of pattern recognition and prediction is a foundation layer for us, but only laboriously and indirectly approximated by LLMs etc.
@AV_YOUTUBE_202X
@AV_YOUTUBE_202X Ай бұрын
It's a great idea. For a while we've had the situation where we've got these transformer networks but they're not being used to the max of their potential capabilities. _"Let's just put the problem in K/V context and pull the next token probabilistically and be done"_ has been the status quo. Well, turns out that's a pretty sub-optimal way to use these networks. The ARC benchmark is a "follow these examples" challenge. So, these guys are saying _"per problem, let's generate a LORA from the problem examples, then use that LORA."_ It's a great idea. In general it's following the principle that _"minds should focus specifically on the problem at hand."_ When a human being works on a hard problem, they forget about other things and lock in! That's what this LORA is doing. It is de-prioritizing broad abilities and "focusing in" on solving the problem at hand. Is just a good thing to generally notice, to notice that AI ought to "focus" while working.
@jason.arthur.taylor
@jason.arthur.taylor Ай бұрын
This is brilliant research. Hats off to Ekin Akyürek, Mehul Damani, Linlu Qiu, Han Guo, Yoon Kim, and Jacob Andreas.
@HansKonrad-ln1cg
@HansKonrad-ln1cg Ай бұрын
the ability to continuously learn is crucial for real intelligence. the model can adapt and have a short term memory outside of its context window. very well presented paper. the fact that they produced such a huge jump in the arc challenge needs to be brought to public attention so everyone can draw their conclusions from it.
@En1Gm4A
@En1Gm4A Ай бұрын
Thx for paper breakdown. Much appreciated
@tomdfluter
@tomdfluter Ай бұрын
Great breakdown of the paper and I love you enthusiasm, thanks again 👍🙏
@matthew_berman
@matthew_berman Ай бұрын
Glad you enjoyed it!
@WanderingJoy
@WanderingJoy Ай бұрын
Yes - very excited to see this paper. Glad you’re getting the word out
@robmaiale652
@robmaiale652 Ай бұрын
Super helpful, thanks a bunch for explaining the concepts
@d_b_
@d_b_ Ай бұрын
Feels counterintuitive that this would work better than other methods, if I am understanding it. So its like in-context learning, but the model parameters change with its own self-generated samples? How does this fare against a self loop, or what I guess is called reflection?
@jimbo2112
@jimbo2112 Ай бұрын
The state of AI reminds me of a spoiled rich kid at Christmas. They open presents and barely play with each one before being presented with a new one. They don't know how the previous toy worked nor did they explore all the options... 1/2hr goes by without getting anything new and they question the validity, fun and usefulness of all their presents. We have never seen a cadence of tech innovation like this and we are barely exploiting any of it before something new comes along to catch our eye. We need to be more realistic about the situation, stop doom-mongering about the slowed pace of AI innovation and accept that we already have enough toys to change our lives forever - knowing that there will be more significant advances in the near future.
@VanSocero
@VanSocero 24 күн бұрын
Great vid. I'm on deep seeknnow testing it against some of my use cases. Keep up the great info. Happy 🦃 day..
@SuperYapayZeka
@SuperYapayZeka Ай бұрын
It's amazing that more than a hundred thousand people watch such technical topics.
@justfoundit
@justfoundit Ай бұрын
I have a feeling that we'll need a more definite score of response-goodness to achieve AGI. But this direction is promising. Math problems are easier, probably coding too, where you can define goodness of your response.
@antoniobortoni
@antoniobortoni Ай бұрын
Instead of relying on constant calculations, these systems work by memory, much like how a person recalls math answers rather than solving them every time. Imagine a software that repeatedly performs the same calculations-why not replace those with precomputed memory? Using a chart or lookup table with all the answers stored, the computer could retrieve results instantly instead of performing the operations each time. The chart doesn't even need to be perfectly precise-approximations are often good enough to solve real-world problems efficiently. This approach could revolutionize computing by reducing complexity while maintaining functionality, especially for applications where 'good enough' physics or outcomes suffice.
@technocorpus1
@technocorpus1 Ай бұрын
ah, just sipp'n my coffee and watch'n some ai news.
@phoebelovesYOU
@phoebelovesYOU Ай бұрын
theres a bomb in your house!!!!!
@saisrikaranpulluri1472
@saisrikaranpulluri1472 Ай бұрын
Yes it is fascinating, good work 👍
@francknouyrigat9809
@francknouyrigat9809 25 күн бұрын
Fascinating, if I got it correctly it still qualifies as synthetic data, but things like inversion means there should be some kind of conservation of information / symmetry present in the data for it to work. Really cool!
@JBLU7
@JBLU7 Ай бұрын
Great Video!
@libiroli
@libiroli Ай бұрын
This is not a breakthrough. It's just more of the same. We are stuck in the current paradigm until LLMs are paired with a UNIVERSE model (math, physics, spatial), which will enable actual reasoning.
@aviralapurva
@aviralapurva Ай бұрын
Love the content!
@patrickzupanc1795
@patrickzupanc1795 Ай бұрын
Great video, thank you!
@TheodoreGelber
@TheodoreGelber Ай бұрын
When you modify parameters to solve certain kinds of problems after standard training , won't that make the model worse at other kinds of problems ? The fine tuner may not care about that, but it seems you can only use that tuned model on those problems related to the fine tuned training set.
@williamal91
@williamal91 Ай бұрын
you are a treasure Matthew
@andrewlewin6525
@andrewlewin6525 Ай бұрын
I’m excited, that you’re excited 😆
@David-tp7sr
@David-tp7sr Ай бұрын
Yes, this is important. Thank you for covering in depth.
@luisalfonsohernandez9239
@luisalfonsohernandez9239 Ай бұрын
Amazing video Matt, very promising approach indeed. Ultimately what most people think is AGI will probably be a spatial first approach, like what Dr. Fei Fei Li proposes but TTT/active inference will definitely be one component for adaptability
@En1Gm4A
@En1Gm4A Ай бұрын
finally understood Lora - like saw its effects and visual representation in front of me
@ScottLahteine
@ScottLahteine Ай бұрын
It’s not exactly recognizing and applying analogies, but it’s getting closer!
@JamesRogersProgrammer
@JamesRogersProgrammer Ай бұрын
You can build a workflow to take a prompt, brainstorm on it, review the ideas and then generate a response based on all that.
@pon1
@pon1 Ай бұрын
This is what we needed, I also think that we need memory, maybe some way to store the real-time learning that TTT achieves. It seems wasteful to just drop the insight, but how to update the weights to make good insights permanent (fixing the LORA into the model itself) maybe is more difficult than it seems? Or it could mess with the already good weights for other things. But yeah, we need to find the solution to that to progress even further. We do have the answer, our brains, but we don't fully understand it yet or how to make it compatible to our digital version. We need to draw insight from neuroscience, it won't be the same, they are two different architectures (our brains and AI), but I believe that there is a way to make them compatible with eachother. That will not only spark superintelligence, but make US superintelligent as well.
@ronilevarez901
@ronilevarez901 Ай бұрын
Sone paper declared that Lora overrides pretraining knowledge. But I'm sure they'll figure it out.
@ronilevarez901
@ronilevarez901 Ай бұрын
Btw,we wont become super intelligent unless we merge with AI.
@pon1
@pon1 Ай бұрын
@@ronilevarez901 Yeah, but that's what I think will happen, first we will make it compatible with our brains, then merge with it. Would solve all problems that cold arise with AI also, like alignment.
@Suro_One
@Suro_One Ай бұрын
@MatthewBerman The arc prize can be seen as a challenge of predicting the next pixels. You can scale it up to have it generate the next logical move in a game of chess, a task on your PC or a realistic game or AI OS.
@ScottzPlaylists
@ScottzPlaylists Ай бұрын
"Predicting the next pixels" is nothing like the ARC puzzles. For each one, is you have to come up with the correct set of rules to transform the inputs into the outputs -- and to find the right blocks of "pixels" that a given rule should transform into the output. Every puzzle is a different set of rules you have to figure out. Sometimes a rule you make for Example 1, won't work for Example 2 or 3 or 4, and you have to modify your rules to fit All examples as you go through them. It's easy to make a bad rule that won't work for the other examples. I've manually written python code that solves about 20 problems so far, and I've built up a library of primitives to re-use code. There is always custom new code functions you have to create for each problem because each puzzle set is unique.
@akhildraksharapu
@akhildraksharapu Ай бұрын
Bro is back with Namaste 🙏
@tom-et-jerry
@tom-et-jerry Ай бұрын
Given that in the Arc Prize tests there are few training tasks per category, during the inference test the model has every interest in carrying out micro training on the 3 tests presented in order to make a good quality inference, that's why TTT is much better than other competitors.
@andrewnewman398
@andrewnewman398 Ай бұрын
Test data is held out to provide an unbiased estimate of model generalization. TTT effectively creates a custom model for each test query, “…we first generate training data 𝒟 TTT (d input) from the test inputs…” It takes 12 hours/100 tasks. Isn’t this just overfitting with more steps? How is this generalised? It seems only useful as an optimisation for a known test.
@alfredomoreira6761
@alfredomoreira6761 Ай бұрын
💯
@Let010l01go
@Let010l01go Ай бұрын
Yes, you are absolutely right but what is the underlying logic/algorithm or whatever of the time-based training on synthetic data? Greate E.p❤
@A_Me_Amy
@A_Me_Amy Ай бұрын
Scale up with prompt time too, "think step by step, and offer a counter to the best solution you come up with. Weight these two options." This kind of thing.
@En1Gm4A
@En1Gm4A Ай бұрын
This should be trending
@SeregaZinin
@SeregaZinin Ай бұрын
It's funny that they called it LoRa, since LoRa is most commonly known as LoRa and LoRaWAN define a low-power, wide-area (LPWA) networking protocol designed to wirelessly connect battery operated devices to the Internet. :)
@deepskyfrontier
@deepskyfrontier Ай бұрын
The way this method might enable better smaller models is big, but the opportunity to take this further, to audit and act upon the insights won thereby, to intelligently drive feedback ever closer to the native structure of the model, it’s the really big kind of gigantic. A year from now; AI using TTT to iterate on its own instantaneous requirements may have a claim on a kind of consciousness.
@matt.stevick
@matt.stevick Ай бұрын
i’ve have been having some unexpectedly profound conversations with my chatgpt lately that make time seem to disappear as i get so into the conversations. we delve into complex topics, and it’s incredible how it analyzes and remembers details from our discussions, sometimes understanding me better than i understand myself. it feels like i’m living in the movie ‘her.’ lol thx for being on top of everything going on matthew b.!
@frun
@frun Ай бұрын
I have a doubt they achieved AGI. ARC is clearly not the definitive test for AGI, although i hope it have some potential.
@u2b83
@u2b83 27 күн бұрын
Fine-tuning during test time simply acts as a prior into the test distribution lol ...but the surprise is that it goes/resets back to the base model after prediction.
@mickidydee
@mickidydee Ай бұрын
I'm curious if this sort of opens a tremendously huge back door to new techniques for jailbreaking the models and how that will effect the time frames for model releases trying to implement this new technique.
@justindressler5992
@justindressler5992 Ай бұрын
Its about pre activation based on problem domain. This is why prompting works aomewhat. If you feed several examples related to the domain problem into the network this activates the nuerons through attention. Lora is a patch with this activations pre defineded. I think the problem with this approach is synthesising accurate domain specific examples. But i believe OpenAI are working on this approach internally because this was something hinted by Sam Altman. The community use Loras for diffusion models the problem is the models only function with vary specific prompts. But if these patches are removed each time and reloaded to fit the context of the prompt it could work. It seems you still need a model capable of creating the examples. A foundation model such as 405b might be able to create examples for training a small model. I think long term a two model solution could be used with a adversaerable design. One model could generate a answer while another could propose improvements to the answer then prompt the first model again. These models would have to be trained differently
@frankjohannessen6383
@frankjohannessen6383 Ай бұрын
but how can it generate training-data with the correct solutions if it can't solve the problem to begin with?
@drdca8263
@drdca8263 Ай бұрын
Did you look at what was on-screen, or just listen to what he said? The input problem consists of 3 input output pairs, followed by an additional input, and the expected answer is the corresponding output. The inference-time generated training set is obtained by: 1) pick a random transformation, such as reflecting everything across some axis, and apply it to each of the 3 input/output pairs. 2)Then, randomly select one of the 3 pairs, and produce as a training sample the problem which has the other two input output pairs, and the input from the selected pair, as the input, and the corresponding output as the output. then it trains on all of these.
@interestedinstuff
@interestedinstuff Ай бұрын
Makes sense. Seems to me that's what we humes do. I see a set of puzzle result pairs and I look at them for a while till I figure out a pattern of transformation. Giving an LLM the ability to do the same, to look at build a working model of the transformation, before being confronted with the novel questions makes sense to me and I can see why it works.
@ShaunShibu-oz8yn
@ShaunShibu-oz8yn Ай бұрын
When it generates similar samples to the new problem at hand, it needs to solve the similar samples correctly as well before fine tuning
@Thedeepseanomad
@Thedeepseanomad Ай бұрын
Insstead of AGI we might end up with a so-so general model that we load up with what is basically similar to a LoRa for our current task need. Llama 8.5 in Optimus 6: "...I know Kung Fu!"
@theaugur1373
@theaugur1373 Ай бұрын
Do we know for sure that the test data is not in the training data?
@dr.mikeybee
@dr.mikeybee Ай бұрын
It seems to me that the couple of examples given for the problem can be used for supervised learning.
@ramakrishna5480
@ramakrishna5480 Ай бұрын
All these super intelligent models and no ai can give an new mnemonic to remember some simple facts for my exam
@baroianandrei7883
@baroianandrei7883 Ай бұрын
It would have been useful to make the difference between the ARC-AGI and ARC-AGI-pub. We don't know the leaderboard for the latter until 6th December
@leegaul8250
@leegaul8250 Ай бұрын
The trained on the public dataset. This isn't the real ARC-AGI challenge.
@leegaul8250
@leegaul8250 Ай бұрын
Also, is the leaderboard at 0:05 photoshopped? The leaderboard doesn't have Test-Time Training on it.
@ronilevarez901
@ronilevarez901 Ай бұрын
Feels like cheating right? But we have trained on the public datasets since the begging. As long as they don't train on the test set, it's all good.
@leegaul8250
@leegaul8250 Ай бұрын
​@@ronilevarez901 yes but there is a note in the paper that specifically states that they cannot guarantee that the model has not seen the test set (which is public).
@jawadmansoor2456
@jawadmansoor2456 Ай бұрын
Thrusters compute post training means more training. It's just a training method, it says train twice.
@BrettWrightsPage
@BrettWrightsPage Ай бұрын
How do you train a model on an answer that you don’t have?
@drdca8263
@drdca8263 Ай бұрын
Did you mention an actual scaling law for this inference-time-training?
@gpsx
@gpsx Ай бұрын
I haven't read the paper and don't fully understand what they are doing, but I don't see how this process generalizes. It just seems like they found a technique that works well on this test by essentially putting the question in to the training set.
@ppbroAI
@ppbroAI Ай бұрын
This seems simple enough to be implemented very soon. I think is kinda like a baked dspy vector technique into each query.
@MeinDeutschkurs
@MeinDeutschkurs Ай бұрын
Good, now we train the reflection process into the model. Well. Let’s hope that any prompt is covered. (TTT, test time training) - btw, live-LORA is dangerous, the model gets silly over time, because the user is silly.
@gd7163
@gd7163 Ай бұрын
Ok so two very basic questions: 1/ how do they generate extra problems that preserve the « spirit » of the initial problem? 2/ they minimise what objective function on each of those « similar » problems for which presumably they don’t the solution given they are generated from an unsolved problem in the first place, and of the same kind? Without this it sounds like cheap prestidigitation trick to me. Could be wrong though
@GregoryMcCarthy123
@GregoryMcCarthy123 Ай бұрын
It’s hard to call it AGI when the entire premise of this technique requires fine tuning on similar tasks beforehand. They’re just priming it to solve similar tasks.
@alexmipego
@alexmipego Ай бұрын
This is at least 1 step short of the real big real, that is use this not to simply update your ai parameters like this, but by enabling/disabling other networks. For example, image if ChatGPT could just ignore all the Japanese translation parts for this reply… this would mean the core would be much smaller and focused on high level knowledge, that triggers/enabled submodels as needed - a much more efficient approach to the whole problem, with the extra bonus that now you can also start reusing and parallelizing the develop of each "module."
@rul1175
@rul1175 Ай бұрын
Tomorrow: AI is doom, Sunday: AI is not dead, Monday:Rinse and repeat.
@fynnjackson2298
@fynnjackson2298 Ай бұрын
The only ceiling is us using our infinite imagination to imagine the lack of infinity. - This will continue forever.
@telotawa
@telotawa Ай бұрын
this is still too specific and unlike how actual reasoning works - it only works because you get to see examples first. true reasoning can explore uncharted territory, and idk if we can make benchmarks for that
@riffsoffov9291
@riffsoffov9291 Ай бұрын
Instead of "just erases all of that" before going to the next problem, should it send the data somewhere because maybe it's more useful than other synthetic data, or maybe send just the problem and the solution? Only when it doesn't break privacy etc. Just my amateur thoughts, likely way-off.
@ronilevarez901
@ronilevarez901 Ай бұрын
Nah, it's right on point. We can store the lora as document connected to the problem it solves and next time the LLM recognizes the same problem it will pull the lora and use it. That's gonna be the next thing added to knowledge graphs. You'll see.
@riffsoffov9291
@riffsoffov9291 Ай бұрын
@@ronilevarez901 Thanks for the info.
@WaveOfDestiny
@WaveOfDestiny Ай бұрын
Now we are cooking!! humans can reflect and learn. this is a step forward
@SethCohn23
@SethCohn23 Ай бұрын
The 'average' human score of AGC is widely overblown. They even claim 'kids' do well. Dig into the actual tests, and the very limited pool they use: it's scary how small the pool, and it's not the test suite used for AI... I do these tests well, and I was shocked at the claims of even 60%... 60% of Mechanical Turk users is not 60% general public, nor 60% of kids. I'm convinced the AGC score is skewed too high (and thus harder to achieve and pay out)
@ferencgaborsimon245
@ferencgaborsimon245 Ай бұрын
achieving AGI and it will run on a datacenter, so who will decide if it needs to be powered down? Would that be like mansla....er? What will it do to block its powerdown? Would it blakmail you or switch off traffic lights and end you?
@wurstelei1356
@wurstelei1356 Ай бұрын
Why TF is o1 in that list ? Where is the source code ?
@jmlbeaujour
@jmlbeaujour Ай бұрын
1. How does the LLM decides that he needs to perform test time training? i.e how does he know he does not have the capacity to respond accurately the new task 2. The model generates its own training data: there is no guaranty that the the generated training has no error (bad labels)
@drdca8263
@drdca8263 Ай бұрын
In this case, the leave-one-out version is sound, it will not produce an inaccurate label, because it is using the original label. The other data augmentation strategies may or may not be valid for a given task. They appear to be based on assuming certain symmetries about the problem. Which, the validity kind of depends on what class of tasks you want it to be able to recognize… But, because the entire collection of inputs and outputs are transformed in the same way, I think it should generally be fine? If e.g. you flip everything in some task, this corresponds to a different task which has everything flipped, and, as long as there is enough in the examples to know to be able to distinguish between the task and the flipped task, I think this should be fine?
@garyshi
@garyshi Ай бұрын
This is interesting. So we can also give a small model that's solid in logic and common sense but don't have a lot of specific knowledge about a domain, give it some articles, books or what have you to "fine tune" at real time, and let the model to generate more data from those materials (it's like "reflect on the knowledge you just learned"), and it can give better answers than, say the current Perplexity approach which is purely context-based? Essentially this is System 2 partially happens in model weights vs. purely in context.
@garyshi
@garyshi Ай бұрын
This may not be *the AGI* someone was looking for, but could still be quite potent.
@vaidphysics
@vaidphysics Ай бұрын
I think what should be worked on first is getting a LLM to understand the "environment", to build a physical model of the whole rather than just looking at the parts or correlations between those parts.
@jsebastianmunch
@jsebastianmunch Ай бұрын
Wow, so they just train on the benchmark - amazing…
@diaitigai9856
@diaitigai9856 Ай бұрын
Great video on 'Test Time Training'! 🤖✨ I appreciate how you simplified the complex concepts, making them accessible to a wider audience. However, I think it would have been helpful to dive a bit deeper into the potential limitations or challenges of this approach. For instance, how might 'Test Time Training' affect the overall efficiency of AI models in real-world applications? It’s exciting to see AI evolving, but understanding the drawbacks is just as important. Looking forward to your next video and more discussions on this topic! 🚀
@taichikitty
@taichikitty Ай бұрын
If you want a rubric for AGI: when an AI can solve the bicycle stability conundrum, it will have discovered something that humans have perplexingly been unable to solve for themselves. Unlike the Collatz conjecture or other mathematical problems, this is physical and should have a solution, or at least a cogent explanation.
@Alistair
@Alistair Ай бұрын
what is the bicycle stability conundrum? Isn't it fairly well understood by now how rake etc affect bicycle self centering, or am I missing something?
@fotisj321
@fotisj321 Ай бұрын
Great video and great paper. Isn't it strange that they then reset the model to its original state?
@kait3n10
@kait3n10 Ай бұрын
12:55 They already beat the average human score, doesn't this count as AGI according to this test? Or do they need to beat the best human score? It would be ASI at that point.
@henrytuttle
@henrytuttle Ай бұрын
One of the biggest problems with AI is it doesn't learn from its mistakes. Generally, the most effective way for humans to learn is trial and error. And the error doesn't come from internal chain of thought but rather the outside world demonstrating our error. AI doesn't do this. I've noticed that in "discussions" with AI, I can point out a flaw and it will sometimes integrate that into a new solution. Test time training seems to somewhat solve this problem, but it's not integrated. The problem is that it doesn't actually "learn" from this mistake. It does not integrate this lesson into its "intelligence". It merely uses it momentarily and then forgets it. If AI has already combed all known knowledge from the internet, the next source is obviously from both real time information and from information it gains from interacting with the world. I don't believe that we'll have true AGI until it can integrate lessons it learns from interaction with an external source into its structure.
@nikhil_jadhav
@nikhil_jadhav Ай бұрын
AGI is here..Breakthrough.. this is it.. blah blah blah... 10% improvement!! Let me know in the comment if you wants to test it out..
@SimeonRadivoev
@SimeonRadivoev Ай бұрын
I should just train my AI mainly on benchmarks, so I get ace them and get in the leaderboards
@3thinking
@3thinking Ай бұрын
I can see how this approach might be applied to creating a super coder LLM. Get an AI to generate billions of programs, some will compile and produce an output, some will generate compile or runtime errors. Use these as synthetic data to fine tune an LLM aimed at leet coding and solving new unseen software problems.
@guillaumevermeillesanchezm2427
@guillaumevermeillesanchezm2427 Ай бұрын
By tuning a model on ARC-like stuff you get better results on ARC. What a non-surprise.
@tomaszzielinski4521
@tomaszzielinski4521 Ай бұрын
Okay, so now AI is able to learn to solve a new problem. This was one missing feature for AGI no one quite knew how to tackle, but here is the breakthrough. Now, how can I do the same stuff at home?
@Barrel_Of_Lube
@Barrel_Of_Lube Ай бұрын
pairing this up with Mixture-of-LoRAs on top of moe architecture, huge potential for tiny fast models
OpenAI Unveils o3! AGI ACHIEVED!
26:24
Matthew Berman
Рет қаралды 200 М.
Anthropic just dropped an INSANE new paper…
14:15
Matthew Berman
Рет қаралды 28 М.
Арыстанның айқасы, Тәуіржанның шайқасы!
25:51
QosLike / ҚосЛайк / Косылайық
Рет қаралды 700 М.
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН
AI can't cross this line and we don't know why.
24:07
Welch Labs
Рет қаралды 1,4 МЛН
AI Pioneer Shows The Power of AI AGENTS - "The Future Is Agentic"
23:47
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Meet Willow, our state-of-the-art quantum chip
6:39
Google Quantum AI
Рет қаралды 1,3 МЛН
AI Is Not Designed for You
8:29
No Boilerplate
Рет қаралды 207 М.
OpenAI's New o1 Is LYING ON PURPOSE?! (Thinking For Itself)
24:09
Matthew Berman
Рет қаралды 55 М.
Vertical AI Agents Could Be 10X Bigger Than SaaS
42:13
Y Combinator
Рет қаралды 382 М.
AI News: Gemini 2.0, Devin, Quantum Computing, Llama 3.3, and more!
20:02
This is how I scrape 99% websites via LLM
22:44
AI Jason
Рет қаралды 174 М.