Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

Рет қаралды 21,651

Күн бұрын

Пікірлер: 79

@mshonle Ай бұрын

I tried reading this paper three times but then decided it would have been more optimal if they doubled the number of scientists writing it…

@guccigav7912 Ай бұрын

lol same

@ultrasound1459 Ай бұрын

They didn't share any code 🔴❌️

@csabaczcsomps7655 Ай бұрын

Neural network is a procedure to process stimul . Not message as in oop. Message is go to one object. Stimul go to all object and is processed in all node. Imagine you have one variable and go in one expression. Stimul is one value and go in all expressions of all node. Is a new way to compute more close to real neurons. How to implement is the work in progres , now.

@tingtingin Ай бұрын

He's alive!

@kevon217 Ай бұрын

Love your paper breakdowns. Always learn a lot. Appreciate it!

@JumpDiffusion Ай бұрын

There is a paper by Christopher Re and co. about scaling inference via random sampling; they demonstrate scaling all the way up to saturating MATH and other benchmarks. They also come up with scaling laws for inference.

@MADjaHEAD Ай бұрын

I was missing you! Hope to see more from you

@pumozavr Ай бұрын

In Figure 2, beam search refers to the "standard" beam search, without refinement. You simply sample intermediate steps from a "standard" LLM (one that might not have self-refinement capabilities) and see what the best intermediate solutions are using the verifier. A PRM-based verifier will give you a score for the current step (the steps are delimited in a way that the PRM understands, e.g. through new lines), and the scores for the single steps are then combined (using average, min, ...) into a score for the whole intermediate solution. You can then pick the solution(s) with the highest score, expand on it, and iterate until you reach one or ideally multiple final solutions from which you can again pick using the verifier. That's my understanding.

@kikijuju4809 Ай бұрын

Long time no see

@juanjesusligero391 Ай бұрын

Glad to see another video of yours, thank you Yannic! :D I really miss your ML News, I hope you make some more of them one of these days ^^

@erv993 Ай бұрын

The king is back!

@ChocolateMilkCultLeader Ай бұрын

My goat is back

@Mordenor Ай бұрын

Thank You Mr Yannic For Explaining This Wonderful Paper About LLM Scaling

@03Krikri Ай бұрын

Thanks for your critical review, was very insightful

@googleyoutubechannel8554 Ай бұрын

Welcome back! I'm not convinced their definition of 'difficulty' is interesting or helpful either, but isn't it entirely unsurprising that LLMs 'think' in a different way than humans?

@LatteDeCoder Ай бұрын

this work seems to build upon another recent work, "Recursive Introspection: Teaching Language Model Agents How to Self-Improve," which has code available...

@wurstelei1356 Ай бұрын

Thanks for the hint.

@EkShunya Ай бұрын

welcome back

@akanjiemmanuel4807 Ай бұрын

Interesting paper

@daniele81 Ай бұрын

There are no error bars in figure 4. Ho would you know if any of these different methods performs significantly better than other? Looks like bad stat to me

@benedictsmith2415 Ай бұрын

Equation 1 just serves as a theoretical foundation for the "compute-optimal" concept but it cannot be directly used for optimization because: Intractability: Finding the truly optimal hyperparameters θ across all possible prompts and compute budgets a*(q) would require an exhaustive search..... Unknown Ground Truth: In a real-world setting, we don't know the ground-truth correct answer y*(q) for unseen prompt, so directly optimizing the indicator function is impossible.

@Veptis Ай бұрын

Will have to check the whole video later. But I think IBM has had a somewhat similar paper recently. about the training rate changing based on epoch/mini batch performance on the benchmark or something. It's called scheduler something

@LysergicKids Ай бұрын

It can't be, a new paper that's not 98% marketing wank? Is the world healing, brothers

@existenceisillusion6528 Ай бұрын

Are we sure a* is not a type-o that should have been y*? Also, best of weighted N beam majority?

@MasamuneX Ай бұрын

what if we use monty carlo tree search on tree of thought llms then we just keep the highest quality output and train a new foundation model on that synthetic data and repeat until asi

@montymemoladi8067 Ай бұрын

Sounds like a promising approach and I think its reasonably close to what the big labs are planning to do

@AtAtaylor Ай бұрын

People have already done this

@scoffpickle9655 Ай бұрын

Or just use something similar to Thinker:learning to plan and act to kinda (predict) a few tokens ahead which might increase quality

@Adhil_parammel Ай бұрын

Oracle to guide and reach asi required.

@keypey8256 Ай бұрын

I'm guessing they trained o1 in a similar manner. Maybe slightly different algorithm, different tree search technique or maybe slightly different way of generating output, but the general idea is probably the same.

@gileneusz Ай бұрын

he's the best

@wurstelei1356 Ай бұрын

It seems to me, accordingly to the graphs: The harder the question the more luck to get the right answer.

@DaRealCodeBlack Ай бұрын

Chinese and Indian software engineers and computer scientists are "killin da game" when it comes to all things high tech in coding Ai and other complicated domains in our field. Hats off to them!

@mike___-fi5kp Ай бұрын

long time no see

@andytroo Ай бұрын

how does resampling the output of a LLM and taking the most frequent differ from running with temp=0 ?

@ArtOfTheProblem Ай бұрын

I think performance breaks down at temp 0, and so you get much less exploration. Especially with ambiguous questions especially you get more stability with majority vote, plus a confidence metric

@makhalid1999 Ай бұрын

Can't you review Computer Vision papers too? 😞

@KadeemSometimes Ай бұрын

Nice

@keypey8256 Ай бұрын

41:15 isn't it at this point a manual overfitting of architecture to the dataset?

@aa-xn5hc Ай бұрын

Please the news back!

@islandfireballkill Ай бұрын

Wake up, babe. New Yannic video just dropped.

@bjarke7886 Ай бұрын

Please cover ESM3

@youssefdirani 15 күн бұрын

why not just open source Gemini and chatgpt ?

@MinecraftJuiceHD Ай бұрын

Isn't beam search done per token? Why does yannic say that they grade the answers?

@benedictsmith2415 Ай бұрын

he's misunderstood - the whole point of the beam search here is that it guides the generation process by making step-wise decisions based on the PRM's evaluation. It's more about strategically navigating the search space rather than explicitly modifying the output distribution or altering already generated outputs

@MinecraftJuiceHD Ай бұрын

@@benedictsmith2415 So im right in the way i understood it right? The beam search is done token by token and evaluated at intermediate steps?

@benedictsmith2415 Ай бұрын

@@MinecraftJuiceHD correct

@TheAIEpiphany Ай бұрын

21:48 What can be unburdened by what has been

@sushantpenshanwar 17 күн бұрын

Rant was good Lol

@张默涵-x3z 18 күн бұрын

牛逼

@TheTheeliman Ай бұрын

Too much of concepts zero lines of code. Deepmind should let me fine tune my llama/gemma with this approach

@RickeyBowers Ай бұрын

Completely worthless if the model has no concept of the test-time trajectory.

@nineteenfortyeight Ай бұрын

Why in the name of all that's holy are we asking an LLM to do arithmetic?? 😭

@hunterkudo9832 Ай бұрын

Because being able to do arithmetic is a good indicator of being able to reason. We want LLMs to be good reasoners because a lot of tasks in the real world will require LLMs and soon AI agents to reason like a human can.

@HUEHUEUHEPony Ай бұрын

Because not all of us are interested in roleplay slop

@csabaczcsomps7655 Ай бұрын

I think wath you want. When a kid see you put one apple than put one more he will answer we have 2. So we write 1+1=2. Then he will take notation always as true wthitout recall the apple video. This mean some training need 2 module, video then video-notation asociotion. And probable use notation is 3 step. My noob opinion.

@ozordiprince9405 Ай бұрын

200 views in 15 minutes. Bro fell off

@fontenbleau Ай бұрын

Python is just dead end pathway. One guy on KZbin writes neural network in Assembly low-level language and it's 500 times faster than Pytorch on 1 CPU core on one same task. We need full rewrite of networks and models.

@scoffpickle9655 Ай бұрын

Please tell me who made that. It seems so interesting

@scoffpickle9655 Ай бұрын

Also yeah, C or C++ is better for actually useful and fast models, python is good for modularity and prototyping but god it is so fucking slow

@biomerl Ай бұрын

Wat? 99 percent of training is done on gpu which is already cpp

@scoffpickle9655 Ай бұрын

@biomerl Yeah sorry I dont have much knowledge on low level ML

@kennycommentsofficial Ай бұрын

@@scoffpickle9655easiest starting place is search youtube for matrix multiplication with cuda (basically just c code)