Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)

  Рет қаралды 21,651

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 79
@mshonle
@mshonle Ай бұрын
I tried reading this paper three times but then decided it would have been more optimal if they doubled the number of scientists writing it…
@guccigav7912
@guccigav7912 Ай бұрын
lol same
@ultrasound1459
@ultrasound1459 Ай бұрын
They didn't share any code 🔴❌️
@csabaczcsomps7655
@csabaczcsomps7655 Ай бұрын
Neural network is a procedure to process stimul . Not message as in oop. Message is go to one object. Stimul go to all object and is processed in all node. Imagine you have one variable and go in one expression. Stimul is one value and go in all expressions of all node. Is a new way to compute more close to real neurons. How to implement is the work in progres , now.
@tingtingin
@tingtingin Ай бұрын
He's alive!
@kevon217
@kevon217 Ай бұрын
Love your paper breakdowns. Always learn a lot. Appreciate it!
@JumpDiffusion
@JumpDiffusion Ай бұрын
There is a paper by Christopher Re and co. about scaling inference via random sampling; they demonstrate scaling all the way up to saturating MATH and other benchmarks. They also come up with scaling laws for inference.
@MADjaHEAD
@MADjaHEAD Ай бұрын
I was missing you! Hope to see more from you
@pumozavr
@pumozavr Ай бұрын
In Figure 2, beam search refers to the "standard" beam search, without refinement. You simply sample intermediate steps from a "standard" LLM (one that might not have self-refinement capabilities) and see what the best intermediate solutions are using the verifier. A PRM-based verifier will give you a score for the current step (the steps are delimited in a way that the PRM understands, e.g. through new lines), and the scores for the single steps are then combined (using average, min, ...) into a score for the whole intermediate solution. You can then pick the solution(s) with the highest score, expand on it, and iterate until you reach one or ideally multiple final solutions from which you can again pick using the verifier. That's my understanding.
@kikijuju4809
@kikijuju4809 Ай бұрын
Long time no see
@juanjesusligero391
@juanjesusligero391 Ай бұрын
Glad to see another video of yours, thank you Yannic! :D I really miss your ML News, I hope you make some more of them one of these days ^^
@erv993
@erv993 Ай бұрын
The king is back!
@ChocolateMilkCultLeader
@ChocolateMilkCultLeader Ай бұрын
My goat is back
@Mordenor
@Mordenor Ай бұрын
Thank You Mr Yannic For Explaining This Wonderful Paper About LLM Scaling
@03Krikri
@03Krikri Ай бұрын
Thanks for your critical review, was very insightful
@googleyoutubechannel8554
@googleyoutubechannel8554 Ай бұрын
Welcome back! I'm not convinced their definition of 'difficulty' is interesting or helpful either, but isn't it entirely unsurprising that LLMs 'think' in a different way than humans?
@LatteDeCoder
@LatteDeCoder Ай бұрын
this work seems to build upon another recent work, "Recursive Introspection: Teaching Language Model Agents How to Self-Improve," which has code available...
@wurstelei1356
@wurstelei1356 Ай бұрын
Thanks for the hint.
@EkShunya
@EkShunya Ай бұрын
welcome back
@akanjiemmanuel4807
@akanjiemmanuel4807 Ай бұрын
Interesting paper
@daniele81
@daniele81 Ай бұрын
There are no error bars in figure 4. Ho would you know if any of these different methods performs significantly better than other? Looks like bad stat to me
@benedictsmith2415
@benedictsmith2415 Ай бұрын
Equation 1 just serves as a theoretical foundation for the "compute-optimal" concept but it cannot be directly used for optimization because: Intractability: Finding the truly optimal hyperparameters θ across all possible prompts and compute budgets a*(q) would require an exhaustive search..... Unknown Ground Truth: In a real-world setting, we don't know the ground-truth correct answer y*(q) for unseen prompt, so directly optimizing the indicator function is impossible.
@Veptis
@Veptis Ай бұрын
Will have to check the whole video later. But I think IBM has had a somewhat similar paper recently. about the training rate changing based on epoch/mini batch performance on the benchmark or something. It's called scheduler something
@LysergicKids
@LysergicKids Ай бұрын
It can't be, a new paper that's not 98% marketing wank? Is the world healing, brothers
@existenceisillusion6528
@existenceisillusion6528 Ай бұрын
Are we sure a* is not a type-o that should have been y*? Also, best of weighted N beam majority?
@MasamuneX
@MasamuneX Ай бұрын
what if we use monty carlo tree search on tree of thought llms then we just keep the highest quality output and train a new foundation model on that synthetic data and repeat until asi
@montymemoladi8067
@montymemoladi8067 Ай бұрын
Sounds like a promising approach and I think its reasonably close to what the big labs are planning to do
@AtAtaylor
@AtAtaylor Ай бұрын
People have already done this
@scoffpickle9655
@scoffpickle9655 Ай бұрын
Or just use something similar to Thinker:learning to plan and act to kinda (predict) a few tokens ahead which might increase quality
@Adhil_parammel
@Adhil_parammel Ай бұрын
Oracle to guide and reach asi required.
@keypey8256
@keypey8256 Ай бұрын
I'm guessing they trained o1 in a similar manner. Maybe slightly different algorithm, different tree search technique or maybe slightly different way of generating output, but the general idea is probably the same.
@gileneusz
@gileneusz Ай бұрын
he's the best
@wurstelei1356
@wurstelei1356 Ай бұрын
It seems to me, accordingly to the graphs: The harder the question the more luck to get the right answer.
@DaRealCodeBlack
@DaRealCodeBlack Ай бұрын
Chinese and Indian software engineers and computer scientists are "killin da game" when it comes to all things high tech in coding Ai and other complicated domains in our field. Hats off to them!
@mike___-fi5kp
@mike___-fi5kp Ай бұрын
long time no see
@andytroo
@andytroo Ай бұрын
how does resampling the output of a LLM and taking the most frequent differ from running with temp=0 ?
@ArtOfTheProblem
@ArtOfTheProblem Ай бұрын
I think performance breaks down at temp 0, and so you get much less exploration. Especially with ambiguous questions especially you get more stability with majority vote, plus a confidence metric
@makhalid1999
@makhalid1999 Ай бұрын
Can't you review Computer Vision papers too? 😞
@KadeemSometimes
@KadeemSometimes Ай бұрын
Nice
@keypey8256
@keypey8256 Ай бұрын
41:15 isn't it at this point a manual overfitting of architecture to the dataset?
@aa-xn5hc
@aa-xn5hc Ай бұрын
Please the news back!
@islandfireballkill
@islandfireballkill Ай бұрын
Wake up, babe. New Yannic video just dropped.
@bjarke7886
@bjarke7886 Ай бұрын
Please cover ESM3
@youssefdirani
@youssefdirani 15 күн бұрын
why not just open source Gemini and chatgpt ?
@MinecraftJuiceHD
@MinecraftJuiceHD Ай бұрын
Isn't beam search done per token? Why does yannic say that they grade the answers?
@benedictsmith2415
@benedictsmith2415 Ай бұрын
he's misunderstood - the whole point of the beam search here is that it guides the generation process by making step-wise decisions based on the PRM's evaluation. It's more about strategically navigating the search space rather than explicitly modifying the output distribution or altering already generated outputs
@MinecraftJuiceHD
@MinecraftJuiceHD Ай бұрын
@@benedictsmith2415 So im right in the way i understood it right? The beam search is done token by token and evaluated at intermediate steps?
@benedictsmith2415
@benedictsmith2415 Ай бұрын
@@MinecraftJuiceHD correct
@TheAIEpiphany
@TheAIEpiphany Ай бұрын
21:48 What can be unburdened by what has been
@sushantpenshanwar
@sushantpenshanwar 17 күн бұрын
Rant was good Lol
@张默涵-x3z
@张默涵-x3z 18 күн бұрын
牛逼
@TheTheeliman
@TheTheeliman Ай бұрын
Too much of concepts zero lines of code. Deepmind should let me fine tune my llama/gemma with this approach
@RickeyBowers
@RickeyBowers Ай бұрын
Completely worthless if the model has no concept of the test-time trajectory.
@nineteenfortyeight
@nineteenfortyeight Ай бұрын
Why in the name of all that's holy are we asking an LLM to do arithmetic?? 😭
@hunterkudo9832
@hunterkudo9832 Ай бұрын
Because being able to do arithmetic is a good indicator of being able to reason. We want LLMs to be good reasoners because a lot of tasks in the real world will require LLMs and soon AI agents to reason like a human can.
@HUEHUEUHEPony
@HUEHUEUHEPony Ай бұрын
Because not all of us are interested in roleplay slop
@csabaczcsomps7655
@csabaczcsomps7655 Ай бұрын
I think wath you want. When a kid see you put one apple than put one more he will answer we have 2. So we write 1+1=2. Then he will take notation always as true wthitout recall the apple video. This mean some training need 2 module, video then video-notation asociotion. And probable use notation is 3 step. My noob opinion.
@ozordiprince9405
@ozordiprince9405 Ай бұрын
200 views in 15 minutes. Bro fell off
@fontenbleau
@fontenbleau Ай бұрын
Python is just dead end pathway. One guy on KZbin writes neural network in Assembly low-level language and it's 500 times faster than Pytorch on 1 CPU core on one same task. We need full rewrite of networks and models.
@scoffpickle9655
@scoffpickle9655 Ай бұрын
Please tell me who made that. It seems so interesting
@scoffpickle9655
@scoffpickle9655 Ай бұрын
Also yeah, C or C++ is better for actually useful and fast models, python is good for modularity and prototyping but god it is so fucking slow
@biomerl
@biomerl Ай бұрын
Wat? 99 percent of training is done on gpu which is already cpp
@scoffpickle9655
@scoffpickle9655 Ай бұрын
@biomerl Yeah sorry I dont have much knowledge on low level ML
@kennycommentsofficial
@kennycommentsofficial Ай бұрын
@@scoffpickle9655easiest starting place is search youtube for matrix multiplication with cuda (basically just c code)
@imaspacecreature
@imaspacecreature Ай бұрын
The Travis Pickle of AI!
I thought one thing and the truth is something else 😂
00:34
عائلة ابو رعد Abo Raad family
Рет қаралды 10 МЛН
За кого болели?😂
00:18
МЯТНАЯ ФАНТА
Рет қаралды 3,2 МЛН
Turn Off the Vacum And Sit Back and Laugh 🤣
00:34
SKITSFUL
Рет қаралды 7 МЛН
AI can't cross this line and we don't know why.
24:07
Welch Labs
Рет қаралды 1,3 МЛН
Reinforcement Learning Course - Full Machine Learning Tutorial
3:55:27
freeCodeCamp.org
Рет қаралды 914 М.
Were RNNs All We Needed? (Paper Explained)
27:48
Yannic Kilcher
Рет қаралды 52 М.
TEST TIME Optimized AI REASONING (MIT)
28:05
Discover AI
Рет қаралды 1,8 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 381 М.
Wolfram Physics Project Launch
3:50:19
Wolfram
Рет қаралды 1,9 МЛН
Learning at test time in LLMs
51:02
Machine Learning Street Talk
Рет қаралды 21 М.
I thought one thing and the truth is something else 😂
00:34
عائلة ابو رعد Abo Raad family
Рет қаралды 10 МЛН