I tried reading this paper three times but then decided it would have been more optimal if they doubled the number of scientists writing it…
@guccigav7912Ай бұрын
lol same
@ultrasound1459Ай бұрын
They didn't share any code 🔴❌️
@csabaczcsomps7655Ай бұрын
Neural network is a procedure to process stimul . Not message as in oop. Message is go to one object. Stimul go to all object and is processed in all node. Imagine you have one variable and go in one expression. Stimul is one value and go in all expressions of all node. Is a new way to compute more close to real neurons. How to implement is the work in progres , now.
@tingtinginАй бұрын
He's alive!
@kevon217Ай бұрын
Love your paper breakdowns. Always learn a lot. Appreciate it!
@JumpDiffusionАй бұрын
There is a paper by Christopher Re and co. about scaling inference via random sampling; they demonstrate scaling all the way up to saturating MATH and other benchmarks. They also come up with scaling laws for inference.
@MADjaHEADАй бұрын
I was missing you! Hope to see more from you
@pumozavrАй бұрын
In Figure 2, beam search refers to the "standard" beam search, without refinement. You simply sample intermediate steps from a "standard" LLM (one that might not have self-refinement capabilities) and see what the best intermediate solutions are using the verifier. A PRM-based verifier will give you a score for the current step (the steps are delimited in a way that the PRM understands, e.g. through new lines), and the scores for the single steps are then combined (using average, min, ...) into a score for the whole intermediate solution. You can then pick the solution(s) with the highest score, expand on it, and iterate until you reach one or ideally multiple final solutions from which you can again pick using the verifier. That's my understanding.
@kikijuju4809Ай бұрын
Long time no see
@juanjesusligero391Ай бұрын
Glad to see another video of yours, thank you Yannic! :D I really miss your ML News, I hope you make some more of them one of these days ^^
@erv993Ай бұрын
The king is back!
@ChocolateMilkCultLeaderАй бұрын
My goat is back
@MordenorАй бұрын
Thank You Mr Yannic For Explaining This Wonderful Paper About LLM Scaling
@03KrikriАй бұрын
Thanks for your critical review, was very insightful
@googleyoutubechannel8554Ай бұрын
Welcome back! I'm not convinced their definition of 'difficulty' is interesting or helpful either, but isn't it entirely unsurprising that LLMs 'think' in a different way than humans?
@LatteDeCoderАй бұрын
this work seems to build upon another recent work, "Recursive Introspection: Teaching Language Model Agents How to Self-Improve," which has code available...
@wurstelei1356Ай бұрын
Thanks for the hint.
@EkShunyaАй бұрын
welcome back
@akanjiemmanuel4807Ай бұрын
Interesting paper
@daniele81Ай бұрын
There are no error bars in figure 4. Ho would you know if any of these different methods performs significantly better than other? Looks like bad stat to me
@benedictsmith2415Ай бұрын
Equation 1 just serves as a theoretical foundation for the "compute-optimal" concept but it cannot be directly used for optimization because: Intractability: Finding the truly optimal hyperparameters θ across all possible prompts and compute budgets a*(q) would require an exhaustive search..... Unknown Ground Truth: In a real-world setting, we don't know the ground-truth correct answer y*(q) for unseen prompt, so directly optimizing the indicator function is impossible.
@VeptisАй бұрын
Will have to check the whole video later. But I think IBM has had a somewhat similar paper recently. about the training rate changing based on epoch/mini batch performance on the benchmark or something. It's called scheduler something
@LysergicKidsАй бұрын
It can't be, a new paper that's not 98% marketing wank? Is the world healing, brothers
@existenceisillusion6528Ай бұрын
Are we sure a* is not a type-o that should have been y*? Also, best of weighted N beam majority?
@MasamuneXАй бұрын
what if we use monty carlo tree search on tree of thought llms then we just keep the highest quality output and train a new foundation model on that synthetic data and repeat until asi
@montymemoladi8067Ай бұрын
Sounds like a promising approach and I think its reasonably close to what the big labs are planning to do
@AtAtaylorАй бұрын
People have already done this
@scoffpickle9655Ай бұрын
Or just use something similar to Thinker:learning to plan and act to kinda (predict) a few tokens ahead which might increase quality
@Adhil_parammelАй бұрын
Oracle to guide and reach asi required.
@keypey8256Ай бұрын
I'm guessing they trained o1 in a similar manner. Maybe slightly different algorithm, different tree search technique or maybe slightly different way of generating output, but the general idea is probably the same.
@gileneuszАй бұрын
he's the best
@wurstelei1356Ай бұрын
It seems to me, accordingly to the graphs: The harder the question the more luck to get the right answer.
@DaRealCodeBlackАй бұрын
Chinese and Indian software engineers and computer scientists are "killin da game" when it comes to all things high tech in coding Ai and other complicated domains in our field. Hats off to them!
@mike___-fi5kpАй бұрын
long time no see
@andytrooАй бұрын
how does resampling the output of a LLM and taking the most frequent differ from running with temp=0 ?
@ArtOfTheProblemАй бұрын
I think performance breaks down at temp 0, and so you get much less exploration. Especially with ambiguous questions especially you get more stability with majority vote, plus a confidence metric
@makhalid1999Ай бұрын
Can't you review Computer Vision papers too? 😞
@KadeemSometimesАй бұрын
Nice
@keypey8256Ай бұрын
41:15 isn't it at this point a manual overfitting of architecture to the dataset?
@aa-xn5hcАй бұрын
Please the news back!
@islandfireballkillАй бұрын
Wake up, babe. New Yannic video just dropped.
@bjarke7886Ай бұрын
Please cover ESM3
@youssefdirani15 күн бұрын
why not just open source Gemini and chatgpt ?
@MinecraftJuiceHDАй бұрын
Isn't beam search done per token? Why does yannic say that they grade the answers?
@benedictsmith2415Ай бұрын
he's misunderstood - the whole point of the beam search here is that it guides the generation process by making step-wise decisions based on the PRM's evaluation. It's more about strategically navigating the search space rather than explicitly modifying the output distribution or altering already generated outputs
@MinecraftJuiceHDАй бұрын
@@benedictsmith2415 So im right in the way i understood it right? The beam search is done token by token and evaluated at intermediate steps?
@benedictsmith2415Ай бұрын
@@MinecraftJuiceHD correct
@TheAIEpiphanyАй бұрын
21:48 What can be unburdened by what has been
@sushantpenshanwar17 күн бұрын
Rant was good Lol
@张默涵-x3z18 күн бұрын
牛逼
@TheTheelimanАй бұрын
Too much of concepts zero lines of code. Deepmind should let me fine tune my llama/gemma with this approach
@RickeyBowersАй бұрын
Completely worthless if the model has no concept of the test-time trajectory.
@nineteenfortyeightАй бұрын
Why in the name of all that's holy are we asking an LLM to do arithmetic?? 😭
@hunterkudo9832Ай бұрын
Because being able to do arithmetic is a good indicator of being able to reason. We want LLMs to be good reasoners because a lot of tasks in the real world will require LLMs and soon AI agents to reason like a human can.
@HUEHUEUHEPonyАй бұрын
Because not all of us are interested in roleplay slop
@csabaczcsomps7655Ай бұрын
I think wath you want. When a kid see you put one apple than put one more he will answer we have 2. So we write 1+1=2. Then he will take notation always as true wthitout recall the apple video. This mean some training need 2 module, video then video-notation asociotion. And probable use notation is 3 step. My noob opinion.
@ozordiprince9405Ай бұрын
200 views in 15 minutes. Bro fell off
@fontenbleauАй бұрын
Python is just dead end pathway. One guy on KZbin writes neural network in Assembly low-level language and it's 500 times faster than Pytorch on 1 CPU core on one same task. We need full rewrite of networks and models.
@scoffpickle9655Ай бұрын
Please tell me who made that. It seems so interesting
@scoffpickle9655Ай бұрын
Also yeah, C or C++ is better for actually useful and fast models, python is good for modularity and prototyping but god it is so fucking slow
@biomerlАй бұрын
Wat? 99 percent of training is done on gpu which is already cpp
@scoffpickle9655Ай бұрын
@biomerl Yeah sorry I dont have much knowledge on low level ML
@kennycommentsofficialАй бұрын
@@scoffpickle9655easiest starting place is search youtube for matrix multiplication with cuda (basically just c code)