Everything You Wanted to Know About LLM Post-Training, with Nathan Lambert of Allen Institute for AI

Рет қаралды 2,309

Cognitive Revolution "How AI Changes Everything"

Күн бұрын

Пікірлер

@IvanBudiselic 6 күн бұрын

Nathan, the quality of your guests and conversations is so high that this is literally the only channel on all of KZbin where I tolerate in-video ads. Though I have to say that the in-video ads are also the only explanation I can find for why this channel doesn't already have 100k+ subs :)

@alexm1815 5 күн бұрын

This is my favorite episode yet, incredibly information dense. Thank you Nathan^2!

@zyzhang1130 5 күн бұрын

Very juicy content that is really lacking elsewhere (as far as I know)

@AR-iu7tf 19 сағат бұрын

As others have already said below, the most substantive and informative conversation on post training I have seen to date. Thank you so much for shedding light on an area that is almost like a black box now! - all we can find online is tidbits of speculation. You mention a paper on Verifier RL. I couldnt find a link to it online, perhaps it is not published yet?. Could you please share that if it is. Also, I know we can only speculate what o1 or Deepseek is doing for the reasoning sequences, but would it be fair to assume, during training they are doing some of reward model/verifier feedback at intermediate stages of a sequence that leads to a correct result, as opposed to just one reward signal for an entire sentence like what ChatGPT (perhaps!) does? In other words, is it likely to be a Bellman update all the tokens at the end of the sequence or at intermediate stages - also thank you so much for clarifying how the single reward value at the end is converted to individual rewards of each token that constituted that sentence.