I'm learning a lot from these talks, thank you for having them.
@NitinPasumarthy9 ай бұрын
The best content I have seen in a while. Enjoyed both the theory and the practical notes from both speakers! Huge thanks dlai for organizing this event
@AurobindoTripathy9 ай бұрын
cut to the chase - 3:30 questions on DPO - 27:37 practical deep-dive - 30:19 question - 53:32
@vijaybhaskar53339 ай бұрын
Excellent topic. Well explained. One of the best videos on this subject I saw recently. Continue your goodwork😊
@amortalbeing9 ай бұрын
This was amazing thank you everyone. One thing though, if thats possible, it would be greatly appreciated if you could record in 1080p where the details/text on the slides are visible and easier to consume. Thanks a lot again
@MatijaGrcic9 ай бұрын
Check out notebooks and slides in the description.
@amortalbeing9 ай бұрын
@@MatijaGrcic Thanks a lot, downloaded the slides
@katie-489 ай бұрын
Great presentation, thank you very much!
@jeankunz59869 ай бұрын
great presentation. Congratulations.
@PaulaLeonova9 ай бұрын
At 29:40 Lewis mentions an algorithm that requires fewer training samples, what is the name of it? I heard "data", but don't think that is correct. If anyone knows, would you mind replying?
@AurobindoTripathy9 ай бұрын
Possibly, this paper, "Rethinking Data Selection for Supervised Fine-Tuning" arxiv.org/pdf/2402.06094.pdf
@ralphabrooks9 ай бұрын
I am also interested in hearing more about this "data" algorithm. Is there a link to a paper or blog on it?
@austinmw899 ай бұрын
Curious if you compared SFT on all data vs. training on completions only?
@TheRilwen9 ай бұрын
I'm wondering why simple techniques, such as sample boosting, increasing errors for high ranked examples, or attention layer wouldn't work in place of RLHF. It seems like a very convoluted and inefficient way of doing a simple thing - which convinces me that I'm missing something :-)
@dc333334 ай бұрын
muy bueno gracias
@iseminamanim9 ай бұрын
Interested
@trisetra4 ай бұрын
kzbin.info/www/bejne/h4m5dKSqdr90pJY The details in the Llama3 paper seem to validate the claim that DPO works better than RL at scale.
@MacProUser998769 ай бұрын
How DPO works under the hood: kzbin.info/www/bejne/gKaQoXmAg8uCnLs