Aligning LLMs with Direct Preference Optimization

Рет қаралды 27,933

DeepLearningAI

Күн бұрын

Пікірлер: 20

@PritishYuvraj 8 ай бұрын

Excellent description between PPO and DPO! Kudos

@eliporter3980 9 ай бұрын

I'm learning a lot from these talks, thank you for having them.

@NitinPasumarthy 9 ай бұрын

The best content I have seen in a while. Enjoyed both the theory and the practical notes from both speakers! Huge thanks dlai for organizing this event

@AurobindoTripathy 9 ай бұрын

cut to the chase - 3:30 questions on DPO - 27:37 practical deep-dive - 30:19 question - 53:32

@vijaybhaskar5333 9 ай бұрын

Excellent topic. Well explained. One of the best videos on this subject I saw recently. Continue your goodwork😊

@amortalbeing 9 ай бұрын

This was amazing thank you everyone. One thing though, if thats possible, it would be greatly appreciated if you could record in 1080p where the details/text on the slides are visible and easier to consume. Thanks a lot again

@MatijaGrcic 9 ай бұрын

Check out notebooks and slides in the description.

@amortalbeing 9 ай бұрын

@@MatijaGrcic Thanks a lot, downloaded the slides

@katie-48 9 ай бұрын

Great presentation, thank you very much!

@jeankunz5986 9 ай бұрын

great presentation. Congratulations.

@PaulaLeonova 9 ай бұрын

At 29:40 Lewis mentions an algorithm that requires fewer training samples, what is the name of it? I heard "data", but don't think that is correct. If anyone knows, would you mind replying?

@AurobindoTripathy 9 ай бұрын

Possibly, this paper, "Rethinking Data Selection for Supervised Fine-Tuning" arxiv.org/pdf/2402.06094.pdf

@ralphabrooks 9 ай бұрын

I am also interested in hearing more about this "data" algorithm. Is there a link to a paper or blog on it?

@austinmw89 9 ай бұрын

Curious if you compared SFT on all data vs. training on completions only?

@TheRilwen 9 ай бұрын

I'm wondering why simple techniques, such as sample boosting, increasing errors for high ranked examples, or attention layer wouldn't work in place of RLHF. It seems like a very convoluted and inefficient way of doing a simple thing - which convinces me that I'm missing something :-)