Aligning LLMs with Direct Preference Optimization

  Рет қаралды 27,933

DeepLearningAI

DeepLearningAI

Күн бұрын

Пікірлер: 20
@PritishYuvraj
@PritishYuvraj 8 ай бұрын
Excellent description between PPO and DPO! Kudos
@eliporter3980
@eliporter3980 9 ай бұрын
I'm learning a lot from these talks, thank you for having them.
@NitinPasumarthy
@NitinPasumarthy 9 ай бұрын
The best content I have seen in a while. Enjoyed both the theory and the practical notes from both speakers! Huge thanks dlai for organizing this event
@AurobindoTripathy
@AurobindoTripathy 9 ай бұрын
cut to the chase - 3:30 questions on DPO - 27:37 practical deep-dive - 30:19 question - 53:32
@vijaybhaskar5333
@vijaybhaskar5333 9 ай бұрын
Excellent topic. Well explained. One of the best videos on this subject I saw recently. Continue your goodwork😊
@amortalbeing
@amortalbeing 9 ай бұрын
This was amazing thank you everyone. One thing though, if thats possible, it would be greatly appreciated if you could record in 1080p where the details/text on the slides are visible and easier to consume. Thanks a lot again
@MatijaGrcic
@MatijaGrcic 9 ай бұрын
Check out notebooks and slides in the description.
@amortalbeing
@amortalbeing 9 ай бұрын
@@MatijaGrcic Thanks a lot, downloaded the slides
@katie-48
@katie-48 9 ай бұрын
Great presentation, thank you very much!
@jeankunz5986
@jeankunz5986 9 ай бұрын
great presentation. Congratulations.
@PaulaLeonova
@PaulaLeonova 9 ай бұрын
At 29:40 Lewis mentions an algorithm that requires fewer training samples, what is the name of it? I heard "data", but don't think that is correct. If anyone knows, would you mind replying?
@AurobindoTripathy
@AurobindoTripathy 9 ай бұрын
Possibly, this paper, "Rethinking Data Selection for Supervised Fine-Tuning" arxiv.org/pdf/2402.06094.pdf
@ralphabrooks
@ralphabrooks 9 ай бұрын
I am also interested in hearing more about this "data" algorithm. Is there a link to a paper or blog on it?
@austinmw89
@austinmw89 9 ай бұрын
Curious if you compared SFT on all data vs. training on completions only?
@TheRilwen
@TheRilwen 9 ай бұрын
I'm wondering why simple techniques, such as sample boosting, increasing errors for high ranked examples, or attention layer wouldn't work in place of RLHF. It seems like a very convoluted and inefficient way of doing a simple thing - which convinces me that I'm missing something :-)
@dc33333
@dc33333 4 ай бұрын
muy bueno gracias
@iseminamanim
@iseminamanim 9 ай бұрын
Interested
@trisetra
@trisetra 4 ай бұрын
kzbin.info/www/bejne/h4m5dKSqdr90pJY The details in the Llama3 paper seem to validate the claim that DPO works better than RL at scale.
@MacProUser99876
@MacProUser99876 9 ай бұрын
How DPO works under the hood: kzbin.info/www/bejne/gKaQoXmAg8uCnLs
RLHF & DPO Explained (In Simple Terms!)
19:39
Entry Point AI
Рет қаралды 2,8 М.
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 45 МЛН
FOREVER BUNNY
00:14
Natan por Aí
Рет қаралды 33 МЛН
Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)
1:44:31
Make AI Think Like YOU: A Guide to LLM Alignment
24:56
Adam Lucek
Рет қаралды 568
What is generative AI and how does it work? - The Turing Lectures with Mirella Lapata
46:02
Enabling Cost-Efficient LLM Serving with Ray Serve
30:28
Anyscale
Рет қаралды 6 М.
Reinforcement Learning from Human Feedback: From Zero to chatGPT
1:00:38