An update on DPO vs PPO for LLM alignment

No video

An update on DPO vs PPO for LLM alignment

Рет қаралды 1,069

Nathan Lambert

Nathan Lambert

Күн бұрын

A casual chat on our experiments trying to figure out which one is best.
Paper referenced: arxiv.org/abs/...
Abstract: Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models (LMs). Despite its widespread use, the way preference-based learning is applied varies wildly, with differing data, learning algorithms, and evaluations used, making disentangling the impact of each aspect difficult. In this work, we identify four core aspects of preference-based learning: preference data, learning algorithm, reward model, and policy training prompts, systematically investigate the impact of these components on downstream model performance, and suggest a recipe for strong learning for preference feedback. Our findings indicate that all aspects are important for performance, with better preference data leading to the largest improvements, followed by the choice of learning algorithm, the use of improved reward models, and finally the use of additional unlabeled prompts for policy training. Notably, PPO outperforms DPO by up to 2.5% in math and 1.2% in general domains.
Slides: docs.google.co...
Synthetic data piece: www.interconne...
Slides taken from recent Stanford Lecture: docs.google.co...

Пікірлер: 7

Self-directed Synthetic Dialogues (and other recent synth data)

15:51

Self-directed Synthetic Dialogues (and other recent synth data)

Nathan Lambert

Рет қаралды 487

DPO Debate: Is RL needed for RLHF?

26:55

DPO Debate: Is RL needed for RLHF?

Nathan Lambert

Рет қаралды 8 М.

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

00:27

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items Official

Рет қаралды 68 МЛН

孩子太多！一碗水真的能端平吗？ #瞧这一家子 #带娃 #四小只吖 #日常 #搞笑 #搞笑家庭 #姐弟 #家庭生活

00:19

孩子太多！一碗水真的能端平吗？ #瞧这一家子 #带娃 #四小只吖 #日常 #搞笑 #搞笑家庭 #姐弟 #家庭生活

四小只吖

Рет қаралды 10 МЛН

Побег из Тюрьмы : Nuggets Gegagedigedagedago удирает от Nikocado Avocado !

00:25

Побег из Тюрьмы : Nuggets Gegagedigedagedago удирает от Nikocado Avocado !

Фани Хани

Рет қаралды 2,5 МЛН

Running With Bigger And Bigger Feastables

00:17

Running With Bigger And Bigger Feastables

MrBeast

Рет қаралды 195 МЛН

Do you think that ChatGPT can reason?

1:42:28

Do you think that ChatGPT can reason?

Machine Learning Street Talk

Рет қаралды 62 М.

This is why Deep Learning is really weird.

2:06:38

This is why Deep Learning is really weird.

Machine Learning Street Talk

Рет қаралды 385 М.

Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model Explained

36:25

Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model Explained

Gabriel Mongaras

Рет қаралды 15 М.

Evaluating LLM-based Applications

33:50

Evaluating LLM-based Applications

Databricks

Рет қаралды 24 М.

[1hr Talk] Intro to Large Language Models

59:48

[1hr Talk] Intro to Large Language Models

Andrej Karpathy

Рет қаралды 2,1 МЛН

Data Modeling for Power BI [Full Course] 📊

2:34:41

Data Modeling for Power BI [Full Course] 📊

Pragmatic Works

Рет қаралды 3,3 МЛН

No. 1 CEO: The Strategies I Used to Build 5 Billion-Dollar Companies (And How You Can Use Them)

1:35:48

No. 1 CEO: The Strategies I Used to Build 5 Billion-Dollar Companies (And How You Can Use Them)

The Knowledge Project Podcast

Рет қаралды 301 М.

Machine Learning for Everybody - Full Course

3:53:53

Machine Learning for Everybody - Full Course

freeCodeCamp.org

Рет қаралды 6 МЛН

Aligning LLMs with Direct Preference Optimization

58:07

Aligning LLMs with Direct Preference Optimization

DeepLearningAI

Рет қаралды 25 М.

Open-source AI (and LLMs): Definitions, Finding Nuance, and Policy

20:44

Open-source AI (and LLMs): Definitions, Finding Nuance, and Policy

Nathan Lambert

Рет қаралды 363

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

00:27

Cool Items!🥰 New Gadgets, Smart Appliances, Kitchen Tools Utensils, Home Cleaning, Beauty #shorts

Cool Items Official

Рет қаралды 68 МЛН