RLHF & DPO Explained (In Simple Terms!)

Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

ОБМЕНЯЛА КВИНКУ НА…😱(смотрите до конца😂)#роблокс #игры #смешное #интересное #квинка

СКОЛЬКО ПАЛЬЦЕВ ТУТ?

БАСКЕТБОЛИСТЫ ИГРАЮТ В НАСТОЛЬНЫЙ ТЕННИС #иванабрамов #дедищев #баскетбол #пингпонг #shorts

Players push long pins through a cardboard box attempting to pop the balloon!

RLHF & DPO Explained (In Simple Terms!)

Рет қаралды 2,735

Entry Point AI

Entry Point AI

Күн бұрын

Пікірлер: 10

@pamelamadingdong

@pamelamadingdong 5 күн бұрын

Fact that you gave a concrete examples really helped me go through this! Thank you for the great video

3 ай бұрын

Awesome. Thanks

@liberate7604 5 ай бұрын

Great video , Is it better to use KTO as optimizer for a binary classification?

@EntryPointAI 4 ай бұрын

I couldn't say for sure. Binary classification is a fairly simple task, so I would start with supervised fine-tuning.

@VerdonTrigance

@VerdonTrigance 5 ай бұрын

Hey! Thanks for video! I never used these techniques, but what I really wants to do is to train a base or chat LLM model like llama or phi-3 on some big text (Lord of the Ring for example). But all techniques I've seen so far requires a proper dataset to be prepared, but who and how can do that? Ask all of possible questions and answer them as well? It's impossible! Don't you know how can I prepare a dataset to later train a model on?

@EntryPointAI 5 ай бұрын

Besides including the big text in a model's pretraining, you can fine-tune on it using empty prompts, which will make the model more likely to respond in a style similar to the writing. That doesn't necessarily make it an expert on the contents. In order to answer questions about a corpus, the typical approach is to chunk it up and use RAG. I have another video on the difference between RAG and fine-tuning.

@iasplay224 5 ай бұрын

Thank you for the info, it was very good explained for an introduction

@MarshallRoy-h9e

@MarshallRoy-h9e 2 ай бұрын

Melisa Branch

@priscillaleapman2367

@priscillaleapman2367 2 ай бұрын

Martin Shirley Jackson Kenneth Allen Mary

Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use

15:21

Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use

Entry Point AI

Рет қаралды 100 М.

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

21:15

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Serrano.Academy

Рет қаралды 7 М.

ОБМЕНЯЛА КВИНКУ НА…😱(смотрите до конца😂)#роблокс #игры #смешное #интересное #квинка

00:42

ОБМЕНЯЛА КВИНКУ НА…😱(смотрите до конца😂)#роблокс #игры #смешное #интересное #квинка

i_roblox_queen

Рет қаралды 5 МЛН

СКОЛЬКО ПАЛЬЦЕВ ТУТ?

00:16

СКОЛЬКО ПАЛЬЦЕВ ТУТ?

Masomka

Рет қаралды 3,4 МЛН

БАСКЕТБОЛИСТЫ ИГРАЮТ В НАСТОЛЬНЫЙ ТЕННИС #иванабрамов #дедищев #баскетбол #пингпонг #shorts

00:20

БАСКЕТБОЛИСТЫ ИГРАЮТ В НАСТОЛЬНЫЙ ТЕННИС #иванабрамов #дедищев #баскетбол #пингпонг #shorts

Иван Абрамов

Рет қаралды 9 МЛН

Players push long pins through a cardboard box attempting to pop the balloon!

00:31

Players push long pins through a cardboard box attempting to pop the balloon!

Daily Viral Brief

Рет қаралды 27 МЛН

Proximal Policy Optimization Explained

17:50

Proximal Policy Optimization Explained

Edan Meyer

Рет қаралды 50 М.

15min History of Reinforcement Learning and Human Feedback

17:24

15min History of Reinforcement Learning and Human Feedback

Nathan Lambert

Рет қаралды 2,7 М.

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

28:23

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)

Yannic Kilcher

Рет қаралды 10 М.

DPO Debate: Is RL needed for RLHF?

26:55

DPO Debate: Is RL needed for RLHF?

Nathan Lambert

Рет қаралды 8 М.

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

8:55

Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained

AI Coffee Break with Letitia

Рет қаралды 25 М.

Reinforcement Learning from Human Feedback (RLHF) Explained

11:29

Reinforcement Learning from Human Feedback (RLHF) Explained

IBM Technology

Рет қаралды 13 М.

LoRA explained (and a bit about precision and quantization)

17:07

LoRA explained (and a bit about precision and quantization)

DeepFindr

Рет қаралды 65 М.

LoRA & QLoRA Fine-tuning Explained In-Depth

14:39

LoRA & QLoRA Fine-tuning Explained In-Depth

Entry Point AI

Рет қаралды 49 М.

Direct Preference Optimization

14:15

Direct Preference Optimization

Data Science Gems

Рет қаралды 460

RLHF+CHATGPT: What you must know

10:48

RLHF+CHATGPT: What you must know

Machine Learning Street Talk

Рет қаралды 69 М.

ОБМЕНЯЛА КВИНКУ НА…😱(смотрите до конца😂)#роблокс #игры #смешное #интересное #квинка

00:42

ОБМЕНЯЛА КВИНКУ НА…😱(смотрите до конца😂)#роблокс #игры #смешное #интересное #квинка

i_roblox_queen

Рет қаралды 5 МЛН