RLHF & DPO Explained (In Simple Terms!)

  Рет қаралды 2,735

Entry Point AI

Entry Point AI

Күн бұрын

Пікірлер: 10
@pamelamadingdong
@pamelamadingdong 5 күн бұрын
Fact that you gave a concrete examples really helped me go through this! Thank you for the great video
3 ай бұрын
Awesome. Thanks
@liberate7604
@liberate7604 5 ай бұрын
Great video , Is it better to use KTO as optimizer for a binary classification?
@EntryPointAI
@EntryPointAI 4 ай бұрын
I couldn't say for sure. Binary classification is a fairly simple task, so I would start with supervised fine-tuning.
@VerdonTrigance
@VerdonTrigance 5 ай бұрын
Hey! Thanks for video! I never used these techniques, but what I really wants to do is to train a base or chat LLM model like llama or phi-3 on some big text (Lord of the Ring for example). But all techniques I've seen so far requires a proper dataset to be prepared, but who and how can do that? Ask all of possible questions and answer them as well? It's impossible! Don't you know how can I prepare a dataset to later train a model on?
@EntryPointAI
@EntryPointAI 5 ай бұрын
Besides including the big text in a model's pretraining, you can fine-tune on it using empty prompts, which will make the model more likely to respond in a style similar to the writing. That doesn't necessarily make it an expert on the contents. In order to answer questions about a corpus, the typical approach is to chunk it up and use RAG. I have another video on the difference between RAG and fine-tuning.
@iasplay224
@iasplay224 5 ай бұрын
Thank you for the info, it was very good explained for an introduction
@MarshallRoy-h9e
@MarshallRoy-h9e 2 ай бұрын
Melisa Branch
@priscillaleapman2367
@priscillaleapman2367 2 ай бұрын
Martin Shirley Jackson Kenneth Allen Mary
Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use
15:21
СКОЛЬКО ПАЛЬЦЕВ ТУТ?
00:16
Masomka
Рет қаралды 3,4 МЛН
Players push long pins through a cardboard box attempting to pop the balloon!
00:31
Proximal Policy Optimization Explained
17:50
Edan Meyer
Рет қаралды 50 М.
15min History of Reinforcement Learning and Human Feedback
17:24
Nathan Lambert
Рет қаралды 2,7 М.
DPO Debate: Is RL needed for RLHF?
26:55
Nathan Lambert
Рет қаралды 8 М.
Reinforcement Learning from Human Feedback (RLHF) Explained
11:29
IBM Technology
Рет қаралды 13 М.
LoRA explained (and a bit about precision and quantization)
17:07
LoRA & QLoRA Fine-tuning Explained In-Depth
14:39
Entry Point AI
Рет қаралды 49 М.
Direct Preference Optimization
14:15
Data Science Gems
Рет қаралды 460
RLHF+CHATGPT: What you must know
10:48
Machine Learning Street Talk
Рет қаралды 69 М.