Reinforcement Learning: ChatGPT and RLHF

  Рет қаралды 8,054

Graphics in 5 Minutes

Graphics in 5 Minutes

Күн бұрын

Reinforcement Learning from human feedback, and how it's used to help train large language models like ChatGPT.
Part 3 of RL from scratch series.
• Reinforcement Learning...
0:00 - intro
0:06 - large language models
0:35 - learning to tell jokes
1:13 - fine tuning with better data
1:26 - positive and negative examples
2:03 - reinforcement learning for LLMs
3:00 - labeling fewer examples
3:56 - reward networks
5:08 - summing it up
5:23 - variants
5:57 - chatGPT, Bard, Claude, Llama
6:09 - finally, a good joke!

Пікірлер: 12
@HoverAround
@HoverAround 18 күн бұрын
Joel, excellent explanation and talk! Thank you!
@tuulymusic3856
@tuulymusic3856 2 ай бұрын
Please come back, your videos are great!
@user-cm5es5kk7j
@user-cm5es5kk7j Ай бұрын
help me a lot, can't wait to see more
@ireoluwaTH
@ireoluwaTH 10 ай бұрын
Welcome back! Hope to see more of these videos..
@jasonpmorrison
@jasonpmorrison 8 ай бұрын
Super helpful - thank you for this series!
@pegasusbupt
@pegasusbupt 8 ай бұрын
Amazing content! Please keep them coming!
@RaulMartinezRME
@RaulMartinezRME 10 ай бұрын
Great content!!
@0xeb-
@0xeb- 10 ай бұрын
Good teaching.
@vamsinadh100
@vamsinadh100 7 ай бұрын
You are the Best
@onhazrat
@onhazrat 9 ай бұрын
🎯 Key Takeaways for quick navigation: 00:00 🤖 Reinforcement learning improves large language models like ChatGPT. 00:25 🃏 Large language models face issues like bias, errors, and quality. 01:11 📊 Training data quality impacts results; removing bad jokes might help. 01:55 🧩 Training on both good and bad jokes improves language models. 02:38 🔄 Language models are policies, reinforcement learning uses policy gradient. 03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition. 03:35 🤔 RLHF theory: Language model might already know jokes' boundary. 04:18 🏆 Training a reward network predicts human ratings for model's output. 04:47 🔄 Reward network is a modified language model for predicting ratings. 05:14 📝 Approach: Humans write text, train reward network, refine model with RL. 05:57 ⚖️ Systems convert comparisons to ratings for reward network training. 06:11 😄 RLHF successfully improves language models, including humor. Made with HARPA AI
@0xeb-
@0xeb- 10 ай бұрын
How long it takes to train a reward network? And how reliable would it be?
@stayhappy-forever
@stayhappy-forever Ай бұрын
come back :(
Large Language Models from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 335 М.
ELE QUEBROU A TAÇA DE FUTEBOL
00:45
Matheus Kriwat
Рет қаралды 35 МЛН
Deep Reinforcement Learning Tutorial for Python in 20 Minutes
20:56
Nicholas Renotte
Рет қаралды 202 М.
🐐Llama 3 Fine-Tune with RLHF [Free Colab 👇🏽]
14:30
Whispering AI
Рет қаралды 13 М.
RLHF+CHATGPT: What you must know
10:48
Machine Learning Street Talk
Рет қаралды 66 М.
Fine-tuning Large Language Models (LLMs) | w/ Example Code
28:18
Shaw Talebi
Рет қаралды 246 М.
How ChatGPT is Trained
13:43
Ari Seff
Рет қаралды 516 М.
Reinforcement Learning:  AlphaGo
8:14
Graphics in 5 Minutes
Рет қаралды 10 М.
Reinforcement Learning from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 37 М.