Reinforcement Learning: ChatGPT and RLHF

Рет қаралды 11,113

Graphics in 5 Minutes

Күн бұрын

Пікірлер: 17

@EternityUnknown 4 ай бұрын

I just binged this playlist at 1 am. Absolutely worth it. You deserve more views.

@jesus_love_for_all 2 ай бұрын

PLEASE COMEBACK!! You are an amazing theacher!

@n45a_ 5 сағат бұрын

ok everything makes sense now, thx

@tuulymusic3856 7 ай бұрын

Please come back, your videos are great!

@Coder.tahsin 4 ай бұрын

All of your videos are amazing, please upload more

@ireoluwaTH Жыл бұрын

Welcome back! Hope to see more of these videos..

@HoverAround 5 ай бұрын

Joel, excellent explanation and talk! Thank you!

@胡里安-n6m 6 ай бұрын

help me a lot, can't wait to see more

@pegasusbupt Жыл бұрын

Amazing content! Please keep them coming!

@jasonpmorrison Жыл бұрын

Super helpful - thank you for this series!

@0xeb- Жыл бұрын

Good teaching.

@RaulMartinezRME Жыл бұрын

Great content!!

@vamsinadh100 Жыл бұрын

You are the Best

@neo4242002 4 ай бұрын

Who is this guy? He made all the complexity so simple with his words. Anyone know this gentleman name?

@0xeb- Жыл бұрын

How long it takes to train a reward network? And how reliable would it be?

@onhazrat Жыл бұрын

🎯 Key Takeaways for quick navigation: 00:00 🤖 Reinforcement learning improves large language models like ChatGPT. 00:25 🃏 Large language models face issues like bias, errors, and quality. 01:11 📊 Training data quality impacts results; removing bad jokes might help. 01:55 🧩 Training on both good and bad jokes improves language models. 02:38 🔄 Language models are policies, reinforcement learning uses policy gradient. 03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition. 03:35 🤔 RLHF theory: Language model might already know jokes' boundary. 04:18 🏆 Training a reward network predicts human ratings for model's output. 04:47 🔄 Reward network is a modified language model for predicting ratings. 05:14 📝 Approach: Humans write text, train reward network, refine model with RL. 05:57 ⚖️ Systems convert comparisons to ratings for reward network training. 06:11 😄 RLHF successfully improves language models, including humor. Made with HARPA AI