I just binged this playlist at 1 am. Absolutely worth it. You deserve more views.
@jesus_love_for_all2 ай бұрын
PLEASE COMEBACK!! You are an amazing theacher!
@n45a_5 сағат бұрын
ok everything makes sense now, thx
@tuulymusic38567 ай бұрын
Please come back, your videos are great!
@Coder.tahsin4 ай бұрын
All of your videos are amazing, please upload more
@ireoluwaTH Жыл бұрын
Welcome back! Hope to see more of these videos..
@HoverAround5 ай бұрын
Joel, excellent explanation and talk! Thank you!
@胡里安-n6m6 ай бұрын
help me a lot, can't wait to see more
@pegasusbupt Жыл бұрын
Amazing content! Please keep them coming!
@jasonpmorrison Жыл бұрын
Super helpful - thank you for this series!
@0xeb- Жыл бұрын
Good teaching.
@RaulMartinezRME Жыл бұрын
Great content!!
@vamsinadh100 Жыл бұрын
You are the Best
@neo42420024 ай бұрын
Who is this guy? He made all the complexity so simple with his words. Anyone know this gentleman name?
@0xeb- Жыл бұрын
How long it takes to train a reward network? And how reliable would it be?
@onhazrat Жыл бұрын
🎯 Key Takeaways for quick navigation: 00:00 🤖 Reinforcement learning improves large language models like ChatGPT. 00:25 🃏 Large language models face issues like bias, errors, and quality. 01:11 📊 Training data quality impacts results; removing bad jokes might help. 01:55 🧩 Training on both good and bad jokes improves language models. 02:38 🔄 Language models are policies, reinforcement learning uses policy gradient. 03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition. 03:35 🤔 RLHF theory: Language model might already know jokes' boundary. 04:18 🏆 Training a reward network predicts human ratings for model's output. 04:47 🔄 Reward network is a modified language model for predicting ratings. 05:14 📝 Approach: Humans write text, train reward network, refine model with RL. 05:57 ⚖️ Systems convert comparisons to ratings for reward network training. 06:11 😄 RLHF successfully improves language models, including humor. Made with HARPA AI