Reinforcement Learning: ChatGPT and RLHF

  Рет қаралды 11,113

Graphics in 5 Minutes

Graphics in 5 Minutes

Күн бұрын

Пікірлер: 17
@EternityUnknown
@EternityUnknown 4 ай бұрын
I just binged this playlist at 1 am. Absolutely worth it. You deserve more views.
@jesus_love_for_all
@jesus_love_for_all 2 ай бұрын
PLEASE COMEBACK!! You are an amazing theacher!
@n45a_
@n45a_ 5 сағат бұрын
ok everything makes sense now, thx
@tuulymusic3856
@tuulymusic3856 7 ай бұрын
Please come back, your videos are great!
@Coder.tahsin
@Coder.tahsin 4 ай бұрын
All of your videos are amazing, please upload more
@ireoluwaTH
@ireoluwaTH Жыл бұрын
Welcome back! Hope to see more of these videos..
@HoverAround
@HoverAround 5 ай бұрын
Joel, excellent explanation and talk! Thank you!
@胡里安-n6m
@胡里安-n6m 6 ай бұрын
help me a lot, can't wait to see more
@pegasusbupt
@pegasusbupt Жыл бұрын
Amazing content! Please keep them coming!
@jasonpmorrison
@jasonpmorrison Жыл бұрын
Super helpful - thank you for this series!
@0xeb-
@0xeb- Жыл бұрын
Good teaching.
@RaulMartinezRME
@RaulMartinezRME Жыл бұрын
Great content!!
@vamsinadh100
@vamsinadh100 Жыл бұрын
You are the Best
@neo4242002
@neo4242002 4 ай бұрын
Who is this guy? He made all the complexity so simple with his words. Anyone know this gentleman name?
@0xeb-
@0xeb- Жыл бұрын
How long it takes to train a reward network? And how reliable would it be?
@onhazrat
@onhazrat Жыл бұрын
🎯 Key Takeaways for quick navigation: 00:00 🤖 Reinforcement learning improves large language models like ChatGPT. 00:25 🃏 Large language models face issues like bias, errors, and quality. 01:11 📊 Training data quality impacts results; removing bad jokes might help. 01:55 🧩 Training on both good and bad jokes improves language models. 02:38 🔄 Language models are policies, reinforcement learning uses policy gradient. 03:08 🎯 Reinforcement Learning from Human Feedback (RLHF) challenges data acquisition. 03:35 🤔 RLHF theory: Language model might already know jokes' boundary. 04:18 🏆 Training a reward network predicts human ratings for model's output. 04:47 🔄 Reward network is a modified language model for predicting ratings. 05:14 📝 Approach: Humans write text, train reward network, refine model with RL. 05:57 ⚖️ Systems convert comparisons to ratings for reward network training. 06:11 😄 RLHF successfully improves language models, including humor. Made with HARPA AI
@stayhappy-forever
@stayhappy-forever 6 ай бұрын
come back :(
A Hackers' Guide to Language Models
1:31:13
Jeremy Howard
Рет қаралды 530 М.
How to Code RLHF on LLama2 w/ LoRA, 4-bit, TRL, DPO
36:14
Discover AI
Рет қаралды 14 М.
小蚂蚁会选到什么呢!#火影忍者 #佐助 #家庭
00:47
火影忍者一家
Рет қаралды 127 МЛН
А что бы ты сделал? @LimbLossBoss
00:17
История одного вокалиста
Рет қаралды 12 МЛН
ЛУЧШИЙ ФОКУС + секрет! #shorts
00:12
Роман Magic
Рет қаралды 15 МЛН
Who’s the Real Dad Doll Squid? Can You Guess in 60 Seconds? | Roblox 3D
00:34
Large Language Models from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 349 М.
Reinforcement Learning from Human Feedback (RLHF) Explained
11:29
IBM Technology
Рет қаралды 11 М.
ChatGPT and Reinforcement Learning
15:53
CodeEmporium
Рет қаралды 10 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 1,2 МЛН
Reinforcement Learning:  AlphaGo
8:14
Graphics in 5 Minutes
Рет қаралды 16 М.
Reinforcement Learning from Human Feedback Explained (and RLAIF)
9:08
What's AI by Louis-François Bouchard
Рет қаралды 2,7 М.
Claude 3.5 Sonnet vs GPT-4o: Side-by-Side Tests
25:10
Patrick Storm
Рет қаралды 128 М.
Reinforcement Learning from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 68 М.
小蚂蚁会选到什么呢!#火影忍者 #佐助 #家庭
00:47
火影忍者一家
Рет қаралды 127 МЛН