George Hotz | Programming | RL is dumb and doesn't work | Reinforcement Learning LunarLander Part 2

Рет қаралды 32,669

11 ай бұрын

Date of the stream 7 Jan 2024.
from $1250 buy comma.ai/shop/comma-3x & best ADAS system in the world openpilot.comma.ai
Live-stream chat added as Subtitles/CC - English (Twitch Chat) - at the bottom - Show Transcript
Sources:
- github.com/geohot/dumbrl
- stable-baselines3.readthedocs.io/en/master/
- kzbin.info/www/bejne/fIeqgn2DZ7KJhLs (Deadliest Journeys - Congo: The Last Train in Katanga)
- andyljones.com/posts/rl-debugging.html
- spinningup.openai.com/en/latest/
- arxiv.org/pdf/1912.02875.pdf (Reinforcement Learning Upside Down)
tinygrad bounties:
- docs.google.com/spreadsheets/d/1WKHbT-7KOgjEawq5h5Ic1qUWzpfAzuD_J06N1JwOCGs/
Follow for notifications:
- twitch.tv/georgehotz
Support George:
- twitch.tv/subs/georgehotz
Pre-order tinybox:
- buy.stripe.com/5kAaGL6lk9uX9nW144 (tinygrad.org/)
Chapters:
00:00:00 intro
00:01:40 stream disclaimer, twitch ban
00:02:45 only 50% of subscription money
00:03:05 kick.com streaming
00:04:45 kick reach out to George, twitch issues
00:06:20 drugs banner, legal in california
00:08:05 50% money to twitch too much, twitch remove the banner
00:09:20 hyubsama food stream, twitch banned users
00:12:00 streaming on X, negotiating power
00:14:40 stream statistics, streaming schedule
00:16:10 applying for twitch partner
00:17:30 twitch revenue
00:18:30 perplexity best way to get banned on twitch
00:22:50 andrew tate impression
00:23:50 stable baselines 3
00:29:40 np.random.randint
00:32:44 NoneType object does not support item assignment
00:33:00 perplexity
00:35:40 render mode defined human
00:37:20 good play, size=10
00:40:30 stable baselines 3 just works
00:50:00 passed a tuple, array element with a sequence
00:51:15 learning
00:52:50 decision transformer stable baselines 3
00:55:20 github.com/geohot/dumbrl
00:56:30 cartpole, stable baselines decision transformer
00:59:30 Jax, wrapper for vectorized environments
01:00:30 deadliest journeys congo, ancestor pothole
01:01:11 building infrastructure, fixing the road
01:01:20 bugs, carefully building infrastructure, CI testing
01:04:00 README
01:05:40 deleting a lot of tinygrad, focusing on what needs to work well
01:09:55 decision transformer repo
01:13:10 beautiful_cartpole.py
01:20:07 andy jones debugging rl
01:24:00 if you are following along
01:29:00 the problem are bugs
01:32:00 asking perplexity, openai spining up and deep rl
01:36:25 log_softmax
01:39:00 broadcasting bug, 2, 3, 5
01:47:20 no detach(), ppo, exp
02:02:00 why is my ppo not working
02:07:40 fast cartpole
02:11:50 banned user
02:15:50 asking it to learn
02:17:15 hyper parameter land
02:25:45 lucky
02:27:30 !!!LOUD WARNING!!! why it's not solving
02:33:20 3 layer network
02:41:30 value function
02:42:50 writing pytorch
02:48:10 if it works in pytorch shutting down tiny corp
02:52:50 pytorch numeric stability
02:55:10 frustrating, having faith in tiny grad
02:56:00 very easy to make progress in tiny grad
02:57:18 tiny grad more numerically stable
03:04:00 the most dead simple thing
03:13:30 size 2, 3 solving
03:14:35 going even simpler
03:19:00 batch size = 4
03:22:40 reward broken
03:27:10 it becomes like an identity matrix over time
03:28:40 this is fire, the gradient, single weight matrix
03:33:00 so beautiful, love watching deep learning happen
03:44:10 learning rate too high
03:51:00 that one does not learn
03:52:40 dying relu, 0xnan getting VIP
04:00:40 advantage
04:06:40 Alex on the phone
04:08:45 no clips, taking out of context
04:11:00 value funtion all noise
04:18:10 graph go up
04:24:00 messing with hyperparameters randomly
04:26:10 slow graph drawing
04:28:20 sampling bias
04:29:10 lower discount factor, larger replay buffer
04:32:45 no major bugs, ppo major bug
04:36:20 entropy loss
04:38:40 counter intuitive in deep learning, bigger learn better
04:40:40 overheads
04:41:40 one good landing
04:42:55 50, 51
04:44:10 Alex home
04:46:30 send this video to a doomer
04:48:00 good enough landing
04:50:40 expectations too high
04:51:35 twitch won't contact George
04:53:30 hope, upside down rl, juergen schmidhuber
04:54:10 good reliable solution to everything
04:54:40 Alex, no checkpoints
04:55:10 last landing, end of the episode
04:55:30 thank you for watching
Official George Hotz communication channels:
- geohot.com
- realGeorgeHotz
- georgehotz
- tinygrad.org
- geohot.github.io/blog
- github.com/geohot
We archive George Hotz and comma.ai videos for fun.
Follow for notifications:
- geohotarchive
Thank you for reading and using the SHOW MORE button.
We hope you enjoy watching George's videos as much as we do.
See you at the next video.

Пікірлер: 36

@geohotarchive 11 ай бұрын

github.com/geohot/dumbrl | stable-baselines3.readthedocs.io/en/master/ | kzbin.info/www/bejne/fIeqgn2DZ7KJhLs (Deadliest Journeys - Congo: The Last Train in Katanga) 01:01:11 carefully building infrastructure, CI testing | andyljones.com/posts/rl-debugging.html | spinningup.openai.com/en/latest/ | arxiv.org/pdf/1912.02875.pdf (Reinforcement Learning Upside Down) Bounties for tiny corp / tinygrad -> docs.google.com/spreadsheets/d/1WKHbT-7KOgjEawq5h5Ic1qUWzpfAzuD_J06N1JwOCGs/ kzbin.info/www/bejne/op-5gqaAf6uWmsk Hiring entire stack for tiny corp join if you are interested | kzbin.info/www/bejne/op-5gqaAf6uWmsk work major source of value in your life Pre-order tinybox buy.stripe.com/5kAaGL6lk9uX9nW144 more info on -> tinygrad.org | github.com/tinygrad/tinygrad comma 3X comma.ai/shop/comma-3x | best ADAS system in the world openpilot.comma.ai | from $999 comma.ai/shop/body the future of people Support George by subscribing twitch.tv/subs/georgehotz | Follow George on twitter.com/realGeorgeHotz to be up to date | Read George's geohot.github.io/blog/ Chapters: 00:00:00 intro 00:01:40 stream disclaimer, twitch ban 00:02:45 only 50% of subscription money 00:03:05 kick.com streaming 00:04:45 kick reach out to George, twitch issues 00:06:20 drugs banner, legal in california 00:08:05 50% money to twitch too much, twitch remove the banner 00:09:20 hyubsama food stream, twitch banned users 00:12:00 streaming on X, negotiating power 00:14:40 stream statistics, streaming schedule 00:16:10 applying for twitch partner 00:17:30 twitch revenue 00:18:30 perplexity best way to get banned on twitch 00:22:50 andrew tate impression 00:23:50 stable baselines 3 00:29:40 np.random.randint 00:32:44 NoneType object does not support item assignment 00:33:00 perplexity 00:35:40 render mode defined human 00:37:20 good play, size=10 00:40:30 stable baselines 3 just works 00:50:00 passed a tuple, array element with a sequence 00:51:15 learning 00:52:50 decision transformer stable baselines 3 00:55:20 github.com/geohot/dumbrl 00:56:30 cartpole, stable baselines decision transformer 00:59:30 Jax, wrapper for vectorized environments 01:00:30 deadliest journeys congo, ancestor pothole 01:01:11 building infrastructure, fixing the road 01:01:20 bugs, carefully building infrastructure, CI testing 01:04:00 README 01:05:40 deleting a lot of tinygrad, focusing on what needs to work well 01:09:55 decision transformer repo 01:13:10 beautiful_cartpole.py 01:20:07 andy jones debugging rl 01:24:00 if you are following along 01:29:00 the problem are bugs 01:32:00 asking perplexity, openai spining up and deep rl 01:36:25 log_softmax 01:39:00 broadcasting bug, 2, 3, 5 01:47:20 no detach(), ppo, exp 02:02:00 why is my ppo not working 02:07:40 fast cartpole 02:11:50 banned user 02:15:50 asking it to learn 02:17:15 hyper parameter land 02:25:45 lucky 02:27:30 !!!LOUD WARNING!!! why it's not solving 02:33:20 3 layer network 02:41:30 value function 02:42:50 writing pytorch 02:48:10 if it works in pytorch shutting down tiny corp 02:52:50 pytorch numeric stability 02:55:10 frustrating, having faith in tiny grad 02:56:00 very easy to make progress in tiny grad 02:57:18 tiny grad more numerically stable 03:04:00 the most dead simple thing 03:13:30 size 2, 3 solving 03:14:35 going even simpler 03:19:00 batch size = 4 03:22:40 reward broken 03:27:10 it becomes like an identity matrix over time 03:28:40 this is fire, the gradient, single weight matrix 03:33:00 so beautiful, love watching deep learning happen 03:44:10 learning rate too high 03:51:00 that one does not learn 03:52:40 dying relu, 0xnan getting VIP 04:00:40 advantage 04:06:40 Alex on the phone 04:08:45 no clips, taking out of context 04:11:00 value funtion all noise 04:18:10 graph go up 04:24:00 messing with hyperparameters randomly 04:26:10 slow graph drawing 04:28:20 sampling bias 04:29:10 lower discount factor, larger replay buffer 04:32:45 no major bugs, ppo major bug 04:36:20 entropy loss 04:38:40 counter intuitive in deep learning, bigger learn better 04:40:40 overheads 04:41:40 one good landing 04:42:55 50, 51 04:44:10 Alex home 04:46:30 send this video to a doomer 04:48:00 good enough landing 04:50:40 expectations too high 04:51:35 twitch won't contact George 04:53:30 hope, upside down rl, juergen schmidhuber 04:54:10 good reliable solution to everything 04:54:40 Alex, no checkpoints 04:55:10 last landing, end of the episode 04:55:30 thank you for watching

@martindbp 11 ай бұрын

If imitation learning works, then the problem is really to find (using search) sequences of actions in simulation which lead to the outcomes you want. Whether you call that RL or not doesn't matter.

@holthuizenoemoet591 11 ай бұрын

I don't know if you tried this, but if not, try a negative reward function that penalizes the agent when taking to much time (so subtract some points each time step)

@dan-cj1rr 11 ай бұрын

r we agi yet

@EverydayTwitch 11 ай бұрын

thanks for the upload, just thought you should know it looks like the Twitch chat messages in CC are duplicated

@geohotarchive 11 ай бұрын

@EverydayTwitch looks like it's just the issue on the start or when the chat is really busy. Tired to upload and it's the same. If you use Show Transcript you see that it's OK. Looks like a bug with displaying the CC. Not that important really.

@iphgfqweio 11 ай бұрын

EverydayTroubleshoot

@cls880 11 ай бұрын

saw a new chat box show up on X today during a livestream

@theguywholosthiswaytothegl3125 11 ай бұрын

sir i need your help in understanding the meaning of query key value in transformer architecture. everywhere i read about them but there is only superficial explanation. i just dont get all the linear transformation required for those qkv. i somehow got the importance of q an k but why do we need values. i am so frustrated

@dariushuang1115 11 ай бұрын

playing this while coding gives me the biggest productivity boost.

@Jackson_Zheng 11 ай бұрын

He remembered to turn on the mic!

@cem_kaya 11 ай бұрын

it is dumb and it works and it is a bit slow. And no good development environment to train them yet. Currently training an agent using ml agents in unity and custom PPO implementation.

@DavidCosta85 11 ай бұрын

hello George! can the human brain evolve forever? do we have enough meta memory?

@More_Row 11 ай бұрын

No we do not. When you hit your absolute limitation you will know instinctively that you can't go further. If you are good at analyzing yourself that is.

@billyf3346 11 ай бұрын

maybe if you gave it a few examples of landing properly, or included some convolutions that made velocity and acceleration more explicit, or let it train on easier games first so it could learn about what the pixels mean before diving into such a hard game. if the reward signal is sparse among all the possible random inputs, of course its not going to work without too many trials, first.