George Hotz | Programming | Decision Transformer Reinforcement Learning (RL) | LunarLander

George Hotz | Programming | Decision Transformer Reinforcement Learning (RL) | LunarLander | Part 1

Рет қаралды 120,942

11 ай бұрын

Date of the stream 6 Jan 2024.
from $1250 buy comma.ai/shop/comma-3x & best ADAS system in the world openpilot.comma.ai
Original stream title:
- tinygrad: rewriting the scheduler
Sources:
- arxiv.org/pdf/2106.01345.pdf
- huggingface.co/blog/decision-transformers
-medium.com/@jscriptcoder/demystifying-upside-down-reinforcement-learning-a-k-a-ꓤ-b7bd4214b33f
- kzbin.info/www/bejne/rpSTm3qQjquEgrM
tinygrad bounties:
- docs.google.com/spreadsheets/d/1WKHbT-7KOgjEawq5h5Ic1qUWzpfAzuD_J06N1JwOCGs/
Follow for notifications:
- twitch.tv/georgehotz
Support George:
- twitch.tv/subs/georgehotz
Pre-order tinybox:
- buy.stripe.com/5kAaGL6lk9uX9nW144 (tinygrad.org/)
Chapters:
00:00:00 lunarlander_transformer.py
00:04:25 twitch substance warning
00:06:00 perplexity decision transformer
00:12:00 assert not x.requires_grad
00:15:00 192 % start_pos
00:21:45 food
00:24:25 fixes needed in tinygrad
00:41:00 gpt2 works
00:46:40 contraction not explained
00:55:00 rant
01:00:25 Ron Paul
01:04:40 usa population pyramid
01:05:30 jit
01:08:55 africa documentaries
01:13:15 cross
01:19:00 not supported 768 %
01:23:20 do things team
01:24:50 tinygrad intern phone call
01:28:50 postmodernism
01:36:40 assert t.grad is not None
01:38:30 advice, schedule
01:43:20 decision transformer paper
01:53:00 not balancing
02:05:00 K=20
02:10:00 plt.show()
02:15:30 clip 50
02:19:00 lunarlander fails
02:20:00 uber eats scam
02:27:00 decision transformers on Hugging Fac
02:31:30 logits
02:45:00 temperature
02:54:00 should never output 2
03:11:40 so many bugs
03:12:40 good idea from chat
03:15:00 lunarlander is not landing
03:16:30 128 clip
03:17:00 highest_reward bug
03:18:50 lunar lander rewards
03:24:30 let's make it work
03:29:00 unknown change
03:31:40 piano
03:34:20 reinforcement learning is impossible
03:37:25 write gym environment
03:50:00 stupid decision transformer
03:57:20 98%
03:58:50 that is what we get for smoking weed
04:02:10 press the light up button
04:05:00 learned to play the game
04:12:55 the optimal strategy
04:13:45 press_the_light_up_button.py
04:17:40 desired reward
04:19:40 so broken
04:27:20 some bug with
04:29:00 action and reward embedding
04:32:30 broadcast issue
04:37:50 another layer
04:44:40 50/50 probability
04:51:40 feeling so scammed
05:00:20 close to AGI
05:07:10 test model code
05:18:00 learning excruciating slowly
05:24:20 scientific notation suppress
05:25:40 making some progress
05:28:55 it's learning press the light up button
05:33:00 JIT disabled
05:34:15 equity and inclusion
05:36:10 loss going down
05:39:00 we did reinforcement learning
05:39:50 Alex, voting
05:43:40 it's learning
05:48:30 render_mode default
05:53:20 demystifying Upside-Down reinforcement learning
05:55:55 CartPole
05:58:30 lunarlander
06:02:15 pressthelightupbutton
06:04:00 lunarlander
06:09:00 spacex simulations
06:09:30 3e-4
06:17:00 size, game_length
06:28:50 life advice
06:32:05 predicting action
06:36:40 life advice answers
06:38:00 ambition greater than your intelligence
06:39:10 learn how to learn, no gradient
06:41:30 most people should just give up
06:42:00 putting time into programming
06:46:40 bug in pressthelightupbutton
06:53:25 it's dumb
07:00:50 game_lenght=32
07:05:40 scale
07:14:25 Alex bringing food
07:26:00 same data over and over
07:28:09 reading the paper
07:38:00 entropy_loss
07:40:50 reading twitch chat
07:43:50 RL stream makes us angry
07:46:30 stream overview
07:47:00 no push to github
07:47:50 ground changes shape
Official George Hotz communication channels:
- geohot.com
- realGeorgeHotz
- georgehotz
- tinygrad.org
- geohot.github.io/blog
- github.com/geohot
We archive George Hotz and comma.ai videos for fun.
Follow for notifications:
- geohotarchive
Thank you for reading and using the SHOW MORE button.
We hope you enjoy watching George's videos as much as we do.
See you at the next video.

Пікірлер: 94

@geohotarchive 11 ай бұрын

kzbin.info/www/bejne/rpSTm3qQjquEgrM (MuZero stream) | arxiv.org/pdf/2106.01345.pdf Decision Transformer: Reinforcement Learning via Sequence Modeling paper Bounties for tiny corp / tinygrad -> docs.google.com/spreadsheets/d/1WKHbT-7KOgjEawq5h5Ic1qUWzpfAzuD_J06N1JwOCGs/ kzbin.info/www/bejne/op-5gqaAf6uWmsk Hiring entire stack for tiny corp join if you are interested | kzbin.info/www/bejne/op-5gqaAf6uWmsk work major source of value in your life Pre-order tinybox buy.stripe.com/5kAaGL6lk9uX9nW144 more info on -> tinygrad.org | github.com/tinygrad/tinygrad comma 3X comma.ai/shop/comma-3x | best ADAS system in the world openpilot.comma.ai | from $999 comma.ai/shop/body the future of people Support George by subscribing twitch.tv/subs/georgehotz | Follow George on twitter.com/realGeorgeHotz to be up to date | Read George's geohot.github.io/blog/ Chapters: 00:00:00 lunarlander_transformer.py 00:04:25 twitch substance warning 00:06:00 perplexity decision transformer 00:12:00 assert not x.requires_grad 00:15:00 192 % start_pos 00:21:45 food 00:24:25 fixes needed in tinygrad 00:41:00 gpt2 works 00:46:40 contraction not explained 00:55:00 rant 01:00:25 Ron Paul 01:04:40 usa population pyramid 01:05:30 jit 01:08:55 africa documentaries 01:13:15 cross 01:19:00 not supported 768 % 01:23:20 do things team 01:24:50 tinygrad intern phone call 01:28:50 postmodernism 01:36:40 assert t.grad is not None 01:38:30 advice 01:40:10 schedule 01:43:20 decision transformer paper 01:53:00 not balancing 02:05:00 K=20 02:10:00 plt.show() 02:15:30 clip 50 02:19:00 lunarlander fails 02:20:00 uber eats scam 02:27:00 decision transformers on Hugging Fac 02:31:30 logits 02:45:00 temperature 02:54:00 should never output 2 03:11:40 so many bugs 03:12:40 good idea from chat 03:15:00 lunarlander is not landing 03:16:30 128 clip 03:17:00 highest_reward bug 03:18:50 lunar lander rewards 03:24:30 let's make it work 03:29:00 unknown change 03:31:40 piano 03:34:20 reinforcement learning is impossible 03:37:25 write gym environment 03:50:00 stupid decision transformer 03:57:20 98% 03:58:50 that is what we get for smoking weed 04:02:10 press the light up button 04:05:00 learned to play the game 04:12:55 the optimal strategy 04:13:45 press_the_light_up_button.py 04:17:40 desired reward 04:19:40 so broken 04:27:20 some bug with 04:29:00 action and reward embedding 04:32:30 broadcast issue 04:37:50 another layer 04:44:40 50/50 probability 04:51:40 feeling so scammed 05:00:20 close to AGI 05:07:10 test model code 05:18:00 learning excruciating slowly 05:24:20 scientific notation suppress 05:25:40 making some progress 05:28:55 it's learning press the light up button 05:33:00 JIT disabled 05:34:15 equity and inclusion 05:36:10 loss going down 05:39:00 we did reinforcement learning 05:39:50 Alex, voting 05:43:40 it's learning 05:48:30 render_mode default 05:53:20 demystifying Upside-Down reinforcement learning 05:55:55 CartPole 05:58:30 lunarlander 06:02:15 pressthelightupbutton 06:04:00 lunarlander 06:09:00 spacex simulations 06:09:30 3e-4 06:17:00 size, game_length 06:28:50 life advice 06:32:05 predicting action 06:36:40 life advice answers 06:38:00 ambition greater than your intelligence 06:39:10 learn how to learn, no gradient 06:41:30 most people should just give up 06:42:00 putting time into programming 06:46:40 bug in pressthelightupbutton 06:53:25 it's dumb 07:00:50 game_lenght=32 07:05:40 scale 07:14:25 Alex bringing food 07:26:00 same data over and over 07:28:09 reading the paper 07:38:00 entropy_loss 07:40:50 reading twitch chat 07:43:50 RL stream makes us angry 07:46:30 stream overview 07:47:00 no push to github 07:47:50 ground changes shape

@MrTomacos 11 ай бұрын

so did he drink too much or smoked weed too much?

@davidalexreis97 11 ай бұрын

Listen, from the moment someone uploads something to the Internet, they can't reasonably expect it to ever be deleted completely. However, that doesn't mean that one should actively try to bring back to life something that someone else decided to remove. You were so worried thinking about whether you could, you didn't think about if you should. It's an easy mistake, but it is a mistake.

@GeorgeHotz940 11 ай бұрын

@@geohotarchive I did not remove it, Twitch must have! I saw the tweet and talked about it in the theory stream. I sent an e-mail to Twitch complaining about the content warning.

@geohotarchive 11 ай бұрын

@GeorgeHotz940 understood and won't make the same mistake again.

@pranavpipariya8556 11 ай бұрын

The fact that you added the paper in description is so good. Also, people who helped recover the stream, much thanks.

@flintn5899 10 ай бұрын

Absoluty agree!

@olinafan4459 11 ай бұрын

8 hours of pure coding love it

@kinvert 11 ай бұрын

So glad you recovered this. Thank you for all your hard work. Your timestamps are a Godsend.

@shoubhikdasguptadg9911 11 ай бұрын

Maester, here you are, again, and here I am, again, ready to be mesmerised by thy neural prowess!

@satvik1619 11 ай бұрын

Finally he uploaded it.. thank you soo much

@MrEmbrance 11 ай бұрын

54:53 is when the mania kicks in

@meehai_ 11 ай бұрын

regarding the 'struggle' with shapes (01:55), I recommend you use breakpoint() statement (>=python 3.7) which gets you into a pdb and you can investigate the shapes or run arbitrary code with the stack/context exactly at that point. I use breakpoint statements in the code all the time and it saves me like a lot of time for this tings compared to just printing the shapes and running n times + doing 1 line changes every time until it works.

@reen6904 11 ай бұрын

Doing the math homework to this hits hard as fuck

@agenticmark 11 ай бұрын

Rooting for you and Javier Milei from Mexico! We need strong commerce and security in the area.

@justinfleagle 11 ай бұрын

Does anyone have a magnifying glass I could borrow? Thanks.

@sujantkumarkv5498 11 ай бұрын

LOL looks good on a big monitor... try that.

@AntonioMac3301 8 ай бұрын

6:04:37 "this is one step away from AGI don't you feel it?" 💀💀💀 shit is too funny

@onecrowdehour 11 ай бұрын

i know nothing about coding but i'm always draw to your channel..

@kevinwebber1746 11 ай бұрын

the man that never googles anything

@toshio-tamura 10 ай бұрын

the actual writer of the code everybody copy paste

@bucharestlostboi 9 ай бұрын

this man is a machine. love it.

@Nate77HK 11 ай бұрын

This guy is unfathomably based

@-mwolf 11 ай бұрын

The person behind this acc is goated.

@sepptrutsch 11 ай бұрын

One of the most funny guys out there

@domenicocolandrea 11 ай бұрын

yo george! really been loving the content man. just wanted to let you know, the other day i was listening to one of your streams while in my car driving and seems your mic picks up a significant amount of background noise. Ive never had an issue at home on my desk. You got up and walked away and when you were walking back to your room it sounded like the t-rex scene in jurassic park haha.. If your looking for any feedback about the pod i would just say maybe use some sort of noise cancelling software or plugin. Your a legend in jersey! cheers!

@ArtOfTheProblem 11 ай бұрын

Can someone summarize his confusion about implimentation?

@agenticmark 11 ай бұрын

Its because if you concat them you would lose some of the temporal embedding? ie, the model will lose information of order.

@agenticmark 11 ай бұрын

Yeah, printing shapes with a debug flag is my go-to for tensor shaping. It's not fun :D

@怡安賴-f9l Ай бұрын

Hi George is it possible for you to do stream on KZbin? It's interesting and motivating to watch these live coding videos and it's like work with me session. I do need these to keep working on my thesis. Thanks for these video!!

@jr8209 11 ай бұрын

Text pixelated even if I zoom. How am I supposed to feel properly dumb if I can't even see the code I don't understand?

@geohotarchive 11 ай бұрын

@jr8209 make sure you click on the wheel next to CC button and select 1080p60 HD quality. It's fine on our end.

@jr8209 11 ай бұрын

thanks!@@geohotarchive

@NeinIhFlyer 10 ай бұрын

Whole thing was a good watch thanks. Makes me wonder what kind of education and experience there was to do this

@LucidDreamn 10 ай бұрын

He knew very little coding when he first became known. George learned coding by exposing himself very rapidly to different areas and projects. We all have the potential to learn something new.

@vangmountain 9 ай бұрын

Not sure where you got your info from but George had programming experience. He was a gifted child and though he has no formal degrees, I remember him saying he took all the hardest comp sci courses at CMU(Carnegie-Mellon University). The combination of being quite smart and having a very clear interest allows him to learn things quickly. It's definitely not easy and just having the education often times will not cut it. You have to have talent and a deep interest in addition to surrounding oneself with knowledgeable people. George is a pretty exceptional guy. There are folks with Masters and PhD's in CompSci who cannot do what he does, so it's not just all about the level of education.

@Jay-kb7if 11 ай бұрын

charlie was the GOAT arc, how dare you.

@santi_alvarez 11 ай бұрын

Was waiting for this, thanksss

@3312fdwf 11 ай бұрын

Does anyone know how he removed google from the chromium home page, just with shortcuts?

@faceofdead 11 ай бұрын

wtf the video was not muted at the beginning. this took me by surprise :/

@miguelfernandocruzsantiago5380 10 ай бұрын

Hello, what did you study?

@magnuswootton6181 11 ай бұрын

machine learning is a computer that dreams.

@iamr0b0tx 8 ай бұрын

I wonder what the specs of the system he's using is. It seems so fast

@PankajDoharey 11 ай бұрын

Geo check the latest Mixtral 8x7B paper, perhaps implement that in next stream.

@hardiksinha7313 10 ай бұрын

code is not too big.. it seems big but still under 500 is okay to process with my brain

@maciej12345678 10 ай бұрын

Welcom Mitnick 2 :D use Marvin Minsky models :D

@Ella_1994 9 ай бұрын

At least even George Hotz face difficulties and don't get some things!

@agenticmark 11 ай бұрын

I had my twitch account banned for smoking weed on stream. Fuck twitch.

@QuantumLayer 11 ай бұрын

im pretty sure that the loss function says nothing in RL lol

@sixthsense844 11 ай бұрын

Hard to watch anything cause the fonts are all too small

@camyllo_7084 4 ай бұрын

como posso te contatar de maneira eficiente, pois alguns dos conteudos estão em forma de boot, acredito que haja interesse sobre, por mim tudo bem, valew a pena; só busco em alguém para confiar.

@bd_acl 11 ай бұрын

46:45 - 46:59

@shinzomushrambo9069 11 ай бұрын

content arrived. poggies ❤

@joshuasonnen5982 9 ай бұрын

I can't see your code

@gil-evens 10 ай бұрын

6:30:00

@Avengerie 10 ай бұрын

George showed his true colors here

@HaKazzaz 11 ай бұрын

Thank you for posting another video I can only watch on my 97 inch TV. Everyone watching here is not fooling me. You guys don't read the code or the article. Lame.

@shinkansen1907 9 ай бұрын

bros on a mission

@camyllo_7084 4 ай бұрын

tenho alguns conceitos e forma de tecnico, são umas analises; muito do meu conteudo é java root de p2p, vale $ alguma coisa, só estou com premissa de medo de perder os laldos.

@radhakrishnanmanickavasaga124 11 ай бұрын

🎉🎉🎉

@SachinDolta 11 ай бұрын

That's quite a bit

@dan-cj1rr 11 ай бұрын

so r we agi yet

@ASDASD-j8m9d 11 ай бұрын

Same

@blueskyautumn 11 ай бұрын

I will now go on to not explain what I mean

@rakolman 11 ай бұрын

How much Adderall does this guy take?

@assmonkey9202 10 ай бұрын

Idk I need his plug tho

@Amarnath62627 11 ай бұрын

release new album dude

@byron2099 11 ай бұрын

Lmao what music is this man possibly making. I’m genuinely curious haha

@geohotarchive 11 ай бұрын

@byron2099 soundcloud.com/tomcr00se

@eddo4life 11 ай бұрын

@@geohotarchivethis me vs your friend cover 😅🔥🔥

@ShankingDisaster 11 ай бұрын

"die now" LOOOOOOOOOOOOOOOOOOOOL 10/10, thanks for working on the moon satellite brutha. Glad the moon keeps keepin us safe @@geohotarchive

@bagzhansadvakassov1093 11 ай бұрын

Did his gf leave him?

@geohotarchive 11 ай бұрын

@bagzhansadvakassov1093 07:14:25 Alex bringing food

@applepie7282 10 ай бұрын

nice work brother. transformer is a scam. just fancy mambo jambo. Feed forward network + positional encoding is enough for too many tasks.

@edh615 10 ай бұрын

for classifying mnist maybe

@RandoTransform3r 8 күн бұрын

06:42:00 Such an elitist take. Frankly geo you will be proven wrong!

@domenicocolandrea 11 ай бұрын

lets go!

@notTh3Mag1c1an 6 ай бұрын

He needs to cancel the background noise. It is so irritating to my ears

@markvr7340 11 ай бұрын

FAIXXXXXXXXXXXXXXXXXXX

@justinava1675 10 ай бұрын

Lifes not fair. Dudes just born like this. Sigh

@SimonLucky-f1b 10 ай бұрын

What is the point of doing a video, if no one can see the fonts except you.

@toshio-tamura 10 ай бұрын

put the video in 1080p

@SimonLucky-f1b 10 ай бұрын

@@toshio-tamura Bro, The fonts are still too small even at 1080p.

@edh615 10 ай бұрын

small but readable on a big screen@@SimonLucky-f1b

@SpaceExplorer 11 ай бұрын

muzero sort of sucks

@ArtOfTheProblem 11 ай бұрын

say more please

@Bunkerniy_Gadenish 10 ай бұрын

whos dat freak?

@blueskyautumn 11 ай бұрын

I will now go on to not explain what I mean