Stanford CS25: V1 I Decision Transformer: Reinforcement Learning via Sequence Modeling

  Рет қаралды 38,852

Stanford Online

Stanford Online

Күн бұрын

Пікірлер: 22
@ryanhewitt9902
@ryanhewitt9902 11 ай бұрын
From a presenter's perspective the interrupting questions may have been mildly frustrating. From a viewer's perspective they were indispensable. I had exactly the same questions and the answers really helped me to understand the choices at a deeper level. Thank you!
@adibkarimi1133
@adibkarimi1133 9 ай бұрын
Great talk! It's super exciting how RL with transformers performs so good.
@prof_shixo
@prof_shixo 10 ай бұрын
For the question at timepoint 20:20 regarding the Markovian property and the fact that we are shifting the state using positional embedding, I think that the presenter reply was not that accurate. While he referred to it as state, actually what is there is an observation instead (e.g., an RGB frame) while the actual state is the encoder embedding given the encoded observation sequence along other token types. That view makes the state representation still to be a Markovian one as the encoder's embedding (the true state) is encapsulating all the history in the sequence and it is consistent (i.e., gives the same state given the same sequence), once the encoder is trained and frozen. In brief, I guess that was a confusion between observations and states in the RL paradigm.
@sh4ny1
@sh4ny1 4 ай бұрын
Great observation !
@rtBen
@rtBen 2 жыл бұрын
On sequence modelling: - talk dated in the future oct 11, 2022 - streamed in July 2022 - “to appear in” neurIPS 2021
@AdityaGrover
@AdityaGrover 2 жыл бұрын
ha thanks for pointing the typo! The talk was in Oct 2021.
@empi_ai
@empi_ai 2 жыл бұрын
brilliant talk! very clear and interesting
@albertwang5974
@albertwang5974 2 жыл бұрын
This talking illustrate what's time travelling, what a future talking!
@djethereal99
@djethereal99 2 жыл бұрын
Great talk, really interesting work!
@LucasOSouza
@LucasOSouza 2 жыл бұрын
nice talk! got a bit spooked watching this today and seeing the date in the first slide 😅
@m.h.w285
@m.h.w285 2 жыл бұрын
How time flies!
@jamesnatchwey1961
@jamesnatchwey1961 2 жыл бұрын
🤣🤗
@productlog5895
@productlog5895 2 жыл бұрын
beautiful lecture
@imolafodor4667
@imolafodor4667 8 ай бұрын
hi, how is this method considered to be offline rl if at some point you are calling env.step() aka an online step?
@miquelnogueralonso2576
@miquelnogueralonso2576 2 жыл бұрын
Are the slides available ?
@amrahmed2009
@amrahmed2009 2 жыл бұрын
Are the slides available, please?
@markcarter6333
@markcarter6333 Жыл бұрын
Good stuff. Would it be better to leave questions in the chat? Or at the end of talk?
@zyzhang1130
@zyzhang1130 Жыл бұрын
Are LLMs stable to train though
@sdfafds6823
@sdfafds6823 Жыл бұрын
at the third page, "Larger model require fewer samples to reach the same performance ." Isn't that a little bit anti common sense? In my original opinion, more complex model needs to ingest more data.
@paulcreaser9130
@paulcreaser9130 2 жыл бұрын
October 11, 2022 ? Time travel?
@paulcreaser9130
@paulcreaser9130 2 жыл бұрын
Ahh, typo
@sitrakaforler8696
@sitrakaforler8696 2 жыл бұрын
Future talks 😝👓
VAMPIRE DESTROYED GIRL???? 😱
00:56
INO
Рет қаралды 8 МЛН
兔子姐姐最终逃走了吗?#小丑#兔子警官#家庭
00:58
小蚂蚁和小宇宙
Рет қаралды 9 МЛН
НАШЛА ДЕНЬГИ🙀@VERONIKAborsch
00:38
МишАня
Рет қаралды 2,8 МЛН
Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)
1:44:31
Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy
1:11:41
Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)
38:35
Stanford CS25: V4 I Jason Wei & Hyung Won Chung of OpenAI
1:17:07
Stanford Online
Рет қаралды 166 М.
MIT 6.S094: Deep Reinforcement Learning for Motion Planning
1:27:30
Lex Fridman
Рет қаралды 235 М.
VAMPIRE DESTROYED GIRL???? 😱
00:56
INO
Рет қаралды 8 МЛН