Stanford CS25: V1 I Decision Transformer: Reinforcement Learning via Sequence Modeling

Рет қаралды 38,852

Күн бұрын

Пікірлер: 22

@ryanhewitt9902 11 ай бұрын

From a presenter's perspective the interrupting questions may have been mildly frustrating. From a viewer's perspective they were indispensable. I had exactly the same questions and the answers really helped me to understand the choices at a deeper level. Thank you!

@adibkarimi1133 9 ай бұрын

Great talk! It's super exciting how RL with transformers performs so good.

@prof_shixo 10 ай бұрын

For the question at timepoint 20:20 regarding the Markovian property and the fact that we are shifting the state using positional embedding, I think that the presenter reply was not that accurate. While he referred to it as state, actually what is there is an observation instead (e.g., an RGB frame) while the actual state is the encoder embedding given the encoded observation sequence along other token types. That view makes the state representation still to be a Markovian one as the encoder's embedding (the true state) is encapsulating all the history in the sequence and it is consistent (i.e., gives the same state given the same sequence), once the encoder is trained and frozen. In brief, I guess that was a confusion between observations and states in the RL paradigm.

@sh4ny1 4 ай бұрын

Great observation !

@rtBen 2 жыл бұрын

On sequence modelling: - talk dated in the future oct 11, 2022 - streamed in July 2022 - “to appear in” neurIPS 2021

@AdityaGrover 2 жыл бұрын

ha thanks for pointing the typo! The talk was in Oct 2021.

@empi_ai 2 жыл бұрын

brilliant talk! very clear and interesting

@albertwang5974 2 жыл бұрын

This talking illustrate what's time travelling, what a future talking!

@djethereal99 2 жыл бұрын

Great talk, really interesting work!

@LucasOSouza 2 жыл бұрын

nice talk! got a bit spooked watching this today and seeing the date in the first slide 😅

@m.h.w285 2 жыл бұрын

How time flies!

@jamesnatchwey1961 2 жыл бұрын

🤣🤗

@productlog5895 2 жыл бұрын

beautiful lecture

@imolafodor4667 8 ай бұрын

hi, how is this method considered to be offline rl if at some point you are calling env.step() aka an online step?

@miquelnogueralonso2576 2 жыл бұрын

Are the slides available ?

@amrahmed2009 2 жыл бұрын

Are the slides available, please?

@markcarter6333 Жыл бұрын

Good stuff. Would it be better to leave questions in the chat? Or at the end of talk?

@zyzhang1130 Жыл бұрын

Are LLMs stable to train though

@sdfafds6823 Жыл бұрын

at the third page, "Larger model require fewer samples to reach the same performance ." Isn't that a little bit anti common sense? In my original opinion, more complex model needs to ingest more data.