L5 DDPG and SAC (Foundations of Deep RL Series)

  Рет қаралды 22,205

Pieter Abbeel

Pieter Abbeel

Күн бұрын

Пікірлер: 16
@flopyarcade4407
@flopyarcade4407 2 жыл бұрын
For those unaware: Peter Abbeel is a legend. Consider gratitude. ✌️
@yihongliu7326
@yihongliu7326 Жыл бұрын
Thank you, Pieter. More than 1 year back when I first learned RL I was trying to search for something like this online but couldn't. This is very helpful and amazing :)
@hassaannaeem4374
@hassaannaeem4374 2 жыл бұрын
Awesome series. Thanks Pieter
@dermitdembrot3091
@dermitdembrot3091 3 жыл бұрын
How widely is this "backprop through Q" used in continuous control? Does e.g. SAC use it?
@sourjyasarkar
@sourjyasarkar 3 жыл бұрын
In the learning step of Soft-Actor Critic Algorithm, shouldn't it be Q_{phi}(st,at) and pi_{\theta}(at | st) presuming the parameters for the Q value network in \phi and that of the policy network is \theta ?
@dermitdembrot3091
@dermitdembrot3091 3 жыл бұрын
Yes it should, to keep the slides consistent, but no big issue. Maybe the papers use greek letters differently.
@bashkirovsergey
@bashkirovsergey 2 жыл бұрын
Hello! At 3:38 I cannot understand the Q(t) = r(t) + gamma * Q(t+1). Shouldn't it be Q(t) = r(t) + gamma * V(t+1)? I'd appreciate any hint! Thank you!
@ioannis.g.tzolas
@ioannis.g.tzolas 2 жыл бұрын
My understanding is that the action value is equal to the state value since we are having a deterministic policy.
@rezamaz8975
@rezamaz8975 Жыл бұрын
In general, value function give us how much the state is good and it seems to me it's equal Q function
@ky8920
@ky8920 Жыл бұрын
i think it uses a single sample to estimate weighted sum over q_t+1
@aakashrana3036
@aakashrana3036 Жыл бұрын
I think that the action is taken from the policy, have a look at the ddpg algorithm pseudocode to have more clarity.
@arizmohammadi5354
@arizmohammadi5354 Жыл бұрын
I appreciate your efforts
@nguyenmanh466
@nguyenmanh466 4 ай бұрын
10:00 that flexing :"D
@levizhou6726
@levizhou6726 2 жыл бұрын
Thank you sir.
@arisioz
@arisioz 2 жыл бұрын
In comparison to the other banger lectures this one was very briefly analyzed :(
@hadsaadat8283
@hadsaadat8283 Жыл бұрын
the voice quality is aweful
L6 Model-based RL (Foundations of Deep RL Series)
18:14
Pieter Abbeel
Рет қаралды 15 М.
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 20 МЛН
Арыстанның айқасы, Тәуіржанның шайқасы!
25:51
QosLike / ҚосЛайк / Косылайық
Рет қаралды 699 М.
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 7 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
Overview of Deep Reinforcement Learning Methods
24:50
Steve Brunton
Рет қаралды 66 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
MIT 6.S191: Reinforcement Learning
1:00:19
Alexander Amini
Рет қаралды 63 М.
Policy Gradient Methods | Reinforcement Learning Part 6
29:05
Mutual Information
Рет қаралды 37 М.
Reinforcement Learning Policies and Learning Algorithms
17:52
L4 TRPO and PPO (Foundations of Deep RL Series)
25:21
Pieter Abbeel
Рет қаралды 30 М.
MIT 6.S191 (2023): Reinforcement Learning
57:33
Alexander Amini
Рет қаралды 137 М.
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 20 МЛН