L5 DDPG and SAC (Foundations of Deep RL Series)

L6 Model-based RL (Foundations of Deep RL Series)

L1 MDPs, Exact Solution Methods, Max-ent RL (Foundations of Deep RL Series)

It’s all not real

Арыстанның айқасы, Тәуіржанның шайқасы!

Мясо вегана? 🧐 @Whatthefshow

Мен атып көрмегенмін ! | Qalam | 5 серия

L5 DDPG and SAC (Foundations of Deep RL Series)

Рет қаралды 22,205

Pieter Abbeel

Pieter Abbeel

Күн бұрын

Пікірлер: 16

@flopyarcade4407

@flopyarcade4407 2 жыл бұрын

For those unaware: Peter Abbeel is a legend. Consider gratitude. ✌️

@yihongliu7326 Жыл бұрын

Thank you, Pieter. More than 1 year back when I first learned RL I was trying to search for something like this online but couldn't. This is very helpful and amazing :)

@hassaannaeem4374

@hassaannaeem4374 2 жыл бұрын

Awesome series. Thanks Pieter

@dermitdembrot3091

@dermitdembrot3091 3 жыл бұрын

How widely is this "backprop through Q" used in continuous control? Does e.g. SAC use it?

@sourjyasarkar 3 жыл бұрын

In the learning step of Soft-Actor Critic Algorithm, shouldn't it be Q_{phi}(st,at) and pi_{\theta}(at | st) presuming the parameters for the Q value network in \phi and that of the policy network is \theta ?

@dermitdembrot3091

@dermitdembrot3091 3 жыл бұрын

Yes it should, to keep the slides consistent, but no big issue. Maybe the papers use greek letters differently.

@bashkirovsergey

@bashkirovsergey 2 жыл бұрын

Hello! At 3:38 I cannot understand the Q(t) = r(t) + gamma * Q(t+1). Shouldn't it be Q(t) = r(t) + gamma * V(t+1)? I'd appreciate any hint! Thank you!

@ioannis.g.tzolas

@ioannis.g.tzolas 2 жыл бұрын

My understanding is that the action value is equal to the state value since we are having a deterministic policy.

@rezamaz8975 Жыл бұрын

In general, value function give us how much the state is good and it seems to me it's equal Q function

@ky8920 Жыл бұрын

i think it uses a single sample to estimate weighted sum over q_t+1

@aakashrana3036

@aakashrana3036 Жыл бұрын

I think that the action is taken from the policy, have a look at the ddpg algorithm pseudocode to have more clarity.

@arizmohammadi5354

@arizmohammadi5354 Жыл бұрын

I appreciate your efforts

@nguyenmanh466 4 ай бұрын

10:00 that flexing :"D

@levizhou6726 2 жыл бұрын

Thank you sir.

@arisioz 2 жыл бұрын

In comparison to the other banger lectures this one was very briefly analyzed :(

@hadsaadat8283 Жыл бұрын

the voice quality is aweful

L6 Model-based RL (Foundations of Deep RL Series)

18:14

L6 Model-based RL (Foundations of Deep RL Series)

Pieter Abbeel

Рет қаралды 15 М.

L1 MDPs, Exact Solution Methods, Max-ent RL (Foundations of Deep RL Series)

1:16:10

L1 MDPs, Exact Solution Methods, Max-ent RL (Foundations of Deep RL Series)

Pieter Abbeel

Рет қаралды 60 М.

It’s all not real

00:15

It’s all not real

V.A. show / Магика

Рет қаралды 20 МЛН

Арыстанның айқасы, Тәуіржанның шайқасы!

25:51

Арыстанның айқасы, Тәуіржанның шайқасы!

QosLike / ҚосЛайк / Косылайық

Рет қаралды 699 М.

Мясо вегана? 🧐 @Whatthefshow

01:01

Мясо вегана? 🧐 @Whatthefshow

История одного вокалиста

Рет қаралды 7 МЛН

Мен атып көрмегенмін ! | Qalam | 5 серия

25:41

Мен атып көрмегенмін ! | Qalam | 5 серия

kak budto

Рет қаралды 1,2 МЛН

Everything You Need to Know About Deep Deterministic Policy Gradients (DDPG) | Tensorflow 2 Tutorial

1:07:46

Everything You Need to Know About Deep Deterministic Policy Gradients (DDPG) | Tensorflow 2 Tutorial

Machine Learning with Phil

Рет қаралды 42 М.

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

41:22

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

Pieter Abbeel

Рет қаралды 30 М.

Overview of Deep Reinforcement Learning Methods

24:50

Overview of Deep Reinforcement Learning Methods

Steve Brunton

Рет қаралды 66 М.

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

57:45

Visualizing transformers and attention | Talk for TNG Big Tech Day '24

Grant Sanderson

Рет қаралды 230 М.

MIT 6.S191: Reinforcement Learning

1:00:19

MIT 6.S191: Reinforcement Learning

Alexander Amini

Рет қаралды 63 М.

Policy Gradient Methods | Reinforcement Learning Part 6

29:05

Policy Gradient Methods | Reinforcement Learning Part 6

Mutual Information

Рет қаралды 37 М.

Reinforcement Learning Policies and Learning Algorithms

17:52

Reinforcement Learning Policies and Learning Algorithms

MATLAB

Рет қаралды 34 М.

L4 TRPO and PPO (Foundations of Deep RL Series)

25:21

L4 TRPO and PPO (Foundations of Deep RL Series)

Pieter Abbeel

Рет қаралды 30 М.

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

1:38:50

DeepMind x UCL RL Lecture Series - Policy-Gradient and Actor-Critic methods [9/13]

Google DeepMind

Рет қаралды 36 М.

MIT 6.S191 (2023): Reinforcement Learning

57:33

MIT 6.S191 (2023): Reinforcement Learning

Alexander Amini

Рет қаралды 137 М.

It’s all not real

00:15

It’s all not real

V.A. show / Магика

Рет қаралды 20 МЛН