CS885 Module 1: Trust region & proximal policy optimization

CS885 Module 2: Maximum Entropy Reinforcement Learning

CS885 Lecture 15b: Proximal Policy Optimization (Presenter: Ruifan Yu)

Quando eu quero Sushi (sem desperdiçar) 🍣

She made herself an ear of corn from his marmalade candies🌽🌽🌽

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

CS885 Module 1: Trust region & proximal policy optimization

Рет қаралды 7,769

Pascal Poupart

Pascal Poupart

Күн бұрын

Пікірлер: 9

@geeklonewu7812

@geeklonewu7812 3 жыл бұрын

Thanks! Best-ever lecture on TRPO and PPO!

@yuniorzhang5643

@yuniorzhang5643 2 жыл бұрын

Excellent explaination, smooth and clear!

@robosergTV 4 жыл бұрын

thanks! Will there be more lectures on RL ?

@Sebastian-fc5dc

@Sebastian-fc5dc 3 жыл бұрын

Very nice to follow, thanks

@НиколайНовичков-е1э

@НиколайНовичков-е1э Жыл бұрын

Thank you!

@outtaspacetime

@outtaspacetime 2 жыл бұрын

good job, thank your sir!

@marcotroster8247

@marcotroster8247 Жыл бұрын

You missed the essence of PPO entirely. It's done because simulators are slow. And to make training feasible you wanna re-use the training data sampled with the old policy for multiple updates. But the ratio would just go crazy and cause unstable training. So a ratio bound gets introduced. Compute the policy gradients and have a look at how the clipping and min work. It's so genius that the gradients related to "bad" training examples become zero due to the constant rule.

@yunpenghuang2246

@yunpenghuang2246 Жыл бұрын

thanks a lot

@ashishj2358 3 жыл бұрын

Nice!

CS885 Module 2: Maximum Entropy Reinforcement Learning

41:48

CS885 Module 2: Maximum Entropy Reinforcement Learning

Pascal Poupart

Рет қаралды 9 М.

CS885 Lecture 15b: Proximal Policy Optimization (Presenter: Ruifan Yu)

18:14

CS885 Lecture 15b: Proximal Policy Optimization (Presenter: Ruifan Yu)

Pascal Poupart

Рет қаралды 11 М.

Quando eu quero Sushi (sem desperdiçar) 🍣

00:26

Quando eu quero Sushi (sem desperdiçar) 🍣

Los Wagners

Рет қаралды 15 МЛН

She made herself an ear of corn from his marmalade candies🌽🌽🌽

00:38

She made herself an ear of corn from his marmalade candies🌽🌽🌽

Valja & Maxim Family

Рет қаралды 18 МЛН

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

00:38

ЧТО ОПАСНЕЕ? ОТВЕТЫ ВАС ШОКИРУЮТ... (1% ОТВЕЧАЮТ ПРАВИЛЬНО) #Shorts #Глент

ГЛЕНТ

Рет қаралды 2,4 МЛН

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

00:19

Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей

Симбочка Пимпочка

Рет қаралды 6 МЛН

Proximal Policy Optimization Explained

17:50

Proximal Policy Optimization Explained

Edan Meyer

Рет қаралды 52 М.

Part 1 of 3 - Proximal Policy Optimization Implementation: 11 Core Implementation Details

25:51

Part 1 of 3 - Proximal Policy Optimization Implementation: 11 Core Implementation Details

Weights & Biases

Рет қаралды 46 М.

Trust Regions

28:48

BYU FLOW Lab

Рет қаралды 6 М.

CS885 Lecture 14c: Trust Region Methods

20:19

CS885 Lecture 14c: Trust Region Methods

Pascal Poupart

Рет қаралды 22 М.

CS885 Lecture 15a: Trust Region Policy Optimization (Presenter: Shivam Kalra)

22:34

CS885 Lecture 15a: Trust Region Policy Optimization (Presenter: Shivam Kalra)

Pascal Poupart

Рет қаралды 7 М.

Transformers (how LLMs work) explained visually | DL5

27:14

Transformers (how LLMs work) explained visually | DL5

3Blue1Brown

Рет қаралды 4 МЛН

11. Unconstrained Optimization; Newton-Raphson and Trust Region Methods

53:30

11. Unconstrained Optimization; Newton-Raphson and Trust Region Methods

MIT OpenCourseWare

Рет қаралды 11 М.

An introduction to Policy Gradient methods - Deep Reinforcement Learning

19:50

An introduction to Policy Gradient methods - Deep Reinforcement Learning

Arxiv Insights

Рет қаралды 208 М.

CS885 Module 6: Inverse RL

37:46

CS885 Module 6: Inverse RL

Pascal Poupart

Рет қаралды 3,1 М.

Proximal Policy Optimization | ChatGPT uses this

13:26

Proximal Policy Optimization | ChatGPT uses this

CodeEmporium

Рет қаралды 21 М.

Quando eu quero Sushi (sem desperdiçar) 🍣

00:26

Quando eu quero Sushi (sem desperdiçar) 🍣

Los Wagners

Рет қаралды 15 МЛН