Actor-Critic Reinforcement for continuous actions!

  Рет қаралды 6,875

Thinkstr

Thinkstr

2 жыл бұрын

Here's a link to the github repository of the actor-critic method I learned from:
github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py
patreon.com/thinkstr

Пікірлер: 11
@AM-dj4vp
@AM-dj4vp Жыл бұрын
very underrated video, literally the best explanation on actor/critic that I've seen. Good Job! and Thanks!
@Thinkstr
@Thinkstr Жыл бұрын
Hey, thanks for watching! These are fun to make and I learn a lot. I think my understanding has come a long way since I made this video, so I'll have to make another eventually.
@underlecht
@underlecht Жыл бұрын
Great video! I was a bit messed up about actor-critic by confusing when and which varianble should be back propagated in loss function and which not :D You did it right it seems.
@Thinkstr
@Thinkstr Жыл бұрын
Thanks for watching, I'm glad you liked it! I should be making more of these videos soon...
@PeterIntrovert
@PeterIntrovert 2 жыл бұрын
This is rocket science to me lol but I get value from your videos anyway. I learn critical thinking from you. I think I understand and like general idea I hope you wont invent skynet or something in the future. :D
@Thinkstr
@Thinkstr 2 жыл бұрын
Haha, thanks! If I ever invent skynet, I hope it's a NICE skynet.
@aprameyandesikan3648
@aprameyandesikan3648 11 ай бұрын
Hey, awesome video!! I had a question regarding how the model is choosing the averages and standard deviations. It is supposed to be continuous, so how is the model choosing a continuous output for the the two?
@Thinkstr
@Thinkstr 11 ай бұрын
Thanks for watching! I'm not sure I understand the question, but I think it's actually easier to make a neural network which outputs in a continuous range instead of a discrete range (like categorization). After the actor makes the mean "mu" and standard deviation "sigma", it samples "epsilon" from a normal distribution and adds mu + sigma * epsilon; it's called the "reparameterization trick." sassafras13.github.io/images/2020-05-25-ReparamTrick-eqn2.png
@aprameyandesikan3648
@aprameyandesikan3648 11 ай бұрын
Thanks! I think that answers my question. So you just essentially take the continuous outputs of your network for your action itself, I presume, instead of like categorisation where the one with the highest probability is chosen?
@Thinkstr
@Thinkstr 11 ай бұрын
@@aprameyandesikan3648 Yes, exactly!
@aprameyandesikan3648
@aprameyandesikan3648 11 ай бұрын
Awesome, thanks for taking your time to answer my questions! Keep up with the videos!
An introduction to Policy Gradient methods - Deep Reinforcement Learning
19:50
Everything You Need To Master Actor Critic Methods | Tensorflow 2 Tutorial
40:47
Machine Learning with Phil
Рет қаралды 47 М.
La revancha 😱
00:55
Juan De Dios Pantoja 2
Рет қаралды 54 МЛН
Универ. 13 лет спустя - ВСЕ СЕРИИ ПОДРЯД
9:07:11
Комедии 2023
Рет қаралды 4,4 МЛН
When Steve And His Dog Don'T Give Away To Each Other 😂️
00:21
BigSchool
Рет қаралды 17 МЛН
Continuous Action Space Actor Critic Tutorial
6:07
Skowster the Geek
Рет қаралды 22 М.
CS885 Lecture 7b: Actor Critic
35:06
Pascal Poupart
Рет қаралды 11 М.
What is Actor-Critic?
11:50
Pourquoi (布瓜的世界)
Рет қаралды 838
Reinforcement Learning from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 40 М.
Reinforcement Learning, by the Book
18:19
Mutual Information
Рет қаралды 76 М.
Proximal Policy Optimization | ChatGPT uses this
13:26
CodeEmporium
Рет қаралды 11 М.
Policy Gradient Methods | Reinforcement Learning Part 6
29:05
Mutual Information
Рет қаралды 24 М.
Soft Actor Critic is Easy in PyTorch | Complete Deep Reinforcement Learning Tutorial
1:02:31
What Is The Most Complicated Lock Pattern?
27:29
Dr. Zye
Рет қаралды 1,3 МЛН
La revancha 😱
00:55
Juan De Dios Pantoja 2
Рет қаралды 54 МЛН