Reinforcement Learning Series: Overview of Methods

  Рет қаралды 103,169

Steve Brunton

Steve Brunton

Күн бұрын

Пікірлер: 98
@tljstewart
@tljstewart 2 жыл бұрын
This russian doll of dichotomies has always been a mind bender, often it seems the literature has nebulous definitions and the boundaries aren't so clear. Thank you for the great insights in this lecture, the graphic is superb.
@Eigensteve
@Eigensteve 2 жыл бұрын
Thanks!
@EkShunya
@EkShunya 2 жыл бұрын
I deeply appreciate the quality of knowledge you are providing to the community. please continue to democratise knowledge.
@alliwant8383
@alliwant8383 Жыл бұрын
Superb. One of the things I always I struggle with when learning something is having a well structured map in my head of the topic and subtopics and this does an extremely good job of doing that. Many thanks.
@saitaro
@saitaro 2 жыл бұрын
OK, this year is gonna be better than I thought. Thanks, professor!
@cornevanzyl5880
@cornevanzyl5880 2 жыл бұрын
You helped me during my undergrad, now you're an inspiration to me during my masters.
@akino.3192
@akino.3192 Жыл бұрын
Wow! Steve, you've managed to break this all down into bite-sized chunks. Thank you 🙏
@complexobjects
@complexobjects 2 жыл бұрын
I just started getting back into RL so this comes at a perfect time! Looking forward 👌
@MrAndreaCaso
@MrAndreaCaso 2 жыл бұрын
Finally! Thank you for you posting. Can't wait to see the whole playlist.
@MrRafaelSencio
@MrRafaelSencio 2 жыл бұрын
It's been a great series of videos on RL! I'm updating my research interests and now I want to combine MPC with RL in such a way that the resulting control structure can be safely implemented and has some stability guarantees. Thank you very much!
@jett_royce
@jett_royce 2 жыл бұрын
Fantastic opening video. You're a talented teacher and I appreciate this content. Looking forward to watching the entire series.
@alireza202
@alireza202 3 ай бұрын
The main difference between on-policy and off-policy methods is the way the generated data is used during learning. If the method uses the data from the current policy, it is on-policy learning. But if the method uses data from another policy (e-greedy or older policies like replays), it is an off-policy method.
@eduardolussi104
@eduardolussi104 Жыл бұрын
Such a high quality course and a free book in description? You're awesome!!
@quantum01010101
@quantum01010101 2 жыл бұрын
An excellent arrangement of a very tough topic, logical and in the proper flow, keep up the very good job Thank you.
@Brian-ft4dh
@Brian-ft4dh Жыл бұрын
Really really great overview for those new to learning about reinforcement learning! Thanks so much!
@thrinayreddy3379
@thrinayreddy3379 2 жыл бұрын
Please make a separate playlist for reinforcement learning :-)
@Eigensteve
@Eigensteve 2 жыл бұрын
Good call -- will do
@rudzanimulaudzi7947
@rudzanimulaudzi7947 2 жыл бұрын
@@Eigensteve please put the videos in order, the current order is not correct. But great content.
@saeedparsamehr9884
@saeedparsamehr9884 2 жыл бұрын
I am really grateful for your eye-opening videos, especially this one
@Eigensteve
@Eigensteve 2 жыл бұрын
Thanks!
@PippyPappyPatterson
@PippyPappyPatterson Жыл бұрын
0:00 Intro 3:00 Background 7:54 Model & Model-Free Reinforcement Learning (RL) 8:29 Markov Decision Process (MDP) 10:25 Nonlinear Dynamics 13:02 Gradient & Gradient-Free RL 14:05 Off-Policy (Q Learning) & On-Policy (SARSA) RL 17:23 Policy Gradient Optimization 18:05 Deep RL
@cabbagecat9612
@cabbagecat9612 Жыл бұрын
Great video! Though imho the on/off-policy distinction explained at 14:24 might be a bit misleading. I believe both on/off policy can explore sub-optimal actions with something like epsilon-greedy.
@GabrielBYPF
@GabrielBYPF 10 ай бұрын
You really invest perfectly the time in your lessons, very very useful! great series!
@Eigensteve
@Eigensteve 10 ай бұрын
Thanks! :)
@anonymous-tt2lm
@anonymous-tt2lm 2 жыл бұрын
The heart of AI is reinforcement learning, it is the only most interesting in whole AI/ml. Basically original AI . Thanks professor 🤝👍
@XenoZeduX
@XenoZeduX 2 жыл бұрын
What an amazing start to the new year! 😍
@Ceznex
@Ceznex Жыл бұрын
Coming to this video after a while. Really great video, thank you!!
@ianlorondealmeida9680
@ianlorondealmeida9680 2 жыл бұрын
Great didactic, congratulations! I used to confuse myself frequently when dealing with these concepts.
@GaetanLepage
@GaetanLepage 2 жыл бұрын
Great synthetic and dense video ! Thank you very much for sharing !
@apurvdhir7062
@apurvdhir7062 2 жыл бұрын
Needed this..... Thank you Professor
@rajmeetsingh1625
@rajmeetsingh1625 2 жыл бұрын
Thanks, Sir, Please add some robotics-related examples in the upcoming series also.
@Eigensteve
@Eigensteve 2 жыл бұрын
Thanks for the suggestion!
@samirelzein1095
@samirelzein1095 2 жыл бұрын
Great teacher and master of the art!
@AlexandreGirard87
@AlexandreGirard87 2 жыл бұрын
Very nice overview video! There is a small typo in the non-linear dynamic equation, the superflous dt on the right. Regarding how MPC fit in the whole DP framework, I remember Prof. Bertsekas was presenting it as a way to approximate cost-to-go online.
@Eigensteve
@Eigensteve 2 жыл бұрын
Good catch on the typo! And interesting perspective on MPC -- thanks!
@mohammadabdollahzadeh268
@mohammadabdollahzadeh268 2 жыл бұрын
Dear steve its amazing category to classify the reinforcement learning thanks alot
@pedrowangler97
@pedrowangler97 9 ай бұрын
The distinction between On-Policy and Off-Policy explained in this video seems to be different from other sources on the internet. I'm trying to get my head around reinforcement learning and I have noticed that different people have different understandings of certain concepts. Model-free and Model-based are also given a different distinction by others, and this really throws me off. I'm not saying the explanation in this video is incorrect, but that there are different explanations elsewhere and I'm not sure which one is correct.
@Spiegeldondi
@Spiegeldondi 2 жыл бұрын
I am a happy owner of you "Data Driven Science and Engineering" book. That fact that there will be much more content on RL in the 2nd edition is really good news! Will there also be a print version of the 2nd edition of your and Kutz' Book?
@Eigensteve
@Eigensteve 2 жыл бұрын
Thanks! Yes, the print edition should be out sometime later this year.
@Mewgu_studio
@Mewgu_studio Жыл бұрын
The dichotomy break down are so awesome...
@alimustafa2682
@alimustafa2682 2 жыл бұрын
I would like to have RL as a career , and you would be the best lecturer for a kickstart.
@zhehaoli1999
@zhehaoli1999 2 жыл бұрын
Great overview, which is just what I need, thank you sir!
@resalatbinafsar7907
@resalatbinafsar7907 2 жыл бұрын
Hi Mr. Brunton, Your videos are impressive and thank you for making the content. A small suggestion though, it will be better for us to navigate if you make separate playlists with orders in a particular content.
@kuanxD
@kuanxD 2 жыл бұрын
this explanation is just beautiful! Thx so much
@vermashwetank
@vermashwetank 2 жыл бұрын
Great series! Can you make a contro bootcamp like series for non-linear control theory? Would love to see some simplified explanations for topics like PDE backstepping, reference governors, lyapanov stability criteria etc.
@Eigensteve
@Eigensteve 2 жыл бұрын
Would love to do a bootcamp on this -- maybe a goal for the new year! :)
@jacekbudzisz380
@jacekbudzisz380 3 ай бұрын
Great channel. You have a knowledge in the field that I was looking for that is mix on AI/NN and control theory.
@marofe
@marofe 2 жыл бұрын
There is a typo at 10:08, the dynamic model in continuous time should be dx/dt=f(x,u,t) only
@Eigensteve
@Eigensteve 2 жыл бұрын
Good catch, thanks!
@MCMelonslice
@MCMelonslice Жыл бұрын
This is amazing. Thank you, steve!
@WhenThoughtsConnect
@WhenThoughtsConnect 2 жыл бұрын
oil lamp except the clear liquid is a gradient and the bubbles are parameters that perform an action. then its p>p' goes to q, then flipflops p q>q'. p'=p or q'=q if the AI improves at a particular task. two randoms cancel each other out on an error function and acts like an implicit rolles theorem without explicitly stating d/dx=0.
@bonaldli
@bonaldli 2 жыл бұрын
Dear Steve, great explanation. However, just wanna confirm: I thought Actor-Critic is a model-free model?
@ShahFahad-hj1ps
@ShahFahad-hj1ps 2 жыл бұрын
Great job Prof. Steve. How about multi-agent based DRL especially graph learning based RL. That can be a remarkable addition to your playlist.
@mortezaaliyari8818
@mortezaaliyari8818 2 жыл бұрын
Thanks for the wonderful videos. It would be great if you add real code to the end of the main videos. it would be very easier to understand with detail.
@danish32100
@danish32100 2 жыл бұрын
Great as always.
@C7ZR1
@C7ZR1 7 ай бұрын
Good stuff! You need to remove the "dt" on the right side of your nonlinear dynamics equation.
@RasitEvduzen
@RasitEvduzen 2 жыл бұрын
Thnx for video professor.
@haotianhang3997
@haotianhang3997 2 жыл бұрын
Thank you! Happy New Year!
@Moonz97
@Moonz97 2 жыл бұрын
At 14:20 to 14:41, you state On-Policy is always playing the best game possible. Is this approach the same as greedily picking the best action at each state? If so, would On-Policy algorithms not include exploration such as epsilon-greedy? The way I understood On-Policy vs Off-Policy here is that On-Policy is purely exploitation whereas Off-Policy is both exploration & exploitation. Am I misunderstanding it? Thanks!
@kunqian6243
@kunqian6243 2 жыл бұрын
I was also a bit confused at that point. But, I think you have probably misunderstood it since, for RL, we have to always ensure exploration & exploitation. I feel that what I understand is completely the opposite to Prof. Steve's description: on-policy uses the to be improved policy to select an action (meaning not always the best action), while the off-policy uses a different policy to decide which action to take (you may always choose the best action). I hope Steve will elaborate more on it. :)
@virgenalosveinte5915
@virgenalosveinte5915 Жыл бұрын
Steve, you are amazing.
@AksGu2
@AksGu2 2 жыл бұрын
Thanks so much for such great video. Can you please tell where does Proximal Policy Optimization (PPO) fit in these categories. For my case a small game, I know that I will be using model free RL, but not able to decide what else to use apart from Q-Learning.
@andreas-lebedev
@andreas-lebedev 2 жыл бұрын
Why is "Actor Critic" assigned (only) to the left side? Isn't it (also) a combination of gradient free and gradient based algorithms, e.g. the Critic is a DQN and the Actor is a Deep Policy Network?
@ayushroy6208
@ayushroy6208 Жыл бұрын
I just have one doubt, is A3C a model free one?
@mairios521
@mairios521 5 ай бұрын
I am a beginner in RL field... i think A2C is an actor critic algo as well (and It's a model-free RL)
@georgefarnon2432
@georgefarnon2432 2 жыл бұрын
Excellent. What other topics will be included in the 2nd edition?
@Eigensteve
@Eigensteve 2 жыл бұрын
Updates throughout, all code in Python and Matlab (with R and Julia online), and new chapters on RL and physics informed machine learning
@drsandeepvm5622
@drsandeepvm5622 Жыл бұрын
Great presentation 👏
@jadavdas5405
@jadavdas5405 2 жыл бұрын
Nice lectures and lots of stuffs to learn. Thanks for sharing. Are the On policy and Off policy somehow related to exploitation and exploration concept?
@paria4393
@paria4393 Жыл бұрын
I have Energy data and I need to implement RL on these data (Inverter) to achieve the best result (when charge/discharge battery, when is the best time to feed in grid,etc.) which algorithm should I use for that ?
@csalahuni
@csalahuni 2 жыл бұрын
The link for the new chapter of the 2nd edition of the book is not working for me. Can someone post the correct link in the comments?
@AliRashidi97
@AliRashidi97 2 жыл бұрын
Yeah it doesnt work :(
@Eigensteve
@Eigensteve 2 жыл бұрын
Maybe use databookuw.com/databook.pdf
@csalahuni
@csalahuni 2 жыл бұрын
Thank you, this link is working, but it still shows the 1st edition of the book I think.
@Eigensteve
@Eigensteve 2 жыл бұрын
@@csalahuni Shoot, sorry, here is the chapter: faculty.washington.edu/sbrunton/databookRL.pdf added to description too
@AliRashidi97
@AliRashidi97 2 жыл бұрын
@@Eigensteve tnx professor 🙏
@Janamejaya.Channegowda
@Janamejaya.Channegowda 2 жыл бұрын
Thank you for sharing.
@sinarezaei218
@sinarezaei218 7 ай бұрын
thanks for your great videos
@thanh315960000
@thanh315960000 2 жыл бұрын
Thank you!
@Oliver-cn5xx
@Oliver-cn5xx Жыл бұрын
hi steve, I think actor critic are usually considered model-free
@tienphatbui9827
@tienphatbui9827 7 ай бұрын
At 20:07 Steve said about the "model of system" that if we have a "model of system" we use the model-base, and if we don't have a "model of system" just use model-free. So, can you explain me more about "model of system". What is it? Example? and Why? Thank you so much.
@jakal0282
@jakal0282 Ай бұрын
Model of the system means you know every possible state of the system, every possible next state and the actions that take you there (& with what probability) as well as the rewards earned after every state & action pair. You can have a model of a maze, you CANT have a model of a chess game (unless you make assumptions about the opposing players strategy).
@Matlockization
@Matlockization 2 жыл бұрын
I enjoy your broad strokes topics. I was wondering can an AI write to its memory once it learns or discovers something new ? Or it doesn't work like that.
@justlaugh8804
@justlaugh8804 Жыл бұрын
Actor critic should be in the policy gradient optimization no?
@ArmanAli-ww7ml
@ArmanAli-ww7ml 2 жыл бұрын
I was reading 1 journal article and found that the author claimed following a model-free RL problem but they have used Markov Decision process to model the problem? they have not mentioned probabilities for states. What does that mean? Also there is another paper which used probabilities for state transitions and solved the problem using Q-learning, so it's all confusing again.
@mohammadabdollahzadeh268
@mohammadabdollahzadeh268 2 жыл бұрын
Dear professor, please explain to us about how to use reinforcement learning to tune pid gains ❤️ Im looking forward to hearing from you Sincerely mohammad
@hamidmirza333
@hamidmirza333 11 ай бұрын
What is difference between deterministic policy and stochastic policy?
@kundankumar-dt5uu
@kundankumar-dt5uu Жыл бұрын
Sir, it model free algorithm uses Marakove decision proce (MDP)?
@mohammadabdollahzadeh268
@mohammadabdollahzadeh268 2 жыл бұрын
Dear steve we can use ls algorithm instead of gradient algorithm isn’t it
@karthiknn97
@karthiknn97 2 жыл бұрын
Hello Professor, where will DDPG algorithm sit in this chart?
@1812aks
@1812aks 2 жыл бұрын
Off plolicy..on policy is slightly confusing here...isnt off policy, a setup where you have the prior data and cant continuously interact with the environment
@Shaunmcdonogh-shaunsurfing
@Shaunmcdonogh-shaunsurfing 2 жыл бұрын
Excellent sunmary
@ArmanAli-ww7ml
@ArmanAli-ww7ml 2 жыл бұрын
Can anyone explain RL by comparing it with ML mathematically? I know much about ML but getting trouble understanding RL.
@mohammadabdollahzadeh268
@mohammadabdollahzadeh268 2 жыл бұрын
Dear professor please explain to us how to use reinforcement learning to tune pid controller gains I’m looking forward to hearing from you Sincerely mohammad
@alexanderskusnov5119
@alexanderskusnov5119 2 жыл бұрын
Will we see the programs teaching each other? (like chess)
@nightsailor1
@nightsailor1 2 жыл бұрын
Sound level is low.
@HarshPatel-g2q
@HarshPatel-g2q 4 ай бұрын
Actor critic is model based ???? I dont think so.
@johnalley8397
@johnalley8397 2 жыл бұрын
Bated breath. No, really. Hurry uuuuh-uuuup!
كم بصير عمركم عام ٢٠٢٥😍 #shorts #hasanandnour
00:27
hasan and nour shorts
Рет қаралды 11 МЛН
Yay😃 Let's make a Cute Handbag for me 👜 #diycrafts #shorts
00:33
LearnToon - Learn & Play
Рет қаралды 117 МЛН
Увеличили моцареллу для @Lorenzo.bagnati
00:48
Кушать Хочу
Рет қаралды 8 МЛН
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 524 М.
AI Learns to Walk (deep reinforcement learning)
8:40
AI Warehouse
Рет қаралды 9 МЛН
Policy Gradient Theorem Explained - Reinforcement Learning
59:36
Elliot Waite
Рет қаралды 64 М.
Reinforcement Learning: Machine Learning Meets Control Theory
26:03
Steve Brunton
Рет қаралды 284 М.
Deep Reinforcement Learning: Neural Networks for Learning Control Laws
21:15
Why Choose Model-Based Reinforcement Learning?
15:01
MATLAB
Рет қаралды 22 М.
MIT 6.S191: Reinforcement Learning
1:00:19
Alexander Amini
Рет қаралды 58 М.
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 341 М.
Overview of Deep Reinforcement Learning Methods
24:50
Steve Brunton
Рет қаралды 65 М.
كم بصير عمركم عام ٢٠٢٥😍 #shorts #hasanandnour
00:27
hasan and nour shorts
Рет қаралды 11 МЛН