Overview of Deep Reinforcement Learning Methods

Рет қаралды 53,567

Күн бұрын

This video gives an overview of methods for deep reinforcement learning, including deep Q-learning, actor-critic methods, deep policy networks, and policy gradient optimization algorithms.
Citable link for this video: doi.org/10.52843/cassyni.kfnzpy
This is a lecture in a series on reinforcement learning, following the new Chapter 11 from the 2nd edition of our book "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz
Book Website: databookuw.com
Book PDF: databookuw.com/databook.pdf
Amazon: www.amazon.com/Data-Driven-Sc...
Brunton Website: eigensteve.com
This video was produced at the University of Washington

Пікірлер: 45

@MaximYudayev 2 жыл бұрын

10:20 I think in this example the state probability density function is assumed stationary for an ergodic environment even in the case of a dynamic policy. So perhaps this assumption implies a static reward function from the given environment, which would not be the case in a dynamic environment like a medical patient whose bodily response to a drug would vary throughout their lifetime/treatment. I checked, Sutton and Barto indeed mention ergodicity of the environment as the reason for policy-independent mu in their book on p.326 and p.333.

@Virsconte 2 жыл бұрын

Thanks! I was wondering after 7:15 if there was some assumption along the lines of ergodicity or stationarity or something being suggested by that wording, but I don't have a deep enough understanding of statistics to unpack it properly.

@BoltzmannVoid 2 жыл бұрын

this is literally the best series for understanding RL ever thank you so much professor for sharing this.

@metluplast 2 жыл бұрын

Thanks professor Steve. Once I hear "welcome back" I just know it's our original professor Steve 😀👍

@matiascova 2 жыл бұрын

at 10:05, my understanding is that the fact that we do not derivate that probability comes from a local approximation assumption. So that formula is only approximately true for changes that are not too big. This simplification is one of the most important parts of the policy gradient theorem, and informs the design of "soft" policy-gradient algorithms, in which we do not allow the policy to change too much since our update logic only works for small steps.

@gbbhkk 2 жыл бұрын

Excellent video, basically saved my day in trying to wrap my head around all the terms and algo :D The concepts have been presented with unmatched clarity and conciseness. Have been waiting for this since your last video on "Q-Learning". Thank you so much!

@mawkuri5496 2 жыл бұрын

i hope you'll create a series where all of the equations in this series is being applied to pytorch and creating simple projects, that would be awesome.

@dmochow Жыл бұрын

This is a fantastic tutorial. Thanks for putting in the time and effort to make it so digestible

@BlueOwnzU96 Жыл бұрын

Thank you professor. This has been great to dust off some RL concepts I had forgotten

@Rodrigoviverosa 2 жыл бұрын

Thanks for the video ! can't wait for that deep MPC video.

@kefre10 2 жыл бұрын

great series! thank you so much!

@chymoney1 2 жыл бұрын

really great stuff, Steve

@OmerBoehm 2 жыл бұрын

Thank you so much for another outstanding video

@wkafa87 25 күн бұрын

@Eigensteve Amazing video lectures. I had watched several of your series. Please if possible make a series about Deep MPC, it would be of great value.

@ramanujanbose6785 2 жыл бұрын

Steve I follow all of your lectures. Being a mechanical engineer I really got amazed by watching your turbulence lectures. I personally worked with CFD using scientific python and visualization and computation using python and published a couple of research articles. I'm very eager to work under your guidance in the field of CFD and Fluid dynamics using Machine learning specifically simulation and modelling turbulence fluid flow field and explore the mysterious world of turbulence. How should I reach you for further communication?

@tarik23boss 2 жыл бұрын

Thanks for this video it was very helpful! Do you have any material on adaptive critic designs ? This is a very well cited paper and I wondering how this all plays in actor critic models

@budmanso Жыл бұрын

Thanks for the video!

@joel.r.h 2 жыл бұрын

Excuse me professor, I am not sure about this specific case: If we have a DRL architectute that interacts with an ad-hoc model we have built (which presents a given structure as the Markov Decision Process), but the DRL agent does not have any prior information on the mechanics of such model (it can just measure outputs and generate inputs), this would be considered model-free? Thank you for your amazing work!

@FRANKONATOR123 2 жыл бұрын

6:22 But Professor, you know we love math derivations!

@sarvagyagupta1744 2 жыл бұрын

10:20, I think it's because we usually use PG in infinite state-action pair models. So in other words, mu(s) is untrackable. It's something like the latent space of an auto-encoder where we can't really track it to generate data.

@add-mt5xc 6 ай бұрын

In DDQN, did you need the Q function (Theta_2) inside the gradient involving d/dTheta?

@add-mt5xc 6 ай бұрын

Prof. Brunton, are you using a lightboard for the lectures? Do you have advice on which one to purchase?

@ryanmckenna2047 3 ай бұрын

@10.33 Steve, maybe mu sub theta is just a vector of constants for the means associated to the asymptotic distribution of each state s to scale the sum of weighted probabilities across all actions for that state in relation to each state's asymptotic distribution?

@0xygenj 13 күн бұрын

Thank you kind sir

@gauravupasani9088 2 жыл бұрын

I have a question regarding the introduction video clip. What is happening to that image in that intro clip ? Before the title comes up

@frankdelahue9761 2 жыл бұрын

Will you make video on AutoML and AutoAI?

@asareclement8177 Жыл бұрын

You are great!!!, a really helpful video. But sir, you did not talk about the MDP

@fawadkhan8905 6 ай бұрын

Sir, could you please give insights that what is the difference between MPC and DDPG??

@randywelt8210 2 жыл бұрын

So my strategy for a better explanation would be to do it like Andrej: start of with a toy example on the real algo, also show the python toy code. Explain how it is connected to other models. After that u can start with the math derivation which is mostly interesting only for ML theorists.

@a_samad Ай бұрын

Zabardast 🎉. Where can i find code toturials similar?

@randywelt8210 2 жыл бұрын

for Deep Q Networks is also not explained well. During training you have use 2 Q networks: The target network for weight training based on replay buffer samples. And the policy network which acts with e.g. an epsilon greedy strategy in the env with the weights from the target network and produces new samples for the replay buffer. (Again I had to read a medium post to remember. )

@marc2752 2 жыл бұрын

But vanilla DQNs dont do this right?

@marc2752 2 жыл бұрын

They also dont include 2 Q networks. These are all extensions of the base alg

@underlecht Жыл бұрын

Some formulas I find misleading. They miss the derivative term with respect to theta. The term is there at the end of video.

@nadimsheikh2876 Жыл бұрын

Excellent.. Try to work

@randywelt8210 2 жыл бұрын

for Deep Policy Networks, the most important part is the explanation of rollout strategies with e.g. stochastic policies, value networks etc. and the similarity to supervised RL. U did not explain that well even though u went over the math derivation. I had to re-read Kapathy's blog to remember! (Just trying to give u reward that Kapathy as supervisor was better for me.)

@thecountoftuscany9493 Жыл бұрын

10:20 I always feel dumb when I don't understand mathematical derivations. It is reassuring to see even the person teaching this course does not understand the math.

@MrPepto93 2 жыл бұрын

Imagine what we could achieve as a species if only larger part of humanity would be capable enough to watch Steve Bruntons vids insted of dumb boomerangs and tik tok vids of idiotic web-celebs who reach over 5mln views in few hours. It really says alot about our level of evolution and where it goes.

@denizdursun3353 4 ай бұрын

I do both and look at us we’re still useless lol

@MrPepto93 4 ай бұрын

@@denizdursun3353😂😂😂

@JTMoustache 2 жыл бұрын

actor critic is not model based. Model based imply a model of the state transitions. P(s' | s, a)

@kunqian6243 2 жыл бұрын

model-based or model-free refers to the model of the environment. He meant the model of the Q value, the parametrized deep Q network, with which, you can apply the gradient methods.

@loki-oq1lj 2 жыл бұрын

Only hackers can hack this conciousness.

@loki-oq1lj 2 жыл бұрын

ELON MUSK also said we are living in simulator like exoskeleton,Sir can you tell (who I am) if not this body please,what ai tell about himself if leaved him concious about himself.