Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

  Рет қаралды 32,115

Mutual Information

Mutual Information

Күн бұрын

Пікірлер: 99
@lordjared2572
@lordjared2572 Жыл бұрын
ok, just pls upload more vids. There's a huge vacuum of ML education out here for people who are not scared of math.
@Mutual_Information
@Mutual_Information Жыл бұрын
That’s the audience I want!
@rewixx69420
@rewixx69420 Жыл бұрын
I learng ML by my self its hart find informatio this gus saves me on RL thanks
@veereshkumar-b3m
@veereshkumar-b3m 17 күн бұрын
Keep up the great work in the DRL universe- we're learning so much from you!
@marcin.sobocinski
@marcin.sobocinski Жыл бұрын
Your animations are fantastic, it's like a new dimension of learning. I helps so much to be able to visualize RL processes. Thank you!
@Mutual_Information
@Mutual_Information Жыл бұрын
Thank you Marcin - it means a lot when I hit someone in the audience exactly as I hoped :)
@samlaki4051
@samlaki4051 Жыл бұрын
Starting to get into RL, Yannic recommended you! How have I missed such a gem of a channel!
@Mutual_Information
@Mutual_Information Жыл бұрын
Love Yannic's stuff. Super pumped to get the shout out.
@saranahluwalia5353
@saranahluwalia5353 Жыл бұрын
I wish I had this to review 5 years ago. This would have eliminated wasteful experiments. Thank you for making this more accessible.
@Mutual_Information
@Mutual_Information Жыл бұрын
The way I'm designing the channel.. is the channel I would have wanted when I was learning ML for the first time. Seems like that theory landed!
@marcegger7411
@marcegger7411 Жыл бұрын
Fantastic!! Keep up the amazing work! It's always so great to see quality content presented so eloquently :)
@bornamorasai5285
@bornamorasai5285 Жыл бұрын
Can't wait for part 5 and 6!!!! Let's go!!!!
@pandie4555
@pandie4555 Жыл бұрын
dude, the amount of work you put in these videos is fantastic.
@Mutual_Information
@Mutual_Information Жыл бұрын
lol yea these videos were crazy hard. This one took me 100+ hours
@davemansfield7719
@davemansfield7719 2 ай бұрын
I'm surprised how fast those algorithms could could train windy gridworld agents, without proper tuning you could run millions of steps and still never get that performance. Great video!
@buh357
@buh357 Жыл бұрын
I am starting to learn RL, and your video is helping me a lot. You have a clear and precise explanation; thank you. Looking forward to new coming videos :)
@Mutual_Information
@Mutual_Information Жыл бұрын
excellent! Exactly what I'm going for :)
@rostislavmarkov7488
@rostislavmarkov7488 8 ай бұрын
Awesome series covering essential fundamentals with great didactics. Raised the bar at creating high-quality content!
@ryderbrooks1783
@ryderbrooks1783 Жыл бұрын
This channel is extremely under subscribed. I very much appreciate the work you're putting in here. Thank you
@Mutual_Information
@Mutual_Information Жыл бұрын
Ha well I can't expect a large number of subs when my stuff is so technical. So.. should I make it less technical? Nope!
@victormanuel8767
@victormanuel8767 Жыл бұрын
"You've covered a lot, give yourself a 3 second break" *0.5 seconds later* "Great, let's keep going"
@hjop010
@hjop010 Жыл бұрын
Great video as always! It is helping me so much complementing Barto & Sutton.
@Mutual_Information
@Mutual_Information Жыл бұрын
Sweet - that's exactly how I want this series used!
@nathanzorndorf8214
@nathanzorndorf8214 8 ай бұрын
Thanks for this. Amazing.
@antonkot6250
@antonkot6250 11 ай бұрын
O, man. This visualisations are top-notch!
@sathyakumarn7619
@sathyakumarn7619 Жыл бұрын
So precise and fun! But Highly under rated! Please advertise so that more people can be benefitted!
@Mutual_Information
@Mutual_Information Жыл бұрын
Thank you! and I agree, distributing this needs some more effort. Sometimes my tweets help
@datsplit2571
@datsplit2571 Жыл бұрын
High quality videos, my compliments! This helps so much in understanding RL for a Master's course. Thank you!
@Mutual_Information
@Mutual_Information Жыл бұрын
You're welcome! And yuno what would be totally sweet? If you told your classmates about these vids :)
@datsplit2571
@datsplit2571 Жыл бұрын
@@Mutual_Information Posted it in the teams chat of the Advanced Machine Learning course!
@Mutual_Information
@Mutual_Information Жыл бұрын
@@datsplit2571 thank you! Over time moves like that will make all the difference :)
@mryazbeck98
@mryazbeck98 Жыл бұрын
I love your videos to recap what I read in the book. Helps me understand and visualize everything better. I was however hoping to know more about batch training because I didn't understand how it works at all!
@timothytyree5211
@timothytyree5211 Жыл бұрын
Excellent video! I am so stoked to use this in my work!
@timothytyree5211
@timothytyree5211 Жыл бұрын
I used the knowledge of ^this vid today to help a buddy out at work! You rock, Duane!
@timothytyree5211
@timothytyree5211 Жыл бұрын
I'm really looking forward to your next video on function approximation!
@qiguosun129
@qiguosun129 Жыл бұрын
Excellent lecture! It solved the doubts about the method that reviewers asked me to do parameter uncertainty analysis in scientific research papers.
@Mutual_Information
@Mutual_Information Жыл бұрын
Excellent - that's thrilling to hear this has some real impact!
@NazerkeSafina
@NazerkeSafina Жыл бұрын
superb job with visualization. keep up! Only you could explain certain things to me, I've watched several other tutorials and wasn't feeling confident. One thing, I wish the explanation of how V(s) obtained for each state was more detailed, perhaps with multiple samples and step by step calculations.
@skirazai7591
@skirazai7591 Жыл бұрын
Man your doing some very high quality stuff ,keep it up.
@Mutual_Information
@Mutual_Information Жыл бұрын
Thanks - I'm tryin!
@박종연-p9h
@박종연-p9h 6 ай бұрын
This is a life saver
@IRONMAIDEN146
@IRONMAIDEN146 Жыл бұрын
Your videos are helping me a lot in my AI engineering degree, thanks a lot!
@Mutual_Information
@Mutual_Information Жыл бұрын
Love it!
@fedelozano2895
@fedelozano2895 Жыл бұрын
Hi, your videos are really specific and super helpful! This information is helping me with my paper, can´t wait for the next one, thank you :)
@Mutual_Information
@Mutual_Information Жыл бұрын
Lol yea specific as hell! Glad it helps and I'm working on the next one as we speak!
@bean217
@bean217 7 ай бұрын
"If you recall... which you better!" I swear, I recall!
@bmenashetheman
@bmenashetheman Жыл бұрын
What a fantastic series, thank you so much!!!
@Mutual_Information
@Mutual_Information Жыл бұрын
Thanks Ben, glad you see the same value I do Btw, if you know other people studying the same subject, it would help a lot to share this with them :)
@b0nce
@b0nce Жыл бұрын
Double this, great effort, excellent videos, thank you so much Also, Duane, you forget to add this video into RL playlist
@Mutual_Information
@Mutual_Information Жыл бұрын
@@b0nce oh thank you! Fixed
@bmenashetheman
@bmenashetheman Жыл бұрын
@@Mutual_Information already shared it with everyone in my class! I'm certain this channel will get really popular really soon, your content is fantastic.
@Mutual_Information
@Mutual_Information Жыл бұрын
@@bmenashetheman oh you rule! Thank you!
@kimchi_taco
@kimchi_taco Жыл бұрын
14:30 TD is better than MC in general. In my opinion, * TD: It's more align to Bellman Optimality equation, as it focuses on n steps optimization. * MC: It's more align to Bellman equation (with sampling), as it averages the rewards over the trajectory.
@arrozenescau1539
@arrozenescau1539 8 ай бұрын
i wish i could like twice your videos
@Mutual_Information
@Mutual_Information 8 ай бұрын
Well unfortunately, there is no way to double-like. I see only one solution: I need to upload 2x more videos!
@ezragarcia6910
@ezragarcia6910 Жыл бұрын
Thanks!! I just found your channel and IT'S AWESOME!
@Mutual_Information
@Mutual_Information Жыл бұрын
Thanks Ezra - I think it’s a work in progress lol 😁
@selcukkalafat2857
@selcukkalafat2857 Жыл бұрын
thank you. looking forward for the next part
@Mutual_Information
@Mutual_Information Жыл бұрын
In the works :) but I'll need some patience
@broccoli322
@broccoli322 Жыл бұрын
Great videos! Can't wait for more.
@Mutual_Information
@Mutual_Information Жыл бұрын
This is one of my favorites in fact - glad it hits!
@rewixx69420
@rewixx69420 Жыл бұрын
episode 6 finally i will undestand PPO
@marcin.sobocinski
@marcin.sobocinski Жыл бұрын
Dziękujemy.
@samuelepignone8255
@samuelepignone8255 Жыл бұрын
Thanks a lot for your videos. There's just one thing that doesn't make sense to me: in the last example when you add Q-learning in the graph, it has a lower maximum reward than SARSA, and I don't understand how that's possible since the path it follows has many fewer steps. I hope I have explained my doubt well.
@Mutual_Information
@Mutual_Information Жыл бұрын
I don't know either actually. My intuition, by this point, is that an inability to explain performance is the rule, not the exception. It's rare that you can tell a story about why one algo is superior on a particular problem. These very simple toy examples are designed precisely to call out the different in their character. The last one, however, is weird enough that I can't explain all the performance gaps. If anyone else has an intuition, please chime in!
@AlisonStuff
@AlisonStuff Жыл бұрын
love it!! so good!!!
@snowflake5204
@snowflake5204 Жыл бұрын
At 20:30 shouldn't it be SARSA rather than TD1? Since we use state value function in TD rather than state action
@Mutual_Information
@Mutual_Information Жыл бұрын
Sorry it's not clear. I'm using 1-step TD control and Sarsa interchangeably here.
@omerlevy6939
@omerlevy6939 2 ай бұрын
why in 18:16 the last n action values are the only ones who are getting updated
@Electrikalforenzis
@Electrikalforenzis Жыл бұрын
Where are the rest, you are doing fine job with these episodes!!
@Mutual_Information
@Mutual_Information Жыл бұрын
haha thank you very much. I need a bit of time for parts 5 and 6. I just moved to a new house, got a full time job, many little things.. but it's coming :)
@catcoder12
@catcoder12 Жыл бұрын
I really liked the videos, but a pace felt a bit too quick...The efforts put into examples are commendable.
@Mutual_Information
@Mutual_Information Жыл бұрын
I'll take it! I'm learning the slowness thing.. a bit
@АнатолийБугаков-е9г
@АнатолийБугаков-е9г 11 ай бұрын
04:00 "Return g_3 is diff of levels at t=3 and the end of Episode". Could someone explain this? a)Why g_3 is that and b)how do we know return at at t=3? In our BJ example we only know Reward at end of Episode(play), and we use that Reward to update Q.
@surakshachoudhary2880
@surakshachoudhary2880 Жыл бұрын
Eagerly awaiting the remaining episodes - remarkable work there! So far I've just watched the videos, and I think it can only become clearer with some practice - but was curious why I keep hearing about 'deep' RL? Where does the 'deep' a.k.a. neural nets fit into these videos..
@Mutual_Information
@Mutual_Information Жыл бұрын
Parts 5 and 6 are in the works - I only just started them so, it'll take some time. Nothing coming this month, but probably in Dec. Good question! "Deep" in "Deep RL" refers to deep learning, where we utilize neural networks with many layers to learn complex functions from observations. At this point, those NNs have had no place to be inserted - but that changes in part 5. In part 5, we'll discuss handling state-space that are so huge, we can't list them out in a table. In that case, you can use a function to model giant swaths of those states.. and Deep NNs can be especially good at that. My video won't be a deep dive in NNs - that's too big of a subject. But it should be clear how they would get used.
@the_random_noob9860
@the_random_noob9860 5 ай бұрын
In an epsilon greedy policy, the two probabilities are epsilon and 1 - epsilon. So, is my understanding correct? if epsilon = 0, the policy always takes the max action value from q table while generating the episode that q-learning, sarsa and expected sarsa becomes identical.
@123ming1231
@123ming1231 Жыл бұрын
can u make a video later, showing how u make those animation, it is fantasic !!!! It show the concept very clearly !!! The data visualization art behind is so elegant
@Mutual_Information
@Mutual_Information Жыл бұрын
Maybe one day.. The code I use is a big personal library that's not ready for the public. But I could see doing that.. maybe in a year or two after things have gone well. We'll see
@coconut_camping
@coconut_camping Жыл бұрын
I bet you are in Stanford as a professor teaching RL by now? This became a RL bible to me.
@Mutual_Information
@Mutual_Information Жыл бұрын
haha not quite a professor! But if you're using this as a resource, I consider my job fulfilled
@hihellohowrumfine
@hihellohowrumfine Жыл бұрын
Can you please do a series on statistics
@Mutual_Information
@Mutual_Information Жыл бұрын
That's a bit broad. Is there a particular topic you're interested in?
@hihellohowrumfine
@hihellohowrumfine Жыл бұрын
@@Mutual_Information specifically statistical learning theory, something like what 3blue1brown channel has done for linear algebra. A lot of times when I read ML papers, it's hard to deeply appreciate why certain techniques work.
@abramgeorge3290
@abramgeorge3290 Жыл бұрын
why didn't we use importance sampling in Q-Learning, I have been searching for an answer for days with no clue
@raghavendrakaushik1691
@raghavendrakaushik1691 4 ай бұрын
At 4:23 Shouldn't it be traversing backwards in time for MC?
@swastiksharma2683
@swastiksharma2683 10 ай бұрын
you have so good content but you tried to make the video as short as you can due to which there are no natural pauses in the video making it difficult to focus and understand your content.
@Mutual_Information
@Mutual_Information 10 ай бұрын
I think you're right. I'll have fewer cuts in future videos, and I have less cuts in my more recent ones.
@imanmossavat9383
@imanmossavat9383 Жыл бұрын
why the mean TD performance is getting worst as you increase m (11:24)
@Mutual_Information
@Mutual_Information Жыл бұрын
I am not sure.. but I know the behavior is expected. That's actually a question posed in Sutton/Barto's book and I'm sure the answer is online somewhere.
@imanmossavat9383
@imanmossavat9383 Жыл бұрын
@@Mutual_Information Thank you for your response. I really benefit from your videos. If I figure out the answer, I will share it here.
@danielm3772
@danielm3772 Жыл бұрын
From what I have read online and my personal interpretation: this is due to 2 factors, mainly a big value for alpha and the initial state values. I we take the 5 states (calling them A,B,C,D,E) example, we know that the true values are 1/6, 2/6, 3/6, 4/6, 5/6. If we then use an initialization schema of 1/2 for all of them, then first we will see a decrease in the error due to the update of A,B,D,E (as they have the biggest difference compared to the true value), however at some point they are going to stabilize and V(C) is going to change as well, and because the value of alpha is big, we will move away from 1/2 (which corresponds to the initial AND true value) by an non-negligeable amount. Hope that helps.
@sidnath7336
@sidnath7336 Жыл бұрын
Could we get videos on Markov Chain Monte Carlo methods?
@Mutual_Information
@Mutual_Information Жыл бұрын
MCMC! Absolutely, just may take me a bit to get to it
@hansthompson
@hansthompson Жыл бұрын
where is part five? in production?
@Mutual_Information
@Mutual_Information Жыл бұрын
Yea, I took a little break before starting part 5. I'm currently writing it. It'll take sometime. Should be ready in January.
@hansthompson
@hansthompson Жыл бұрын
@Mutual Information very easy to follow. I'll be patiently waiting. Thanks.
@raminessalat9803
@raminessalat9803 Жыл бұрын
You videos are amazing and I know the time spent for creating these are probably astronomical. But i do have a feedback that would help your videos and its my own observation. I think your body language is too much and I feel it is very unnatural/isn't meaningful for the content. I don't know if you are actually forcing it to have a body language or not, but I think body language is something that happens naturally and you don't need to try too hard for it. At first when I started to watch your videos, that was something that was repelling for me personally but when I saw your content, I became a fan of your channel. So hope you take it as a constructive feedback from a fan.
@Mutual_Information
@Mutual_Information Жыл бұрын
Thank you, appreciate the genuine feedback, and I know what you mean. There's this awkward robotic-ness that's difficult to shake. But I think some of it is due this set up. In my more recent videos, my new setup has hopefully brought the unnaturalness down. A work in progress. I also may de-burden myself with trying to match my language with what I'll anticipate will be on screen.
Function Approximation | Reinforcement Learning Part 5
21:16
Mutual Information
Рет қаралды 19 М.
Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3
27:06
Mutual Information
Рет қаралды 43 М.
Brawl Stars Edit😈📕
00:15
Kan Andrey
Рет қаралды 49 МЛН
Bike Vs Tricycle Fast Challenge
00:43
Russo
Рет қаралды 74 МЛН
How Strong is Tin Foil? 💪
00:26
Preston
Рет қаралды 88 МЛН
What is a Monad? - Computerphile
21:50
Computerphile
Рет қаралды 602 М.
Is the Future of Linear Algebra.. Random?
35:11
Mutual Information
Рет қаралды 293 М.
Policy Gradient Methods | Reinforcement Learning Part 6
29:05
Mutual Information
Рет қаралды 30 М.
What happens at the Boundary of Computation?
14:59
Mutual Information
Рет қаралды 65 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 427 М.
TD Learning - Richard S. Sutton
1:26:25
Wei Wei
Рет қаралды 18 М.
Importance Sampling
12:46
Mutual Information
Рет қаралды 61 М.
Why π is in the normal distribution (beyond integral tricks)
24:46
3Blue1Brown
Рет қаралды 1,6 МЛН
Brawl Stars Edit😈📕
00:15
Kan Andrey
Рет қаралды 49 МЛН