Temporal Difference Learning (including Q-Learning)

Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

Рет қаралды 32,115

Mutual Information

Күн бұрын

Пікірлер: 99

@lordjared2572 Жыл бұрын

ok, just pls upload more vids. There's a huge vacuum of ML education out here for people who are not scared of math.

@Mutual_Information Жыл бұрын

That’s the audience I want!

@rewixx69420 Жыл бұрын

I learng ML by my self its hart find informatio this gus saves me on RL thanks

@veereshkumar-b3m 17 күн бұрын

Keep up the great work in the DRL universe- we're learning so much from you!

@marcin.sobocinski Жыл бұрын

Your animations are fantastic, it's like a new dimension of learning. I helps so much to be able to visualize RL processes. Thank you!

@Mutual_Information Жыл бұрын

Thank you Marcin - it means a lot when I hit someone in the audience exactly as I hoped :)

@samlaki4051 Жыл бұрын

Starting to get into RL, Yannic recommended you! How have I missed such a gem of a channel!

@Mutual_Information Жыл бұрын

Love Yannic's stuff. Super pumped to get the shout out.

@saranahluwalia5353 Жыл бұрын

I wish I had this to review 5 years ago. This would have eliminated wasteful experiments. Thank you for making this more accessible.

@Mutual_Information Жыл бұрын

The way I'm designing the channel.. is the channel I would have wanted when I was learning ML for the first time. Seems like that theory landed!

@marcegger7411 Жыл бұрын

Fantastic!! Keep up the amazing work! It's always so great to see quality content presented so eloquently :)

@bornamorasai5285 Жыл бұрын

Can't wait for part 5 and 6!!!! Let's go!!!!

@pandie4555 Жыл бұрын

dude, the amount of work you put in these videos is fantastic.

@Mutual_Information Жыл бұрын

lol yea these videos were crazy hard. This one took me 100+ hours

@davemansfield7719 2 ай бұрын

I'm surprised how fast those algorithms could could train windy gridworld agents, without proper tuning you could run millions of steps and still never get that performance. Great video!

@buh357 Жыл бұрын

I am starting to learn RL, and your video is helping me a lot. You have a clear and precise explanation; thank you. Looking forward to new coming videos :)

@Mutual_Information Жыл бұрын

excellent! Exactly what I'm going for :)

@rostislavmarkov7488 8 ай бұрын

Awesome series covering essential fundamentals with great didactics. Raised the bar at creating high-quality content!

@ryderbrooks1783 Жыл бұрын

This channel is extremely under subscribed. I very much appreciate the work you're putting in here. Thank you

@Mutual_Information Жыл бұрын

Ha well I can't expect a large number of subs when my stuff is so technical. So.. should I make it less technical? Nope!

@victormanuel8767 Жыл бұрын

"You've covered a lot, give yourself a 3 second break" *0.5 seconds later* "Great, let's keep going"

@hjop010 Жыл бұрын

Great video as always! It is helping me so much complementing Barto & Sutton.

@Mutual_Information Жыл бұрын

Sweet - that's exactly how I want this series used!

@nathanzorndorf8214 8 ай бұрын

Thanks for this. Amazing.

@antonkot6250 11 ай бұрын

O, man. This visualisations are top-notch!

@sathyakumarn7619 Жыл бұрын

So precise and fun! But Highly under rated! Please advertise so that more people can be benefitted!

@Mutual_Information Жыл бұрын

Thank you! and I agree, distributing this needs some more effort. Sometimes my tweets help

@datsplit2571 Жыл бұрын

High quality videos, my compliments! This helps so much in understanding RL for a Master's course. Thank you!

@Mutual_Information Жыл бұрын

You're welcome! And yuno what would be totally sweet? If you told your classmates about these vids :)

@datsplit2571 Жыл бұрын

@@Mutual_Information Posted it in the teams chat of the Advanced Machine Learning course!

@Mutual_Information Жыл бұрын

@@datsplit2571 thank you! Over time moves like that will make all the difference :)

@mryazbeck98 Жыл бұрын

I love your videos to recap what I read in the book. Helps me understand and visualize everything better. I was however hoping to know more about batch training because I didn't understand how it works at all!

@timothytyree5211 Жыл бұрын

Excellent video! I am so stoked to use this in my work!

@timothytyree5211 Жыл бұрын

I used the knowledge of ^this vid today to help a buddy out at work! You rock, Duane!

@timothytyree5211 Жыл бұрын

I'm really looking forward to your next video on function approximation!

@qiguosun129 Жыл бұрын

Excellent lecture! It solved the doubts about the method that reviewers asked me to do parameter uncertainty analysis in scientific research papers.

@Mutual_Information Жыл бұрын

Excellent - that's thrilling to hear this has some real impact!

@NazerkeSafina Жыл бұрын

superb job with visualization. keep up! Only you could explain certain things to me, I've watched several other tutorials and wasn't feeling confident. One thing, I wish the explanation of how V(s) obtained for each state was more detailed, perhaps with multiple samples and step by step calculations.

@skirazai7591 Жыл бұрын

Man your doing some very high quality stuff ,keep it up.

@Mutual_Information Жыл бұрын

Thanks - I'm tryin!

@박종연-p9h 6 ай бұрын

This is a life saver

@IRONMAIDEN146 Жыл бұрын

Your videos are helping me a lot in my AI engineering degree, thanks a lot!

@Mutual_Information Жыл бұрын

Love it!

@fedelozano2895 Жыл бұрын

Hi, your videos are really specific and super helpful! This information is helping me with my paper, can´t wait for the next one, thank you :)

@Mutual_Information Жыл бұрын

Lol yea specific as hell! Glad it helps and I'm working on the next one as we speak!

@bean217 7 ай бұрын

"If you recall... which you better!" I swear, I recall!

@bmenashetheman Жыл бұрын

What a fantastic series, thank you so much!!!

@Mutual_Information Жыл бұрын

Thanks Ben, glad you see the same value I do Btw, if you know other people studying the same subject, it would help a lot to share this with them :)

@b0nce Жыл бұрын

Double this, great effort, excellent videos, thank you so much Also, Duane, you forget to add this video into RL playlist

@Mutual_Information Жыл бұрын

@@b0nce oh thank you! Fixed

@bmenashetheman Жыл бұрын

@@Mutual_Information already shared it with everyone in my class! I'm certain this channel will get really popular really soon, your content is fantastic.

@Mutual_Information Жыл бұрын

@@bmenashetheman oh you rule! Thank you!

@kimchi_taco Жыл бұрын

14:30 TD is better than MC in general. In my opinion, * TD: It's more align to Bellman Optimality equation, as it focuses on n steps optimization. * MC: It's more align to Bellman equation (with sampling), as it averages the rewards over the trajectory.

@arrozenescau1539 8 ай бұрын

i wish i could like twice your videos

@Mutual_Information 8 ай бұрын

Well unfortunately, there is no way to double-like. I see only one solution: I need to upload 2x more videos!

@ezragarcia6910 Жыл бұрын

Thanks!! I just found your channel and IT'S AWESOME!

@Mutual_Information Жыл бұрын

Thanks Ezra - I think it’s a work in progress lol 😁

@selcukkalafat2857 Жыл бұрын

thank you. looking forward for the next part

@Mutual_Information Жыл бұрын

In the works :) but I'll need some patience

@broccoli322 Жыл бұрын

Great videos! Can't wait for more.

@Mutual_Information Жыл бұрын

This is one of my favorites in fact - glad it hits!

@rewixx69420 Жыл бұрын

episode 6 finally i will undestand PPO

@marcin.sobocinski Жыл бұрын

Dziękujemy.

@samuelepignone8255 Жыл бұрын

Thanks a lot for your videos. There's just one thing that doesn't make sense to me: in the last example when you add Q-learning in the graph, it has a lower maximum reward than SARSA, and I don't understand how that's possible since the path it follows has many fewer steps. I hope I have explained my doubt well.

@Mutual_Information Жыл бұрын

I don't know either actually. My intuition, by this point, is that an inability to explain performance is the rule, not the exception. It's rare that you can tell a story about why one algo is superior on a particular problem. These very simple toy examples are designed precisely to call out the different in their character. The last one, however, is weird enough that I can't explain all the performance gaps. If anyone else has an intuition, please chime in!

@AlisonStuff Жыл бұрын

love it!! so good!!!

@snowflake5204 Жыл бұрын

At 20:30 shouldn't it be SARSA rather than TD1? Since we use state value function in TD rather than state action

@Mutual_Information Жыл бұрын

Sorry it's not clear. I'm using 1-step TD control and Sarsa interchangeably here.

@omerlevy6939 2 ай бұрын

why in 18:16 the last n action values are the only ones who are getting updated

@Electrikalforenzis Жыл бұрын

Where are the rest, you are doing fine job with these episodes!!

@Mutual_Information Жыл бұрын

haha thank you very much. I need a bit of time for parts 5 and 6. I just moved to a new house, got a full time job, many little things.. but it's coming :)

@catcoder12 Жыл бұрын

I really liked the videos, but a pace felt a bit too quick...The efforts put into examples are commendable.

@Mutual_Information Жыл бұрын

I'll take it! I'm learning the slowness thing.. a bit

@АнатолийБугаков-е9г 11 ай бұрын

04:00 "Return g_3 is diff of levels at t=3 and the end of Episode". Could someone explain this? a)Why g_3 is that and b)how do we know return at at t=3? In our BJ example we only know Reward at end of Episode(play), and we use that Reward to update Q.

@surakshachoudhary2880 Жыл бұрын

Eagerly awaiting the remaining episodes - remarkable work there! So far I've just watched the videos, and I think it can only become clearer with some practice - but was curious why I keep hearing about 'deep' RL? Where does the 'deep' a.k.a. neural nets fit into these videos..

@Mutual_Information Жыл бұрын

Parts 5 and 6 are in the works - I only just started them so, it'll take some time. Nothing coming this month, but probably in Dec. Good question! "Deep" in "Deep RL" refers to deep learning, where we utilize neural networks with many layers to learn complex functions from observations. At this point, those NNs have had no place to be inserted - but that changes in part 5. In part 5, we'll discuss handling state-space that are so huge, we can't list them out in a table. In that case, you can use a function to model giant swaths of those states.. and Deep NNs can be especially good at that. My video won't be a deep dive in NNs - that's too big of a subject. But it should be clear how they would get used.

@the_random_noob9860 5 ай бұрын

In an epsilon greedy policy, the two probabilities are epsilon and 1 - epsilon. So, is my understanding correct? if epsilon = 0, the policy always takes the max action value from q table while generating the episode that q-learning, sarsa and expected sarsa becomes identical.

@123ming1231 Жыл бұрын

can u make a video later, showing how u make those animation, it is fantasic !!!! It show the concept very clearly !!! The data visualization art behind is so elegant

@Mutual_Information Жыл бұрын

Maybe one day.. The code I use is a big personal library that's not ready for the public. But I could see doing that.. maybe in a year or two after things have gone well. We'll see

@coconut_camping Жыл бұрын

I bet you are in Stanford as a professor teaching RL by now? This became a RL bible to me.

@Mutual_Information Жыл бұрын

haha not quite a professor! But if you're using this as a resource, I consider my job fulfilled

@hihellohowrumfine Жыл бұрын

Can you please do a series on statistics

@Mutual_Information Жыл бұрын

That's a bit broad. Is there a particular topic you're interested in?

@hihellohowrumfine Жыл бұрын

@@Mutual_Information specifically statistical learning theory, something like what 3blue1brown channel has done for linear algebra. A lot of times when I read ML papers, it's hard to deeply appreciate why certain techniques work.

@abramgeorge3290 Жыл бұрын

why didn't we use importance sampling in Q-Learning, I have been searching for an answer for days with no clue

@raghavendrakaushik1691 4 ай бұрын

At 4:23 Shouldn't it be traversing backwards in time for MC?

@swastiksharma2683 10 ай бұрын

you have so good content but you tried to make the video as short as you can due to which there are no natural pauses in the video making it difficult to focus and understand your content.

@Mutual_Information 10 ай бұрын

I think you're right. I'll have fewer cuts in future videos, and I have less cuts in my more recent ones.

@imanmossavat9383 Жыл бұрын

why the mean TD performance is getting worst as you increase m (11:24)

@Mutual_Information Жыл бұрын

I am not sure.. but I know the behavior is expected. That's actually a question posed in Sutton/Barto's book and I'm sure the answer is online somewhere.

@imanmossavat9383 Жыл бұрын

@@Mutual_Information Thank you for your response. I really benefit from your videos. If I figure out the answer, I will share it here.

@danielm3772 Жыл бұрын

From what I have read online and my personal interpretation: this is due to 2 factors, mainly a big value for alpha and the initial state values. I we take the 5 states (calling them A,B,C,D,E) example, we know that the true values are 1/6, 2/6, 3/6, 4/6, 5/6. If we then use an initialization schema of 1/2 for all of them, then first we will see a decrease in the error due to the update of A,B,D,E (as they have the biggest difference compared to the true value), however at some point they are going to stabilize and V(C) is going to change as well, and because the value of alpha is big, we will move away from 1/2 (which corresponds to the initial AND true value) by an non-negligeable amount. Hope that helps.

@sidnath7336 Жыл бұрын

Could we get videos on Markov Chain Monte Carlo methods?

@Mutual_Information Жыл бұрын

MCMC! Absolutely, just may take me a bit to get to it

@hansthompson Жыл бұрын

where is part five? in production?

@Mutual_Information Жыл бұрын

Yea, I took a little break before starting part 5. I'm currently writing it. It'll take sometime. Should be ready in January.

@hansthompson Жыл бұрын

@Mutual Information very easy to follow. I'll be patiently waiting. Thanks.

@raminessalat9803 Жыл бұрын

You videos are amazing and I know the time spent for creating these are probably astronomical. But i do have a feedback that would help your videos and its my own observation. I think your body language is too much and I feel it is very unnatural/isn't meaningful for the content. I don't know if you are actually forcing it to have a body language or not, but I think body language is something that happens naturally and you don't need to try too hard for it. At first when I started to watch your videos, that was something that was repelling for me personally but when I saw your content, I became a fan of your channel. So hope you take it as a constructive feedback from a fan.

@Mutual_Information Жыл бұрын

Thank you, appreciate the genuine feedback, and I know what you mean. There's this awkward robotic-ness that's difficult to shake. But I think some of it is due this set up. In my more recent videos, my new setup has hopefully brought the unnaturalness down. A work in progress. I also may de-burden myself with trying to match my language with what I'll anticipate will be on screen.