Bellman Equation Basics for Reinforcement Learning

Рет қаралды 145,276

Күн бұрын

Пікірлер: 88

@DCG_42 4 жыл бұрын

Look at many people trying to explain this. even Berkeley lectures, this dude learned on his own and he explained soooo much better, thank you so much!!

@iworeushankaonce 4 жыл бұрын

Wow, amazingly done! Have searched over the entire internet to finally find this video and understand the idea behind this equation

@vaishnavi-jk9ve 4 жыл бұрын

🎈

@vaishnavi-jk9ve 4 жыл бұрын

🥰

@estebannicolas78 3 жыл бұрын

I dont mean to be so off topic but does someone know a way to get back into an Instagram account?? I somehow forgot my account password. I would appreciate any tips you can give me.

@wishIKnewHowToLove Жыл бұрын

This guy took this complicated formula and made it easy

@zishiwu7757 3 жыл бұрын

Great advice to thoroughly understand the vocabulary of a subject of interest first. It's similar to gathering information from users and figuring out the requirements of what a computer program should do before coding it.

@saikiranvarma6450 2 жыл бұрын

A very positive start of the video thanks you, keep going and we keep supporting

@AlessandroOrlandi83 4 жыл бұрын

I'm here from coursera, I'm having a bit of a trouble understanding them, thanks for the video!

@vamshidawat5654 4 жыл бұрын

me also

@prateek6502-y4p 4 жыл бұрын

can u tell which course?

@vamshidawat5654 4 жыл бұрын

@@prateek6502-y4p pratical reinforcement learning course from national research university higher school of economics

@afmjoaa Жыл бұрын

Awesome explanation.

@mykyta.petrenko 4 жыл бұрын

The way you explain is great. It is very clear for me despite the fact that my English is not so good.

@vizart2045 2 жыл бұрын

I am working on machine learning and this was new to me. Thanks for bringing it to my attention.

@akashpb4044 2 жыл бұрын

Brilliantly explained 👍🏼👍🏼

@Nana-wu6fb 3 жыл бұрын

Thank you so much for all the videos you've done for this topic!!! Appreciate it so much with the details and also walk-through of codes and also the tips in terms of how to study! Really helpful

@anilkurkcu3389 5 жыл бұрын

Thanks for the "Quick Study Tips"!

@0i0l0o 5 жыл бұрын

your approach is amazing. thank You, good Sir.

@pggg5001 2 жыл бұрын

10:45. I think you might have some error here. The equation is V(s) = max(R(s,a) + r*V(s')) i.e. V(s) max of the sum of the reward of the CURRENT cell (which is s) plus the V-value of its best NEIGHBOUR (which is s'). so the V-value of the cell left of the princess should be V = 0 + 0.9 * 1 = 0.9, not 1, right?

@virgenalosveinte5915 Жыл бұрын

I thought so too at first, but then realized that R(s,a) is a function of s and a, the current state and action pair, not s' which is the next state. So the reward is given by taking the action, not being in the next state. Its a detail anyway but ill leave it for future people

@yadugna 10 ай бұрын

Great presentation- thank you

@rikiriki43 Жыл бұрын

Thank you for this

@rodolfojoseleopoldofarinar7317 Жыл бұрын

thumbs up for the quake reference :)

@user-wd3gm5nv7u 2 жыл бұрын

You are a great teacher man! Mega Thanks...

@mohmmedshafeer2820 5 ай бұрын

Thanks You Skowster

@chiedozieonyearugbulem9363 Жыл бұрын

Thank you for this concise video. The texts provided by my lecturer wasn't this easy to understand

@thenoteBooktalkshow 3 жыл бұрын

Awsm explanation sir

@建鴻-d3v 3 жыл бұрын

very vivid. So the optimal step (action) to take is the direction to increase value the most (gradient direction).

@sambo7734 4 жыл бұрын

isn't V==1 in the princess square (reward == 1 and no next state) and the one to the left == 0.9 (reward == 0 + gamma * 1) ?

@danielketterer9683 3 жыл бұрын

Yup.

@ManjeetKaur-hm4jt 3 жыл бұрын

The value of the terminal state is always 0, not 1.

@ravynala 2 жыл бұрын

The value is calculated for rewards resulting from an action being taken. We are looking at the value of good decisions, not good initial placement.

@pggg5001 2 жыл бұрын

it should be 0.9. R(s,a) and V(s) should be evaluated at the same s.

@ScottTaylorMCPD 9 ай бұрын

The link to the "free Move 37 Reinforcement Learning course" mentioned in the description appears to be dead.

@newan0000 Жыл бұрын

So clear, thank you!

@kabbasoldji3816 Жыл бұрын

thank you very much sir 😍

@gogigaga1677 2 жыл бұрын

BEST EXPLANATION

@erdoganyildiz617 4 жыл бұрын

I have some difficulties understanding something. I would be glad if anyone could help. Bellman equation basically states that to calculate the value of a state, we need to check [R(s,a) + gama*V(s')] for all possible actions in future and select the maximum one. In the Mario example we've directly placed the value=1 to the box near the princess. Because as a human we know the best possible action in this state is to take step towards the princess. But the only way for a computer to decide the best possible action is the try all possible actions and compare the results. Here is where the problem starts, there will be infinite loops (let's say in one infinite loop case, Mario could go back and forth forever) and the computer never will get the chance to compare results because it won't have all the results to make a comparison. What am I missing here? Thanks in advance.

@roboticcharizard 3 жыл бұрын

In this example, we're starting from the base case, that is the final case when we are about to reach the princess. Since we are starting from the base case, we always have some value assigned to the cell, and we can just work our way back to the initial state. You're right in the sense that if we were to start from the initial state and explore from there, we could run into an infinite loop, but in the given example, we are starting from the base case.

@irfanwar6986 4 жыл бұрын

I appreciate your work

@AmanSharma-cv3rn 2 жыл бұрын

Simple and clear❤️

@jeffreyanderson5333 3 жыл бұрын

Thanks that saved me from the day

@maggielin8664 4 жыл бұрын

Thank you so much, Sir.

@Freakynoblegas 4 жыл бұрын

Awesome video!

@KennTollens 4 жыл бұрын

I just started machine learning a couple days ago and it looks like the Bellman equation is part of the Q-learning. I would love if someone did a manual math of how to calculate the q-learning as the agent moves through the network. s - state is the square the agent is in s' - is the next square the agent will go to V - Value R(s,a) - reward for the state given an action y - penalty for going into the new state between 0 and 1

@fazaljarral2792 4 жыл бұрын

kenn just started today , is there any discord server for beginners?

@patite3103 3 жыл бұрын

Great video! Could you explain why you move to the left when calculating the values and not for example go down where V = 0.9. How would you calculate the value V = 0.9 left to the cell with reward -1?

@omkarkulkarni2595 2 жыл бұрын

as far I have understood, he used the value in 1st row, 3rd column, he did not use the value of the -1

@japneetsingh5015 5 жыл бұрын

really enjoyed

@craigowsen4501 3 жыл бұрын

I just finished the 5th lesson in this series. It is awesome! I am also reading Artificial Intelligence Engines: A Tutorial Introduction to the Mathematics of Deep Learning. The math just comes to life after seeing Colin's Python programs. Incredible series! Thanks

@salehabod6740 3 жыл бұрын

Where can we find the codes?

@faisalamir1656 2 жыл бұрын

thanks alot mannnnnnn

@ORagnar 2 жыл бұрын

3:44 It's fascinating to note that in 1954 there were no digital computers. I wonder if they were using analog computers.

@patf9770 5 жыл бұрын

This series is amazing.

@volt897 3 жыл бұрын

i have a question, how can i decide the actions to take before going backwards to extimate values?

@bea59kaiwalyakhairnar37 2 жыл бұрын

That means each action of the agent will give him a reward and if he finds the path or get to the goal then he will get maximum reward. But does an agent trys to find different ways to reach goals. Also i subscribed.

@robotronix-co-il 19 күн бұрын

in my calculation s3 should be 0.9 , since Q(s3 ,a)=0+0.9×max(0,1,0) from here s2 =0.81 s1 = 0.729 , .....

@superdahoho 2 жыл бұрын

11:42 you lost me there You said the state of the next square is 0 that's why it's 0 + but then you go ahead and say the value of the square is 1. are the state and value of the square different? I thought the state was the square?

@pramodpal3320 4 жыл бұрын

Is it like, because we are going from that state with particular action which will lead us to final reward of +1, we are taking the reward value of R(s,a) as 1.

@mohammed333suliman 5 жыл бұрын

Helpful thank you

@pramodpal3320 4 жыл бұрын

I think for reward you are talking about current state. So how could it be +1 when you are starting backward?

@ejbock5b179 6 ай бұрын

What happens if instead of mario being drunk we had that Mario can move how instructed but the identity of the two rooms trap and success are unknown, but information is known of the probability that either is trap or success. i.e. mario knows that the success room may be trap with 50 % confidence

@shakyasarkar7143 4 жыл бұрын

How is the 3rd row, 4th column value coming as 0.73? Can anyone please explain? Please

@Shriukan1 3 жыл бұрын

Because the next possible moves are either -1 or 0,81. So the best move is to go to 0,81, multiplied by the gamma factor of 0,9. .81 x .9 is .73 :)

@chenguangzhou3991 3 жыл бұрын

super

@phattaraphatchaiamornvate8827 2 жыл бұрын

you make me god ty.

@alexusnag 5 жыл бұрын

Great tutorial!

@pramodpal3320 4 жыл бұрын

I mean for the second last cell

@yolomein415 5 жыл бұрын

Where do you get that reward function? The R(s, a)?

@pengli4769 4 жыл бұрын

I would say if it is a chess game. In a state, when your agent places a piece somewhere, and your opponent loose a piece, your agent get a positive reward. State is like current chessboard, action is where you put thr piece, reward can be defined by you.

@niklasdamm6900 Жыл бұрын

21.11 20:55

@asdfasdfuhf 4 жыл бұрын

Great content, terrible microphone

@mpgrewal00 3 жыл бұрын

Las Vegas and AI? What an oxymoron.

@femkemilene4547 5 жыл бұрын

Using gaming analogies and sexist trope distracts. Also unnecessary info about how to learn.

@gaulvhan2814 5 жыл бұрын

you are great at explaining things, but why use games and rescuing a princess as examples? smh, be more open-minded about your audience

@katakeresztesi 3 жыл бұрын

Is the Move 37 course currently available? i am having a hard time finding it

@tonymornoty6346 4 жыл бұрын

Explanation help me, but calculations are wrong. Princes is V=1 and square next to prices should be V=0.9.

@brodderick 5 жыл бұрын

brilliant

@mishrag104 3 жыл бұрын

Topic should be "Bellman Reinforcement Learning for dummies". Great job in explaining with steps.

@somecsmajor Жыл бұрын

Thanks Skowster, you're a real one!

@malanb5 4 жыл бұрын

The example looks very familiar to the Georgia Tech's Reinforcement Learning course's example

@tianyuzhang7404 2 жыл бұрын

The first time I a have a feeling of what Bellman’s equation is for. Awesome video.

@ahmet9446 4 жыл бұрын

I have a question about the step before going to the fire. R(s, a) of fire is -1. The step before, shouldn't be the V value -1 + 0.9 = -0.1?

@MaiMaxTeamIELTSTA Ай бұрын

Exactly. I was wondering too why the bottom right corner cell, the V(s) there is 0.73??? I also think the way you calculated is more correct, so: V(s) = -1 + 0.9(1 + 0.9x0) = -1 + 0.9 = -0.1 Because at that state (at that cell) the only way you can move further is to get into the lava pit, even though it definitely is not the optimal option, but you've no other choice right? Can the KZbinr author help clarify here?