Look at many people trying to explain this. even Berkeley lectures, this dude learned on his own and he explained soooo much better, thank you so much!!
@iworeushankaonce4 жыл бұрын
Wow, amazingly done! Have searched over the entire internet to finally find this video and understand the idea behind this equation
@vaishnavi-jk9ve4 жыл бұрын
🎈
@vaishnavi-jk9ve4 жыл бұрын
🥰
@estebannicolas783 жыл бұрын
I dont mean to be so off topic but does someone know a way to get back into an Instagram account?? I somehow forgot my account password. I would appreciate any tips you can give me.
@wishIKnewHowToLove Жыл бұрын
This guy took this complicated formula and made it easy
@zishiwu77573 жыл бұрын
Great advice to thoroughly understand the vocabulary of a subject of interest first. It's similar to gathering information from users and figuring out the requirements of what a computer program should do before coding it.
@saikiranvarma64502 жыл бұрын
A very positive start of the video thanks you, keep going and we keep supporting
@AlessandroOrlandi834 жыл бұрын
I'm here from coursera, I'm having a bit of a trouble understanding them, thanks for the video!
@vamshidawat56544 жыл бұрын
me also
@prateek6502-y4p4 жыл бұрын
can u tell which course?
@vamshidawat56544 жыл бұрын
@@prateek6502-y4p pratical reinforcement learning course from national research university higher school of economics
@afmjoaa Жыл бұрын
Awesome explanation.
@mykyta.petrenko4 жыл бұрын
The way you explain is great. It is very clear for me despite the fact that my English is not so good.
@vizart20452 жыл бұрын
I am working on machine learning and this was new to me. Thanks for bringing it to my attention.
@akashpb40442 жыл бұрын
Brilliantly explained 👍🏼👍🏼
@Nana-wu6fb3 жыл бұрын
Thank you so much for all the videos you've done for this topic!!! Appreciate it so much with the details and also walk-through of codes and also the tips in terms of how to study! Really helpful
@anilkurkcu33895 жыл бұрын
Thanks for the "Quick Study Tips"!
@0i0l0o5 жыл бұрын
your approach is amazing. thank You, good Sir.
@pggg50012 жыл бұрын
10:45. I think you might have some error here. The equation is V(s) = max(R(s,a) + r*V(s')) i.e. V(s) max of the sum of the reward of the CURRENT cell (which is s) plus the V-value of its best NEIGHBOUR (which is s'). so the V-value of the cell left of the princess should be V = 0 + 0.9 * 1 = 0.9, not 1, right?
@virgenalosveinte5915 Жыл бұрын
I thought so too at first, but then realized that R(s,a) is a function of s and a, the current state and action pair, not s' which is the next state. So the reward is given by taking the action, not being in the next state. Its a detail anyway but ill leave it for future people
@yadugna10 ай бұрын
Great presentation- thank you
@rikiriki43 Жыл бұрын
Thank you for this
@rodolfojoseleopoldofarinar7317 Жыл бұрын
thumbs up for the quake reference :)
@user-wd3gm5nv7u2 жыл бұрын
You are a great teacher man! Mega Thanks...
@mohmmedshafeer28205 ай бұрын
Thanks You Skowster
@chiedozieonyearugbulem9363 Жыл бұрын
Thank you for this concise video. The texts provided by my lecturer wasn't this easy to understand
@thenoteBooktalkshow3 жыл бұрын
Awsm explanation sir
@建鴻-d3v3 жыл бұрын
very vivid. So the optimal step (action) to take is the direction to increase value the most (gradient direction).
@sambo77344 жыл бұрын
isn't V==1 in the princess square (reward == 1 and no next state) and the one to the left == 0.9 (reward == 0 + gamma * 1) ?
@danielketterer96833 жыл бұрын
Yup.
@ManjeetKaur-hm4jt3 жыл бұрын
The value of the terminal state is always 0, not 1.
@ravynala2 жыл бұрын
The value is calculated for rewards resulting from an action being taken. We are looking at the value of good decisions, not good initial placement.
@pggg50012 жыл бұрын
it should be 0.9. R(s,a) and V(s) should be evaluated at the same s.
@ScottTaylorMCPD9 ай бұрын
The link to the "free Move 37 Reinforcement Learning course" mentioned in the description appears to be dead.
@newan0000 Жыл бұрын
So clear, thank you!
@kabbasoldji3816 Жыл бұрын
thank you very much sir 😍
@gogigaga16772 жыл бұрын
BEST EXPLANATION
@erdoganyildiz6174 жыл бұрын
I have some difficulties understanding something. I would be glad if anyone could help. Bellman equation basically states that to calculate the value of a state, we need to check [R(s,a) + gama*V(s')] for all possible actions in future and select the maximum one. In the Mario example we've directly placed the value=1 to the box near the princess. Because as a human we know the best possible action in this state is to take step towards the princess. But the only way for a computer to decide the best possible action is the try all possible actions and compare the results. Here is where the problem starts, there will be infinite loops (let's say in one infinite loop case, Mario could go back and forth forever) and the computer never will get the chance to compare results because it won't have all the results to make a comparison. What am I missing here? Thanks in advance.
@roboticcharizard3 жыл бұрын
In this example, we're starting from the base case, that is the final case when we are about to reach the princess. Since we are starting from the base case, we always have some value assigned to the cell, and we can just work our way back to the initial state. You're right in the sense that if we were to start from the initial state and explore from there, we could run into an infinite loop, but in the given example, we are starting from the base case.
@irfanwar69864 жыл бұрын
I appreciate your work
@AmanSharma-cv3rn2 жыл бұрын
Simple and clear❤️
@jeffreyanderson53333 жыл бұрын
Thanks that saved me from the day
@maggielin86644 жыл бұрын
Thank you so much, Sir.
@Freakynoblegas4 жыл бұрын
Awesome video!
@KennTollens4 жыл бұрын
I just started machine learning a couple days ago and it looks like the Bellman equation is part of the Q-learning. I would love if someone did a manual math of how to calculate the q-learning as the agent moves through the network. s - state is the square the agent is in s' - is the next square the agent will go to V - Value R(s,a) - reward for the state given an action y - penalty for going into the new state between 0 and 1
@fazaljarral27924 жыл бұрын
kenn just started today , is there any discord server for beginners?
@patite31033 жыл бұрын
Great video! Could you explain why you move to the left when calculating the values and not for example go down where V = 0.9. How would you calculate the value V = 0.9 left to the cell with reward -1?
@omkarkulkarni25952 жыл бұрын
as far I have understood, he used the value in 1st row, 3rd column, he did not use the value of the -1
@japneetsingh50155 жыл бұрын
really enjoyed
@craigowsen45013 жыл бұрын
I just finished the 5th lesson in this series. It is awesome! I am also reading Artificial Intelligence Engines: A Tutorial Introduction to the Mathematics of Deep Learning. The math just comes to life after seeing Colin's Python programs. Incredible series! Thanks
@salehabod67403 жыл бұрын
Where can we find the codes?
@faisalamir16562 жыл бұрын
thanks alot mannnnnnn
@ORagnar2 жыл бұрын
3:44 It's fascinating to note that in 1954 there were no digital computers. I wonder if they were using analog computers.
@patf97705 жыл бұрын
This series is amazing.
@volt8973 жыл бұрын
i have a question, how can i decide the actions to take before going backwards to extimate values?
@bea59kaiwalyakhairnar372 жыл бұрын
That means each action of the agent will give him a reward and if he finds the path or get to the goal then he will get maximum reward. But does an agent trys to find different ways to reach goals. Also i subscribed.
@robotronix-co-il19 күн бұрын
in my calculation s3 should be 0.9 , since Q(s3 ,a)=0+0.9×max(0,1,0) from here s2 =0.81 s1 = 0.729 , .....
@superdahoho2 жыл бұрын
11:42 you lost me there You said the state of the next square is 0 that's why it's 0 + but then you go ahead and say the value of the square is 1. are the state and value of the square different? I thought the state was the square?
@pramodpal33204 жыл бұрын
Is it like, because we are going from that state with particular action which will lead us to final reward of +1, we are taking the reward value of R(s,a) as 1.
@mohammed333suliman5 жыл бұрын
Helpful thank you
@pramodpal33204 жыл бұрын
I think for reward you are talking about current state. So how could it be +1 when you are starting backward?
@ejbock5b1796 ай бұрын
What happens if instead of mario being drunk we had that Mario can move how instructed but the identity of the two rooms trap and success are unknown, but information is known of the probability that either is trap or success. i.e. mario knows that the success room may be trap with 50 % confidence
@shakyasarkar71434 жыл бұрын
How is the 3rd row, 4th column value coming as 0.73? Can anyone please explain? Please
@Shriukan13 жыл бұрын
Because the next possible moves are either -1 or 0,81. So the best move is to go to 0,81, multiplied by the gamma factor of 0,9. .81 x .9 is .73 :)
@chenguangzhou39913 жыл бұрын
super
@phattaraphatchaiamornvate88272 жыл бұрын
you make me god ty.
@alexusnag5 жыл бұрын
Great tutorial!
@pramodpal33204 жыл бұрын
I mean for the second last cell
@yolomein4155 жыл бұрын
Where do you get that reward function? The R(s, a)?
@pengli47694 жыл бұрын
I would say if it is a chess game. In a state, when your agent places a piece somewhere, and your opponent loose a piece, your agent get a positive reward. State is like current chessboard, action is where you put thr piece, reward can be defined by you.
@niklasdamm6900 Жыл бұрын
21.11 20:55
@asdfasdfuhf4 жыл бұрын
Great content, terrible microphone
@mpgrewal003 жыл бұрын
Las Vegas and AI? What an oxymoron.
@femkemilene45475 жыл бұрын
Using gaming analogies and sexist trope distracts. Also unnecessary info about how to learn.
@gaulvhan28145 жыл бұрын
you are great at explaining things, but why use games and rescuing a princess as examples? smh, be more open-minded about your audience
@katakeresztesi3 жыл бұрын
Is the Move 37 course currently available? i am having a hard time finding it
@tonymornoty63464 жыл бұрын
Explanation help me, but calculations are wrong. Princes is V=1 and square next to prices should be V=0.9.
@brodderick5 жыл бұрын
brilliant
@mishrag1043 жыл бұрын
Topic should be "Bellman Reinforcement Learning for dummies". Great job in explaining with steps.
@somecsmajor Жыл бұрын
Thanks Skowster, you're a real one!
@malanb54 жыл бұрын
The example looks very familiar to the Georgia Tech's Reinforcement Learning course's example
@tianyuzhang74042 жыл бұрын
The first time I a have a feeling of what Bellman’s equation is for. Awesome video.
@ahmet94464 жыл бұрын
I have a question about the step before going to the fire. R(s, a) of fire is -1. The step before, shouldn't be the V value -1 + 0.9 = -0.1?
@MaiMaxTeamIELTSTAАй бұрын
Exactly. I was wondering too why the bottom right corner cell, the V(s) there is 0.73??? I also think the way you calculated is more correct, so: V(s) = -1 + 0.9(1 + 0.9x0) = -1 + 0.9 = -0.1 Because at that state (at that cell) the only way you can move further is to get into the lava pit, even though it definitely is not the optimal option, but you've no other choice right? Can the KZbinr author help clarify here?