how did we computed the values of the probablity in the value iteration??
@mrunalwaghmare14 күн бұрын
Professional yapper
@sayakbanerjee721420 күн бұрын
Indians make understanding so easy for everyone. This is a much easy to understand explanation than what they taught me here at CMU😅
@sanchitagarwal8764Ай бұрын
Excellent explanation sir
@amn1981Ай бұрын
One of the best RL videos!! Thank you for sharing!
@imreezanАй бұрын
why is it V2 0.72? and not 0.8? The reward for moving right from 3,3 is suppoused to be +1 right? and the V(S`) is suppoused to 0 since there will no value if we are in that state, since it is terminal. So V2 is suppoused to be 0.8 right?
@Hareo8891Ай бұрын
Great video! But I think P(s' | s, a) is 0.21?
@Joseph-kd9txАй бұрын
For the value update equation, wouldn't it be simpler to take the R(s,a,s') out of both the sum and the argmax? Given that R(s,a,s') would equal just R(s) in this case? So it would be R(s) + argmax(sum(P*gamma*V(S'))). The sum over all possible next states for P always equals 1. Thus this R(s)/R(s,a,s') term would be the same in or out of this part
@Joseph-kd9txАй бұрын
3:53 agent is in so much constant pain that it just decides to end itself, how interesting
@whilewecan2 ай бұрын
The place (-0.51 North, -0.43 East, 0.15 South, and 0.42 West) ... I do not understand why -0.51 tp gp tp North. The prospect is bright to go North then to East.
@tahmidkhan81322 ай бұрын
like this if you're todd neller.
@jackdoughty64573 ай бұрын
For v2 why would (2,3)=0 if there is still a small chance we go right torwards -1? Wouldn't (2,3)=-0.09 in this case?
@tirth83093 ай бұрын
Suyog sir idhar se kuch sikhlo
@quentinquarantino82613 ай бұрын
Doesn't R(s,a,s') actually mean ending up in s' by chosing action a and being in a? So why is this not the same as being in state s'?
@relaxo8834 ай бұрын
nobody explain why we get square for L1 and circle for L2.
@Saumillakra2 ай бұрын
We get a square for L1 because since here we take 2 weights so its equation will be of the form |w1| + |w2| which gives a square, if you try to plot it. L2 gives an equation of the circle. Hope that clears your doubt!
@keyoorabhyankar58635 ай бұрын
Is it just me or are the subscripts in the wrong directions?
@akinolalanre71446 ай бұрын
Great work but please the red ink on the screen doesn't align with the flow of the explanation and it's create a kind of imbalance.
@SO5RQ6 ай бұрын
Good day! I was wondering if it's possible to update the github course content with the materials related to this section (the "week 9" in the repository is called "Text and Embeddings" and contains very little about RNNs)? The course is great! I've been following it for the past few weeks and learned a lot from it. I would love to also learn more about RNNs.
@Daaninator7 ай бұрын
thank u
@jaiberjohn7 ай бұрын
Excellent points to think about! This lecture questions the overhyped present-day AI, how it lacks compared to Biological Intelligence
@myfolder45617 ай бұрын
very well explained. much better than most others on the same topic
@myfolder45617 ай бұрын
Great explanation!
@royvivat1138 ай бұрын
Great video thank you.
@jlopezll8 ай бұрын
9:06 Why when iterating v2, the values of the all other squares are 0's? Shouldn't the squares near the terminal states have non-zero value?
@alexwasdreaming94403 ай бұрын
I believe it's because not moving is a valid move, otherwise I feel you are right
@Joseph-kd9txАй бұрын
I know why. When evaluating any state adjacent to the -1 terminal state, the argmax will always prefer the action that yields 0 rather than -1. Thus it stays at 0. The argmax is choosing the action that goes directly away from the -1 state so that there's no chance in hell that it could land there, even if it slips. However, there is an interesting case where a state adjacent to a -1 would update: if the state is sandwiched between two -1 terminal states. In this case, no matter what action you take, there is a chance of slipping into one of the negative states, and it would therefore update negatively.
@NehaKariya-d1fАй бұрын
@@Joseph-kd9txThanks for clarifying!
@VIJAYALAKSHMIJ-h2b8 ай бұрын
Cant understand how it it 0.52
@QIANWAN-wi3ty8 ай бұрын
4:45开始
@manishkumarkn8 ай бұрын
can you please suggest some good books for reinforcement learning?
@gunjanshinde3968 ай бұрын
Well explained, thank you !!
@harbaapkabaap20408 ай бұрын
Best video on the topic I have seen so far, to the point and well explained! Kudos to you brother!
@abdullah.montasheri9 ай бұрын
the state value function Bellman equation includes the policy action probability at the beginning of the equation which you did not consider in your equation. any reason why?
@negrito3609 ай бұрын
Amazing
@user-canon0319 ай бұрын
Good!
@SphereofTime9 ай бұрын
0:10
@meobliganaponerunnom10 ай бұрын
I don't understand why this works.
@bean21710 ай бұрын
This is probably one of the best videos I've found describing transitioning from tabular Q-learning to using deep Q networks. Other videos seem to sugar coat the topic with terminology that is far too simple to the point where it almost obscures what is actually being discussed.
@anishreddyanam861710 ай бұрын
Thank you so much! My professor explained this part a bit too fast so I got confused, but this makes a lot of sense!
@Advikchaudhry2811 ай бұрын
VIMAL Sir, your explaintation is better on KZbin then what you teach in Class Lacture
@sgn_sabir11 ай бұрын
GANs in Bengali Knowledge 😊
@don-ju8ck11 ай бұрын
🙏🙏🏿
@cuongnguyenuc177611 ай бұрын
great video!! I understand your lecture, i know that the expectation term can be replace by sampling trajectories, however, in many document, the gradient term is reduce to the term inside the expectation bracket which mean they dont take the average over many trajectories, they update the parameter using one trajectory only. Do you know why this the case? ( I see the algorithm in the book by Sutton & Barto)
@tarkemregunduz687211 ай бұрын
perfect
@jimmylaihk Жыл бұрын
Excellent explanation.
@kyrohrs Жыл бұрын
Great video but how can we use policy iteration for a MDP when the state space grows considerably with each action? I know there’s various methods of approximation for policy iteration but I just haven’t been able to find anything, do you have any resources on this?
@SaloniHitendraPatadia Жыл бұрын
According to bellman equation, I got the value 0.8 * (0.72 + 0.9 * 1) + 0.1 * (0.72 + 0.9 * 0) + 0.1 * (0.72 + 0.9 * 0) = 1.62. Please correct where I got wrong.
@mghaynes24 Жыл бұрын
The living reward is 0, not 0.72. 0.72 is the V at time 2 for grid square (3,3). Use the 0.72 value to update grid squares (2,3) and (3,2) at time step 3.
@newtondurden Жыл бұрын
fantastic video, man I was so confused for some reason when my lecturer was talking about it, not supposed to be hard iguess, just how exactly it worked this video helped fill in the details
@TheClockmister Жыл бұрын
My bald teacher will talk about this for 2 hours and I won’t understand anything. This helps a lot
@Moch11710 ай бұрын
lmfaooo
@fa7234Ай бұрын
does your bald teacher name starts with Charles Isbell
@abhisheksa9552 Жыл бұрын
Does "phi" over here points to the "vector of features" which represents q-state(s,a)?
@subrahmanyaswamyperuru26757 ай бұрын
No. It represents the weight vector of the neural network.
@tower1990 Жыл бұрын
There shouldn’t be any value for the terminal state… my god…
@rouzbehh1705 Жыл бұрын
Thank you so much for your helpful teaching!
@knowledgelover2736 Жыл бұрын
Great explanation. For time series data, should the value selected for the positional encoding match the number of features or the number of features * time steps? Thanks!