Watch on Udacity: www.udacity.co... Check out the full Advanced Operating Systems course for free at: www.udacity.co... Georgia Tech online Master's program: www.udacity.co...
Пікірлер: 10
@chaitanyasharma62702 жыл бұрын
I liked the analogies ypu were using previously, i think of this as telling someone to go higher or lower when you ask them to guess a number, they eventually converge
@bharadwajadi187 жыл бұрын
Great video. Thanks for making this video :)
@ZohreYahyaee9 ай бұрын
I have a question, what is U(s)? in the previous videos it is defined to have U(s0, s1, s2, .....)= sigma of R(si) (with/ without gamma (discount factor)) as R(si) is defined in the model, U(si) is also defined. we don't need to select it arbitrary.
@MinhVu-fo6hd7 жыл бұрын
I think it does help by the fact that the incorrect initial guess being discounted when added to the true reward. This makes the next guess move toward the correct direction faster, so faster converge to the true Utility.
@oldcowbb2 жыл бұрын
they mentioned that actually
@bikcrum2 жыл бұрын
Is solving non-linear equation iteratively somehow related to Newton-Rapson method? (Take initial guess and converge recursively).
@IndrajitRajtilak6 жыл бұрын
Why isn't U(S) equal to the R(S) + max(U(S')) where max(U(S')) is the max utility of its next neighbour? Each of the immediate neighbours capture their immediate neightbours. Why have a summation there? Why not define it as the Reward in current state + max utility of next state, what am I missing?
@Arik19896 жыл бұрын
If still relevant: As I understand it , in the general case the model (T) is stochastic , so when deciding to take action (a), you have to consider that there's are several states (s') that you can end up in. In the grid example, you pick action (a) to go up, you have a chance to go up, left or right, so you have to calculate the EXPECTED VALUE: The probability of reaching each possible state (T(s,a ,s')) - up(0.8), left(0.1), right(0.1) multiplied by the respective utility U(s')
@mcab22223 жыл бұрын
Also you miss the decaying factor. Just summing them would be neglecting decaying factor
@googoonight8588 жыл бұрын
it feels like that Charles does not quite know what he's talking about