Watch on Udacity: www.udacity.co... Check out the full Advanced Operating Systems course for free at: www.udacity.co... Georgia Tech online Master's program: www.udacity.co...
Пікірлер: 6
@michelaka68367 жыл бұрын
Not only will explained, but immensely important
@jhk9219 жыл бұрын
So, a state that was falsely assigned with a negative utility value will slowly move toward its true utility value because it will be edited by its true reward R(s) and its utility value in regard to its neighbors, and the same would hold for the others as well?
@jhk9219 жыл бұрын
So, I can start with flat out 0s for the utility values, and this thing would still work? I think it would, and I'm pretty sure, but I've never actually put this algorithm to the test so I'm not really certain.
@sunilrathee24796 жыл бұрын
How to make max differentiable?
@oldcowbb3 жыл бұрын
LogSumExp
@fgfanta9 жыл бұрын
You have already used "t" for time so far, it is misleading to use it for the number of iteration in the algorithm; you could use, say, "i" instead