Finding Policies Two - Georgia Tech - Machine Learning

  Рет қаралды 12,057

Udacity

Udacity

Күн бұрын

Watch on Udacity: www.udacity.co...
Check out the full Advanced Operating Systems course for free at: www.udacity.co...
Georgia Tech online Master's program: www.udacity.co...

Пікірлер: 10
@chaitanyasharma6270
@chaitanyasharma6270 2 жыл бұрын
I liked the analogies ypu were using previously, i think of this as telling someone to go higher or lower when you ask them to guess a number, they eventually converge
@bharadwajadi18
@bharadwajadi18 7 жыл бұрын
Great video. Thanks for making this video :)
@ZohreYahyaee
@ZohreYahyaee 9 ай бұрын
I have a question, what is U(s)? in the previous videos it is defined to have U(s0, s1, s2, .....)= sigma of R(si) (with/ without gamma (discount factor)) as R(si) is defined in the model, U(si) is also defined. we don't need to select it arbitrary.
@MinhVu-fo6hd
@MinhVu-fo6hd 7 жыл бұрын
I think it does help by the fact that the incorrect initial guess being discounted when added to the true reward. This makes the next guess move toward the correct direction faster, so faster converge to the true Utility.
@oldcowbb
@oldcowbb 2 жыл бұрын
they mentioned that actually
@bikcrum
@bikcrum 2 жыл бұрын
Is solving non-linear equation iteratively somehow related to Newton-Rapson method? (Take initial guess and converge recursively).
@IndrajitRajtilak
@IndrajitRajtilak 6 жыл бұрын
Why isn't U(S) equal to the R(S) + max(U(S')) where max(U(S')) is the max utility of its next neighbour? Each of the immediate neighbours capture their immediate neightbours. Why have a summation there? Why not define it as the Reward in current state + max utility of next state, what am I missing?
@Arik1989
@Arik1989 6 жыл бұрын
If still relevant: As I understand it , in the general case the model (T) is stochastic , so when deciding to take action (a), you have to consider that there's are several states (s') that you can end up in. In the grid example, you pick action (a) to go up, you have a chance to go up, left or right, so you have to calculate the EXPECTED VALUE: The probability of reaching each possible state (T(s,a ,s')) - up(0.8), left(0.1), right(0.1) multiplied by the respective utility U(s')
@mcab2222
@mcab2222 3 жыл бұрын
Also you miss the decaying factor. Just summing them would be neglecting decaying factor
@googoonight858
@googoonight858 8 жыл бұрын
it feels like that Charles does not quite know what he's talking about
Iterations Quiz - Georgia Tech - Machine Learning
1:40
Udacity
Рет қаралды 25 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН
Mom Hack for Cooking Solo with a Little One! 🍳👶
00:15
5-Minute Crafts HOUSE
Рет қаралды 23 МЛН
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 120 МЛН
Reinforcement Learning: Machine Learning Meets Control Theory
26:03
Steve Brunton
Рет қаралды 298 М.
How a Russian student invented a faster multiplication method
18:48
Running a Buffer Overflow Attack - Computerphile
17:30
Computerphile
Рет қаралды 2 МЛН
Support Vector Machines Part 3: The Radial (RBF) Kernel (Part 3 of 3)
15:52
StatQuest with Josh Starmer
Рет қаралды 290 М.
2007 vs. 2025
10:29
OneTwo
Рет қаралды 247 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН