Lecture 17 - MDPs & Value/Policy Iteration | Stanford CS229: Machine Learning Andrew Ng (Autumn2018)

Рет қаралды 87,611

Stanford Online

Күн бұрын

Пікірлер: 16

@supersnowva6717 8 ай бұрын

Such a great lecture on RL! Super clear on these algorithms, thanks so much Profession Ng!

@myao8930 2 жыл бұрын

This is the best instruction among all videos on reinforcement learning. Thank you!

@ali57555 Жыл бұрын

Thank you very much for explaining this in such simple terms! Been looking for some time for something good to understand MDPs

@KipIngram 2 жыл бұрын

1:13:19 - Probably the right model here is the one we used to spread across the planet. Most folks are trying to get by best they can, and are likely to pursue "exploitative" strategies - what they know will bring them what they want. But sometimes we explicitly launch exploration missions, and the "success criterion" of such a mission is very different from that of a "profit oriented initiative." The "payoff" for an exploration mission is *knowledge*. I think keeping the two things cleanly separate is probably the way to go.

@Ayanshandseals 2 жыл бұрын

indeed (1-epsilon) Greedy is the correct term and should have been used!

@genotabby 8 ай бұрын

48:42 this should be for stochastic methods right? If it is deterministic then the value policy V(S) should be calculated based on the 100% chance of the direction in the optimal policy. For stochastic it would be, in this case, 0.8 in the direction of the optimal policy, 0.1 chance for left side of the optimal policy, 0.1 chance for the right side. Since the left side is already at the border, it would return back to it's original state hence 0.1*0.71

@henkjekel4081 Жыл бұрын

Thank you andrew, u the best

@gokdeniztingur7515 8 ай бұрын

great video man!

@griffinbholt 2 жыл бұрын

Is there one student in the class with just a crazy deep voice? Or are they masking students' voices?

@griffinbholt 2 жыл бұрын

Nvm. I can confirm they are masking students' voices. One time they accidentally masked Dr. Ng's voice.

@PhucHoang-ng4vh Жыл бұрын

@@griffinbholt u can see in another video, they would blurred it whenever a student appeared on screen

@KipIngram 2 жыл бұрын

1:12:00 - I feel exactly the same mixed feelings that Dr. Ng seems to feel here. On the one hand, this technology is amazing, and there are so many wonderful things we can do with it, such as helping people get better medical care more quickly, and so on. These things could save lives. But there are also so many nasty things we can do with them; this general category of stuff is part of how we're.. sterilizing the world, so to speak. Removing the "humanity" from things and making our culture colder, more clinical, and less empathic and compassionate. I honestly don't know how to walk that tightrope - in cases like this "if we don't do it, someone else will." I suppose the best we can do is just try every day to keep some sort of "human-ness" in our endeavors. Some of us will do a pretty good job of that - some of us won't. 😞