This is super clear. Thanks so much for making this video.
@audic23502 жыл бұрын
The greatest video I could watch to understand MDP.
@vishalkumarpandey5546 Жыл бұрын
Such an insightful discussion based explanation. Great 👍
@QQ-xx7mo6 жыл бұрын
Awesome videos, Thank you
@cigxhang48611 ай бұрын
so the policy tells you the next action to take in order for you to reach the reward eventually?
@renskirchner63094 жыл бұрын
You're a genius
@renskirchner63094 жыл бұрын
When it comes to explanation imean
@enditend29 жыл бұрын
no part 5?
@braineedly75432 жыл бұрын
Is decision of policy based on model?
@lahaale58407 жыл бұрын
is the reward by given? or where is the reward come from? is it equivalent to label data in supervise learning?
@oldcowbb3 жыл бұрын
i think it is more like the cost function associated with whether the prediction matches with the label, it is some numerical function to indicate what you want the algorithm to optimize, like matching labels in classification or getting closer to the goal in navigation
@braineedly75432 жыл бұрын
@@oldcowbb so we should store every reward of each state?
@oldcowbb2 жыл бұрын
@@braineedly7543 well you can't solve an MDP without the reward so yes
@joselabaki82902 жыл бұрын
The Instructor is excellent, unfortunately, the explanation is slowed down, sometimes "blurred" because of the non-stop interjections. I believe a single voice is more than enough.