COMPSCI 188 - 2018-09-18 - Markov Decision Processes (MDPs) Part 1/2

  Рет қаралды 48,361

Webcast Departmental

Webcast Departmental

Күн бұрын

Пікірлер: 16
@sanjanachopra2100
@sanjanachopra2100 4 жыл бұрын
Topic: Markov Decision Process, Value iteration
@Hecklit
@Hecklit 6 жыл бұрын
Start 3:33
@AkshayAradhya
@AkshayAradhya 5 жыл бұрын
Just awkward silence till then
@youssefabdallah3940
@youssefabdallah3940 4 жыл бұрын
Does the overheated state at 1:20:00 correspond to the pit for example in the grid world? This is somehow confusing me because in the grid world there was an exit action that we have to take to get the negative reward, but here Prof. Klein mentions that you take the reward when you transition to this state
@nate7368
@nate7368 3 жыл бұрын
Yes, they're both terminal states. There are different ways to construct the MDP. In Grid World it was arbitrarily decided that you have to leave the terminal state to get the +1/-1 reward. They could have decided to instead get the reward for transitioning into the state.
@quiteSimple24
@quiteSimple24 6 жыл бұрын
Question: 1:12:08 Why the optimal policy(arrow) next to the fire pit is north? I think it should be west. If you choose north, there are chances to fall in to the fire pit and no chance to get a diamond, so the sum of discounted rewards is less than zero. If you choose west, it is always zero. Also I wonder why the arrow in the top second state heads east. If it is a matter of tie breaking than why the other states head north.
@shivendraiitkgp
@shivendraiitkgp 5 жыл бұрын
Did you ever get the answer to these doubts? I have the same doubts. I have one more doubt - Why does the other green state have V_2(s) = 0.72? I thought it should be 0.8*1 + 0.1*0 + 0.1*( 0.8*0 + 0.1*0 + 0.1*-1) = 0.79
@AkshayAradhya
@AkshayAradhya 5 жыл бұрын
I had the same question. I think the optimal policy next to the fire pit should be WEST and not NORTH
@akshara08
@akshara08 5 жыл бұрын
I think it should be north because we are calculating rewards for just 2 time steps, the discounted reward for going north is >0, since it also includes the probability of going in the right direction.
@rogertrullo8272
@rogertrullo8272 4 жыл бұрын
the optimal policy at that specific time step, suggest that the best value is to go north (the value is 0.72 which is greater than zero). This value already took into account the chances of failing.Same goes to the top second state. The other states are pointing north but they could be pointing anywhere because the value is the same (zero); note however the bottom right state which could be pointing anywhere except north because the value is -1 which is less than zero. This is more clear in next video, the part called policy extraction (extraction action from values)
@LoveIsTheCure001
@LoveIsTheCure001 4 жыл бұрын
@@shivendraiitkgp You forgot to multiply by the discount factor of 0.9, so 0.8*0.9 = 0.72
@jonsnow9246
@jonsnow9246 3 жыл бұрын
Why expectimax doesn't work? 1:07:00
@alexandrebrownAI
@alexandrebrownAI Жыл бұрын
See 1:06:35 Essentially, because we have a very deep tree when in fact there is only 3 states that are repeated over and over again. Also the tree goes on forever. Using Expectimax would be doing "hard work" instead of "smart work", it is not the appropriate algorithm for such cases. What is mentioned at 1:07:24 is that if you try to use expectimax with some tricks such as caching and limiting the depth then you'd actually end-up close to the Value Iteration algorithm which is the algorithm that is more appropriate for these situations. Therefore expectimax alone is not the best choice and one should consider more appropriate technics like the value iteration algorithm. Hope this helps.
@jayanthr1112
@jayanthr1112 5 жыл бұрын
54:32 Break Ends
COMPSCI 188 - 2018-09-20 - Markov Decision Processes (MDPs) Part 2/2
1:25:00
Webcast Departmental
Рет қаралды 34 М.
COMPSCI 188 - 2018-09-25 - Reinforcement Learning Part 1/2
1:25:00
Webcast Departmental
Рет қаралды 37 М.
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 36 МЛН
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН
人是不能做到吗?#火影忍者 #家人  #佐助
00:20
火影忍者一家
Рет қаралды 20 МЛН
Леон киллер и Оля Полякова 😹
00:42
Канал Смеха
Рет қаралды 4,7 МЛН
COMPSCI 188 - 2018-09-13 - Search with Other Agents: Expectimax, Utilities
1:25:00
COMPSCI 188 - 2018-09-27 - Reinforcement Learning Part 2/2
1:25:00
Webcast Departmental
Рет қаралды 27 М.
COMPSCI 188 - 2018-08-30 - A* Search and Heuristics
1:25:00
Webcast Departmental
Рет қаралды 56 М.
Simon Sinek's Advice Will Leave You SPEECHLESS 2.0 (MUST WATCH)
20:43
Alpha Leaders
Рет қаралды 2,5 МЛН
Markov Decision Processes - Computerphile
17:42
Computerphile
Рет қаралды 180 М.
Markov Decision Process (MDP) - 5 Minutes with Cyrill
3:36
Cyrill Stachniss
Рет қаралды 33 М.
Markov Chains Clearly Explained! Part - 1
9:24
Normalized Nerd
Рет қаралды 1,3 МЛН
Markov Decision Processes
43:18
Bert Huang
Рет қаралды 77 М.
Solving MDPs
15:32
CIS 522 - Deep Learning
Рет қаралды 14 М.
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 36 МЛН