Lecture 2, 2024, Stochastic finite and infinite horizon DP, approximation in value and policy space

  Рет қаралды 1,679

Dimitri Bertsekas

Dimitri Bertsekas

Күн бұрын

Пікірлер: 2
@amitbhaya3571
@amitbhaya3571 4 ай бұрын
On the slide explaining rollout for the TSP (starting around 22:30), the cost from node AC to node ACD should be 1 not 3 (in order to be compatible with the matrix of intercity travel costs). Similarly the cost from ADB to ADBC should be 1 and not 20. If these corrections are made, rollout applied from A will lead to the path A-C-D-B-A (cost 26) and will not recover the optimal path A-B-D-C-A (shown in red on the slide) with cost 13. This shows that rollout could lead to suboptimal outcomes. However, if the A to C cost is changed to 3 [in the matrix of intercity travel costs], then everything works as advertised. Side remark: I also tried modifying the matrix to reflect the (black) numbers on the slide (C to D cost 3, B to C cost 20), but then there is a lot more to change!
@CoyNDPL
@CoyNDPL 6 ай бұрын
I would like to ask for some clarification: On slide 15 / 34 at 48:10, L is introduced with a subscript "N-1", but all of the variables L depends on {a, b, q, r} are time invariant in this example. Is it useful, in general, to show that L has some time dependence? Thanks in advance!
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН
Une nouvelle voiture pour Noël 🥹
00:28
Nicocapone
Рет қаралды 9 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
ML Tutorial: Gaussian Processes (Richard Turner)
1:53:32
Marc Deisenroth
Рет қаралды 138 М.
2024 MIT Integration Bee - Finals
1:09:25
MIT Integration Bee
Рет қаралды 756 М.
Lecture 2: Experimental Facts of Life
1:20:12
MIT OpenCourseWare
Рет қаралды 1,7 МЛН
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН