CS885 Lecture 8a: Multi-armed bandits

  Рет қаралды 21,971

Pascal Poupart

Pascal Poupart

Күн бұрын

Пікірлер: 17
@rafaelmosca6243
@rafaelmosca6243 5 жыл бұрын
Excellent Lectures! Everything crystal clear.
@cbandle5050
@cbandle5050 Жыл бұрын
Suboptimal action != Worst action (unless there are only two actions). This leads to various issues with the proof at 44:20 1) 2nd bullet does not follow from 1st. R(a) >= R(a') for all a' means that action a is actually optimal. However this contradicts bullet 1 which defines action a as suboptimal. 2) Assuming that action a is suboptimal only implies that there merely *exists* an a' so that R(a') > R(a). We cannot deduce universal quantification as it's possible for R(b) < R(a) < R(a') (here action a is suboptimal but not the worst). 3) We also cannot fix this by assuming that we converged to the worst arm a for this proof by contradiction, as the negation of "converging to the best arm" = "not converging or converging to an arm which is not the best".
@user-co6pu8zv3v
@user-co6pu8zv3v 2 жыл бұрын
Excellent lecture!
@shivangitomar5557
@shivangitomar5557 Жыл бұрын
Amazing lecture! Thank you!!
@Bryan-bh7cy
@Bryan-bh7cy Жыл бұрын
well damn it's really good
@umarkhan-hu7yt
@umarkhan-hu7yt Жыл бұрын
please suggest some book to go more in depth of your lecturer. I am following Reinforcement Learning of Richard Sutton.
@ryanjadidi8622
@ryanjadidi8622 4 жыл бұрын
timestamp 44:20 shouldnt it be "assumption that a' is suboptimal" rather than a?
@andresacostaescobar
@andresacostaescobar 5 жыл бұрын
Hi. Where can I find a proof of the statement that P(a_t != a*) ~ epsilon and of the fact that epsilon=1/t leads to a logarithmic regret?
@NilavraPathak
@NilavraPathak 4 жыл бұрын
Andrés Acosta Escobar sum of inverse of t is a log series as it tends to infinity. Easiest explanation is if you integrate 1/t you will get log t.
@sudhas_world
@sudhas_world 4 жыл бұрын
To prove that P(a_t != a*) ~ epsilon, you just need to understand that epsilon-Greedy algorithm explores with probability epsilon at each iteration. Hence, on an average, we can say that that an arm isn't going to be the optimal arm at any time step is just epsilon.
@Cindy-md1dm
@Cindy-md1dm 4 жыл бұрын
The purpose of having epsilon is to increase the exploration. If we try to reduce the probability of getting a suboptimal by reducing epsilon, it will go back to greedy strategy. Then, what's the point?
@m-aun
@m-aun 4 жыл бұрын
You explore at the start to get the best action. After that is estimated for you choose the best action(exploit), and not sub-optimal actions. You want your regret to be low.
@jocarrero
@jocarrero 4 жыл бұрын
True that as epsilon decreases you become greedy but before reaching that point you have had explored a lot!! So the theoretical result is telling you a.- decrease the rate of exploitation over time. b.- But not too fast!! slower than 1/t. This type of results is typical in the theory of stochastic process. It is related to the way “stochastic processes” are constructed. The classical book by Love, vol 2 has a beautiful chapter using this type of conditions to prove that ‘good process” can be constructed in a Kolmogov space.
@jocarrero
@jocarrero 4 жыл бұрын
Just one more comment on the rate at which epsilon is decreasing. Look at Kolmogorov’s 0-1 Law. This law tell you what happens if you keep sampling independently from a distribution.
@Lcipolina
@Lcipolina 3 жыл бұрын
@@jocarrero which book?
@hsujerry7231
@hsujerry7231 Жыл бұрын
greedy strategy also gets regret = O(n) if reward is bounded
@PeekPost
@PeekPost 5 жыл бұрын
amazing
CS885 Lecture 8b: Bayesian and Contextual Bandits
1:17:00
Pascal Poupart
Рет қаралды 13 М.
The Contextual Bandits Problem
54:29
Simons Institute
Рет қаралды 23 М.
Фейковый воришка 😂
00:51
КАРЕНА МАКАРЕНА
Рет қаралды 6 МЛН
هذه الحلوى قد تقتلني 😱🍬
00:22
Cool Tool SHORTS Arabic
Рет қаралды 99 МЛН
escape in roblox in real life
00:13
Kan Andrey
Рет қаралды 45 МЛН
Cute
00:16
Oyuncak Avı
Рет қаралды 7 МЛН
Bandit Algorithms - 1
1:34:05
ICTP Quantitative Life Sciences
Рет қаралды 9 М.
The Contextual Bandits Problem: A New, Fast, and Simple Algorithm
1:00:56
Microsoft Research
Рет қаралды 13 М.
Multi-Armed Bandit : Data Science Concepts
11:44
ritvikmath
Рет қаралды 89 М.
Multi-Armed Bandits: A Cartoon Introduction - DCBA #1
13:59
Academic Gamer
Рет қаралды 42 М.
Machine learning - Bayesian optimization and multi-armed bandits
1:20:30
Nando de Freitas
Рет қаралды 130 М.
Multi-Armed Bandits 1 - Algorithms
13:35
Cynthia Rudin
Рет қаралды 9 М.
Multi-Armed Bandits and A/B Testing
19:01
Jay Feng
Рет қаралды 6 М.
Contextual Bandits
12:32
Reinforcement Learning
Рет қаралды 12 М.
Фейковый воришка 😂
00:51
КАРЕНА МАКАРЕНА
Рет қаралды 6 МЛН