Рет қаралды 72
Subrahmanya Swamy Peruru
This lecture introduces multi-arm bandits, a special case of the Reinforcement learning problem. Further, an algorithm called explore-then-commit is discussed.