ACCEL: Evolving Curricula with Regret-Based Environment Design (Paper Review)

Рет қаралды 10,333

Күн бұрын

#ai #accel #evolution
Automatic curriculum generation is one of the most promising avenues for Reinforcement Learning today. Multiple approaches have been proposed, each with their own set of advantages and drawbacks. This paper presents ACCEL, which takes the next step into the direction of constructing curricula for multi-capable agents. ACCEL combines the adversarial adaptiveness of regret-based sampling methods with the capabilities of level-editing, usually found in Evolutionary Methods.
OUTLINE:
0:00 - Intro & Demonstration
3:50 - Paper overview
5:20 - The ACCEL algorithm
15:25 - Looking at the pseudocode
23:10 - Approximating regret
33:45 - Experimental results
40:00 - Discussion & Comments
Website: accelagent.github.io
Paper: arxiv.org/abs/2203.01302
Abstract:
It remains a significant challenge to train generally capable agents with reinforcement learning (RL). A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from their generality, with theoretical guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces. By contrast, evolutionary approaches seek to incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. In this paper we propose to harness the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of the paper is available at this http URL.
Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
Links:
TabNine Code Completion (Referral): bit.ly/tabnine-yannick
KZbin: / yannickilcher
Twitter: / ykilcher
Discord: ykilcher.com/discord
BitChute: www.bitchute.com/channel/yann...
LinkedIn: / ykilcher
BiliBili: space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 19

@YannicKilcher 2 жыл бұрын

OUTLINE: 0:00 - Intro & Demonstration 3:50 - Paper overview 5:20 - The ACCEL algorithm 15:25 - Looking at the pseudocode 23:10 - Approximating regret 33:45 - Experimental results 40:00 - Discussion & Comments Website: accelagent.github.io Paper: arxiv.org/abs/2203.01302

@arirahikkala 2 жыл бұрын

Regret-based environment design? That's how I decorate my apartment!

@ChocolateMilkCultLeader 2 жыл бұрын

Brilliant comment

@fahds2583 2 жыл бұрын

the neat way you explain these concepts makes me more intrigued into Machine learning

@alan2here 2 жыл бұрын

It's like novelty seeking :)

@That-Google-Guy 2 жыл бұрын

Omg the green boi @ 1:15 watching him fail to proceed is so frustrating lol

@ChocolateMilkCultLeader 2 жыл бұрын

I think the environment algorithm will probably be the MVP of this. I can see a situation where creating general environment generators helps us train RL agents. Environment setup can be a giant pain in RL

@robbiero368 2 жыл бұрын

Feels like they could easily improve the level design so that even the challenging levels maintain easy parts instead of just making every single step forward a challenge

@alan2here 2 жыл бұрын

Maybe you want no forgetting, so a immutable edges between neurones and constantly expanding neural network. But then theres an occasional wave of pruning/forgetting trying each edge at a time, and they're only removed if every previous game level doesn't get significantly worse when you do so.

@ssssssstssssssss 2 жыл бұрын

It seems to me a standard random number generator is not ideal for this. When generating a procedural level, the initial seed will determine the level generated given a set of parameters. Pseudo-random number generators aim to have no correlation between consecutive numbers, though. I think it would be more beneficial to have seeds that are close to one another generate similar levels

@angry4rtichoke646 2 жыл бұрын

How can you calculate the optimal policy, to get the regret?

@raphaelambrosiuscosteau829 2 жыл бұрын

"Regret-based design" sounds like my approach to real life

@hurktang 2 жыл бұрын

I see quite a problem with this approach. Maybe I skipped that part, but it seems to me like the level generator have to generate solvable levels. Which mean that you can only train AI on problems that are already solved by other robust algorithms. Which ... let's be honest here, are not the kind of problems we need AIs for. Am I missing something ?

@_rockt 2 жыл бұрын

That's actually not correct. The generator (and editor) can produce unsolvable levels. The great thing about the approach is that unsolvable problems are filtered out automatically as they don't incur any regret for the agent. For details, check the paper arxiv.org/abs/2203.01302 (also compare to arxiv.org/abs/2010.03934 which this paper builds upon).