Author Interview - Improving Intrinsic Exploration with Language Abstractions

Рет қаралды 4,247

Күн бұрын

#reinforcementlearning #ai #explained
This is an interview with Jesse Mu, first author of the paper.
Original Paper Review: • Improving Intrinsic Ex...
Exploration is one of the oldest challenges for Reinforcement Learning algorithms, with no clear solution to date. Especially in environments with sparse rewards, agents face significant challenges in deciding which parts of the environment to explore further. Providing intrinsic motivation in form of a pseudo-reward is sometimes used to overcome this challenge, but often relies on hand-crafted heuristics, and can lead to deceptive dead-ends. This paper proposes to use language descriptions of encountered states as a method of assessing novelty. In two procedurally generated environments, they demonstrate the usefulness of language, which is in itself highly concise and abstractive, which lends itself well for this task.
OUTLINE:
0:00 - Intro
0:55 - Paper Overview
4:30 - Aren't you just adding extra data?
9:35 - Why are you splitting up the AMIGo teacher?
13:10 - How do you train the grounding network?
16:05 - What about causally structured environments?
17:30 - Highlights of the experimental results
20:40 - Why is there so much variance?
22:55 - How much does it matter that we are testing in a video game?
27:00 - How does novelty interface with the goal specification?
30:20 - The fundamental problems of exploration
32:15 - Are these algorithms subject to catastrophic forgetting?
34:45 - What current models could bring language to other environments?
40:30 - What does it take in terms of hardware?
43:00 - What problems did you encounter during the project?
46:40 - Where do we go from here?
Paper: arxiv.org/abs/2202.08938
Abstract:
Reinforcement learning (RL) agents are particularly hard to train when rewards are sparse. One common solution is to use intrinsic rewards to encourage agents to explore their environment. However, recent intrinsic exploration methods often use state-based novelty measures which reward low-level exploration and may not scale to domains requiring more abstract skills. Instead, we explore natural language as a general medium for highlighting relevant abstractions in an environment. Unlike previous work, we evaluate whether language can improve over existing exploration methods by directly extending (and comparing to) competitive intrinsic exploration baselines: AMIGo (Campero et al., 2021) and NovelD (Zhang et al., 2021). These language-based variants outperform their non-linguistic forms by 45-85% across 13 challenging tasks from the MiniGrid and MiniHack environment suites.
Authors: Jesse Mu, Victor Zhong, Roberta Raileanu, Minqi Jiang, Noah Goodman, Tim Rocktäschel, Edward Grefenstette
Links:
TabNine Code Completion (Referral): bit.ly/tabnine-yannick
KZbin: / yannickilcher
Twitter: / ykilcher
Discord: ykilcher.com/discord
BitChute: www.bitchute.com/channel/yann...
LinkedIn: / ykilcher
BiliBili: space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 5

@YannicKilcher 2 жыл бұрын

OUTLINE: 0:00 - Intro 0:55 - Paper Overview 4:30 - Aren't you just adding extra data? 9:35 - Why are you splitting up the AMIGo teacher? 13:10 - How do you train the grounding network? 16:05 - What about causally structured environments? 17:30 - Highlights of the experimental results 20:40 - Why is there so much variance? 22:55 - How much does it matter that we are testing in a video game? 27:00 - How does novelty interface with the goal specification? 30:20 - The fundamental problems of exploration 32:15 - Are these algorithms subject to catastrophic forgetting? 34:45 - What current models could bring language to other environments? 40:30 - What does it take in terms of hardware? 43:00 - What problems did you encounter during the project? 46:40 - Where do we go from here? Paper: arxiv.org/abs/2202.08938

@drtristanbehrens 2 жыл бұрын

This is a fantastic interview! Very inspiring and insightful. Thanks for sharing!

@urfinjus378 2 жыл бұрын

Great! Very thoughtful author, pleasure to listen. Yannic, I would autolike your videos if there would be this option in youtube. Appreciate your efforts and skills to share knowledge and ideas. Towards the paper. If it were obvious that we can take a better policies from adding text, than we can say that it is obvious we can get this extra data from using big image captioning models)

@oncedidactic 2 жыл бұрын

Wow, great interview again. Nice questions as always Yannic, and super impressed with the author. Acquitted himself very well on the questions raised in the paper review and far beyond into deeper and future questions. Really interesting to see how they re-aimed the paper to address a more abstract research question and still benefit from their earlier work in specific algo implementation. This is a fantastic exemplar, how great would it be if just half of the "neat ideas / benchmark chasing" research could be transmogrified into an increment in the "basic research" space. Get low hanging fruit that tastes good *and* is good for you, lol.