Can Wikipedia Help Offline Reinforcement Learning? (Author Interview)

Рет қаралды 4,254

Күн бұрын

#wikipedia #reinforcementlearning #languagemodels
Original paper review here: • Can Wikipedia Help Off...
Machel Reid and Yutaro Yamada join me to discuss their recent paper on langauge model pre-training for decision transformers in offline reinforcement learning.
OUTLINE:
0:00 - Intro
1:00 - Brief paper, setup & idea recap
7:30 - Main experimental results & high standard deviations
10:00 - Why is there no clear winner?
13:00 - Why are bigger models not a lot better?
14:30 - What’s behind the name ChibiT?
15:30 - Why is iGPT underperforming?
19:15 - How are tokens distributed in Reinforcement Learning?
22:00 - What other domains could have good properties to transfer?
24:20 - A deeper dive into the models' attention patterns
33:30 - Codebase, model sizes, and compute requirements
37:30 - Scaling behavior of pre-trained models
40:05 - What did not work out in this project?
42:00 - How can people get started and where to go next?
Paper: arxiv.org/abs/2201.12122
Code: github.com/machelreid/can-wik...
My Video on Decision Transformer: • Decision Transformer: ...
Abstract:
Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on a variety of environments, accelerating training by 3-6x and achieving state-of-the-art performance in a variety of tasks using Wikipedia-pretrained and GPT2 language models. We hope that this work not only brings light to the potentials of leveraging generic sequence modeling techniques and pre-trained models for RL, but also inspires future work on sharing knowledge between generative modeling tasks of completely different domains.
Authors: Machel Reid, Yutaro Yamada, Shixiang Shane Gu
Links:
TabNine Code Completion (Referral): bit.ly/tabnine-yannick
KZbin: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
LinkedIn: / ykilcher
BiliBili: space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 13

@YannicKilcher 2 жыл бұрын

OUTLINE: 0:00 - Intro 1:00 - Brief paper, setup & idea recap 7:30 - Main experimental results & high standard deviations 10:00 - Why is there no clear winner? 13:00 - Why are bigger models not a lot better? 14:30 - What’s behind the name ChibiT? 15:30 - Why is iGPT underperforming? 19:15 - How are tokens distributed in Reinforcement Learning? 22:00 - What other domains could have good properties to transfer? 24:20 - A deeper dive into the models' attention patterns 33:30 - Codebase, model sizes, and compute requirements 37:30 - Scaling behavior of pre-trained models 40:05 - What did not work out in this project? 42:00 - How can people get started and where to go next? Paper: arxiv.org/abs/2201.12122 Code: github.com/machelreid/can-wikipedia-help-offline-rl My Video on Decision Transformer: kzbin.info/www/bejne/Y5PYhn-jm5yXm8U

@lucasbeyer2985 2 жыл бұрын

yeah this format with split videos and uploaded on different days works well for the way I watch you videos! One thing I noticed across several videos, when you have multiple authors, you could try to play "moderator" a little more. When asking an open-ended/opinion question (as opposed to a specific question about a specific experiment/plot/..), after one author gave their opinion, if the other author doesn't say anything, then actively ask them too, like "and MrX, what's your opinion on this?" or similar.

@codac7608 2 жыл бұрын

This new paper review format is great Yannic. Thanks!

@serta5727 2 жыл бұрын

Nice I am looking forward to the interview

@logo2462 2 жыл бұрын

I love the new release format! Perfect for me.

@mgostIH 2 жыл бұрын

I would be curious to see what are the effects of training both objectives at the same time, maybe smoothly transitioning the loss from focusing first on the language task and then the RL task

@serta5727 2 жыл бұрын

Haha that is awesome. Just using a language transformer and it makes a great basis for a neural network to play a game

@toyuyn 2 жыл бұрын

This format is, in my opinion, a bit awkward. I watch the paper review when it comes out, but my interest in the paper drops in the time between, before the interview comes out. On the other hand, putting the 2 videos together and releasing them as a single extra-long video will hurt retention because of the length.

@carlotonydaristotile7420 2 жыл бұрын

The problem is that wikipedia has lot's of misinformation.

@Kram1032 2 жыл бұрын

doubt that's a huge problem if all you're trying to do is to get a robot to walk lol

@jawadmansoor6064 2 жыл бұрын

1. We should not dismiss any source of information. 2. This is not a NLP problem, RL does not require a bot to speak so it is not relevant. 3. (1a) We can always curate a true set of information after we have a bot that is capable of speech. 4. This is completely new thing, new things (ideas) should be supported. Who could have thought you can have an agent make good/better decision by training him on data that is totally non related to his task.

@bayesianlee6447 2 жыл бұрын

The bigger poblem would be wikipedia is utmost informative against all the other information u can find in internet

@carlotonydaristotile7420 2 жыл бұрын

When it comes to math and physics Wikipedia seems accurate. But what worries me is when Wikipedia talks about individuals it is very political. Example African American Candace Owens, who suffered racism here in the USA. Wikipedia mentions a stupid journal which claimed she is a member of the "alt-right"(Wikipedia says alt-right is white nationalist). Wikipedia knows it's a lie what the Newspaper said but they intentionally mention this on the wiki page to defame her, since they don't like her. So what happens to AI training. Hopefully it is smart enough to see through politics left-right attacks.