Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents (+Author)

Рет қаралды 16,077

Күн бұрын

#gpt3 #embodied #planning
In this video: Paper explanation, followed by first author interview with Wenlong Huang.
Large language models contain extraordinary amounts of world knowledge that can be queried in various ways. But their output format is largely uncontrollable. This paper investigates the VirtualHome environment, which expects a particular set of actions, objects, and verbs to be used. Turns out, with proper techniques and only using pre-trained models (no fine-tuning), one can translate unstructured language model outputs into the structured grammar of the environment. This is potentially very useful anywhere where the models' world knowledge needs to be provided in a particular structured format.
OUTLINE:
0:00 - Intro & Overview
2:45 - The VirtualHome environment
6:25 - The problem of plan evaluation
8:40 - Contributions of this paper
16:40 - Start of interview
24:00 - How to use language models with environments?
34:00 - What does model size matter?
40:00 - How to fix the large models' outputs?
55:00 - Possible improvements to the translation procedure
59:00 - Why does Codex perform so well?
1:02:15 - Diving into experimental results
1:14:15 - Future outlook
Paper: arxiv.org/abs/2201.07207
Website: wenlong.page/language-planner/
Code: github.com/huangwl18/language...
Wenlong's Twitter: / wenlong_huang
Abstract:
Can world knowledge learned by large language models (LLMs) be used to act in interactive environments? In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. "make breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While prior work focused on learning from explicit step-by-step examples of how to act, we surprisingly find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans without any further training. However, the plans produced naively by LLMs often cannot map precisely to admissible actions. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. Our evaluation in the recent VirtualHome environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards extracting actionable knowledge from language models. Website at this https URL
Authors: Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch
Links:
Merch: store.ykilcher.com
TabNine Code Completion (Referral): bit.ly/tabnine-yannick
KZbin: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
LinkedIn: / ykilcher
BiliBili: space.bilibili.com/2017636191
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 41

@YannicKilcher 2 жыл бұрын

OUTLINE: 0:00 - Intro & Overview 2:45 - The VirtualHome environment 6:25 - The problem of plan evaluation 8:40 - Contributions of this paper 16:40 - Start of interview 24:00 - How to use language models with environments? 34:00 - What does model size matter? 40:00 - How to fix the large models' outputs? 55:00 - Possible improvements to the translation procedure 59:00 - Why does Codex perform so well? 1:02:15 - Diving into experimental results 1:14:15 - Future outlook Paper: arxiv.org/abs/2201.07207 Website: wenlong.page/language-planner/ Code: github.com/huangwl18/language-planner

@DeadtomGCthe2nd 2 жыл бұрын

these conversations are great. BUT please do the regular paper analysis too. I really value your explanations, drawings, and simplification of the dense technical jargon.

@florianhonicke5448 2 жыл бұрын

I really like the interview format. From my perspective, interview alone is not a substitute of your traditional videos. But since you put the summary at the start, it is a great ensemble of simple on-point description, deepdive and reasoning of the authors.

@brll5733 2 жыл бұрын

Finally we see someone brideging the gap between language models and embodied agents. The most important next step, imo, will be to get this multi-modal with visual input.

@daniellawson9894 2 жыл бұрын

This new format of explanation plus interview is really good. Keep it up!

@marilysedevoyault465 2 жыл бұрын

I am impressed ! Can you imagine what it will be in five years! Thank you Yannick for sharing and thank you to that team!

@forcanadaru 2 жыл бұрын

That is great, hank you, hope they will continue

@changtimwu Жыл бұрын

Revisit the research in the GPT-4/LLaMA moment. It would be interesting if we apply the same techniques on today's small LMs(LLaMA, Alpaca, Vicuna).

@timothy-ul9wp 2 жыл бұрын

combining this result and previous discovery in decision transformer, I start to speculate that language model are somehow well suited for RL / decision-making giving that basically every decode token are in fact a discrete dicision itself Combime with the current multi-model trend, I can see the hype coming

@robottinkeracademy 2 жыл бұрын

I did something similar but much more simple with an understanding of context and priority for interjecting new commands. Excellent work Yannic, keep these coming. Understanding why and what considerations were made is part of the journey.

@SimonJackson13 2 жыл бұрын

Love long time.

@ixion2001kx76 2 жыл бұрын

This raises a lot of possibilities. Basically language models produces something that interacts and evaluates text like a human, encoding a large part of human judgement. So this paper really says anything that needs human-like input can be done with a language model. Can court juries be automated by GPT-3, and in so doing be made more fair and consistent? Can copy editing and constructive criticism of writing be used to teach and check the quality of writing? Can it act like a speech writer, turning badly written text, or even a sketch of ideas, into better quality, more eloquent text? In modern warfare, the slowest part in an air strike is getting approval. Could this give legally accurate automated decisions on fire approval?

@andrewluo6088 2 жыл бұрын

More these kinds of videos

@shengyaozhuang3748 2 жыл бұрын

I think this thing can serve as a meta planner for other classic RL agents. Maybe it's easier to let other agents conditioned on the meta states generated by this.

@senadkurtisi2930 2 жыл бұрын

Is there a possibility for the existence of some subtle error in their translation system that they overlooked? By that I mean the way they translate the embedding of the language model into verbs/objects.

@clementdato6328 2 жыл бұрын

It feels like much of the constraints of the executability that is repeatedly highlighted is but an artifact of the VirtualHome env. I feel this is more like how well LLM does in planning, with the unsatisfactory fact that the evaluation of the plans is dwarfed by a rather limited env. To put it in another way, the bottleneck of these “performance” is largely from the “stupidity” of virtualhome rather than the limitation of LLMs themselves.

@Niels1234321 2 жыл бұрын

Instead of the translation model, could we not simply restrict the sampling step to tokens that correspond to an admissible action in the first language model?

@YannicKilcher 2 жыл бұрын

We discuss exactly this in the interview. First, it's not really possible with an API like gpt3, and second it really hurts the output quality

@robottinkeracademy 2 жыл бұрын

Also I would say that humans will be trained to give commands that are required just as we do today with Alexa, Siri and Cortana.

@DamianReloaded 2 жыл бұрын

They should make the model play Maniac Mansion ^_^

@laurenpinschannels 2 жыл бұрын

curious if any mturkers who participated in this project will ever see this video

@fiNitEarth 2 жыл бұрын

Maybe try "find sota" instead of sofa next time 🤓

@McSwey 2 жыл бұрын

more +author pls

@EricAboussouan Жыл бұрын

Why not use logit bias rather than doing this clunky weighted neighbors search?

@norik1616 2 жыл бұрын

I miss the full-depth reviews that poke in the holes in the paper. It feels like you are mild, if you talk to the authors.

@G12GilbertProduction 2 жыл бұрын

Wen is feel like stressed on this paper conceptual talk, maybe he evidently knows only a half-part Codex language libraries he teach to the interview? :)

@carlotonydaristotile7420 2 жыл бұрын

Wenlong Huang is going to work for Tesla.

@oxiigen 2 жыл бұрын

Humans have smaller brains than some other animals but we have different geometry of the brains.

@teckyify 2 жыл бұрын

I hope we find soon something new, Deep Learning is starting to get old

@doppelrutsch9540 2 жыл бұрын

As long as the test loss goes down it does not get old yet.

@idrisabdi1397 2 жыл бұрын

Tesla Bot : 👀👀

@oxide9717 2 жыл бұрын

Was thinking the exact same thing . What even happen with elon and OpenAI never hear him talk about it

@oxide9717 2 жыл бұрын

Was thinking the exact same thing . What even happen with elon and OpenAI never hear him talk about it

@SimonJackson13 2 жыл бұрын

Net spend dollar spunky?

@alexandrsoldiernetizen162 2 жыл бұрын

Not to belabor the obvious, but why have a robot smearing lotion, shaving and getting little cups of milk from the fridge? How about something you want a robot to really do, like turning bolts, welding, changing tires and digging babies out of slagged down reactor cores?

@drdca8263 2 жыл бұрын

The corpus GPT3 was trained on presumably had more stuff that is informative about the tasks about e.g. getting milk etc. than it does info about how to Well, I’m not sure that a baby in a nuclear reactor would live long enough to be rescued regardless, but still, than those other things you mention. Note that here they aren’t really training new models?

@alexandrsoldiernetizen162 2 жыл бұрын

@@drdca8263 I know its data cutoff was in 2019 so it doesnt know about the Kung Flu or St. Floyd of Fentanyl, but presumably knows about bolts and tires. The data reflects the bias of mechanical turk workers and the state of our effeminized and infantilized society more than anything.

@drdca8263 2 жыл бұрын

@@alexandrsoldiernetizen162 First off: politics is the mindkiller. Now that we’ve got that out of the way... Obviously it has some info about bolts and such, and, I would imagine that it probably has a great deal of technical information about bolts and such. But something important is not just whether something was in the training corpus at all, but the relative proportions. And, not just “is there enough information about bolts”, but “are there as many step by step instructions about doing things with bolts, in the style of a person trying to break down common tasks, as there are for more common human tasks?”. Also, this set of actions in the simulator is, I think, not designed by the authors of this paper. Also, the Mechanical Turk responses were just used for evaluation; they didn’t influence how the model worked. Duh? Oh, maybe you are thinking of the part about how the other people who designed the simulator came up with the word list? [edit: ah, you meant the choice of what tasks it is evaluated on, not what tasks it can give plans for] Though, really, the first line of my reply is the only one that is necessary.

@alexandrsoldiernetizen162 2 жыл бұрын

@@drdca8263 They said mechanical turk people came up with the tasks, hence were responsible for the enumerated scenarios. Presumably the model would have worked with other data, had it been given said data. dur. I presume you are more oriented to lubing with lotion than turning a wrench, so lets leave it there.

@drdca8263 2 жыл бұрын

@@alexandrsoldiernetizen162 I care neither for lotions nor wrenches. I prefer abstractions. Wait, you said “lubing” and “orientation”; was that you calling me gay? Heh. There are like, 2 people I’ve called gay, one was someone being super racist on twitter, and the other was someone on youtube who was insisting that the protagonist of “bee movie” was “trans” (from which I correctly inferred that the person making that claim was gay. They confirmed this in their response.) Anyway, you are presumably a bit trigger happy on that particular accusation. It seems you interpreted my objection to you inserting politics into things as disagreement with your politics? No, that isn’t the reason for my response to it. I do the same regardless of the partisanship in question. Partisanship latches on to people’s brains and makes them say... ...well, let’s just say it degrades the quality of what they say. I don’t mean that people shouldn’t have political opinions, or even a preferred political party. But thinking *too* much about opposing parties being bad will melt anyone’s brain, resulting in doing things like bringing it up in a youtube comment section above machine learning.