[CMU VASC Seminar] Foundation Models for Robotic Manipulation: Opportunities and Challenges

No video

[CMU VASC Seminar] Foundation Models for Robotic Manipulation: Opportunities and Challenges

Рет қаралды 6,973

Күн бұрын

Abstract:
Foundation models, such as GPT-4 Vision, have marked significant achievements in the fields of natural language and vision, demonstrating exceptional abilities to adapt to new tasks and scenarios. However, physical interaction-such as cooking, cleaning, or caregiving-remains a frontier where foundation models and robotic systems have yet to achieve the desired level of adaptability and generalization. In this talk, I will discuss the opportunities for incorporating foundation models into classic robotic pipelines to endow robots with capabilities beyond those achievable with traditional robotic tools. The talk will focus on three key improvements in (1) task specification, (2) low-level, and (3) high-level scene modeling. The core idea behind this series of research is to introduce novel representations and integrate structural priors into robot learning systems, incorporating the commonsense knowledge learned from foundation models to achieve the best of both worlds. I will demonstrate how such integration allows robots to interpret instructions given in free-form natural language and perform few- or zero-shot generalizations for challenging manipulation tasks. Additionally, we will explore how foundation models can enable category-level generalization for free and how this can be augmented with an action-conditioned scene graph for a wide range of real-world manipulation tasks involving rigid, articulated, and nested objects (e.g., Matryoshka dolls), and deformable objects. Towards the end of the talk, I will discuss challenges that still lie ahead and potential avenues to address these challenges.
Bio:
Yunzhu Li is an Assistant Professor of Computer Science at the University of Illinois Urbana-Champaign (UIUC). Before joining UIUC, he collaborated with Fei-Fei Li and Jiajun Wu during his Postdoc at Stanford. Yunzhu earned his PhD from MIT under the guidance of Antonio Torralba and Russ Tedrake. His work stands at the intersection of robotics, computer vision, and machine learning, with the goal of helping robots perceive and interact with the physical world as dexterously and effectively as humans do. Yunzhu’s work has been recognized through the Best Systems Paper Award and the Finalist for Best Paper Award at the Conference on Robot Learning (CoRL). Yunzhu is also the recipient of the Adobe Research Fellowship and was selected as the First Place Recipient of the Ernst A. Guillemin Master’s Thesis Award in Artificial Intelligence and Decision Making at MIT. His research has been published in top journals and conferences, including Nature, NeurIPS, CVPR, and RSS, and featured by major media outlets, including CNN, BBC, The Wall Street Journal, Forbes, The Economist, and MIT Technology Review.
Homepage: yunzhuli.githu...