ECCV 2024: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

  Рет қаралды 98

Voxel51

Voxel51

Күн бұрын

In this talk, I will introduce our recent work on open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, for open-vocabulary 3D semantic segmentation and visual grounding tasks. Diff2Scene gets rid of any labeled 3D data and effectively identifies objects, appearances, locations and their compositions in 3D scenes.
ECCV 2024 Paper: Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
arxiv.org/abs/...
About the Speaker
Xiaoyu Zhu is a Ph.D. student at Language Technologies Institute, School of Computer Science, Carnegie Mellon University. Her research interest is computer vision, multimodal learning, and generative models.

Пікірлер
Lamborghini vs Smoke 😱
00:38
Topper Guild
Рет қаралды 56 МЛН
Как Я Брата ОБМАНУЛ (смешное видео, прикол, юмор, поржать)
00:59
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 16 МЛН
[NMIXX] 김지우 바보
0:19
궁금하면 오해원
Рет қаралды 8 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 387 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
ECCV 2024 Redux: Day 1 - Tree-of-Life Meets AI
28:39
Voxel51
Рет қаралды 58
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 3,9 МЛН