Miika Aittala: Elucidating the Design Space of Diffusion-Based Generative Models

  Рет қаралды 7,432

Finnish Center for Artificial Intelligence FCAI

Finnish Center for Artificial Intelligence FCAI

7 ай бұрын

Abstract: We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.
The paper was presented at NeurIPS 2022, where it received an Outstanding Paper Award. Joint work with Tero Karras, Timo Aila and Samuli Laine.
Bio: Miika Aittala is a Senior Research Scientist at NVIDIA Research, which he joined in 2019. He received his PhD in 2016 from Aalto University, working on capture and rendering of surface material appearance. Prior to his current position, he worked as a postdoctoral researcher at MIT CSAIL and visited Inria Sophia Antipolis. His research interests include neural generative modeling and image processing, and realistic image synthesis in computer graphics.

Пікірлер: 6
@susdoge3767
@susdoge3767 Ай бұрын
this video is criminally underrated, thanks for your insights!
@user-uu5ml5dc6n
@user-uu5ml5dc6n Ай бұрын
Really clear explanation of how the diffusion network works !! Thanks
@luke2642
@luke2642 4 ай бұрын
Fantastic talk, and a really good paper. Over 400 citations in two years... a lot to sift through! Would love to see a follow up! I need to dive into the guidance next, this just leaves me wondering why we don't use clip to guide the diffusion of a low resolution semantic map first, that captures the structure, meaning, long range patterns easily, maybe even a course depth map too, and then use that to guide the diffusion of a latent, which then gets intelligently upscaled guided by the semantics.
@fyremaelstrom2896
@fyremaelstrom2896 4 ай бұрын
Excellent elucidation.
@zbaker0071
@zbaker0071 4 ай бұрын
Intelligent insights.
@nickjordan6360
@nickjordan6360 5 ай бұрын
Nice talk
Alexander Ilin: Hierarchical Imitation Learning with Vector Quantized Models
51:11
Finnish Center for Artificial Intelligence FCAI
Рет қаралды 269
ПАРАЗИТОВ МНОГО, НО ОН ОДИН!❤❤❤
01:00
Chapitosiki
Рет қаралды 2,8 МЛН
ДЕНЬ РОЖДЕНИЯ БАБУШКИ #shorts
00:19
Паша Осадчий
Рет қаралды 6 МЛН
Eccentric clown jack #short #angel #clown
00:33
Super Beauty team
Рет қаралды 29 МЛН
Rotary Positional Embeddings: Combining Absolute and Relative
11:17
Efficient NLP
Рет қаралды 23 М.
LoRA explained (and a bit about precision and quantization)
17:07
AI for Mental Health FCAI 23 5 2024 Tiina Parviainen  University of Jyväskylä
16:27
Finnish Center for Artificial Intelligence FCAI
Рет қаралды 42
Intro to AI Safety, Remastered
18:05
Robert Miles AI Safety
Рет қаралды 151 М.
Introduction to image generation
9:06
Google Cloud Tech
Рет қаралды 30 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 160 М.
Reinforcement Learning, by the Book
18:19
Mutual Information
Рет қаралды 74 М.
Hough Transform | Boundary Detection
21:40
First Principles of Computer Vision
Рет қаралды 150 М.
Где раздвижные смартфоны ?
0:49
Не шарю!
Рет қаралды 540 М.
Не обзор DJI Osmo Pocket 3 Creator Combo
1:00
superfirsthero
Рет қаралды 1,3 МЛН
iPhone 15 Unboxing Paper diy
0:57
Cute Fay
Рет қаралды 1,7 МЛН