Lesson 9A 2022 - Stable Diffusion deep dive

  Рет қаралды 30,490

Jeremy Howard

Jeremy Howard

Күн бұрын

Johno shows us what is happening behind the scenes when we create an image with Stable Diffusion, looking at the different components and processes and how each can be modified for further control over the generation process. The notebook is available in this repository: github.com/fastai/diffusion-nbs
This was made as a companion to lesson 9 of the FastAI 2022 course by Jonathan Whitaker (his channel: kzbin.info/door/P6gT9X2....
00:00 - Introduction
00:40 - Replicating the sampling loop
01:17 - The Auto-Encoder
03:55 - Adding Noise and image-to-image
08:43 - The Text Encoding Process
15:15 - Textual Inversion
18:36 - The UNET and classifier free guidance
24:41 - Sampling explanation
36:30 - Additional guidance
Errata: there should be some scaling done to the model inputs for the unet demo in cell 49 (19 minutes in) - see scheduler.scale_model_input in all the loops for the code that is missing. And in the autoencoder part the 'compression' isn't exactly 64 times since there are 4 channels in the latent representation and only 3 in the input.

Пікірлер: 15
@markhopkins8731
@markhopkins8731 Жыл бұрын
Love your simple explanation of a manifold Jonathan. It's the first time it's made sense to me. Looking forward to the coming lectures.
@al3030
@al3030 Жыл бұрын
Thank you for this deep dive. The sampling explanation especially was helpful to try to get an intuition for what the model does.
@timandersen8030
@timandersen8030 Жыл бұрын
Appreciate this supplemental deep dive into code of stable diffusion!
@spider853
@spider853 Жыл бұрын
I finally understand the schedulers! Thank you!
@saidmoglu
@saidmoglu Жыл бұрын
pretty good video to further understand SD!
@climez
@climez Жыл бұрын
This is useful but I wish you went into more detail here and there. Is some CLIP or similar model included in the stable diffusion implementation? If so, are precomputed weights of the CLIP model used to calculate noise_prediction in each step? I.e. we pass the current noisy image (in a latent space) and the text embedding to CLIP and then calculate the gradient for each voxel of the image so that something (semantic similarity?) is maximized? I wish you would say what happens during training of the mode and what then happens during inference :).
@alexrichmonkey7845
@alexrichmonkey7845 Жыл бұрын
Please explain the ancestral samplers.
@adityagupta-hm2vs
@adityagupta-hm2vs 9 күн бұрын
Also, are we using latent space as gradients here, as we are subtracting gradients from the latent, which we typically do from weights in conventional NN ?
@jaivalani4609
@jaivalani4609 Жыл бұрын
How can it perform the custom action. basically how can we fine tune it for our input and target image we want as per our text action
@JohnSmith-he5xg
@JohnSmith-he5xg Жыл бұрын
Why do you "sample()" from the latents? Does this mean the latents are not the same between runs?
@adityagupta-hm2vs
@adityagupta-hm2vs 9 күн бұрын
How do we decide the scaling factor in VAE part i.e. 0.18215, any hint on how to decide it ? I did try changing and could see the different output, but what's a good way to choose ?
@AM-yk5yd
@AM-yk5yd Жыл бұрын
I'm surprised how... complexity(?) raised up. It's second day and I only on 4th minute, spent 30 minutes debugging my coding-along session (I wrote rand_like instead of randn_like and my parrot photo went green instead of grambled)
@offchan
@offchan Жыл бұрын
rand is uniform whereas rand is normal (gaussian)
@howardjeremyp
@howardjeremyp Жыл бұрын
Feel free to skip over lessons 9A and 9B if you don't feel ready for them just yet - they're optional extras for those looking to dig deeper.
@DinoFancellu
@DinoFancellu Ай бұрын
Don't like all this jumping around. Would be much easier to simply go through it, in a linear fashion, explaining as you go. Disappointing
Lesson 9B - the math of diffusion
51:36
Jeremy Howard
Рет қаралды 30 М.
бесит старшая сестра!? #роблокс #анимация #мем
00:58
КРУТОЙ ПАПА на
Рет қаралды 3 МЛН
Wait for the last one! 👀
00:28
Josh Horton
Рет қаралды 112 МЛН
OMG🤪 #tiktok #shorts #potapova_blog
00:50
Potapova_blog
Рет қаралды 17 МЛН
She ruined my dominos! 😭 Cool train tool helps me #gadget
00:40
Go Gizmo!
Рет қаралды 59 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 206 М.
The U-Net (actually) explained in 10 minutes
10:31
rupert ai
Рет қаралды 83 М.
Why Stable Diffusion Beats Midjourney
0:59
Overpowered
Рет қаралды 224 М.
Diffusion Models Explained with Math From Scratch
31:21
Computer Vision with Hüseyin Özdemir
Рет қаралды 1,4 М.
Diffusion Models | Paper Explanation | Math Explained
33:27
Outlier
Рет қаралды 229 М.
The rarest move in chess
17:01
Paralogical
Рет қаралды 1,2 МЛН
How I Understand Diffusion Models
17:39
Jia-Bin Huang
Рет қаралды 22 М.
бесит старшая сестра!? #роблокс #анимация #мем
00:58
КРУТОЙ ПАПА на
Рет қаралды 3 МЛН