Diffusion Models | PyTorch Implementation

  Рет қаралды 77,848

Outlier

Outlier

Күн бұрын

Diffusion Models are generative models just like GANs. In recent times many state-of-the-art works have been released that build on top of diffusion models such as #dalle , #imagen or #stablediffusion . In this video I'm coding a PyTorch implementation of diffusion models in a very easy and straightforward way. At first I'm showing how to implement an unconditional version and subsequently train it. After that I'm explaining 2 popular improvements for diffusion models: classifier free guidance and exponential moving average. I'm also going to implement both updates and train a conditional model on CIFAR-10 and afterwards compare the different results.
Code: github.com/dome272/Diffusion-...
#diffusion #dalle2 #dalle #imagen #stablediffusion
00:00 Introduction
02:05 Recap
03:16 Diffusion Tools
07:22 UNet
13:07 Training Loop
15:44 Unconditional Results
16:05 Classifier Free Guidance
19:16 Exponential Moving Average
21:05 Conditional Results
21:51 Github Code & Outro
Further Reading:
1. Paper: arxiv.org/pdf/1503.03585.pdf
2. Paper: arxiv.org/pdf/2006.11239.pdf
3. Paper: arxiv.org/pdf/2102.09672.pdf
4. Paper: arxiv.org/pdf/2105.05233.pdf
5. CFG: arxiv.org/pdf/2207.12598.pdf
6. Timestep Embedding: machinelearningmastery.com/a-...
Follow me on instagram lol: / dome271

Пікірлер: 168
@outliier
@outliier Жыл бұрын
Link to the code: github.com/dome272/Diffusion-Models-pytorch
@bao-dai
@bao-dai Жыл бұрын
21:56 The way you starred your own repo makes my day bro 🤣🤣 really appreciate your work, just keep going!!
@outliier
@outliier Жыл бұрын
@@bao-dai xd
@leif1075
@leif1075 Жыл бұрын
@@outliier Thanks for sharing but how do you not get bored or tired of doing the same thing for so long and deal with all the math?
@outliier
@outliier Жыл бұрын
@@leif1075 I love to do it. I don’t get bored
@ananpinya835
@ananpinya835 Жыл бұрын
After I saw your next video "Cross Attention | method and math explained", I would like to see ControlNet's openpose in PyTorch Implementation which control posing on image of a dogs. Or if it is too complicate, you may simplify it to control 2 - 3 branches shape of a tree.
@aladinwunderlampe7478
@aladinwunderlampe7478 Жыл бұрын
Hello, this has become a great video once again. We didn't understand much, but it's still nice to watch. Greetings from home say Mam & Dad. ;-))))
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Great, this video is finally out! Awesome coding explanation! 👏
@FLLCI
@FLLCI Жыл бұрын
This video is really timely and needed. Thanks for the implementation and keep up the good work!
@potisseslikitap7605
@potisseslikitap7605 Жыл бұрын
This channel seems to be growing very fast. Thanks for this amazing tutorial.🤩
@yingwei3436
@yingwei3436 Жыл бұрын
thank you so much for your detailed explaination of the code. It helped me a lot on my way of learning diffusion model. Wish there are more youtubers like you!
@stevemurch3245
@stevemurch3245 Жыл бұрын
Incredible. Very thorough and clear. Very, very well done.
@terencelee6492
@terencelee6492 Жыл бұрын
We chose Diffusion Model as part of our course project, and your videos do save much of my time to understand the concepts and have more focus on implementing the main part. I am really grateful for your contribution.
@javiersolisgarcia
@javiersolisgarcia 3 ай бұрын
This videos is crazy! I don't get tired of recommend it to anyone interesting in diffusion models. I have recently started to research with these type of models and I think your video as huge source of information and guidance in this topic. I find myself recurrently re-watching your video to revise some information. Incredible work, we need more people like you!
@outliier
@outliier 3 ай бұрын
Thank you so much for the kind words!
@Mandollr
@Mandollr 11 ай бұрын
After my midterm week i wanna study diffusion models with your videos im so exited .thanks a lot for good explanation
@manuelsebastianriosbeltran972
@manuelsebastianriosbeltran972 Жыл бұрын
Congrats, This is a great channel!! hope to see more of these videos in the future.
@mmouz2
@mmouz2 7 ай бұрын
Sincere gratitude for this tutorial, this has really helped me with my project. Please continue with such videos.
@subtainmalik5182
@subtainmalik5182 Жыл бұрын
most informative and easy to understand video on diffusion models on youtube, Thanks Man
@user-ch6nf8gs1h
@user-ch6nf8gs1h Жыл бұрын
great tutorial! looking to seeing more of this! keep it up!
@vinc6966
@vinc6966 8 ай бұрын
Amazing tutorial, very informative and clear, nice work!
@prabhavkaula9697
@prabhavkaula9697 Жыл бұрын
Thank you for sharing the implementation since authentic resources are rare
@947973
@947973 9 ай бұрын
Very helpful walk-through. Thank you!
@user-fg4pr4ct6g
@user-fg4pr4ct6g 9 ай бұрын
Thanks, this implementation really helped clear things up.
@rewixx69420
@rewixx69420 Жыл бұрын
I was wating for so long i learnd about condicional difusion models
@NickSergievskiy
@NickSergievskiy Жыл бұрын
Thank you. Best explanation with good DNN models
@talktovipin1
@talktovipin1 Жыл бұрын
Looking forward for some video on Classifier Guidance as well. Thanks.
@gaggablagblag9997
@gaggablagblag9997 6 ай бұрын
Dude, you're amazing! Thanks for uploading this!
@Miurazzo
@Miurazzo Жыл бұрын
Hi, @Outlier , thank you for the awesome explanation ! Just one observation, I believe in line 50 of your code (at 19:10) it should be: uncond_predicted_noise = model(x,t,None) 😁
@outliier
@outliier Жыл бұрын
good catch thank you. (It's correct in the github code tho :))
@yuhaowang9846
@yuhaowang9846 Жыл бұрын
Thank you so much for this sharing, that was perfect!
@haoxu3204
@haoxu3204 4 ай бұрын
The best video for diffusion! Very Clear
@qq-mf9pw
@qq-mf9pw 6 ай бұрын
Incredible explanation, thanks a lot!
@ethansmith7608
@ethansmith7608 Жыл бұрын
this is the most underrated channel i've ever seen, amazing explanation !
@outliier
@outliier Жыл бұрын
thank you so much!
@dylanwattles7303
@dylanwattles7303 4 ай бұрын
nice demonstration, thanks for sharing
@chickenp7038
@chickenp7038 Жыл бұрын
great walkthrough. but where would i implement dynamic or static thresholding as described in the imagen paper? the static thresholding clips all values larger then 1 but my model regularly outputs numbers as high as 5. but it creates images and loss decreases to 0.016 with SmoothL1Loss.
@talktovipin1
@talktovipin1 Жыл бұрын
Very nicely explained. Thanks.
@LMonty-do9ud
@LMonty-do9ud Жыл бұрын
Thank you very much, it has solved my urgent need
@yazou3896
@yazou3896 Жыл бұрын
It's definitely cool and helpful! Thanks!!!
@DiogoSanti
@DiogoSanti 3 ай бұрын
Very well done! Keep the great content!!
@xuefengdu6926
@xuefengdu6926 Жыл бұрын
thanks for your amazing efforts!
@nez2884
@nez2884 Жыл бұрын
awesome implementation!
@junghunkim8467
@junghunkim8467 Жыл бұрын
it is very helpful!! You are a genius.. :) thank you!!
@user-so4vj6xh6j
@user-so4vj6xh6j 8 ай бұрын
Thank you so much for this amazing video! In mention that the first DDPM paper show no necessary of lower bound formulation, could you tell me the specific place in the paper? thanks!
@anonymousperson9757
@anonymousperson9757 Жыл бұрын
Thank you so much for this amazing video! You mention that changing the original DDPM to a conditional model should be as simple as adding in the condition at some point during training. I was just wondering if you had any experience with using DDPM to denoise images? I was planning on conditioning the model on the input noisy data by concatenating it to yt during training. I am going to try and play around with your github code and see if I can get something to work with denoising. Wish me luck!
@henrywong741
@henrywong741 11 ай бұрын
Could you please explain the paper "High Resolution Image Synthesis With Latent Diffusion Models" and its implementations? Your explanations are exceptionally crystal.
@rachelgardner1799
@rachelgardner1799 6 ай бұрын
Fantastic video!
@spyrosmarkesinis443
@spyrosmarkesinis443 Жыл бұрын
Amazing stuff!
@user-mh8pl5wd1s
@user-mh8pl5wd1s Жыл бұрын
Thank you for sharing!
@homataha5626
@homataha5626 Жыл бұрын
Thank you for the video. How can we use diffusion model for inpainting?
@kerenye955
@kerenye955 11 ай бұрын
Great video!
@TheAero
@TheAero 8 ай бұрын
This is my first few days of trying to understand diffusion models. Coding was kinda fun on this one. I will take a break for 1-2 months and study something related like GANs or VAE, or even energy-based models. Then comeback with more general understanding :) Thanks !
@zenchiassassin283
@zenchiassassin283 6 ай бұрын
And transformers for the attention mechanisms + positional encoding
@TheAero
@TheAero 6 ай бұрын
I got that snatched in the past 2 months. Gotta learn the math, what is actually a distribution etc.@@zenchiassassin283
@orestispapanikolaou9798
@orestispapanikolaou9798 Жыл бұрын
Great video!! You make coding seem like playing super mario 😂😂
@sandravu1541
@sandravu1541 Жыл бұрын
great video, you got one new subscriber
@Neptutron
@Neptutron Жыл бұрын
Thank you!!
@doctorshadow2482
@doctorshadow2482 8 ай бұрын
Thank you for the review. So, what is the key to make a step from text description to image? Can you please pinpoint where it is explained?
@gordondou2286
@gordondou2286 9 ай бұрын
Can you please explain how to use Woodfisher technique to approximate second-order gradients? Thanks
@jingweili1867
@jingweili1867 Жыл бұрын
Nice tutorial
@houbenbub
@houbenbub Жыл бұрын
This is GOLD
@mcpow6614
@mcpow6614 Жыл бұрын
Can you do one for tensorflow too btw very good explaination
@marcotommasini5600
@marcotommasini5600 Жыл бұрын
Great video, thanks for making it. I started working with diffusion models very recently and I used you implementation as base for my model. I am currently facing a problem that the MSE loss starts very close to 1 and continues like that but varying between 1.0002 and 1.0004, for this reason the model is not training properly. Did you face any issue like this one? I am using the MNIST dataset to train the network, I wanted to first test it with some less complex dataset.
@justinsong3506
@justinsong3506 Жыл бұрын
I am facing similar problems. I did the experiment on CIFAR10 dataset. The mse loss starts descresing normally but at some points the loss increse to 1 and never descrese again.
@luchaoqi
@luchaoqi Жыл бұрын
Awesome! How did you type Ɛ in code?
@remmaria
@remmaria Жыл бұрын
Your videos are a blessing. Thank you very much!!! Have you tried using DDIM to accelerate predictions? Or any other idea to decrease the number of steps needed?
@outliier
@outliier Жыл бұрын
I have not tried any speedups in any way. But feel free to try it out and tell me / us what works best. In the repo I do linked a fork which implements a couple additions which make the training etc. faster. You can check that out too here: github.com/tcapelle/Diffusion-Models-pytorch
@remmaria
@remmaria Жыл бұрын
@@outliier Thank you! I will try it for sure.
@ovrava
@ovrava Жыл бұрын
Great Video, On what Data did you train your model again?
@chemaguerra1635
@chemaguerra1635 Жыл бұрын
This video is priceless.
@scotth.hawley1560
@scotth.hawley1560 3 ай бұрын
Wonderful video! I notice that at 18:50, the equation for the new noise seems to differ from Eq. 6 in the CFG paper, as if the unconditioned and conditioned epsilons are reversed. Can you comment on that?
@signitureDGK
@signitureDGK 4 ай бұрын
Very cool. How would DDIM models be different? Do they use a deterministic denoising sampler?
@outliier
@outliier 4 ай бұрын
yes indeed
@satpalsinghrathore2665
@satpalsinghrathore2665 Жыл бұрын
Super cool
@SeonhoonKim
@SeonhoonKim Жыл бұрын
Hello, thanks for your a lot contribution ! But a bit confused, At 06:04, just sampling from N(0, 1) totally randomly would not have any "trace" of an image. How come the model infer the image from the totally random noise ?
@outliier
@outliier Жыл бұрын
Hey there, that is sort of the "magic" of diffusion models which is hard to grasp your mind around. But since the model is trained to always see noise between 0% and 100% it sees full noise during training for which it is then trained to denoise it. And usually when you provide conditioning to the model such as class labels or text information, the model has more information than just random noise. But still, unconditional training still works.
@susdoge3767
@susdoge3767 3 ай бұрын
having hard time to understand the mathematical and code aspect of diffusion model although i have a good high level understanding...any good resource i can go through? id appreciate it
@orangethemeow
@orangethemeow Жыл бұрын
Like your channel, please make more videos
@jinhengfeng6440
@jinhengfeng6440 Жыл бұрын
terrific!
@maybritt-sch
@maybritt-sch Жыл бұрын
Great videos on diffusion models, very understandable explanations! For how many hours did you train it? I tried adjusting your conditional model and train with a different dataset, but it seems to take forever :D
@outliier
@outliier Жыл бұрын
Yea it took quite long. On the 3090 it trained a couple days (2-4 days I believe)
@maybritt-sch
@maybritt-sch Жыл бұрын
@@outliier Thanks for the feedback. Ok seems like I didn't do a mistake, but only need more patience!
@outliier
@outliier Жыл бұрын
@@maybritt-sch Yea. Let me know how it goes or if you need help
@janevirahman9904
@janevirahman9904 3 ай бұрын
Hi , I want to use a single underwater image dataset what changes do i have to implement on the code?
@ravishankar2180
@ravishankar2180 Жыл бұрын
can you do a text to image in small dataset similar to SD from scratch?
@agiengineer
@agiengineer Жыл бұрын
Can you please tell me how much time was need to train this 3000 image for 500 Epoch?
@colintsang-ww6mz
@colintsang-ww6mz Жыл бұрын
Thank you very much for this very easy-to-understand implementation. I have one question: I don't understand the function def noise_images. Assume that we have img_{0}, img_{1}, ..., img_{T}, which are obtained from adding the noise iteratively. I understand that img{t} is given by the formula "sqrt_alpha_hat * img_{0} + sqrt_one_minus_alpha_hat * Ɛ". However, I don't understand the function "def noise_images(self, x, t)" in [ddpm.py]. It return Ɛ, where Ɛ = torch.randn_like(x). So, this is just a noise signal draw directly from the normal distribution. I suppose this random noise is not related to the input image? It is becasue randn_like() returns a tensor with the same size as input x that is filled with random numbers from a normal distribution with mean 0 and variance 1 In training, the predicted noise is compared to this Ɛ (line 80 in [ddpm.py]). Why we are predicting this random noise? Shouldn't we predict the noise added at time t, i.e. "img_{t} - img_{t-1}"?
@Laszer271
@Laszer271 Жыл бұрын
I had the same misconception before. It was actually explained by "AI Coffee Break with Letitia" channel in a video titled "How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED". Basically, the model tries to predict the WHOLE noise added to the image to go from noised image to a fully denoised image in ONE STEP. Because it's a hard task to do, the model does not excel at that so at inference we denoise it iteratively, each time subtracting only a small fraction of the noise predicted by the model. In this way, the model produces much better quality samples. At least that's how I understood it :P
@rikki146
@rikki146 11 ай бұрын
@@Laszer271 While I understand it predicts the "whole noise", this "whole noise" is newly generated and I suppose the ground truth is (img_{t} - img_{0)).. still can't wrap my head around it.
@pedrambazrafshan9598
@pedrambazrafshan9598 5 ай бұрын
@outliier Do you think there is a way to run the code with a 3060 GPU on personal desktop? I get the error message: CUDA out of memory.
@wizzy1996pl
@wizzy1996pl 7 ай бұрын
last self attention layer (64, 64) changes my training type from 5 minutes to hours per epoch, do you know why? training on a single rtx 3060 TI gpu
@ankanderia4999
@ankanderia4999 Ай бұрын
` x = torch.randn((n, 3, self.img_size, self.img_size)).to(self.device) predicted_noise = model(x, t) ` in the deffusion class why you create an noise and pass that noise into the model to predict noise ... please explain
@Kooshiar
@Kooshiar 9 ай бұрын
best diffusion youtube
@sweetautumnfox
@sweetautumnfox 2 ай бұрын
With this training method, wouldn't there be a possibility of some timesteps not being trained in an epoch? wouldn't it be better to shuffle the whole list of timesteps and then sample sequentially with every batch?
@zedtarwu3074
@zedtarwu3074 Жыл бұрын
Great video! How long did it take to train the models?
@outliier
@outliier Жыл бұрын
About 3-4 days on an rtx 3090.
@kashishmathukiya8091
@kashishmathukiya8091 8 ай бұрын
8:38 in the UNet section, how do you decide on the number of channels to set in both input and output to the Down and Up classes. Why just 64,128, etc. ?
@outliier
@outliier 8 ай бұрын
People just go with powers of 2 usually. And usually you go to more channels in the deeper layers of the network.
@kashishmathukiya8091
@kashishmathukiya8091 8 ай бұрын
@@outliier oh okay got it. Thank you so much for clearing that and for the video! I had seen so many videos / read articles for diffusion but yours were the best and explained every thing which others considered prerequisites!! Separating the paper explanation and implementation was really helpful.
@Naira-ny9zc
@Naira-ny9zc Жыл бұрын
Thank you...U just made diffusion so easy to understand... I would like to ask ; What changes do I need to make in order to give an image as condition rather than a label as condition. I mean how to load ground Truth from GT repository as label (y).
@outliier
@outliier Жыл бұрын
Depends on your task. Could you specify what you want to achieve? Super resolution? Img2Img?
@Naira-ny9zc
@Naira-ny9zc Жыл бұрын
@@outliier I want to generate thermal IR images conditioned on their respective RGB images . I know that in order to achieve this task i.e ; Image (RGB) to Image (Thermal IR) translation, I have to concat the input to U-net (which of course is thermal noise image ) with its corresponding RGB (condition image) and give this concatenated output as final input to the unet ; but problem is that I am not able to put this all together in the code (especially concatenating each RGB image (condition) from RGB image folder with its corresponding Thermal noise images so that I can pass the concatenated resultant image as final input to Unet as my aim is to generate RGB conditioned Thermal image using Diffusion.
@nomaannafi7561
@nomaannafi7561 6 ай бұрын
How can i increase the size of the generated image here?
@andonso
@andonso 7 ай бұрын
How can i increase the img size to 128 pixels square?
@Soso65929
@Soso65929 2 ай бұрын
So the process of adding noise and removing it happens in a loop
@user-hb5le6qt8t
@user-hb5le6qt8t 7 ай бұрын
is anyone find the DDPM Unet architecture figure, I can't find it
@mihailchirobocea2751
@mihailchirobocea2751 Жыл бұрын
Your content is the best one in the field of AI! I love it. I tried to use this code on my own dataset and find out that even after 100 epochs the model generates some block images and I don't understand why? is this kind of a bug or it's a normal thing and it gets fixed with longer training? I am thinking that the output of the model, ie the predicted noise, is not always in the same range as the noise (which seams to be 0,1 as we are returning something with torch.rand_like) and this might end up with some values outside of the 0, 255 final image range. I am thinking to add sigmoid activation for the last layer to clip the data to 0, 1. Do you have any thoughts about that. I also tried to use cosine schedule, tho the results of epoch 80 were still not that complex, and it looked worst than the one generated in epoch 40 by the linear schedule. I noticed that the loss was also higher so my though is that this task is a more difficult one and needs a lot more train. This is also the reason why you chose the linear one? (about the code, I just replaced the prepare_noise_schedule function with the one provided by OpenAI, not 100% if this approch is correct).
@outliier
@outliier Жыл бұрын
Hey there. First of all thank you so much for the nice words. Can I ask you to open an issue on GitHub and upload some images of what you are experiencing? Ill take a look then and maybe also others :)
@andrewluo6088
@andrewluo6088 Жыл бұрын
The best
@muhammadawais2173
@muhammadawais2173 6 ай бұрын
thanks for the easiest implementation. could you plz tell us how to find FID and IS score for these images?
@outliier
@outliier 6 ай бұрын
I think you would just sample 10-50k images from the trained model and then take 10-50k images from the original dataset and then calculate the FID and IS
@muhammadawais2173
@muhammadawais2173 6 ай бұрын
@@outliier thanks
@FLLCI
@FLLCI Жыл бұрын
lol training this on my ONLY "RTX 3090" :D
@mic9657
@mic9657 Жыл бұрын
great video. please can you list the creators of the other helpful videos at 00:52? thanks
@outliier
@outliier Жыл бұрын
There are from Yannick Kilcher (on the right side), the one in the lower left is from AICoffeeBreak, the one in the top right corner is the first video that comes when you google „diffusion models explained“ and I forgot the middle one sorry. But shouldnt be hard to find
@versusFliQq
@versusFliQq Жыл бұрын
Really nice video! I also enjoyed your explanation video - great work in general :) However, I noticed at around 5:38, you are defining sample_timesteps with low=1. I am pretty sure that this is wrong, as Python indexes at 0 meaning you skip the first noising step every time you access alpha, alpha_cumprod etc. Correct me if I am wrong but all the other implementations also utilise zero-indexing.
@arpanpoudel
@arpanpoudel 11 ай бұрын
this function sample the timesteps of the denoising step. selecting time=0 is the original image itself. there is no point in taking 0 timestep.
@konradkaranowski6553
@konradkaranowski6553 Жыл бұрын
6:57 Why the formula is ... + torch.sqrt(beta) instead of calculated posterior variance like in paper?
@outliier
@outliier Жыл бұрын
Which paper are you referring to? In the first paper, you would just set the variance to beta and since you add the std * noise you take the sqrt(beta)
@gabrielchan3255
@gabrielchan3255 Жыл бұрын
Thx Mr gigachad
@UnbelievableRam
@UnbelievableRam Ай бұрын
Hi! Can you please explain why the output is getting two stitched images?
@outliier
@outliier Ай бұрын
What do you mean with two stitched images?
@rajroy2426
@rajroy2426 11 ай бұрын
Can we use diffusion to a neural architecture for classification
@outliier
@outliier 11 ай бұрын
I think people have done that to. But I don’t remember the papers. But maybe just look for „diffusion models for classification“
@rajroy2426
@rajroy2426 11 ай бұрын
@@outliier thanks, I figured it out myself.
@khangvutien2538
@khangvutien2538 3 ай бұрын
People in Earth Observation know that images from Synthetic Aperture Radar have random speckles. People have tried removing the speckles using wavelets. I wonder how Denoising Diffusion would fare. The difficulty that I see is the need for x0 the un-noised image. What do you think?
@noushineftekhari4211
@noushineftekhari4211 7 ай бұрын
Hi, Thank you for the Video! Can you please explain the test part: n = 4 device = "cpu" model = UNet_conditional(num_classes=4).to(device) ckpt = torch.load(r"C:\Users oueft\Downloads\Diffusion-Models-pytorch-V7\models\DDPM_conditional\ckpt.pt", map_location=torch.device('cpu')) model.load_state_dict(ckpt) diffusion = Diffusion(img_size=64, device=device) y = torch.Tensor([6] * n).long().to(device) x = diffusion.sample(model, n, y) plot_images(x) What is n, and why did the following error come up when I ran it? ddpm_conditional.py", line 81, in sample n = len(labels) TypeError: object of type 'int' has no len()
@bendev6807
@bendev6807 Жыл бұрын
How do you use this models to generate text to image?
@outliier
@outliier Жыл бұрын
You would need to train it on text-image pairs instead of label-image pairs as in the video. And you would need to scale up the model and dataset size to get some nice results
@ethaneaston6443
@ethaneaston6443 Жыл бұрын
Thanks for your amazing sharing. I have a problem as follows in "forward" function of "DoubleConv" Class, return F.gelu(x + self.double_conv(x)) will cause an error. here shape of x is (batchsize×3×64×64),shape of self.double_conv(x) is (batchsize×64×64×64),Adding them together will result in an error in pytorch syntax.
@outliier
@outliier Жыл бұрын
Hmm this is weird. What exactly did you run? Could you open an issue on github? What if you run this code from here onwards (without the launch()) github.com/dome272/Diffusion-Models-pytorch/blob/main/ddpm_conditional.py#L129
@ethaneaston6443
@ethaneaston6443 Жыл бұрын
@@outliier I have successfully run your code(very very thankful for your sharing!!!😆), but in debug mode I found that there is a small problem as described above, it should report an error here, but the runtime is fine. And I think the code here should be return F.gelu(self.double_conv(x)), Could you please explain why you add x here? Furthermore, do you think we can add an another residual block in the Unet Structure?What is your opinion
@mathkernel5136
@mathkernel5136 Жыл бұрын
Hi, do you know why the code gives this error when I increase the image size from 64 to 512? Traceback (most recent call last): File "/home/domainHomes/aokolie/Desktop/Augustine Workspace/SWAG PROJECT/Diffusion models/Diffusion-Models-pytorch/ddpm.py", line 118, in launch() File "/home/domainHomes/aokolie/Desktop/Augustine Workspace/SWAG PROJECT/Diffusion models/Diffusion-Models-pytorch/ddpm.py", line 114, in launch train(args) File "/home/domainHomes/aokolie/Desktop/Augustine Workspace/SWAG PROJECT/Diffusion models/Diffusion-Models-pytorch/ddpm.py", line 85, in train predicted_noise = model(x_t, t) File "/home/domainHomes/aokolie/enter/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/domainHomes/aokolie/Desktop/Augustine Workspace/SWAG PROJECT/Diffusion models/Diffusion-Models-pytorch/modules.py", line 181, in forward x3 = self.down2(x2, t) File "/home/domainHomes/aokolie/enter/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/home/domainHomes/aokolie/Desktop/Augustine Workspace/SWAG PROJECT/Diffusion models/Diffusion-Models-pytorch/modules.py", line 107, in forward return x + emb RuntimeError: The size of tensor a (192) must match the size of tensor b (3) at non-singleton dimension 0
@outliier
@outliier Жыл бұрын
@@mathkernel5136 Yea. Can you please look into the github issues. Other people had that problem too and we discussed solutions there
@mathkernel5136
@mathkernel5136 Жыл бұрын
Hey, I am getting an error when i try to use one channel "RuntimeError: Given groups=1, weight of size [64, 1, 3, 3], expected input[4, 3, 64, 64] to have 1 channels, but got 3 channels instead" What can I do?
@outliier
@outliier Жыл бұрын
You need to change the input and output channels in the unet code
@Gruell
@Gruell Ай бұрын
Sorry if I am misunderstanding, but at 19:10, shouldn't the code be: "uncond_predicted_noise = model(x, t, None)" instead of "uncond_predicted_noise = model(x, labels, None)" Also, according to the CFG paper's formula, shouldn't the next line be: "predicted_noise = torch.lerp(predicted_noise, uncond_predicted_noise, -cfg_scale)" under the definition of lerp? One last question: have you tried using L1Loss instead of MSELoss? On my implementation, L1 Loss performs much better (although my implementation is different than yours). I know the ELBO term expands to essentially an MSE term wrt predicted noise, so I am confused as to why L1 Loss performs better for my model. Thank you for your time.
@Gruell
@Gruell Ай бұрын
Great videos by the way
@Gruell
@Gruell Ай бұрын
Ah, I see you already fixed the first question in the codebase
@egoistChelly
@egoistChelly 6 ай бұрын
I think your code bugs when adjust image_size?
@NoahElRhandour
@NoahElRhandour Жыл бұрын
pog!!!
Diffusion Models | Paper Explanation | Math Explained
33:27
Outlier
Рет қаралды 222 М.
Diffusion models from scratch in PyTorch
30:54
DeepFindr
Рет қаралды 226 М.
Miracle Doctor Saves Blind Girl ❤️
00:59
Alan Chikin Chow
Рет қаралды 33 МЛН
BRUSH ONE’S TEETH WITH A CARDBOARD TOOTHBRUSH!#asmr
00:35
HAYATAKU はやたく
Рет қаралды 28 МЛН
Did you find it?! 🤔✨✍️ #funnyart
00:11
Artistomg
Рет қаралды 120 МЛН
The U-Net (actually) explained in 10 minutes
10:31
rupert ai
Рет қаралды 75 М.
Diffusion and Score-Based Generative Models
1:32:01
MITCBMM
Рет қаралды 65 М.
Miika Aittala: Elucidating the Design Space of Diffusion-Based Generative Models
52:46
Finnish Center for Artificial Intelligence FCAI
Рет қаралды 7 М.
Stable Diffusion in Code (AI Image Generation) - Computerphile
16:56
Computerphile
Рет қаралды 283 М.
How I Understand Diffusion Models
17:39
Jia-Bin Huang
Рет қаралды 19 М.
Pydantic is all you need: Jason Liu
17:55
AI Engineer
Рет қаралды 164 М.
Mapping GPT revealed something strange...
1:09:14
Machine Learning Street Talk
Рет қаралды 102 М.
ULTIMATE FREE LORA Training In Stable Diffusion! Less Than 7GB VRAM!
21:14
CVPR #18546 - Denoising Diffusion Models: A Generative Learning Big Bang
3:04:32
ComputerVisionFoundation Videos
Рет қаралды 12 М.
Huawei который почти как iPhone
0:53
Romancev768
Рет қаралды 298 М.
Теперь это его телефон
0:21
Хорошие Новости
Рет қаралды 1,7 МЛН
How Neuralink Works 🧠
0:28
Zack D. Films
Рет қаралды 31 МЛН
Выложил СВОЙ АЙФОН НА АВИТО #shorts
0:42
Дмитрий Левандовский
Рет қаралды 915 М.
What’s your charging level??
0:14
Татьяна Дука
Рет қаралды 7 МЛН