Ultimate Guide to Diffusion Models | ML Coding Series | Denoising Diffusion Probabilistic Models

Рет қаралды 42,380

Aleksa Gordić - The AI Epiphany

Күн бұрын

Пікірлер: 74

@TheAIEpiphany 2 жыл бұрын

Time to cover diffusion models in greater depth! Do let me know how do you like this combination of papers + coding!

@prabhavkaula9697 2 жыл бұрын

Thank you so much for uploading the tutorial. Good resources on diffusion models is such a rarity.

@prabhavkaula9697 2 жыл бұрын

13:49 I too am okay with the mathematics and the proofs, but I wanted to know why it works?

@prabhavkaula9697 2 жыл бұрын

It would be great if you could share the code!

@nickfratto2439 2 жыл бұрын

Might be better to separate the code & papers into their own videos

@prabhavkaula9697 2 жыл бұрын

Thank you for the video, I have had some doubts: I wanted to know if one runs the training script, how does the model save the checkpoints? I also wanted to know while sampling where does the model save the samples?

@JorgeGarcia-eg5ps 2 жыл бұрын

I have been learning about diffusion models for a week, so the timing on this video was perfect. Thank you!

@TheAIEpiphany 2 жыл бұрын

Nice!

@akrielz 2 жыл бұрын

Hi Aleksa The formulas side by side comparisons are really useful. Thank you a lot for your dedication! P.S.: I might be wrong, but I believe the bug that you mentioned at the end of the video with images that have 4k steps being over-saturated is caused by the next factor: The whole reason why Diffusion Models work, is that we assume the last step of the noising process will be a noise with mean=0 and variance=1. While it is true that if we take an image, and we gradually apply gaussian noise n steps where n tends to infinity, we will reach that state with mean=0 and var=1, it is important to notice that we can define an n_epsilon where if we take the limit, the image is already peaking the desired mean and variance. This n_epsilon in this case is about 2k. Every image generated in x steps where x > n_epsilon will roughly be same image as the one generated at n_epsilon. Thus, a diffusion model when starts to sample, the noise that is initially generated will be equivalent with the one at the n_epsilon. This means that the first n_epsilon steps from x will actually be able to generate the a good image, while all the steps that are past n_epsilon just destroy the image. This limit with n_epsilon being 2k might have to do with the precision of the operations too tho'.

@arshakrezvani3562 Жыл бұрын

Your walkthroughs are perfect, please keep up the good work ❤

@improvement_developer8995 2 жыл бұрын

Thanks for showing the code and paper side by side. Really helpful!

@tinysquareradius8186 Жыл бұрын

Hi Aleska, the zero_module here is meaned to initial the zero weights for the last layers, avoiding the situation that last layers learn everything. But as the learning going on, the last layer will learn some thing. You can check the paper, . ovo

@blakerains8465 2 жыл бұрын

The side by side really does help give me a understanding on the formulas

@sg.stefan 2 жыл бұрын

Thanks for this very useful video full of clear explanations about diffusion models and the bridge between paper formulas and code!

@arunram6687 Жыл бұрын

Loved the code and paper side by side explanation ! Kudos to you ! Follow the code and paper explanations if you can in all videos !

@sg.stefan 2 жыл бұрын

Thank you very much for this video! Really, really great explanation (although no easy going) of the improved diffusion models and a perfect preparation for your stable diffusion video!

@AZTECMAN 2 жыл бұрын

Finally got around to watching this. I quite enjoyed the video.

@TheAIEpiphany 2 жыл бұрын

Glad to hear that!!

@Skinishh 2 жыл бұрын

Food for thought: I think it'd be cooler and more informative to build the simplest diffusion model from scratch, using Pytorch/Tensorflow/JAX and other packages of course

@TheAIEpiphany 2 жыл бұрын

100%!

@pjborowiecki2577 Жыл бұрын

Or even a series, where we start from a simplest possible diffusion-based model, and improve it over time in consecutive videos, implementing latest discoveries from most recent papers. This would be incredible

@rajkiran1982 Жыл бұрын

@ЕгорКолодин-й2з 2 жыл бұрын

Amazing! Keep up the good work. It is very interesting!

@xiangyuguo9856 2 жыл бұрын

I'm fairly familiar with the ddpm code but I still learned a lot, thanks for the nice video!

@kargarisaac 2 жыл бұрын

amazing Aleksa :) we cannot wait for glide and dalle-2 :)

@TheAIEpiphany 2 жыл бұрын

Glide is already uploaded! 😀 Check it out!

@GuanlinLi-l8j 2 жыл бұрын

Great video. Hope to see a video explaining the code of the "Diffusion model beat GANs" paper.

@anarnurizada9586 Жыл бұрын

Your videos are amazing. I especially like this simultaneous covering of both the paper and the code. Keep it up! However, maybe you can still make some short (lighter) videos for beginners.

@DED_Search Жыл бұрын

Hi, could you kindly share the repo please? I cant find it on your github. Thanks.

@wenbogao2630 10 ай бұрын

amazing video, really helpful!

@davita6379 2 жыл бұрын

i love this series

@TheAIEpiphany 2 жыл бұрын

🥳🥳🥳

@Vikram-wx4hg 2 жыл бұрын

Love your tutorials, Aleksa! Also wanted to know if you have covered DDIMs in any tutorial?

@lanjiang9870 2 жыл бұрын

Excellent video, it is very helpful ❤

@omarabubakr6408 Жыл бұрын

Hey I have a question about the research paper, Why are they using the integration in the beginning of the background section? Thanks in advance. 3:39

@anatolicvs 2 жыл бұрын

It was quite nice video ! Well done sir !

@almogdavid 2 жыл бұрын

Excellent video, thank you very much!

@angelacast135 2 жыл бұрын

Thanks for this video, it's really helpful. Could you please cover the DDIM paper too? It's super helpful to have the code and equations side-by-side.

@sh4ny1 9 ай бұрын

Hi, i am always confused about the forward process equation defined in (2). we say the our images x come from an unknown distribution q(.). but in equation (2) we are saying that this distribution is normal ? we are sampling from a normal distribution to get next forward step. sorry I am not that good when it comes to probability theory.

@daniel-mika Жыл бұрын

I am curious, is the problem seen at 1:15:05 addressed... Its quite a big error tbh, I am curious if they actually used this code with the error to train because then that means the theory behind how it works is shaky

@orip333 Жыл бұрын

There is no error in the code The parentheses are just before the 1 over \alpha-bar_t it's all good.

@susmithasai204 Жыл бұрын

Hi. Great Explanation. Also, can you do a video explaining score based generative models i.e score based sde paper and code?

@baharazari976 2 жыл бұрын

Perfect explanation, I really appreciate it if you can share the code that runs on a single gpu. I am having trouble running the code in distributed mode.

@hesselbosma1998 2 жыл бұрын

Hey nice vid! Do you have any idea why they zero the weights of some of the convolutional layers?

@TheAIEpiphany 2 жыл бұрын

Wondering the same thing

@ArjunKumar123111 2 жыл бұрын

Hey Aleksa, I have a question. When you come across a topic such as Text to Image generation or just Diffusion models, how do you find fundamental papers/articles/reading materials to gain in-depth knowledge on them? And how do you plan and follow through on your learning process. I'm big on self learning but often lack the planning to follow through. I'm inspired by your journey and seek to acquire some guidance. Thanks in advance!

@TheAIEpiphany 2 жыл бұрын

Hey Arjun! Check out my Medium blogs. I literally have my process captured there. :)) Maybe start with how I landed a job at DeepMind blog

@imranq9241 2 жыл бұрын

Thanks for the video, is there a good toy project that uses diffusion models that you would recommend?

@TheAIEpiphany 2 жыл бұрын

Hm, toy project - not that I am aware of. I mean if you treat the model as a black box everything is a toy project. GLIDE, DALL-E mini, etc. Although I think you can't run DALL-E mini on a single machine, I might be wrong. Stay tuned! ;)

@alexijohansen 2 жыл бұрын

Super valuable video! Many thanks. Can you post a link to your GitHub repo for windows?

@VarunTulsian 2 жыл бұрын

great video Aleksa. i am new to torch, i read pytorch rand_like should sample frim a uniform distribution instead of a gaussian. How does that work since we need samples from standard gaussian?

@alexijohansen 2 жыл бұрын

Do you know how outpainting/inpainting works?

@leonardoberti917 Жыл бұрын

The explanation was great. If you went back to making these type of videos would be super.

@jianxiongfeng Жыл бұрын

yuor video is very wonderful

@alessandrozuech61 2 жыл бұрын

Very nice video! Just a question: how can I apply denoising to a noisy image? It seems to me that this paper can only generate a new image from the learned data distribution, right? Maybe I lost some steps....

@anonymousperson9757 2 жыл бұрын

Hey! I am working on the same problem. It would be great if @Aleksa could make a video on that. I think this paper "Image super-resolution via iterative refinement", a follow-up to the original DDPM has the solution although it focusses on super resolution. To my understanding, in the original DDPM, you are trying to minimize the MSE loss between the noise added in the forward process at time t and the noise predicted by the network. So, the noise predicted by the network is only a function of the noisy input at step t and t itself. In denoising/super resolution, I would assume that there should also be some way of feeding the noisy image to the network as input during training. So in this case, the network would take in the noisy(to be denoised) input, the noisy input from the forward diffusion process and the time step. But I am not entirely sure. Would you like to connect through Discord to discuss this in case you are still working on this?

@snsa_kscc 2 жыл бұрын

Gigachad!

@TheAIEpiphany 2 жыл бұрын

Lol! Such a wordchad thing to say!

@nirmalbaishnab4910 2 жыл бұрын

Fantastic tutorial! It will be very helpful if you share the code. Thanks.

@rezabagherian3331 2 жыл бұрын

thank you

@emanalafandi9474 2 жыл бұрын

Thank you for tNice tutorials video. I just downloaded soft soft and I was so, so lost. I couldn't even figure out how to make a soft. Your video

@rajroy2426 Жыл бұрын

the variational lower bound part is not very clear to be honest

@lorenzo.padoan 2 жыл бұрын

I think they initialize some of the layers with a zero weights in order to speed up the training process

@TheAIEpiphany 2 жыл бұрын

Any pointers/papers?

@lorenzo.padoan 2 жыл бұрын

@@TheAIEpiphany Unfortunately I can't give any paper reference, during the AI course my prof explained some rules of thumbs for weights initialization, and one is this technique that was implemented in this code.

@convolutionalnn2582 2 жыл бұрын

What are the maths require to be research scientist in computer vision? What are best resource? And Best book for Computer Vision?

@sergeychirkunov7165 2 жыл бұрын

Multiview Geometry in Computer Vision. It’s fundamental and quite helpful for research in CV

@convolutionalnn2582 2 жыл бұрын

@@sergeychirkunov7165 Can you look for me something in youtube....I search as geometry for computer vision and which playlist should i watch....multiple view geometry in computer vision playlist by Sean Mullery or Cvprtum or 3D Computer Vision by CVRP Lab or any recommendations

@convolutionalnn2582 2 жыл бұрын

@@sergeychirkunov7165 People mostly said Linear Algebra Calculas Probability and statistic optimization and even talk about tensor algebra...Are this maths require too?

@saurabhshrivastava224 2 жыл бұрын

@@convolutionalnn2582 Yes, that's true. Basics of LA, Probab and Optimization are sort of mandatory.

@convolutionalnn2582 2 жыл бұрын

@@saurabhshrivastava224 Best resource of geometry for Computer Vision?