Ultimate Guide to Diffusion Models | ML Coding Series | Denoising Diffusion Probabilistic Models

  Рет қаралды 40,923

Aleksa Gordić - The AI Epiphany

Aleksa Gordić - The AI Epiphany

Күн бұрын

Пікірлер: 73
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Time to cover diffusion models in greater depth! Do let me know how do you like this combination of papers + coding!
@prabhavkaula9697
@prabhavkaula9697 2 жыл бұрын
Thank you so much for uploading the tutorial. Good resources on diffusion models is such a rarity.
@prabhavkaula9697
@prabhavkaula9697 2 жыл бұрын
13:49 I too am okay with the mathematics and the proofs, but I wanted to know why it works?
@prabhavkaula9697
@prabhavkaula9697 2 жыл бұрын
It would be great if you could share the code!
@nickfratto2439
@nickfratto2439 2 жыл бұрын
Might be better to separate the code & papers into their own videos
@prabhavkaula9697
@prabhavkaula9697 2 жыл бұрын
Thank you for the video, I have had some doubts: I wanted to know if one runs the training script, how does the model save the checkpoints? I also wanted to know while sampling where does the model save the samples?
@blakerains8465
@blakerains8465 2 жыл бұрын
The side by side really does help give me a understanding on the formulas
@Vikram-wx4hg
@Vikram-wx4hg 2 жыл бұрын
Love your tutorials, Aleksa! Also wanted to know if you have covered DDIMs in any tutorial?
@wenbogao2630
@wenbogao2630 6 ай бұрын
amazing video, really helpful!
@omarabubakr6408
@omarabubakr6408 Жыл бұрын
Hey I have a question about the research paper, Why are they using the integration in the beginning of the background section? Thanks in advance. 3:39
@bibiworm
@bibiworm Жыл бұрын
Hi, could you kindly share the repo please? I cant find it on your github. Thanks.
@Skinishh
@Skinishh 2 жыл бұрын
Food for thought: I think it'd be cooler and more informative to build the simplest diffusion model from scratch, using Pytorch/Tensorflow/JAX and other packages of course
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
100%!
@pjborowiecki2577
@pjborowiecki2577 Жыл бұрын
Or even a series, where we start from a simplest possible diffusion-based model, and improve it over time in consecutive videos, implementing latest discoveries from most recent papers. This would be incredible
@rajkiran1982
@rajkiran1982 Жыл бұрын
+1
@JorgeGarcia-eg5ps
@JorgeGarcia-eg5ps 2 жыл бұрын
I have been learning about diffusion models for a week, so the timing on this video was perfect. Thank you!
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Nice!
@tinysquareradius8186
@tinysquareradius8186 Жыл бұрын
Hi Aleska, the zero_module here is meaned to initial the zero weights for the last layers, avoiding the situation that last layers learn everything. But as the learning going on, the last layer will learn some thing. You can check the paper, . ovo
@sanawarhussain
@sanawarhussain 5 ай бұрын
Hi, i am always confused about the forward process equation defined in (2). we say the our images x come from an unknown distribution q(.). but in equation (2) we are saying that this distribution is normal ? we are sampling from a normal distribution to get next forward step. sorry I am not that good when it comes to probability theory.
@akrielz
@akrielz 2 жыл бұрын
Hi Aleksa The formulas side by side comparisons are really useful. Thank you a lot for your dedication! P.S.: I might be wrong, but I believe the bug that you mentioned at the end of the video with images that have 4k steps being over-saturated is caused by the next factor: The whole reason why Diffusion Models work, is that we assume the last step of the noising process will be a noise with mean=0 and variance=1. While it is true that if we take an image, and we gradually apply gaussian noise n steps where n tends to infinity, we will reach that state with mean=0 and var=1, it is important to notice that we can define an n_epsilon where if we take the limit, the image is already peaking the desired mean and variance. This n_epsilon in this case is about 2k. Every image generated in x steps where x > n_epsilon will roughly be same image as the one generated at n_epsilon. Thus, a diffusion model when starts to sample, the noise that is initially generated will be equivalent with the one at the n_epsilon. This means that the first n_epsilon steps from x will actually be able to generate the a good image, while all the steps that are past n_epsilon just destroy the image. This limit with n_epsilon being 2k might have to do with the precision of the operations too tho'.
@bibhabasumohapatra
@bibhabasumohapatra Жыл бұрын
kuch samjh nhin aaraha 😭
@GuanlinLi-l8j
@GuanlinLi-l8j 2 жыл бұрын
Great video. Hope to see a video explaining the code of the "Diffusion model beat GANs" paper.
@rajroy2426
@rajroy2426 11 ай бұрын
the variational lower bound part is not very clear to be honest
@sg.stefan
@sg.stefan 2 жыл бұрын
Thanks for this very useful video full of clear explanations about diffusion models and the bridge between paper formulas and code!
@angelacast135
@angelacast135 2 жыл бұрын
Thanks for this video, it's really helpful. Could you please cover the DDIM paper too? It's super helpful to have the code and equations side-by-side.
@sg.stefan
@sg.stefan 2 жыл бұрын
Thank you very much for this video! Really, really great explanation (although no easy going) of the improved diffusion models and a perfect preparation for your stable diffusion video!
@xiangyuguo9856
@xiangyuguo9856 2 жыл бұрын
I'm fairly familiar with the ddpm code but I still learned a lot, thanks for the nice video!
@ArjunKumar123111
@ArjunKumar123111 2 жыл бұрын
Hey Aleksa, I have a question. When you come across a topic such as Text to Image generation or just Diffusion models, how do you find fundamental papers/articles/reading materials to gain in-depth knowledge on them? And how do you plan and follow through on your learning process. I'm big on self learning but often lack the planning to follow through. I'm inspired by your journey and seek to acquire some guidance. Thanks in advance!
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Hey Arjun! Check out my Medium blogs. I literally have my process captured there. :)) Maybe start with how I landed a job at DeepMind blog
@ЕгорКолодин-й2з
@ЕгорКолодин-й2з 2 жыл бұрын
Amazing! Keep up the good work. It is very interesting!
@anarnurizada9586
@anarnurizada9586 9 ай бұрын
Your videos are amazing. I especially like this simultaneous covering of both the paper and the code. Keep it up! However, maybe you can still make some short (lighter) videos for beginners.
@improvement_developer8995
@improvement_developer8995 2 жыл бұрын
Thanks for showing the code and paper side by side. Really helpful!
@imranq9241
@imranq9241 2 жыл бұрын
Thanks for the video, is there a good toy project that uses diffusion models that you would recommend?
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Hm, toy project - not that I am aware of. I mean if you treat the model as a black box everything is a toy project. GLIDE, DALL-E mini, etc. Although I think you can't run DALL-E mini on a single machine, I might be wrong. Stay tuned! ;)
@jianxiongfeng
@jianxiongfeng 10 ай бұрын
yuor video is very wonderful
@leonardoberti917
@leonardoberti917 11 ай бұрын
The explanation was great. If you went back to making these type of videos would be super.
@arshakrezvani3562
@arshakrezvani3562 Жыл бұрын
Your walkthroughs are perfect, please keep up the good work ❤
@hesselbosma1998
@hesselbosma1998 2 жыл бұрын
Hey nice vid! Do you have any idea why they zero the weights of some of the convolutional layers?
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Wondering the same thing
@susmithasai204
@susmithasai204 Жыл бұрын
Hi. Great Explanation. Also, can you do a video explaining score based generative models i.e score based sde paper and code?
@VarunTulsian
@VarunTulsian Жыл бұрын
great video Aleksa. i am new to torch, i read pytorch rand_like should sample frim a uniform distribution instead of a gaussian. How does that work since we need samples from standard gaussian?
@emanalafandi9474
@emanalafandi9474 2 жыл бұрын
Thank you for tNice tutorials video. I just downloaded soft soft and I was so, so lost. I couldn't even figure out how to make a soft. Your video
@AZTECMAN
@AZTECMAN 2 жыл бұрын
Finally got around to watching this. I quite enjoyed the video.
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Glad to hear that!!
@alexijohansen
@alexijohansen Жыл бұрын
Super valuable video! Many thanks. Can you post a link to your GitHub repo for windows?
@baharazari976
@baharazari976 2 жыл бұрын
Perfect explanation, I really appreciate it if you can share the code that runs on a single gpu. I am having trouble running the code in distributed mode.
@arunram6687
@arunram6687 Жыл бұрын
Loved the code and paper side by side explanation ! Kudos to you ! Follow the code and paper explanations if you can in all videos !
@kargarisaac
@kargarisaac 2 жыл бұрын
amazing Aleksa :) we cannot wait for glide and dalle-2 :)
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Glide is already uploaded! 😀 Check it out!
@nirmalbaishnab4910
@nirmalbaishnab4910 2 жыл бұрын
Fantastic tutorial! It will be very helpful if you share the code. Thanks.
@alessandrozuech61
@alessandrozuech61 2 жыл бұрын
Very nice video! Just a question: how can I apply denoising to a noisy image? It seems to me that this paper can only generate a new image from the learned data distribution, right? Maybe I lost some steps....
@anonymousperson9757
@anonymousperson9757 Жыл бұрын
Hey! I am working on the same problem. It would be great if @Aleksa could make a video on that. I think this paper "Image super-resolution via iterative refinement", a follow-up to the original DDPM has the solution although it focusses on super resolution. To my understanding, in the original DDPM, you are trying to minimize the MSE loss between the noise added in the forward process at time t and the noise predicted by the network. So, the noise predicted by the network is only a function of the noisy input at step t and t itself. In denoising/super resolution, I would assume that there should also be some way of feeding the noisy image to the network as input during training. So in this case, the network would take in the noisy(to be denoised) input, the noisy input from the forward diffusion process and the time step. But I am not entirely sure. Would you like to connect through Discord to discuss this in case you are still working on this?
@almogdavid
@almogdavid Жыл бұрын
Excellent video, thank you very much!
@alexijohansen
@alexijohansen Жыл бұрын
Do you know how outpainting/inpainting works?
@lanjiang9870
@lanjiang9870 Жыл бұрын
Excellent video, it is very helpful ❤
@anatolicvs
@anatolicvs 2 жыл бұрын
It was quite nice video ! Well done sir !
@davita6379
@davita6379 2 жыл бұрын
i love this series
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
🥳🥳🥳
@snsa_kscc
@snsa_kscc 2 жыл бұрын
Gigachad!
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Lol! Such a wordchad thing to say!
@rezabagherian3331
@rezabagherian3331 2 жыл бұрын
thank you
@daniel-mika
@daniel-mika Жыл бұрын
I am curious, is the problem seen at 1:15:05 addressed... Its quite a big error tbh, I am curious if they actually used this code with the error to train because then that means the theory behind how it works is shaky
@orip333
@orip333 Жыл бұрын
There is no error in the code The parentheses are just before the 1 over \alpha-bar_t it's all good.
@lurensss
@lurensss 2 жыл бұрын
I think they initialize some of the layers with a zero weights in order to speed up the training process
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Any pointers/papers?
@lurensss
@lurensss 2 жыл бұрын
@@TheAIEpiphany Unfortunately I can't give any paper reference, during the AI course my prof explained some rules of thumbs for weights initialization, and one is this technique that was implemented in this code.
@convolutionalnn2582
@convolutionalnn2582 2 жыл бұрын
What are the maths require to be research scientist in computer vision? What are best resource? And Best book for Computer Vision?
@sergeychirkunov7165
@sergeychirkunov7165 2 жыл бұрын
Multiview Geometry in Computer Vision. It’s fundamental and quite helpful for research in CV
@convolutionalnn2582
@convolutionalnn2582 2 жыл бұрын
@@sergeychirkunov7165 Can you look for me something in youtube....I search as geometry for computer vision and which playlist should i watch....multiple view geometry in computer vision playlist by Sean Mullery or Cvprtum or 3D Computer Vision by CVRP Lab or any recommendations
@convolutionalnn2582
@convolutionalnn2582 2 жыл бұрын
@@sergeychirkunov7165 People mostly said Linear Algebra Calculas Probability and statistic optimization and even talk about tensor algebra...Are this maths require too?
@saurabhshrivastava224
@saurabhshrivastava224 2 жыл бұрын
@@convolutionalnn2582 Yes, that's true. Basics of LA, Probab and Optimization are sort of mandatory.
@convolutionalnn2582
@convolutionalnn2582 2 жыл бұрын
@@saurabhshrivastava224 Best resource of geometry for Computer Vision?
Diffusion Models Beat GANs on Image Synthesis | ML Coding Series | Part 2
1:08:56
Aleksa Gordić - The AI Epiphany
Рет қаралды 13 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 325 М.
отомстил?
00:56
История одного вокалиста
Рет қаралды 7 МЛН
Офицер, я всё объясню
01:00
История одного вокалиста
Рет қаралды 4 МЛН
Stable Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models | ML Coding Series
1:40:36
Stable Diffusion in Code (AI Image Generation) - Computerphile
16:56
Computerphile
Рет қаралды 292 М.
Diffusion Models | Paper Explanation | Math Explained
33:27
Outlier
Рет қаралды 248 М.
Variational Autoencoders | Generative AI Animated
20:09
Deepia
Рет қаралды 14 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 994 М.
OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers
1:02:38
Gabriel Mongaras
Рет қаралды 11 М.
Diffusion models from scratch in PyTorch
30:54
DeepFindr
Рет қаралды 250 М.
отомстил?
00:56
История одного вокалиста
Рет қаралды 7 МЛН