Time to cover diffusion models in greater depth! Do let me know how do you like this combination of papers + coding!
@prabhavkaula96972 жыл бұрын
Thank you so much for uploading the tutorial. Good resources on diffusion models is such a rarity.
@prabhavkaula96972 жыл бұрын
13:49 I too am okay with the mathematics and the proofs, but I wanted to know why it works?
@prabhavkaula96972 жыл бұрын
It would be great if you could share the code!
@nickfratto24392 жыл бұрын
Might be better to separate the code & papers into their own videos
@prabhavkaula96972 жыл бұрын
Thank you for the video, I have had some doubts: I wanted to know if one runs the training script, how does the model save the checkpoints? I also wanted to know while sampling where does the model save the samples?
@JorgeGarcia-eg5ps2 жыл бұрын
I have been learning about diffusion models for a week, so the timing on this video was perfect. Thank you!
@TheAIEpiphany2 жыл бұрын
Nice!
@akrielz2 жыл бұрын
Hi Aleksa The formulas side by side comparisons are really useful. Thank you a lot for your dedication! P.S.: I might be wrong, but I believe the bug that you mentioned at the end of the video with images that have 4k steps being over-saturated is caused by the next factor: The whole reason why Diffusion Models work, is that we assume the last step of the noising process will be a noise with mean=0 and variance=1. While it is true that if we take an image, and we gradually apply gaussian noise n steps where n tends to infinity, we will reach that state with mean=0 and var=1, it is important to notice that we can define an n_epsilon where if we take the limit, the image is already peaking the desired mean and variance. This n_epsilon in this case is about 2k. Every image generated in x steps where x > n_epsilon will roughly be same image as the one generated at n_epsilon. Thus, a diffusion model when starts to sample, the noise that is initially generated will be equivalent with the one at the n_epsilon. This means that the first n_epsilon steps from x will actually be able to generate the a good image, while all the steps that are past n_epsilon just destroy the image. This limit with n_epsilon being 2k might have to do with the precision of the operations too tho'.
@arshakrezvani3562 Жыл бұрын
Your walkthroughs are perfect, please keep up the good work ❤
@improvement_developer89952 жыл бұрын
Thanks for showing the code and paper side by side. Really helpful!
@tinysquareradius8186 Жыл бұрын
Hi Aleska, the zero_module here is meaned to initial the zero weights for the last layers, avoiding the situation that last layers learn everything. But as the learning going on, the last layer will learn some thing. You can check the paper, . ovo
@blakerains84652 жыл бұрын
The side by side really does help give me a understanding on the formulas
@sg.stefan2 жыл бұрын
Thanks for this very useful video full of clear explanations about diffusion models and the bridge between paper formulas and code!
@arunram6687 Жыл бұрын
Loved the code and paper side by side explanation ! Kudos to you ! Follow the code and paper explanations if you can in all videos !
@sg.stefan2 жыл бұрын
Thank you very much for this video! Really, really great explanation (although no easy going) of the improved diffusion models and a perfect preparation for your stable diffusion video!
@AZTECMAN2 жыл бұрын
Finally got around to watching this. I quite enjoyed the video.
@TheAIEpiphany2 жыл бұрын
Glad to hear that!!
@Skinishh2 жыл бұрын
Food for thought: I think it'd be cooler and more informative to build the simplest diffusion model from scratch, using Pytorch/Tensorflow/JAX and other packages of course
@TheAIEpiphany2 жыл бұрын
100%!
@pjborowiecki2577 Жыл бұрын
Or even a series, where we start from a simplest possible diffusion-based model, and improve it over time in consecutive videos, implementing latest discoveries from most recent papers. This would be incredible
@rajkiran1982 Жыл бұрын
+1
@ЕгорКолодин-й2з2 жыл бұрын
Amazing! Keep up the good work. It is very interesting!
@xiangyuguo98562 жыл бұрын
I'm fairly familiar with the ddpm code but I still learned a lot, thanks for the nice video!
@kargarisaac2 жыл бұрын
amazing Aleksa :) we cannot wait for glide and dalle-2 :)
@TheAIEpiphany2 жыл бұрын
Glide is already uploaded! 😀 Check it out!
@GuanlinLi-l8j2 жыл бұрын
Great video. Hope to see a video explaining the code of the "Diffusion model beat GANs" paper.
@anarnurizada9586 Жыл бұрын
Your videos are amazing. I especially like this simultaneous covering of both the paper and the code. Keep it up! However, maybe you can still make some short (lighter) videos for beginners.
@DED_Search Жыл бұрын
Hi, could you kindly share the repo please? I cant find it on your github. Thanks.
@wenbogao263010 ай бұрын
amazing video, really helpful!
@davita63792 жыл бұрын
i love this series
@TheAIEpiphany2 жыл бұрын
🥳🥳🥳
@Vikram-wx4hg2 жыл бұрын
Love your tutorials, Aleksa! Also wanted to know if you have covered DDIMs in any tutorial?
@lanjiang98702 жыл бұрын
Excellent video, it is very helpful ❤
@omarabubakr6408 Жыл бұрын
Hey I have a question about the research paper, Why are they using the integration in the beginning of the background section? Thanks in advance. 3:39
@anatolicvs2 жыл бұрын
It was quite nice video ! Well done sir !
@almogdavid2 жыл бұрын
Excellent video, thank you very much!
@angelacast1352 жыл бұрын
Thanks for this video, it's really helpful. Could you please cover the DDIM paper too? It's super helpful to have the code and equations side-by-side.
@sh4ny19 ай бұрын
Hi, i am always confused about the forward process equation defined in (2). we say the our images x come from an unknown distribution q(.). but in equation (2) we are saying that this distribution is normal ? we are sampling from a normal distribution to get next forward step. sorry I am not that good when it comes to probability theory.
@daniel-mika Жыл бұрын
I am curious, is the problem seen at 1:15:05 addressed... Its quite a big error tbh, I am curious if they actually used this code with the error to train because then that means the theory behind how it works is shaky
@orip333 Жыл бұрын
There is no error in the code The parentheses are just before the 1 over \alpha-bar_t it's all good.
@susmithasai204 Жыл бұрын
Hi. Great Explanation. Also, can you do a video explaining score based generative models i.e score based sde paper and code?
@baharazari9762 жыл бұрын
Perfect explanation, I really appreciate it if you can share the code that runs on a single gpu. I am having trouble running the code in distributed mode.
@hesselbosma19982 жыл бұрын
Hey nice vid! Do you have any idea why they zero the weights of some of the convolutional layers?
@TheAIEpiphany2 жыл бұрын
Wondering the same thing
@ArjunKumar1231112 жыл бұрын
Hey Aleksa, I have a question. When you come across a topic such as Text to Image generation or just Diffusion models, how do you find fundamental papers/articles/reading materials to gain in-depth knowledge on them? And how do you plan and follow through on your learning process. I'm big on self learning but often lack the planning to follow through. I'm inspired by your journey and seek to acquire some guidance. Thanks in advance!
@TheAIEpiphany2 жыл бұрын
Hey Arjun! Check out my Medium blogs. I literally have my process captured there. :)) Maybe start with how I landed a job at DeepMind blog
@imranq92412 жыл бұрын
Thanks for the video, is there a good toy project that uses diffusion models that you would recommend?
@TheAIEpiphany2 жыл бұрын
Hm, toy project - not that I am aware of. I mean if you treat the model as a black box everything is a toy project. GLIDE, DALL-E mini, etc. Although I think you can't run DALL-E mini on a single machine, I might be wrong. Stay tuned! ;)
@alexijohansen2 жыл бұрын
Super valuable video! Many thanks. Can you post a link to your GitHub repo for windows?
@VarunTulsian2 жыл бұрын
great video Aleksa. i am new to torch, i read pytorch rand_like should sample frim a uniform distribution instead of a gaussian. How does that work since we need samples from standard gaussian?
@alexijohansen2 жыл бұрын
Do you know how outpainting/inpainting works?
@leonardoberti917 Жыл бұрын
The explanation was great. If you went back to making these type of videos would be super.
@jianxiongfeng Жыл бұрын
yuor video is very wonderful
@alessandrozuech612 жыл бұрын
Very nice video! Just a question: how can I apply denoising to a noisy image? It seems to me that this paper can only generate a new image from the learned data distribution, right? Maybe I lost some steps....
@anonymousperson97572 жыл бұрын
Hey! I am working on the same problem. It would be great if @Aleksa could make a video on that. I think this paper "Image super-resolution via iterative refinement", a follow-up to the original DDPM has the solution although it focusses on super resolution. To my understanding, in the original DDPM, you are trying to minimize the MSE loss between the noise added in the forward process at time t and the noise predicted by the network. So, the noise predicted by the network is only a function of the noisy input at step t and t itself. In denoising/super resolution, I would assume that there should also be some way of feeding the noisy image to the network as input during training. So in this case, the network would take in the noisy(to be denoised) input, the noisy input from the forward diffusion process and the time step. But I am not entirely sure. Would you like to connect through Discord to discuss this in case you are still working on this?
@snsa_kscc2 жыл бұрын
Gigachad!
@TheAIEpiphany2 жыл бұрын
Lol! Such a wordchad thing to say!
@nirmalbaishnab49102 жыл бұрын
Fantastic tutorial! It will be very helpful if you share the code. Thanks.
@rezabagherian33312 жыл бұрын
thank you
@emanalafandi94742 жыл бұрын
Thank you for tNice tutorials video. I just downloaded soft soft and I was so, so lost. I couldn't even figure out how to make a soft. Your video
@rajroy2426 Жыл бұрын
the variational lower bound part is not very clear to be honest
@lorenzo.padoan2 жыл бұрын
I think they initialize some of the layers with a zero weights in order to speed up the training process
@TheAIEpiphany2 жыл бұрын
Any pointers/papers?
@lorenzo.padoan2 жыл бұрын
@@TheAIEpiphany Unfortunately I can't give any paper reference, during the AI course my prof explained some rules of thumbs for weights initialization, and one is this technique that was implemented in this code.
@convolutionalnn25822 жыл бұрын
What are the maths require to be research scientist in computer vision? What are best resource? And Best book for Computer Vision?
@sergeychirkunov71652 жыл бұрын
Multiview Geometry in Computer Vision. It’s fundamental and quite helpful for research in CV
@convolutionalnn25822 жыл бұрын
@@sergeychirkunov7165 Can you look for me something in youtube....I search as geometry for computer vision and which playlist should i watch....multiple view geometry in computer vision playlist by Sean Mullery or Cvprtum or 3D Computer Vision by CVRP Lab or any recommendations
@convolutionalnn25822 жыл бұрын
@@sergeychirkunov7165 People mostly said Linear Algebra Calculas Probability and statistic optimization and even talk about tensor algebra...Are this maths require too?
@saurabhshrivastava2242 жыл бұрын
@@convolutionalnn2582 Yes, that's true. Basics of LA, Probab and Optimization are sort of mandatory.
@convolutionalnn25822 жыл бұрын
@@saurabhshrivastava224 Best resource of geometry for Computer Vision?