Thank you for making such a high quality video! It's very helpful for me to understand the diffusion model!
@jbhuang0604Күн бұрын
You're very welcome! Happy that it was helpful!
@Neo-kx3feКүн бұрын
@10:55 with that local outgoingness on the left, why there is one additional term p_t(x_t) inside the d/dx bracket? This term seems to disappear in @11:18. Thanks.
@sokak013 күн бұрын
I think there should be a abla log q(x_t) instead of p(x_t) at the score matching part.
@ruoshiliu60246 күн бұрын
Amazing work Jia-Bin!! P.S. you should create a bibtex for this video so it can be cited in literature :P
@jbhuang06043 күн бұрын
Haha! Thanks! Too bad google scholar don’t include views of KZbin videos.
@nathan_ca6 күн бұрын
Thanks! This is amazing video to get students, like me, to re-engage with these topics that i haven't had a chance to explore more ❤
@jbhuang06044 күн бұрын
Thanks! Glad that this is helpful.
@AnujZore-pe9mg8 күн бұрын
Thanks once again for easy-to-understand explanation! Gonna miss CSMC 733 lectures :(
@jbhuang06047 күн бұрын
Glad you like them!
@amirhosseinraffiee827010 күн бұрын
Thanks for the video. Great way to explain a complex concept
@jbhuang060410 күн бұрын
Appreciate your comment! Thanks for watching the video. Hope you enjoyed it.
@manuellecha11 күн бұрын
Thank you very much! You did a really nice job! The video is clear, visual, and informative. It is consistent with the timeline and evolution of the field, and it effectively conveys the information along with the motivation for the development of these models.
@jbhuang060411 күн бұрын
Glad you liked it! It’s a lot of fun making this video!
@kvu20711 күн бұрын
beautiful! May I ask how you made the animations for this video?
@jbhuang060411 күн бұрын
Most of the animations are from the “morph transition” in PowerPoint slides. The rest are from Adobe premier pro.
@adrienforbu516511 күн бұрын
nice visuals, good job
@jbhuang060411 күн бұрын
Thanks a lot!
@julienblanchon608211 күн бұрын
This is brilliant !
@jbhuang060411 күн бұрын
Glad that you enjoyed the video!
@jackshi761312 күн бұрын
excellent video!
@jbhuang060411 күн бұрын
Thanks for watching!
@DimitrivonRutte12 күн бұрын
Awesome to see easy-to-understand explanations of current research topics, keep up the great work!
@jbhuang060411 күн бұрын
Glad you liked it!
@catherineyang519912 күн бұрын
Thank you for the video! This is the most clear explanation of flow matching on the internet ❤
@jbhuang060411 күн бұрын
Thank you so much for your kind words!
@r00t25712 күн бұрын
Legend comeback 🙇! Your educational video is worth more than gold.💓🙏
@jbhuang060412 күн бұрын
Thanks a lot! Glad you like it!
@SurajBorate-bx6hv13 күн бұрын
Thankyou for great step by step explanation. Can you share any good resources and insights for implementing diffusion for own custom images?
@jbhuang06049 күн бұрын
Hi! No problem. I think huggingface's diffuser probably has the best resources. Check it out: huggingface.co/docs/diffusers/en/index
@pi554922 күн бұрын
Any arxiv paper with a video like this goes above any without, on my TO_READ list at least. +1
@jbhuang060421 күн бұрын
Thanks for your interest! Let us know if you have any questions!
@crispinotechgaming25 күн бұрын
Honestly it'd be really nice to see the open source community catch up to this scale of operation one day! They are the backbone of ai progress but they rarely manage to innovate with actual public models to use
@jbhuang060422 күн бұрын
I completely agree. Most of these models are close-source so it’s mainly showcasing their R&D capability but as you said, public don’t benefit much from these. I also hope the open source community can catch up soon.
@arashnozari404225 күн бұрын
imagine it in the next 5 years
@jbhuang060422 күн бұрын
Yup, the rate of progress is incredible. Can’t imagine what this will look like…
@4thlord5128 күн бұрын
I'm building my own diffusion model myself. This is the best breakdown and visualization of the mathematics and implementation. Well done.
@jbhuang060428 күн бұрын
Thank you! This comment just made my day!
@Yenrabbit29 күн бұрын
Very cool work, and top notch video with the little sound effects etc :D
@jbhuang060428 күн бұрын
Thanks! I also enjoyed these little sound effects. Pika pika~!
@youtube_showcase29 күн бұрын
Exciting work. Thank you for creating and sharing this explanation video.
@jbhuang060429 күн бұрын
Yes, we are excited as well. Glad you enjoyed the video!
@mcarlettiАй бұрын
My like comes with the 5th Symphony (9:39) 😸🎶
@jbhuang0604Ай бұрын
Oh My! Finally one person noticed that! (Spent a lot of time making that lol)
@khalilsabri7978Ай бұрын
Just one minute in the video, you know it's extremely well done. Thanks for the video !
@jbhuang0604Ай бұрын
Glad you liked it! Thanks so much for the comment!
@importonАй бұрын
very cool! will you be putting the code on github?
@jbhuang0604Ай бұрын
We are working on getting approval. The process is complex but we are hopeful.
@TayuYoungАй бұрын
Hi Professor, thank you for your explanation. However, I think that at 1:03 in the video, the up-sampling mechanism for image is performed by the 'decoder', not by the diffusion model. The animation here seems to suggest that the diffusion model produces the high-resolution images. Thanks for your time.
@jbhuang0604Ай бұрын
Sorry for the confusion. I introduced two mechanisms for high-resolution generation: 1) cascade diffusion models and 2) latent diffusion models. In cascade-based approaches, the upsampling is done via a super-resolution diffusion model. The model in Sora is likely using only a video decoder that upsampling the denoised clean latent to high-resolution images/videos.
@TayuYoungАй бұрын
@@jbhuang0604 Thanks for your explanation. I checked the IMAGEN paper, they use the text2image diffusion model and the SR-resolution diffusion model to produce the high-resolution image, which is the output of decoder in the latent diffusion model. Because I used to think that the main difference between cascade and latent diffusion model was just one uses the low-resolution image and the other uses the latent representation, with both employing an encoder-diffusion-decoder pipeline. In IMAGEN, it seems that the diffusion model can also serve in the 'decoder' role. Am I right?
@youtube_showcaseАй бұрын
Amazing work! Thank you for sharing 😀
@jbhuang0604Ай бұрын
Thank you! Cheers!
@TechCindyАй бұрын
Amazing work!
@jbhuang0604Ай бұрын
Thank you!
@orisenbazuruАй бұрын
Great video! At 1:21 should be maximizing similarity between two distributions. Or minimizing the distance between two distributions.
@jbhuang0604Ай бұрын
Thanks for pointing this out! Yes, you are right! It should be *maximizing* the similarity between the two distributions.
@Raymond-zv5grАй бұрын
BRO YOU ARE EPIC
@jbhuang0604Ай бұрын
Thank you thank you!
@Beauty.and.FashionPhotographerАй бұрын
i wish this would be simpler. You know Notebook colab google and video which shows what to click etc etc....
@jbhuang060429 күн бұрын
That would definitely be great! We will be working on making this easier to use!
@ohjein2 ай бұрын
Very good! But what surprises me is that pifu is still at the core. 4+ years and no better model has arrived? With all the developments? Anyway, great work.
@jbhuang06042 ай бұрын
Yes, we were surprised as well. We tried the recent ICON/ECON methods. These methods produce better shape reconstruction for challenging/uncommon poses (eg dancing). But ironically produce unnatural shape for nature poses (like the one shown in the video). For those natural poses, PiFU-HD still performs the best!
@truonggiangnguyen88442 ай бұрын
I have a question: Are all distribution mentioned is distribution of a continuous variable, since we're using integral here?
@jbhuang06042 ай бұрын
Good question! I think there are some development of discrete variational autoencoder and diffusion models. Those methods can deal with discrete variables.
@user-kq9cu8wy9z2 ай бұрын
The world and my brain after this: 💀
@jbhuang0604Ай бұрын
Indeed!
@curiousobserver20062 ай бұрын
seriously one of the best educational videos I've ever watched.
@jbhuang06042 ай бұрын
Thank you so much!
@nutshell18112 ай бұрын
Best video on diffusion!!
@jbhuang06042 ай бұрын
Great! Glad that it’s helpful!
@rtluo15462 ай бұрын
This is truly a great tutorial video, so well-made. Cannot believe covering so many things within only 17 minutes.
@jbhuang06042 ай бұрын
Thanks a lot! Happy that you enjoyed the video!
@wangy012 ай бұрын
Thank you for your great work removing the need of the audience to know much prior knowledge before they could enjoy your video. For example, you mentioned maximum likelihood and explain what it is immediately. It is such a challenge to straighten all these in a 17-minute video, but you did a great work. Thank you!
@jbhuang06042 ай бұрын
Glad that you liked it! Appreciate your kind words! This made my day!
@vfhfnvecnfaby53622 ай бұрын
I LOVE U!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
@jbhuang060429 күн бұрын
Thank you thank you!
@NobleSpartan2 ай бұрын
Your production on these videos is incredible
@jbhuang06042 ай бұрын
Thanks so much for your kind words! Glad that you like it!
@MrNoipe2 ай бұрын
Awesome! Would also love a video on SLAM!
@jbhuang06042 ай бұрын
Yup! It’s a complex topic, but I will get to there!
@MrNoipe2 ай бұрын
The linear algebra portion went by very quickly for someone who hasn't worked with them in several years. I guess this video is more tailored for current researchers?
@jbhuang06042 ай бұрын
Ah, sorry about that! I probably should slow down a bit on those math derivations. Will do so in future videos!
@nikitadrobyshev79532 ай бұрын
OK, this is the best video explanation of diffusion models I saw. Ideal ratio between simplifications and depth☺👏
@jbhuang06042 ай бұрын
Glad it was helpful! Thank you so much for your kind words!
@wangy012 ай бұрын
I agree. The author must have carefully chosen the most efficient way cutting into the complex concept hierarchy and every single word to achieve that efficiency.
@madhavkumar99422 ай бұрын
Great video, Can you please explain how to get the 'sparse' folder used in your project for a video ( or a folder containing video frames )?
@jbhuang06042 ай бұрын
Thanks! It comes from the COLMAP preprocessing.
@yuelinxin36842 ай бұрын
This looks like covariance matrix 🤔
@jbhuang06042 ай бұрын
Yes, second-moment matrix is a local covariance matrix of the gradient vector field. It captures the local image structure.
@mityashabat2 ай бұрын
Great vid! Quick question - it wasn't clear how many of these second-order matrices we're making. One for each patch, right? A single one for the whole image wouldn't provide us with local information. The explanation confused me a bit Also, if I understand correctly, the intuition is that the second-order matrix helps us compute the curvature of the edge in the patch. And checking the eigenvalues provides us with info on that curvature. Is my mind in the right place? 😅
@jbhuang06042 ай бұрын
Thanks! For every pixel location, we will form a second-order matrix. So yes, it's one for each patch. The eigenvalues of the second-order matrix tell us how fast the summed square error (between the reference patch and the translated patch) will go up. The eigenvectors tell us where we should move to get the fastest or the slowest error changes. So, to find a corner, we look for patches with a *large smallest eigenvalue*. This means that the error still goes up quickly, even in the direction with the slowest change. That's the criterion for good features to track.
@pedroenriquelopezdeteruela65452 ай бұрын
Awesome post, Jiang, thank you so much for the great job! Anyway, a small comment/question on your video (without too much importance, I assume). At minute 5:56 you comment that (direct derivation of formula (7) in the paper "Denoising Diffusion Probabilistic Models"), mu^hat_t(x_t,x_0) is on the line joining x_0 and x_t. And, while this is approximately true for "normal" beta_t scheduling, I think that the estimated mean as a function of x_0 and x_t need not be exactly on such a line since, in general, the respective multipliers of x_0 and x_t in such an equation need not (in general) add up to one. In fact, in "normal" scheduling, as t increases, it seems that this sum keeps progressively moving away from 1, so that although obviously mu_t will continue to be a simple linear combination of both x_t and x_0, the fact is that it will progressively move away (although by a small amount) from this line. Would you agree with this observation? Greetings, and again, congratulations for the video and thank you very much for clarifying us the inners of diffusion models!
@jbhuang06042 ай бұрын
Thank you so much for your comment! You are right! It won’t be on the line when the multipliers are not adding up to one.
@r00t2572 ай бұрын
You are a god of presenting things in such a concise and intuitive way! Thank you very much, Professor! It really, really helps.
@jbhuang06042 ай бұрын
Thanks so much!
@user-pk4yz7wn3s3 ай бұрын
BRAVO! No one ever have explained the diffusion model in such an easy way with all the details.
@jbhuang06043 ай бұрын
Thank you so much for your kind words! This makes my day!
@rasmuseriksson68053 ай бұрын
Thanks for a great video! What are we talking in terms of space use of the material? All the diffrent samples i find is always extremely short. Or what is the limitations?