Jia-Bin Huang

Пікірлер

@welann Күн бұрын

Thank you for making such a high quality video! It's very helpful for me to understand the diffusion model!

@jbhuang0604 Күн бұрын

You're very welcome! Happy that it was helpful!

@Neo-kx3fe Күн бұрын

@10:55 with that local outgoingness on the left, why there is one additional term p_t(x_t) inside the d/dx bracket? This term seems to disappear in @11:18. Thanks.

@sokak01 3 күн бұрын

I think there should be a abla log q(x_t) instead of p(x_t) at the score matching part.

@ruoshiliu6024 6 күн бұрын

Amazing work Jia-Bin!! P.S. you should create a bibtex for this video so it can be cited in literature :P

@jbhuang0604 3 күн бұрын

Haha! Thanks! Too bad google scholar don’t include views of KZbin videos.

@nathan_ca 6 күн бұрын

Thanks! This is amazing video to get students, like me, to re-engage with these topics that i haven't had a chance to explore more ❤

@jbhuang0604 4 күн бұрын

Thanks! Glad that this is helpful.

@AnujZore-pe9mg 8 күн бұрын

Thanks once again for easy-to-understand explanation! Gonna miss CSMC 733 lectures :(

@jbhuang0604 7 күн бұрын

Glad you like them!

@amirhosseinraffiee8270 10 күн бұрын

Thanks for the video. Great way to explain a complex concept

@jbhuang0604 10 күн бұрын

Appreciate your comment! Thanks for watching the video. Hope you enjoyed it.

@manuellecha 11 күн бұрын

Thank you very much! You did a really nice job! The video is clear, visual, and informative. It is consistent with the timeline and evolution of the field, and it effectively conveys the information along with the motivation for the development of these models.

@jbhuang0604 11 күн бұрын

Glad you liked it! It’s a lot of fun making this video!

@kvu207 11 күн бұрын

beautiful! May I ask how you made the animations for this video?

@jbhuang0604 11 күн бұрын

Most of the animations are from the “morph transition” in PowerPoint slides. The rest are from Adobe premier pro.

@adrienforbu5165 11 күн бұрын

nice visuals, good job

@jbhuang0604 11 күн бұрын

Thanks a lot!

@julienblanchon6082 11 күн бұрын

This is brilliant !

@jbhuang0604 11 күн бұрын

Glad that you enjoyed the video!

@jackshi7613 12 күн бұрын

excellent video!

@jbhuang0604 11 күн бұрын

Thanks for watching!

@DimitrivonRutte 12 күн бұрын

Awesome to see easy-to-understand explanations of current research topics, keep up the great work!

@jbhuang0604 11 күн бұрын

Glad you liked it!

@catherineyang5199 12 күн бұрын

Thank you for the video! This is the most clear explanation of flow matching on the internet ❤

@jbhuang0604 11 күн бұрын

Thank you so much for your kind words!

@r00t257 12 күн бұрын

Legend comeback 🙇! Your educational video is worth more than gold.💓🙏

@jbhuang0604 12 күн бұрын

Thanks a lot! Glad you like it!

@SurajBorate-bx6hv 13 күн бұрын

Thankyou for great step by step explanation. Can you share any good resources and insights for implementing diffusion for own custom images?

@jbhuang0604 9 күн бұрын

Hi! No problem. I think huggingface's diffuser probably has the best resources. Check it out: huggingface.co/docs/diffusers/en/index

@pi5549 22 күн бұрын

Any arxiv paper with a video like this goes above any without, on my TO_READ list at least. +1

@jbhuang0604 21 күн бұрын

Thanks for your interest! Let us know if you have any questions!

@crispinotechgaming 25 күн бұрын

Honestly it'd be really nice to see the open source community catch up to this scale of operation one day! They are the backbone of ai progress but they rarely manage to innovate with actual public models to use

@jbhuang0604 22 күн бұрын

I completely agree. Most of these models are close-source so it’s mainly showcasing their R&D capability but as you said, public don’t benefit much from these. I also hope the open source community can catch up soon.

@arashnozari4042 25 күн бұрын

imagine it in the next 5 years

@jbhuang0604 22 күн бұрын

Yup, the rate of progress is incredible. Can’t imagine what this will look like…

@4thlord51 28 күн бұрын

I'm building my own diffusion model myself. This is the best breakdown and visualization of the mathematics and implementation. Well done.

@jbhuang0604 28 күн бұрын

Thank you! This comment just made my day!

@Yenrabbit 29 күн бұрын

Very cool work, and top notch video with the little sound effects etc :D

@jbhuang0604 28 күн бұрын

Thanks! I also enjoyed these little sound effects. Pika pika~!

@youtube_showcase 29 күн бұрын

Exciting work. Thank you for creating and sharing this explanation video.

@jbhuang0604 29 күн бұрын

Yes, we are excited as well. Glad you enjoyed the video!

@mcarletti Ай бұрын

My like comes with the 5th Symphony (9:39) 😸🎶

@jbhuang0604 Ай бұрын

Oh My! Finally one person noticed that! (Spent a lot of time making that lol)

@khalilsabri7978 Ай бұрын

Just one minute in the video, you know it's extremely well done. Thanks for the video !

@jbhuang0604 Ай бұрын

Glad you liked it! Thanks so much for the comment!

@importon Ай бұрын

very cool! will you be putting the code on github?

@jbhuang0604 Ай бұрын

We are working on getting approval. The process is complex but we are hopeful.

@TayuYoung Ай бұрын

Hi Professor, thank you for your explanation. However, I think that at 1:03 in the video, the up-sampling mechanism for image is performed by the 'decoder', not by the diffusion model. The animation here seems to suggest that the diffusion model produces the high-resolution images. Thanks for your time.

@jbhuang0604 Ай бұрын

Sorry for the confusion. I introduced two mechanisms for high-resolution generation: 1) cascade diffusion models and 2) latent diffusion models. In cascade-based approaches, the upsampling is done via a super-resolution diffusion model. The model in Sora is likely using only a video decoder that upsampling the denoised clean latent to high-resolution images/videos.

@TayuYoung Ай бұрын

@@jbhuang0604 Thanks for your explanation. I checked the IMAGEN paper, they use the text2image diffusion model and the SR-resolution diffusion model to produce the high-resolution image, which is the output of decoder in the latent diffusion model. Because I used to think that the main difference between cascade and latent diffusion model was just one uses the low-resolution image and the other uses the latent representation, with both employing an encoder-diffusion-decoder pipeline. In IMAGEN, it seems that the diffusion model can also serve in the 'decoder' role. Am I right?

@youtube_showcase Ай бұрын

Amazing work! Thank you for sharing 😀

@jbhuang0604 Ай бұрын

Thank you! Cheers!

@TechCindy Ай бұрын

Amazing work!

@jbhuang0604 Ай бұрын

Thank you!

@orisenbazuru Ай бұрын

Great video! At 1:21 should be maximizing similarity between two distributions. Or minimizing the distance between two distributions.

@jbhuang0604 Ай бұрын

Thanks for pointing this out! Yes, you are right! It should be *maximizing* the similarity between the two distributions.

@Raymond-zv5gr Ай бұрын

BRO YOU ARE EPIC

@jbhuang0604 Ай бұрын

Thank you thank you!

@Beauty.and.FashionPhotographer Ай бұрын

i wish this would be simpler. You know Notebook colab google and video which shows what to click etc etc....

@jbhuang0604 29 күн бұрын

That would definitely be great! We will be working on making this easier to use!

@ohjein 2 ай бұрын

Very good! But what surprises me is that pifu is still at the core. 4+ years and no better model has arrived? With all the developments? Anyway, great work.

@jbhuang0604 2 ай бұрын

Yes, we were surprised as well. We tried the recent ICON/ECON methods. These methods produce better shape reconstruction for challenging/uncommon poses (eg dancing). But ironically produce unnatural shape for nature poses (like the one shown in the video). For those natural poses, PiFU-HD still performs the best!

@truonggiangnguyen8844 2 ай бұрын

I have a question: Are all distribution mentioned is distribution of a continuous variable, since we're using integral here?

@jbhuang0604 2 ай бұрын

Good question! I think there are some development of discrete variational autoencoder and diffusion models. Those methods can deal with discrete variables.

@user-kq9cu8wy9z 2 ай бұрын

The world and my brain after this: 💀

@jbhuang0604 Ай бұрын

Indeed!

@curiousobserver2006 2 ай бұрын

seriously one of the best educational videos I've ever watched.

@jbhuang0604 2 ай бұрын

Thank you so much!

@nutshell1811 2 ай бұрын

Best video on diffusion!!

@jbhuang0604 2 ай бұрын

Great! Glad that it’s helpful!

@rtluo1546 2 ай бұрын

This is truly a great tutorial video, so well-made. Cannot believe covering so many things within only 17 minutes.

@jbhuang0604 2 ай бұрын

Thanks a lot! Happy that you enjoyed the video!

@wangy01 2 ай бұрын

Thank you for your great work removing the need of the audience to know much prior knowledge before they could enjoy your video. For example, you mentioned maximum likelihood and explain what it is immediately. It is such a challenge to straighten all these in a 17-minute video, but you did a great work. Thank you!

@jbhuang0604 2 ай бұрын

Glad that you liked it! Appreciate your kind words! This made my day!

@vfhfnvecnfaby5362 2 ай бұрын

I LOVE U!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

@jbhuang0604 29 күн бұрын

Thank you thank you!

@NobleSpartan 2 ай бұрын

Your production on these videos is incredible

@jbhuang0604 2 ай бұрын

Thanks so much for your kind words! Glad that you like it!

@MrNoipe 2 ай бұрын

Awesome! Would also love a video on SLAM!

@jbhuang0604 2 ай бұрын

Yup! It’s a complex topic, but I will get to there!

@MrNoipe 2 ай бұрын

The linear algebra portion went by very quickly for someone who hasn't worked with them in several years. I guess this video is more tailored for current researchers?

@jbhuang0604 2 ай бұрын

Ah, sorry about that! I probably should slow down a bit on those math derivations. Will do so in future videos!

@nikitadrobyshev7953 2 ай бұрын

OK, this is the best video explanation of diffusion models I saw. Ideal ratio between simplifications and depth☺👏

@jbhuang0604 2 ай бұрын

Glad it was helpful! Thank you so much for your kind words!

@wangy01 2 ай бұрын

I agree. The author must have carefully chosen the most efficient way cutting into the complex concept hierarchy and every single word to achieve that efficiency.

@madhavkumar9942 2 ай бұрын

Great video, Can you please explain how to get the 'sparse' folder used in your project for a video ( or a folder containing video frames )?

@jbhuang0604 2 ай бұрын

Thanks! It comes from the COLMAP preprocessing.

@yuelinxin3684 2 ай бұрын

This looks like covariance matrix 🤔

@jbhuang0604 2 ай бұрын

Yes, second-moment matrix is a local covariance matrix of the gradient vector field. It captures the local image structure.

@mityashabat 2 ай бұрын

Great vid! Quick question - it wasn't clear how many of these second-order matrices we're making. One for each patch, right? A single one for the whole image wouldn't provide us with local information. The explanation confused me a bit Also, if I understand correctly, the intuition is that the second-order matrix helps us compute the curvature of the edge in the patch. And checking the eigenvalues provides us with info on that curvature. Is my mind in the right place? 😅

@jbhuang0604 2 ай бұрын

Thanks! For every pixel location, we will form a second-order matrix. So yes, it's one for each patch. The eigenvalues of the second-order matrix tell us how fast the summed square error (between the reference patch and the translated patch) will go up. The eigenvectors tell us where we should move to get the fastest or the slowest error changes. So, to find a corner, we look for patches with a *large smallest eigenvalue*. This means that the error still goes up quickly, even in the direction with the slowest change. That's the criterion for good features to track.

@pedroenriquelopezdeteruela6545 2 ай бұрын

Awesome post, Jiang, thank you so much for the great job! Anyway, a small comment/question on your video (without too much importance, I assume). At minute 5:56 you comment that (direct derivation of formula (7) in the paper "Denoising Diffusion Probabilistic Models"), mu^hat_t(x_t,x_0) is on the line joining x_0 and x_t. And, while this is approximately true for "normal" beta_t scheduling, I think that the estimated mean as a function of x_0 and x_t need not be exactly on such a line since, in general, the respective multipliers of x_0 and x_t in such an equation need not (in general) add up to one. In fact, in "normal" scheduling, as t increases, it seems that this sum keeps progressively moving away from 1, so that although obviously mu_t will continue to be a simple linear combination of both x_t and x_0, the fact is that it will progressively move away (although by a small amount) from this line. Would you agree with this observation? Greetings, and again, congratulations for the video and thank you very much for clarifying us the inners of diffusion models!

@jbhuang0604 2 ай бұрын

Thank you so much for your comment! You are right! It won’t be on the line when the multipliers are not adding up to one.

@r00t257 2 ай бұрын

You are a god of presenting things in such a concise and intuitive way! Thank you very much, Professor! It really, really helps.

@jbhuang0604 2 ай бұрын

Thanks so much!

@user-pk4yz7wn3s 3 ай бұрын

BRAVO! No one ever have explained the diffusion model in such an easy way with all the details.

@jbhuang0604 3 ай бұрын

Thank you so much for your kind words! This makes my day!

@rasmuseriksson6805 3 ай бұрын

Thanks for a great video! What are we talking in terms of space use of the material? All the diffrent samples i find is always extremely short. Or what is the limitations?

Ең жақсы KZbin

Пікірлер