It's just crazy to see Hu-po understand all the concepts in this paper; what an insane guy!
@arashakbari6986Ай бұрын
These long videos are super useful!!!
@wolpumba409910 ай бұрын
Summary starts at 1:52:24
@wolpumba409910 ай бұрын
*Abstract* Stability AI, the open-source AI pioneers, have released a fantastic paper on scaling diffusion models for high-resolution image generation. This paper is a must-read - a deep dive into the math and techniques used, packed with valuable experimental results. Let's break it down: * *Rectified Flow: Making Diffusion Efficient* Think of image generation like getting from point A to point B. Traditional diffusion models take a roundabout route, but rectified flow aims for a straight line - more efficient and better results! * *The Power of Simplicity:* Rectified flow is surprisingly simple, yet when combined with a clever time-step sampling technique ('log-normal' sampling), it outperforms more complex methods. This saves researchers a ton of compute and energy! * *New Architecture, Better Results:* Stability AI introduced a new transformer-based architecture (MNDIT) that separates visual and text features, improving how the model understands both. * *Scaling Up = Better Images:* Unsurprisingly, bigger models (within reason) give better images. This is great news for the future of AI image generation. Stability AI's focus on sharing their findings is admirable. This paper helps the whole field, potentially saving tons of wasted compute and making AI a bit more environmentally friendly. Disclaimer: I used gemini advanced 1.0 (2024.03.04) to summarize the video transcript. This method may make mistakes in recognizing words
@aiarchitecture64272 ай бұрын
Thank for the lecture man, 🔥🔥 Help me lot to bring light and understanding
@HolidayAtHome10 ай бұрын
super interesting to listen to someone with actuall understanding of how all that magic works ;)!
@fast_harmonic_psychedelic10 ай бұрын
parti prompts are all about "two dogs on the left the dog on the left is black and the one on the right is white, and a cat on the right holding up its right paw, with 12 squares on the carpet and a triangle on the wall"
@OneMillionDollars-tu9ur15 күн бұрын
"human preference evaluation" is the "vibe check"
@aiarchitecture64272 ай бұрын
I don't know if it's in you line of interest but a review of controlnet/T2Iadapter/unicontrolnet/LORAdapater/Blora would be great 😊 It's getting confusing for me and probably for other people interested in diffusion too
@qimingwang95572 ай бұрын
sounds gooood!!I really interest in and want to konw more about it, If you can do some video explain the CN training will be very grateful
@ashishkannad30213 ай бұрын
Superb
@bause618210 ай бұрын
Great explanations , i can't wait to test the multimodals inputs
@hjups10 ай бұрын
The VAE scaling isn't new, it was shown in the EMU paper. One thing neither paper discusses, is that there's an issue with scaling the channels - they have diminishing returns in terms of information density. For example, with d=8, if you sort the channels by PCA variance, the first 4 channels have the most information, then the next 2 have high frequency detail, and the last two are mostly noisy. There's still valid information in that "noise", but it may not be worth the increased computational complexity. Alternatively, this could be a property of the KL regularization where it focuses information density to few channels. The idea of shifting the timestep distribution was proposed in the UViT paper (Simple Diffusion), I'm surprised they did not reference it directly. Although, the UViT paper provided a theoretical perspective, which does not necessarily align with human preferences. I wish they had done a more comprehensive search with the diffusion methods... It's missing some of the other training objectives (likely due to author bias and not compute), which means it's not quite as useful as claimed.
@KunalSwami8 ай бұрын
I have a doubt abt your "increasing the complexity" part. Does increasing the channels increase complexity significantly? Increasing the spatial dimensions of the latent is costly.
@hjups8 ай бұрын
@@KunalSwami Increasing channels does not significantly increase the diffusion model's computational complexity, but it increases the semantic complexity of the latent space (potentially making it harder to learn - there are ways around this), and it increases both the semantic and computational complexity of the VAE. The SD3 paper showed the former, where they the VAE with more channels performed worse until a certain model size was reached (indicating that the latent space was harder to learn). The latter claim comes from anecdotal evidence from training VAEs - you typically need to increase the VAE base channel dim to support a deeper latent space, and VAEs can be quite slow to train.
@MultiMam1234510 ай бұрын
Talking about signal/noise ratio on a mic input that is clipping. Nice 😂
@siddharthmenon193210 ай бұрын
broo this is such an informative video man. kudos to you on making such complicated equations so easy and intuitive to understand for beginners
@timeTegus5 ай бұрын
voll gutes video!
@erikhommen145010 ай бұрын
Thank you, helps a lot!! Next the SD3-Turbo Paper please :)
@fast_harmonic_psychedelic10 ай бұрын
Thats not what parti prompts are for. its not about visually pleasing images. its about accurately captioned images
@hu-po10 ай бұрын
Thanks for the clarification , sorry I got this wrong :(
@KunalSwami8 ай бұрын
What pdf reader do you use for annotation?
@jkpesonen10 ай бұрын
How many convergence points does the vector field have?
@chickenp703810 ай бұрын
great video. do you know if rectified flow is in diffusers?
@jeffg46869 ай бұрын
Intel does have a big investment. Personally, I think they should sell the model. Keep it open source, but sell rights to use the high end models. That way, they have a solid business plan.
@danielvarga_p10 ай бұрын
i was here
@Elikatie2510 ай бұрын
witnessed
@isiisorisiaint9 ай бұрын
dude, so much nonsense in one single... you're the champ man. do you actually know anything that you're talking about? (rhetoric question, you obviously don't). by the time i got to your "what's a vector field" (@min 23) i just gave up (what you're showing in that image is a representation of a function f:R^2->R^2, which is anything but a vector field, it's a function bro, a function, get it?)