Stable Diffusion 3

  Рет қаралды 9,542

hu-po

hu-po

Күн бұрын

Пікірлер: 28
@wzyjoseph7317
@wzyjoseph7317 10 ай бұрын
It's just crazy to see Hu-po understand all the concepts in this paper; what an insane guy!
@arashakbari6986
@arashakbari6986 Ай бұрын
These long videos are super useful!!!
@wolpumba4099
@wolpumba4099 10 ай бұрын
Summary starts at 1:52:24
@wolpumba4099
@wolpumba4099 10 ай бұрын
*Abstract* Stability AI, the open-source AI pioneers, have released a fantastic paper on scaling diffusion models for high-resolution image generation. This paper is a must-read - a deep dive into the math and techniques used, packed with valuable experimental results. Let's break it down: * *Rectified Flow: Making Diffusion Efficient* Think of image generation like getting from point A to point B. Traditional diffusion models take a roundabout route, but rectified flow aims for a straight line - more efficient and better results! * *The Power of Simplicity:* Rectified flow is surprisingly simple, yet when combined with a clever time-step sampling technique ('log-normal' sampling), it outperforms more complex methods. This saves researchers a ton of compute and energy! * *New Architecture, Better Results:* Stability AI introduced a new transformer-based architecture (MNDIT) that separates visual and text features, improving how the model understands both. * *Scaling Up = Better Images:* Unsurprisingly, bigger models (within reason) give better images. This is great news for the future of AI image generation. Stability AI's focus on sharing their findings is admirable. This paper helps the whole field, potentially saving tons of wasted compute and making AI a bit more environmentally friendly. Disclaimer: I used gemini advanced 1.0 (2024.03.04) to summarize the video transcript. This method may make mistakes in recognizing words
@aiarchitecture6427
@aiarchitecture6427 2 ай бұрын
Thank for the lecture man, 🔥🔥 Help me lot to bring light and understanding
@HolidayAtHome
@HolidayAtHome 10 ай бұрын
super interesting to listen to someone with actuall understanding of how all that magic works ;)!
@fast_harmonic_psychedelic
@fast_harmonic_psychedelic 10 ай бұрын
parti prompts are all about "two dogs on the left the dog on the left is black and the one on the right is white, and a cat on the right holding up its right paw, with 12 squares on the carpet and a triangle on the wall"
@OneMillionDollars-tu9ur
@OneMillionDollars-tu9ur 15 күн бұрын
"human preference evaluation" is the "vibe check"
@aiarchitecture6427
@aiarchitecture6427 2 ай бұрын
I don't know if it's in you line of interest but a review of controlnet/T2Iadapter/unicontrolnet/LORAdapater/Blora would be great 😊 It's getting confusing for me and probably for other people interested in diffusion too
@qimingwang9557
@qimingwang9557 2 ай бұрын
sounds gooood!!I really interest in and want to konw more about it, If you can do some video explain the CN training will be very grateful
@ashishkannad3021
@ashishkannad3021 3 ай бұрын
Superb
@bause6182
@bause6182 10 ай бұрын
Great explanations , i can't wait to test the multimodals inputs
@hjups
@hjups 10 ай бұрын
The VAE scaling isn't new, it was shown in the EMU paper. One thing neither paper discusses, is that there's an issue with scaling the channels - they have diminishing returns in terms of information density. For example, with d=8, if you sort the channels by PCA variance, the first 4 channels have the most information, then the next 2 have high frequency detail, and the last two are mostly noisy. There's still valid information in that "noise", but it may not be worth the increased computational complexity. Alternatively, this could be a property of the KL regularization where it focuses information density to few channels. The idea of shifting the timestep distribution was proposed in the UViT paper (Simple Diffusion), I'm surprised they did not reference it directly. Although, the UViT paper provided a theoretical perspective, which does not necessarily align with human preferences. I wish they had done a more comprehensive search with the diffusion methods... It's missing some of the other training objectives (likely due to author bias and not compute), which means it's not quite as useful as claimed.
@KunalSwami
@KunalSwami 8 ай бұрын
I have a doubt abt your "increasing the complexity" part. Does increasing the channels increase complexity significantly? Increasing the spatial dimensions of the latent is costly.
@hjups
@hjups 8 ай бұрын
​@@KunalSwami Increasing channels does not significantly increase the diffusion model's computational complexity, but it increases the semantic complexity of the latent space (potentially making it harder to learn - there are ways around this), and it increases both the semantic and computational complexity of the VAE. The SD3 paper showed the former, where they the VAE with more channels performed worse until a certain model size was reached (indicating that the latent space was harder to learn). The latter claim comes from anecdotal evidence from training VAEs - you typically need to increase the VAE base channel dim to support a deeper latent space, and VAEs can be quite slow to train.
@MultiMam12345
@MultiMam12345 10 ай бұрын
Talking about signal/noise ratio on a mic input that is clipping. Nice 😂
@siddharthmenon1932
@siddharthmenon1932 10 ай бұрын
broo this is such an informative video man. kudos to you on making such complicated equations so easy and intuitive to understand for beginners
@timeTegus
@timeTegus 5 ай бұрын
voll gutes video!
@erikhommen1450
@erikhommen1450 10 ай бұрын
Thank you, helps a lot!! Next the SD3-Turbo Paper please :)
@fast_harmonic_psychedelic
@fast_harmonic_psychedelic 10 ай бұрын
Thats not what parti prompts are for. its not about visually pleasing images. its about accurately captioned images
@hu-po
@hu-po 10 ай бұрын
Thanks for the clarification , sorry I got this wrong :(
@KunalSwami
@KunalSwami 8 ай бұрын
What pdf reader do you use for annotation?
@jkpesonen
@jkpesonen 10 ай бұрын
How many convergence points does the vector field have?
@chickenp7038
@chickenp7038 10 ай бұрын
great video. do you know if rectified flow is in diffusers?
@jeffg4686
@jeffg4686 9 ай бұрын
Intel does have a big investment. Personally, I think they should sell the model. Keep it open source, but sell rights to use the high end models. That way, they have a solid business plan.
@danielvarga_p
@danielvarga_p 10 ай бұрын
i was here
@Elikatie25
@Elikatie25 10 ай бұрын
witnessed
@isiisorisiaint
@isiisorisiaint 9 ай бұрын
dude, so much nonsense in one single... you're the champ man. do you actually know anything that you're talking about? (rhetoric question, you obviously don't). by the time i got to your "what's a vector field" (@min 23) i just gave up (what you're showing in that image is a representation of a function f:R^2->R^2, which is anything but a vector field, it's a function bro, a function, get it?)
Relative Entropy
1:38:00
hu-po
Рет қаралды 1,5 М.
Strawberry
1:55:38
hu-po
Рет қаралды 7 М.
小丑教训坏蛋 #小丑 #天使 #shorts
00:49
好人小丑
Рет қаралды 54 МЛН
Tokenformer
1:33:46
hu-po
Рет қаралды 2,7 М.
TripoSR
1:37:44
hu-po
Рет қаралды 1,5 М.
Stanford CS25: V4 I Demystifying Mixtral of Experts
1:04:32
Stanford Online
Рет қаралды 9 М.
Stable Diffusion 3 IS FINALLY HERE!
16:08
Sebastian Kamph
Рет қаралды 81 М.
How Stable Diffusion Works (AI Image Generation)
30:21
Gonkee
Рет қаралды 164 М.
Gaussian Surfels
1:42:45
hu-po
Рет қаралды 3,4 М.
Stable Diffusion in Code (AI Image Generation) - Computerphile
16:56
Computerphile
Рет қаралды 299 М.
ICML 2024 Tutorial: Physics of Language Models
1:53:43
Zeyuan Allen-Zhu, Sc.D.
Рет қаралды 41 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45