Diffusion models explained. How does OpenAI's GLIDE work?

  Рет қаралды 91,816

AI Coffee Break with Letitia

AI Coffee Break with Letitia

Күн бұрын

Пікірлер: 114
@Mrbits01
@Mrbits01 2 жыл бұрын
As I was about to go and generate the avocado armchair, I heard you say no avocado armchair. My disappointment is immeasurable and my day is ruined.
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Imagine, our day was ruined too! 😭
@johnvonhorn2942
@johnvonhorn2942 2 жыл бұрын
Why can't it generate that iconic chair? Paradise lost. We miss those simpler times of that junior AI
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
🤣🤣
@LecrazyMaffe
@LecrazyMaffe Жыл бұрын
This video offers one of the best explanations for classifier-free guidance.
@ElieAtik
@ElieAtik 2 жыл бұрын
This is the only video that goes into how OpenAI used text/tokens in combination with the diffusion model in order to achieve such results. That was very helpful.
@r00t257
@r00t257 2 жыл бұрын
love your video so much! lots of helpful intuition 🌻🌻💮Thanks ms. coffee bean a lot
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Sorry, the upload seems buggy. Re-uploading did not help. I'll wait to see if this gets better over time. Did you try turning it off and on again? 🤖
@phizc
@phizc Жыл бұрын
Wow what a difference a few months make. Dall-E 2 in April, Midjourney in July, and Stable Diffusion in August. Hi from the future 😊.
@CristianGarcia
@CristianGarcia 2 жыл бұрын
Something not stated in the video is that Diffusion Models are WAY easier to train than GANs. Although it requires you to code the forward and backward diffusion procedures, training is rather stable which is more gratifying. Might release a tutorial on training diffusion models on a toy-ish dataset in the near future :)
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Great point, thanks! 🎯 Paste the tutorial in the comments, when ready! 👀
@MultiCraftTube
@MultiCraftTube 2 жыл бұрын
That would be a great tutorial! Mine doesn't want to learn MNIST 😅
@taseronify
@taseronify 2 жыл бұрын
WHY noise is added to a perfect image? And why we reverse it? To get a clear image? We already had a clear image at the beginning. This video fails to explain it.
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
@@taseronify Because we train the model on existing images where we know how they should look like. Then with new noise, the model generates new images during testing.
@RishiRaj-hu9it
@RishiRaj-hu9it Жыл бұрын
Hi.. just curious to know.. if any tutorial has come up?
@MachineLearningStreetTalk
@MachineLearningStreetTalk 2 жыл бұрын
Amazing production quality! Here we go!!
@samanthaqiu3416
@samanthaqiu3416 2 жыл бұрын
I love Yannic, but boy do I like your articulate presentation? I think I do
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Wow, thanks! I love Yannic too! :)
@OP-yw3ws
@OP-yw3ws 11 ай бұрын
You explained the CFG so well. I was trying to wrap my head around it for a while!
@balcaenpunch
@balcaenpunch 2 жыл бұрын
At @3:55, in "227" the two "2s" written differently - I have never seen someone else other than myself do this! Cheers, Letitia. Great video.
@Nex_Addo
@Nex_Addo 2 жыл бұрын
Thank you for the first effective high-level explanation of Diffusion I've found. Truly, I do not know how I went so long in this space not knowing about your channel.
@HangtheGreat
@HangtheGreat Жыл бұрын
very well explained. love the intuition / comparison piece. send my regards to ms coffee bean :D
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Thanks! Ms. Coffee Bean was so happy to read this. :)
@emiliomorales2843
@emiliomorales2843 2 жыл бұрын
I was waiting for this Leticia, love your channel, thank you
@alexandrupapiu3310
@alexandrupapiu3310 2 жыл бұрын
This was soo informative. And the humour was spot on!
@alfcnz
@alfcnz 2 жыл бұрын
Nice high-level summary. Thanks!
@jonahturner2969
@jonahturner2969 2 жыл бұрын
Love your channel! Cat videos get millions of views. Your videos might get in the thousands of views, but they have a huge impact by explaining high level concepts to people who can actually use them. Please keep up your exceptional work
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Wow, thank you! Funny, I was thinking about my videos vs. cat videos very recently in a chat with Tim and Keith from MLST. I remember that part was not recorded. It's nice to read that you had the same thought. :)
@RalphDratman
@RalphDratman 2 жыл бұрын
This is an excellent teaching session. I learned a great deal. Thank you. I do not personally need another avocado armchair as that is all we ever sit on now in my house. It turns out that avocados are not ideal for chair construction. When the avocado becomes fully ripe the chair loses its furniture-like qualities. I would like to know whether the smaller, released version of GLIDE is at least useful for understanding the GLIDE archtecture and getting a feel for what GLIDE can do.
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Haha, your middle line cracked me up. Regarding your last question, the answer is rather no. Scale enables for some capabilities that small data and models simply do not show.
@_tgwilson_
@_tgwilson_ 2 жыл бұрын
Just started playing around with disco diffusion. This is the best explanation I've found and I love the coffee bean character. Subbed.
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Welcome to the coffee drinkers' club! ☕
@_tgwilson_
@_tgwilson_ 2 жыл бұрын
@@AICoffeeBreak ☕ Thanks, the content on your channel is really well thought out and wonderfully conceived. I really hope the channel grows, and am quite sure Mr KZbin will favour a channel dedicated to the architecture that underpins his existence 😀 I spent some time during lockdown going through many chapters of Penroses The Road to Reality (one of the best and most difficult books I've ever read) with nothing but calc 1 to 3 and some linear algebra under my belt. I'm very interested in studying ML in my free time as many of the ideas are informed by physics. Thanks again for your educational content, the quality is top notch.
@klarietakiba1445
@klarietakiba1445 2 жыл бұрын
You always have the best, clear and concise explanations on these topics
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Thanks! ☺️
@taseronify
@taseronify 2 жыл бұрын
I don't think so. I did not understand why noise is added to a perfect image? What is achieved by adding noise? Can anyone explain it please?
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
@@taseronify We train the model on existing images where we know how they should look like. Then with new noise, the model generates new images during testing.
@amirarsalanrajabi5171
@amirarsalanrajabi5171 2 жыл бұрын
Just found your channel yesterday and I'm loving it! Way to go !
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Glad we found you. 😜
@ArjunKumar123111
@ArjunKumar123111 2 жыл бұрын
I'm here to speculate Ms Coffee Bean knew the existence of DALLE 2... Convenient timing...
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
🤫
@DeepFindr
@DeepFindr 2 жыл бұрын
Very nice video! I'm working with flow-based models atm and also came accross lilian weng's blogpost, which is superb. I feel like diffusion models and flow-based models share some similarities. In fact all generative models share similarities :D
@undergrad4980
@undergrad4980 2 жыл бұрын
Great explanation. Thank you.
@Vikram-wx4hg
@Vikram-wx4hg 2 жыл бұрын
Wonderful review - not just does it capture the essential information, but it is also is interspersed with some very good humor. Look forward to more from you!
@daesoolee1083
@daesoolee1083 2 жыл бұрын
Nice explanation! You got my subscription!
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Nice to see you! :)
@Micetticat
@Micetticat 2 жыл бұрын
Amazing video. All concepts are explained so clearly. "Teeeeeext!" That notation made me laugh. It seems that that Classifier-free guidance technique they are using could be used in a lot of other cases where multimodality is required.
@tripzero0
@tripzero0 2 жыл бұрын
I finally understand diffusion! (Not really but moreso than before)
@JosephRocca
@JosephRocca 2 жыл бұрын
Astoundingly well-explained!
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Hehe, thanks! Astoundingly positively impactful comment. ☺️
@marcocipriano5922
@marcocipriano5922 2 жыл бұрын
you can feel this is serious stuff by the workout background music. Super interesting topic and a very clear video considering how many complex aspects were involved. 14:20 I wonder what GLIDE predicts here on the branch which inputs just noise without the text (at least at the first iteration?).
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
RE: music. Cannot leave the impression we are talking about unimportant stuff, lol. 😅 RE: Prediction without text from just noise. I think the answer is: something. Like, anything, but always depending on the noise that was just sampled. Different noise => different generations. Being the first step out of 150, this would mean that it basically adds here and there pieces of information that can crystallize in the remaining 149 iterations.
@spacemanchris
@spacemanchris 2 жыл бұрын
Thanks so much for this video and your channel. I really appreciate your explanations, I'm coming at this topic from the art side rather than the technical side so having these concepts explained is very helpful. For the last month I've been producing artwork with Disco Diffusion and it's really a revolution in my opinion. Let me know if you'd like to use any future videos and I can send you a selection.
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Hey, write me an email or tell me your Twitter handle.
@Neptutron
@Neptutron 2 жыл бұрын
I love your videos! I also love how many comments you respond to...it makes it feel more like a community than other ML channels The idea of generating globally coherent images via a u-net is pretty cool - the global image attention part is weird I'll have to look into more lol. From DALLE-2 it seems another advantage of diffusion models is that it can be used to edit images, because it can modify existing images somehow
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Hey, thanks! Yes, we totally forgot to mention how editing can be done: basically, you limit the diffusion process to only the area you want to have edited. The rest of the image is left unchanged.
@RfMac
@RfMac 2 жыл бұрын
@@AICoffeeBreak yeah, I agree, your videos are awesome! I just met your channels and it covers so many recent papers! I'm watching a bunch of your videos hahah And is global image attention covered in some other video? Thanks for the content!
@muhammadwaseem_
@muhammadwaseem_ 9 ай бұрын
classifier-free guidance is explained well. Thank you
@AICoffeeBreak
@AICoffeeBreak 9 ай бұрын
Glad it was helpful!
@Yenrabbit
@Yenrabbit 2 жыл бұрын
What a great explainer video! Thanks for sharing 🙂
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Thanks for the feedback! ☺️
@Mutual_Information
@Mutual_Information 2 жыл бұрын
Very nice video! It's nice to see Diffusion models getting more attention. It seems the coolest AI generated art is all coming from diffusion models these days.
@sophiazell9517
@sophiazell9517 2 жыл бұрын
"Is this a weird hack? - Yes, it is!"
@theaicodes
@theaicodes 2 жыл бұрын
Nice video! very instructive!
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Glide you liked it! 😅
@gergerger53
@gergerger53 2 жыл бұрын
Great, as always
@alexvass
@alexvass Жыл бұрын
nice and clear
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Thank you so much! ☺️
@theeFaris
@theeFaris 2 жыл бұрын
very helpful thank you
@alexijohansen
@alexijohansen 2 жыл бұрын
Very nice video!
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Thank you! Cheers!
@shahaffinder5355
@shahaffinder5355 2 жыл бұрын
Great video :) One small mistake I would like to point out is at 6:30, where the example with the extra arrow is in fact a Markovian structure (Markov random field), but not a chain :)
@MakerBen
@MakerBen 2 жыл бұрын
Thanks!
@AICoffeeBreak
@AICoffeeBreak Жыл бұрын
Thanks a lot! 😀
@RfMac
@RfMac 2 жыл бұрын
I would like to give 1000 likes in this video!
@marcinelantkowski662
@marcinelantkowski662 2 жыл бұрын
I absolutely love your channel and the explanations you provide, thanks for all the great work you put into these videos! But here I don't fully get the intuition behind the step-wise denoising: At step T we ask the network to predict the noise from step T-1, correct? But the noise at step T-1 is indistinguishable from the noise at step T-2, T-3, ... T-n, no? Let's say we add some random noise only twice: img = (img + noise_1) + noise_2 It seems like a non-identifiable problem! I can imagine we could train the network to predict (noise_1 + noise_2), but it should be physically impossible to predict which pixels were corrupted by noise_1, which were corrupted by noise_2?
@declan6052
@declan6052 Ай бұрын
At 13:14 - Is this 'clip guided diffusion' done by adding a term to the loss function or via a different method?
@AICoffeeBreak
@AICoffeeBreak Ай бұрын
It's done by adding an image to the generated image during inference. This extra added image is computed via the gradient with respect to clip's output. It's a bit like deep dream, if you are old enough to know about it.
@Jupiter-Optimus-Maximus
@Jupiter-Optimus-Maximus Күн бұрын
Another great video, as usual! This little bean mutant of yours always puts a smile on my face ☺ Is it possible that it is actually an AI? For example, a transformer that converts language information into the facial expressions of the animated bean. That would be so cool 😎 I have a question: I am looking for training methods that are not based on backpropagation. Specifically, I want to avoid running backwards through the NNW again after the forward pass. Do you know of any algorithms like this? Already 2^10 * Thanks in advance 😄
@chainonsmanquants1630
@chainonsmanquants1630 2 жыл бұрын
thx
@lewingtonn
@lewingtonn 2 жыл бұрын
bless your soul!
@Youkouleleh
@Youkouleleh 2 жыл бұрын
Is it possible to create an embedding of an input image using a diffusion model? If the way to do it is to add noise, does the embedding still have interesting propreties ? I would not think so
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Maybe I lack imagination, but I also do not think so. The neural net representations just capture the noise diff, which is not really an image representation.
@Youkouleleh
@Youkouleleh 2 жыл бұрын
@@AICoffeeBreak I have another question, does the network used during the denoising part (predict the noise to remove it) is the same at every noise level, or is it N different models for each level of noise?
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
The same model for each step. :)
@Youkouleleh
@Youkouleleh 2 жыл бұрын
@@AICoffeeBreak Just for information, there are indeed "no single latent space" because the sampling procedure is stochastic. But that why some people proposed a deterministic approach to produce sample from the target distribution, DDIM (denoising diffusion implicit model) which does not require to retrian the DDPM but only changes the sampling algorithm and allows the concept of latent space and encoder for diffusion models.
@bhuvaneshs.k638
@bhuvaneshs.k638 2 жыл бұрын
How's unet becomes a Markov chain if there's skip connection? Can you explain this? I did get it exactly
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Not the Unet is markov, but the successions of steps where at each step, you apply a Unet or something else.
@renanmonteirobarbosa8129
@renanmonteirobarbosa8129 2 жыл бұрын
Letitia do you have a channel discord ?
@Sutirtha
@Sutirtha 2 жыл бұрын
Amazing video.. Any recommendations about the python code, to implement this model with any custom dataset?
@core6358
@core6358 2 жыл бұрын
you should do an update video now that dalle 2 and imagen are out and people are hyping them up
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
We already have a video on Imagen. 😅
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Imagen video. kzbin.info/www/bejne/rqKnlnSwZbpgiJY
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
And a DALL-E 2 secret language video. kzbin.info/www/bejne/g3_ahoWHbptlZ80
@lendrick
@lendrick 2 жыл бұрын
"open" AI
@BlissfulBasilisk
@BlissfulBasilisk 2 жыл бұрын
Teeeeext!
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
😂
@adr3000
@adr3000 2 жыл бұрын
Question: Can the NOISE ( input ) be used as a SEED to be highly-deterministic with the diffusion models outputs? (Assuming the trained model (PT or w/e) is the same?)
@ithork
@ithork 2 жыл бұрын
Can anybody recommend a video that describes how this works in less technical terms? Like explain it to an art major?
@aungkhant502
@aungkhant502 Жыл бұрын
What is intuition behind classifier free approach?
@Imhotep397
@Imhotep397 2 жыл бұрын
Does the diffusion model essentially work like Chuck Close’s art method, while CLIP actually finds the requisite parts that are to be put together to create the crazy images? Also, how do you even get an invite to Imagen or Dall-E to test this beyond all the possibly rigged samples they have up.
@aifirst9478
@aifirst9478 2 жыл бұрын
Thanks for this amazing video. Do you know any online course where we can practice with training diffusion models?
@hoami8320
@hoami8320 4 ай бұрын
i'm sorry, 😁 you can decode the architecture of Model meta llama 3
@DuskJockeysApps
@DuskJockeysApps 8 ай бұрын
Well I went to have a look at the Glide Text2im. To say I am not impressed would be an understatement. My prompt was "girl with short blonde hair, cherry blossom tattoos, pencil sketch". What did I get back, after 20 minutes? A crude drawing of 2 giraffes. And the one on the left is barely recognisable.
@peterplantec7911
@peterplantec7911 2 жыл бұрын
You lost me from time to time, but I think I have an overview now. I wish you have better explained how Diffusion models decide what they are going to use in their construction of the image. Sure It goes from noise to image, but If I use Ken Perlin's noise, it doesn't have any image component in it. So how does the diffusion model suck image information out of it?
@julius4858
@julius4858 2 жыл бұрын
„Open“ai
@jadtawil6143
@jadtawil6143 2 жыл бұрын
i like you
@DerPylz
@DerPylz 2 жыл бұрын
I like you, too
@bgspss
@bgspss 2 жыл бұрын
Can someone pls explain how exactl ythis model was inspired the nonequilibrium thermodynamics?
@DazzlingAction
@DazzlingAction 2 жыл бұрын
Why is everything a chain lately... kinda of laughable...
@stumby1073
@stumby1073 2 жыл бұрын
I'm so stupid
@ujjwaljain6416
@ujjwaljain6416 2 жыл бұрын
We really don't need that coffee bean jumping around in the video.
@diarykeeper
@diarykeeper 2 жыл бұрын
Give me vocal isolation. Spleeter and uvr are nice, but if image stuff can work this well, apply it to music. Gogogo
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 372 М.
Hoodie gets wicked makeover! 😲
00:47
Justin Flom
Рет қаралды 125 МЛН
The Ultimate Sausage Prank! Watch Their Reactions 😂🌭 #Unexpected
00:17
La La Life Shorts
Рет қаралды 6 МЛН
Увеличили моцареллу для @Lorenzo.bagnati
00:48
Кушать Хочу
Рет қаралды 7 МЛН
MAMBA and State Space Models explained | SSM explained
22:27
AI Coffee Break with Letitia
Рет қаралды 53 М.
How does Stable Diffusion work? - Latent Diffusion Models EXPLAINED
13:16
AI Coffee Break with Letitia
Рет қаралды 90 М.
Diffusion Models | Paper Explanation | Math Explained
33:27
Outlier
Рет қаралды 259 М.
Diffusion and Score-Based Generative Models
1:32:01
MITCBMM
Рет қаралды 81 М.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 212 М.
Diffusion models from scratch in PyTorch
30:54
DeepFindr
Рет қаралды 259 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57