Why Does Diffusion Work Better than Auto-Regression?

  Рет қаралды 99,039

Algorithmic Simplicity

Algorithmic Simplicity

Күн бұрын

Have you ever wondered how generative AI actually works? Well the short answer is, in exactly the same as way as regular AI!
In this video I break down the state of the art in generative AI - Auto-regressors and Denoising Diffusion models - and explain how this seemingly magical technology is all the result of curve fitting, like the rest of machine learning.
Come learn the differences (and similarities!) between auto-regression and diffusion, why these methods are needed to perform generation of complex natural data, and why diffusion models work better for image generation but are not used for text generation.
The following generative models were featured as demos in this video:
Images: Adobe Firefly (www.adobe.com/products/firefl...)
Text: ChatGPT (chat.openai.com)
Audio: Suno.ai (suno.ai)
Code: Gemini (gemini.google.com/app)
Video: Lumiere (Lumiere-video.github.io)
Chapters:
00:00 Intro to Generative AI
02:40 Why Naïve Generation Doesn't Work
03:52 Auto-regression
08:32 Generalized Auto-regression
11:43 Denoising Diffusion
14:19 Optimizations
14:30 Re-using Models and Causal Architectures
16:35 Diffusion Models Predict the Noise Instead of the Image
18:19 Conditional Generation
19:08 Classifier-free Guidance

Пікірлер: 135
@jupiterbjy
@jupiterbjy 4 күн бұрын
kinda sorry to my professors and seniors but this is the single best explanation of logics behind each models. About dozen min vid > 2 years of confusion in univ
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
Next video will be on Mamba/SSM/Linear RNNs!
@benjamindilorenzo
@benjamindilorenzo 2 ай бұрын
great! Also maybe think about the Tradeoff between scaling and incremental improvements, in case your perspective is, that LLM´s also always approximate the data set and therefore memorize rather than any "emergent capabilities". So that ChatGPT also does "only" curve fitting.
@harshvardhanv3873
@harshvardhanv3873 8 күн бұрын
I am student who is pursuing a degree in ai and we want more of your videos for even simplest of the concepts in ai, trust me this channel will be a huge deal in the near future, good luck!!
@user-my3dd4lu2k
@user-my3dd4lu2k Ай бұрын
Man I love the fact that you present the fundamental idea with an Intuitionistic approach, and then discuss the optimization.
@doku7335
@doku7335 2 күн бұрын
At first I thought "oh, another random video explaining the same basics and not adding anything new", but I was so wrong. It's an incredibly clear explanation of diffusion, and the start with the basic makes the full picture much clearer. Thank you for the video!
@pw7225
@pw7225 4 күн бұрын
The way you tell the story is fantastic! I am surprised that all AI/ML books are so terrible at didactics. We should always start at the intuition, the big picture, the motivation. The math comes later when the intuition is clear.
@yqisq6966
@yqisq6966 10 күн бұрын
The clearest and most concise explanation of diffusion model I've seen so far. Well done.
@jasdeepsinghgrover2470
@jasdeepsinghgrover2470 14 күн бұрын
This is a much better explanation than the diffusion paper itself. They just went all around variational inference to get the same result!
@Veptis
@Veptis 2 күн бұрын
This is a great explanation on how image decoders work. I haven't seen this approach and narrative direction yet. This now makes my reference for explaining it to people that got no idea.!
@pseudolimao
@pseudolimao 2 күн бұрын
this is insane. I feel bad for getting this level of content for free
@user-fh7tg3gf5p
@user-fh7tg3gf5p 3 ай бұрын
This genius only makes videos occassionally, that are not to be missed.
@justanotherbee7777
@justanotherbee7777 3 ай бұрын
absolutely true
@rafa_br34
@rafa_br34 11 күн бұрын
Such an underrated video, I love how you went from the basic concepts to complex ones and didn't just explain how it works but also the reason why other methods are not as good/efficient. I will definitely be looking forward to more of your content!
@Jack-gl2xw
@Jack-gl2xw 8 күн бұрын
I have trained my own diffusion models and it required me to do a deep dive of the literature. This is hands down the best video on the subject and covers so much helpful context that makes understanding diffusion models so much easier. I applaud your hard work, you have earned a subscriber!
@RicardoRamirez-dr6gc
@RicardoRamirez-dr6gc 9 күн бұрын
This is seriously one of the best explainer videos i've ever seen. I've spent a long time trying to understand diffusion models and not a single video has come close to this one
@benjamindilorenzo
@benjamindilorenzo 2 ай бұрын
Very good job. My suggestion is that you explain more about how it actually works, that the model learns to understand complete sceneries just from text prompts. This could fill its own video. Also it would be very nice to have a video about Diffusion Transformers like OpenAIs Sora probably is. Also it could be great to have a Video about the paper "Learning in High Dimension Always Amounts to Extrapolation". best wishes
@algorithmicsimplicity
@algorithmicsimplicity 2 ай бұрын
Thanks for the suggestions, I was planning to make a video about why neural networks generalize outside their training set from the perspective of algorithmic complexity. That paper "Learning in High Dimension Always Amounts to Extrapolation" essentially argues that the interpolation vs extrapolation distinction is meaningless for high dimensional data, and I agree, I don't think it is worth talking about interpolation/extrapolation at all when explaining neural network generalization.
@benjamindilorenzo
@benjamindilorenzo 2 ай бұрын
@@algorithmicsimplicity yes true. It would be great also because this links back to the LLM´s discussions, wether scaling up Transformers actually brings up "emergent capabilities", or if this is simple and less magical explainable by extrapolation. Or in other words: either people tend to believe, that Deep Learning Architectures like Transformers only approximating their training data set, or people tend to believe, that seemingly unexplainable or unexpected capabilities emerge while scaling. I believe, that extrapolation alone explains really good why LLM´s work so well, especially when scaled up AND that LLM´s "just" approximate their training data (curve fitting). This is why i brought this up ;)
@HD-Grand-Scheme-Unfolds
@HD-Grand-Scheme-Unfolds 14 күн бұрын
You truly understand how to simplify... to engage our imagination... to employ naive thought or ideas to make comparisons to bring across a deeper more core principles and concepts to make the subject for more easier to grasp and get an intuition for. Algorithmic Simplicity indeed... thank you for your style of presentation and teaching. love it love it... you make me know what question I want to ask but didn't know I wanted to ask. KZbin needs your contribution in ML education. please don't forget that.
@Frdyan
@Frdyan 23 сағат бұрын
I have a graduate degree in this shit and this is by far the clearest explanation of diffusion I've seen. Have you thought about doing a video running over the NN Zoo? I've used that as a starting point for lectures on NN and people seem to really connect with that paradigm
@CodeMonkeyNo42
@CodeMonkeyNo42 6 күн бұрын
Great video. Love the pacing and how you distiled the material into such an easy to watch video. Great job!
@karlnikolasalcala8208
@karlnikolasalcala8208 7 күн бұрын
This channel is gold, I'm glad I've randomly stumbled across one of your vids
@MeriaDuck
@MeriaDuck 5 сағат бұрын
This must be one of the best and concise explanations I've seen!
@Matyanson
@Matyanson 5 күн бұрын
Thank you for the explanation. I already knew a little bit about diffusion but this is exactly the way I'd hope to learn. Start from the simplest examples(usually historical) and progresivelly advance, explaining each optimisation!
@banana_lemon_melon
@banana_lemon_melon 6 күн бұрын
bruh, I loved your contents. Other channel/video usually explain general knowledge that can be easily found on internet. But you're going deeper to the intrinsic aspects of how the stuff works. This video, and one of your video about transformer, are really good.
@mrdr9534
@mrdr9534 4 күн бұрын
Thanks for taking the time and effort of making and sharing these videos and Your knowledge. Kudos and best regards
@JordanMetroidManiac
@JordanMetroidManiac 5 күн бұрын
I finally understand how models like Stable Diffusion work now! I tried understanding them before but got lost at the equation (17:50), but this video describes that equation very simply. Thank you!
@jcorey333
@jcorey333 3 ай бұрын
This is an amazing quality video! The best conceptual video on diffusion in AI I've ever seen. Thanks for making it! I'd love to see you cover RNNs.
@iestynne
@iestynne 4 күн бұрын
Wow, fantastic video. Such clear explanations. I learned a great deal from this. Thank you so much!
@anthonybernstein1626
@anthonybernstein1626 19 күн бұрын
I had a good idea how diffusion models work but I still learned a lot from this video. Thanks!
@justanotherbee7777
@justanotherbee7777 3 ай бұрын
A person with very less background can understand what he describes here.. commenting to make youtube so it gets recommended for other .. wonderful video! really good one
@ecla141
@ecla141 Күн бұрын
Awesome video! I would love to see a video about graph neural networks
@xaidopoulianou6577
@xaidopoulianou6577 9 күн бұрын
Very nicely and simply explained! Keep it up
@user-yj3mf1dk7b
@user-yj3mf1dk7b 7 күн бұрын
nice explanations, although, i've already knew about diffusion. examples from simplest to final diffusion -- were a really nice touch.
@abdelhakkhalil7684
@abdelhakkhalil7684 6 күн бұрын
This was a good watch, thank you :)
@akashmody9954
@akashmody9954 3 ай бұрын
Great video....already waiting for your next video
@anatolyr3589
@anatolyr3589 Ай бұрын
Great explanation!👍👍, I personally would like to see a video observing all major types of neural nets with their distinctions, specifics, advantages, disadvantages etc. the author explains very well 👏👏
@iancallegariaragao
@iancallegariaragao 3 ай бұрын
Great video and amazing content quality!
@tkimaginestudio
@tkimaginestudio 5 сағат бұрын
Great explanations, thank you!
@ShubhamSinghYoutube
@ShubhamSinghYoutube 11 сағат бұрын
Love the conclusion
@sanjeev.rao3791
@sanjeev.rao3791 22 сағат бұрын
Wow, that was a fantastic explanation.
@johnbolt2686
@johnbolt2686 5 күн бұрын
I would recommend reading about active inference to possibly understand the role of generative models in intelligence.
@RobotProctor
@RobotProctor 9 күн бұрын
I like to think of ML as a funky calculator. Instead of a calculator where you give it inputs and an operation and it gives you an output, you give it inputs and outputs and it gives you an operation. You said it's like curve fitting, which is the same thing, but I like thinking the words funky calculator because why not
@1.4142
@1.4142 3 ай бұрын
Some2 really brought out some good channels
@user-er9pw4qh6j
@user-er9pw4qh6j 16 күн бұрын
Soooo Good!!! Thanks for making it!!!!
@khangvutien2538
@khangvutien2538 7 күн бұрын
Thank you very much. I enjoyed the first part, the first 10 seconds. After, there are too any shortcuts in the explanations that I struugled to understand and be able to explain it again to myself. Still, I subscribed. As for suggestions for other videos, I'll check whether you have explained the U-Net already. If not I'd appreciate to have the same kind of explanation about it.
@Mhrn.Bzrafkn
@Mhrn.Bzrafkn 10 күн бұрын
It was too easy understanding👌🏻👌🏻
6 күн бұрын
I think it would help to mention that the auto-regressors may be viewing the image as a sequence of pixels (RGB vectors). Overall excellent video, extremely intuitive.
@algorithmicsimplicity
@algorithmicsimplicity 6 күн бұрын
In general, auto-regressors do not view images as a sequence. For example, PixelCNN uses convolutional layers and treats inputs as 2d images. Only sequential models such as recurrent neural networks would view the image as a sequence.
5 күн бұрын
@@algorithmicsimplicity of course, but I feel mentioning it may help with intuition as you’re walking through pixel by pixel image generation
@RobotProctor
@RobotProctor 9 күн бұрын
Thank you. This video is wonderful
@vijayaveluss9098
@vijayaveluss9098 7 күн бұрын
Great explanation
@ollie-d
@ollie-d Күн бұрын
Solid video!
@marcinstrzesak346
@marcinstrzesak346 14 күн бұрын
Very good video. Thank you
@paaabl0.
@paaabl0. 5 күн бұрын
Great video! Focus on the right elements.
@mojtabavalipour
@mojtabavalipour 6 күн бұрын
Well done!
@joaosousapinto3614
@joaosousapinto3614 12 күн бұрын
Great video, congrats.
@zephilde
@zephilde 11 күн бұрын
Great visualisation! Good job! Maybe next video on LoRA or ControlNet ?
@algorithmicsimplicity
@algorithmicsimplicity 11 күн бұрын
Great suggestions, I will put them on my TODO list.
@banana_lemon_melon
@banana_lemon_melon 6 күн бұрын
+1 for LoRA
@hmmmza
@hmmmza 3 ай бұрын
what a great rare content!
@AurL_69
@AurL_69 9 күн бұрын
thanks for explaining
@meanderthalensis
@meanderthalensis 8 күн бұрын
Great video!
@demohub
@demohub 7 күн бұрын
Just subscribed. Great video
@ArtOfTheProblem
@ArtOfTheProblem 12 күн бұрын
great work
@infographie
@infographie 7 күн бұрын
Excellent.
@pon1
@pon1 8 күн бұрын
Still feels like magic to me 🙌🙌
@ChristProg
@ChristProg 14 күн бұрын
Thank you So much Sir. Really interesting video. But i will like you to create a video on how the generative model uses the text promt during training. Thank you Sir. I subscribed !😊
@winstongraves8321
@winstongraves8321 8 күн бұрын
Great video
@kubaissen
@kubaissen 3 ай бұрын
Nice vid thx
@oculuscat
@oculuscat 9 күн бұрын
Diffusion doesn't necessarily work better than auto-regression. The "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" paper introduces an architecture they call VAR that upscales noise using an AR model and this currently out-performs all diffusion models in terms of speed and accuracy.
@mallow610
@mallow610 8 күн бұрын
Video is a banger
@aydr5412
@aydr5412 5 күн бұрын
Thank you for the video. Imao curve fitting is oversimplification, it destructs us from real problem - what and how being optimized. Also there is different perspective on cases there we prefer computational efficiency over training quality: with efficiency you can train model on more data and for longer time using same amount of computational resources which actually results in better model
@johnmorrell3187
@johnmorrell3187 4 күн бұрын
Curve fitting is optimization so I'd say the two explanations are equivalent. While it's true that a more efficient method -> longer training -> better behavior, it's also true that if compute and time really were not a limiting factor then these less efficient methods would give better final performance.
@IceMetalPunk
@IceMetalPunk 8 күн бұрын
And the newest/upcoming models seem to be tending more towards diffusion Transformers, which from my understanding is effectively a Transformer autoencoder with a diffusion model plugged in, applying diffusion directly to the latent space embeddings. Is that correct?
@hjups
@hjups 3 ай бұрын
Do you have a citation that supports your claim for eps vs x0 prediction? It's true that the first sampling step with x0 tends to produce a blurry / averaged result, but that's a result of the loss function used when training DDPMs. If you were to use something more complex or another NN, then you'd have a GAN, which don't produce blurry or averaged results on a single forward pass. Also, if you examine the output of x0 = noise - eps for the first step, it's both mathematically and visually equivalent to the first x0 prediction sample - a blurry / averaged result. The same thing is also true when predicting velocity, but velocity is arguably harder for a network to predict due to the phase transition.
@Blooper1980
@Blooper1980 10 күн бұрын
Finally I understand!
@recklessroges
@recklessroges Күн бұрын
Could you explain why the YOLO image classify is/was so effective? Thank you.
@muhammadaneeqasif572
@muhammadaneeqasif572 3 күн бұрын
can you please share the code that ubused for generation of the images in the demo. it will be very helpful
@zacklee5787
@zacklee5787 3 күн бұрын
Not sure I agree with some of your analysis here. The strength of diffision models doesn't come from the lower depedence of objects/pixels the model generates at once. In fact, as you mention, the model actually predicts a whole image, in practice, at every step. Even when you use the trick of predicting the noise, the noise is unintuitively not random, that is, not randomly generated, but actually depends completely on the noise or lack there of in the input. It is after all equivalent to predicting the whole image. The real strength comes from the incremental nature, that is, a step of the model further down the line can "fix" a mistake it made previously by interpreting the previous generation as noise. In the space of all say 1024x1024 pixel value combinations, there is a manifold (essentially a subset of close together images) of all target images we want to generate. The diffusion model learns to take incremental steps toward that subset of "reasonable" images from any random starting point.
@algorithmicsimplicity
@algorithmicsimplicity 3 күн бұрын
The noise is absolutely randomly generated. The reason the model can predict the noise (or equivalently image) is because it receives both the noise and image as input. If it was the case that the incremental nature helped, then I would expect diffusion models to generate higher quality outputs than auto-regressors, but this isn't the case. Auto-regressors generate higher quality outputs (e.g. arxiv.org/abs/2205.13554 ), they just take longer to run. If it was the case that NN are unable to give correct predictions on the first go, we would see the opposite, that diffusion models can correct previous generations and thereby achieve higher quality. Also see LLM which have no difficulty generating perfect outputs in one pass. Diffusion models only learn to take steps toward the data distribution starting at the standard normal distribution (origin).
@yk4r2
@yk4r2 4 сағат бұрын
Hey, could you kindly recommend more on causal architectures?
@algorithmicsimplicity
@algorithmicsimplicity 4 сағат бұрын
I haven't seen any material that cover them really well. There are basically 2 types of causal architectures, causal CNNs and causal transformers, with causal transformers being much more widely used in practice now. Causal transformers are also known as "decoder only transformers" ("encoders" uses regular self-attention layers, "decoders" use causal self-attention). If you search for encoder vs decoder-only transformers you should find some resources that explain the difference. Basically, to make a self-attention layer causal you mask the attention scores (i.e. set some to 0), so that words can only attend to words that came before them in the input. This makes it so that every word's vector only contains information from before it. This means you can use every word's vector to predict the word that comes after it, and it will be a valid prediction because that word's vector never got to attend (i.e. see) anything after it. So, it is as if you had applied the transformer to every subsequence of input words, except you only had to apply it once.
@alex65432
@alex65432 3 ай бұрын
Can you make a video about the loss landscape.Like what effects do different weight inits. Optimizers or architectures like resnet have.
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
Thanks for the interesting suggestion! I was already planning to do a video about why neural networks generalize outside of their training set, I should be able to talk about the loss landscape in that video.
@MilesBellas
@MilesBellas 9 күн бұрын
via Pi "Diffusion models and auto-regressive (AR) models are two popular approaches for generating images and other types of data. They differ in their fundamental techniques, generation time, and output quality. Here's a brief comparison: **Diffusion Models:** * Approach: Diffusion models are based on the idea of denoising images iteratively, starting from a noisy input and gradually refining it into a high-quality output. * Generation Time: Diffusion models are generally faster than AR models for image generation, especially when using optimizations like "asymmetric step" or Cascade models. * Output Quality: Diffusion models are known for generating high-quality and diverse images, especially when trained on large datasets like Stable Diffusion or DALL-E 2. They can capture various styles and generate coherent images with intricate details. **Auto-Regressive (AR) Models:** * Approach: AR models generate images pixel by pixel, conditioning each new pixel on previously generated pixels. This sequential approach makes AR models computationally expensive, especially for large images. * Generation Time: AR models tend to be slower than diffusion models due to their sequential nature. The generation time can be significantly longer for high-resolution images. * Output Quality: While AR models can produce high-quality images, they may struggle with capturing diverse styles or maintaining coherence across different image regions. They might require additional techniques, like classifier-free guidance or super-resolution, to achieve better results. In summary, diffusion models generally offer faster generation times and better output quality compared to AR models. However, both approaches have their strengths and limitations, and the choice between them depends on the specific use case, available computational resources, and desired generation speed and output quality."
@duytdl
@duytdl 8 күн бұрын
So why isn't diffusion better for text? Also are you saying that auto-regression is only bad because it's expensive to do (serially)? Or is diffusion fundamentally better for images?
@algorithmicsimplicity
@algorithmicsimplicity 8 күн бұрын
Auto-regression is only bad because it is slow, it produces better generations for both text and images. For text, there aren't that many tokens that you need to generate, so you can just use auto-regression: it gives better results. For images, you are forced to use something faster, and diffusion is much faster while producing nearly as good generations.
@HyperFocusMarshmallow
@HyperFocusMarshmallow 7 күн бұрын
A funny thing about watching a video like this is that you see an artificial neural network produce an image and then you have another layer of neural network in the brain that tries to figure out if it was a good match or not. The so called “blurry noise” could in principle look like a good match to someone and a bad match to someone else depending on how their own categorization works. It could also be good for everyone and bad for everyone of course or some arbitrary mix along that scale. The point is that “looks like blury noise” risks being a quite unobjective statement. I mean, people see images in the clouds and so on.
@quickdudley
@quickdudley 3 күн бұрын
My brain misinterpreted the title as "Why diffusers work better than autoencoders" (I believe because the noising process works rather like data augmentation)
@frommarkham424
@frommarkham424 9 күн бұрын
That was exactly how i guessed they did
@turhancan97
@turhancan97 10 күн бұрын
Is the idea at the beginning of the video (auto regression image generation) self supervised learning?
@algorithmicsimplicity
@algorithmicsimplicity 10 күн бұрын
Technically yes, self supervised learning just means that the labels used to train the model were created automatically from the data itself, instead of by a human. So yes both auto-regression and diffusion are self-supervised learning, since they automatically create masked/noised inputs and use the clean image as labels. Though usually when people refer to self-supervised learning specifically they mean self-supervised but not generative, so things like simCLR or contrastive learning.
@turhancan97
@turhancan97 10 күн бұрын
@@algorithmicsimplicity I understand. Thanks a lot :)
@akashmody9954
@akashmody9954 3 ай бұрын
Can you make a video on how SORA by OpenAI works, what kind of architecture does it follow
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
Unfortunately OpenAI does not publicly release details on their architectures, they only said it was a transformer based diffusion model. This thread had some speculation on the exact architecture though: threadreaderapp.com/thread/1758433676105310543.html
@akashmody9954
@akashmody9954 3 ай бұрын
Can you recommend some sources that i can follow if i want to do deeper into diffusion models and transformers?
@akashmody9954
@akashmody9954 3 ай бұрын
I tried to go through the research papers but the math is overwhelming
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
​@@akashmody9954 If you just want to learn how to train/use them, I'd highly recommend the fast.ai course by Jeremy Howard, it will give you practical experience using them. If you want to do research/develop new methods then I'm afraid there isn't any better option than just reading the papers. Although if code is available I sometimes find it easier to just read the code than the paper lol.
@akashmody9954
@akashmody9954 3 ай бұрын
@@algorithmicsimplicity alright.....thanks a lot man, and loving your videos as always
@EricPham-gr8pg
@EricPham-gr8pg 6 күн бұрын
Use lense projector and -zoom will save all the msthematical brain picking In video we use ccd cell in camera instantly illuminate LED pixel then zoom it down to tiny dot then send to ram and display on monitor by zoom factor corespond to resolutiom allow and zoom it back down when store it in time line of each coordinate and add all up with address and time then when unfold all we need is tiny dot first frame and last frame then start by last frame unfold into buffer subtract time but must adjust to phase angle of time at closest to last frame and just less tine drive with appropriate speed of each time axis so memory is so small
@joshjohnson259
@joshjohnson259 Күн бұрын
If this explanation is too advanced for me how would you recommend I learn enough to be able to grasp these concepts? Can you direct me to some content that is one level down in complexity so I can see if that would be my starting point in understanding how these models work? I don’t really have any CS background.
@algorithmicsimplicity
@algorithmicsimplicity Күн бұрын
If you just want to learn how to train/use these models, I would highly recommend the fast.ai course by Jeremy Howard (course.fast.ai/ ). You can also look at 3blue1brown's videos on neural networks and transformers which are aimed at a general audience, and Andrej Karpathy's videos on implementing a transformer from scratch for a more detailed walkthrough of the models.
@hamzaumair7909
@hamzaumair7909 Ай бұрын
I love your eplanations especially transfomers. Although this one imo could have been better, I think you are missing some ideas that should have been explained.
@algorithmicsimplicity
@algorithmicsimplicity Ай бұрын
Thanks for the feedback, any ideas in particular that you think should have been explained?
@sichengmao4038
@sichengmao4038 5 күн бұрын
can you explain why for diffusion model, there's no causal architecture? 16:26
@algorithmicsimplicity
@algorithmicsimplicity 5 күн бұрын
Basically its because NN layers accumulate information from multiple input features into one feature's vector. By making the layer only take in information from features before it in the AR order, you get a causal architecture with the same size as the original model. For diffusion, you could in principle make a causal architecture, but you would need to make a feature vector for every feature in every step of the noising process. i.e. the size of the model would need to be increased by a factor equal to the number of denoising steps, which isn't practical.
@sichengmao4038
@sichengmao4038 4 күн бұрын
@@algorithmicsimplicity don't quite understand why "the model size is increased by the number of denoising steps". What I imagine is, if we make an analogy to language model like Transformer, we now have a series of tokens (where each token is indeed a noisy image in the noising process), then we can still parallelize along the sequence dimension, isn't it?
@algorithmicsimplicity
@algorithmicsimplicity 4 күн бұрын
@@sichengmao4038 You could do that, the problem is how you convert the entire image into a token. Usually in order to convert an image into a feature vector, you need to apply a full-sized neural network. So to get your noisy image tokens you need to apply a NN for each noising step.
@agustinbs
@agustinbs 7 күн бұрын
This video is better than go to the MIT for machine learning degree. Man this is gold, thank you so much
@klaushermann6760
@klaushermann6760 6 күн бұрын
Now we know they're not only predictors.
@assgoblin3981
@assgoblin3981 2 ай бұрын
Assgoblin approves of this content
@chadarmstrong7458
@chadarmstrong7458 11 күн бұрын
I didnt understand why you would predict the noise rather than the clean image. Your explanation didnt seem to be related to the problem...
@chadarmstrong7458
@chadarmstrong7458 11 күн бұрын
"You get a blurry mess again" Why is that a problem in the early iterations?
@chadarmstrong7458
@chadarmstrong7458 11 күн бұрын
"The advanage of doing it this way is that now the model output is uncertain at the later stages of the generation process" Why is that valuable? Why is that relevant to this other problem with the early stages that you are supposedly trying to solve?
@chadarmstrong7458
@chadarmstrong7458 11 күн бұрын
"The average of a bunch of different noise samples which is still valid noise" Why does that matter?
@cakep4271
@cakep4271 10 күн бұрын
I think 🤔 the main points are, 1. Predicting a clean image directly is slow, not creative, expensive. 2. So instead of predicting an image outright, just learn to "un-blur", and run it a bunch of times, cuz thats a fast process. So now, you tell it a pic of random noise is a cat, and to unblur the cat, thereby making the noise slightly more like a cat. Repeat over and over again. Eventually you have a clean image of a cat.
@banana_lemon_melon
@banana_lemon_melon 6 күн бұрын
noisy image = clean image + noise . Now NN is given a noisy image as input, and output/predict the pure noise. Then we can do: clean image = noisy image (input) - noise (prediction output) . Predicting noise is easier than predicting image directly, maybe because the noise is having gaussian/normal distribution (not explained in this video, but we know regression can perform better if the target label has gaussian/normal distribution). I'm not sure about the distribution of pixel value in the images though.
@dubfather521
@dubfather521 5 күн бұрын
So denoising models work by predicting the clean image, and then to get the next step you noise its already clean output??? That doesn't make any sense. If it predicts the final image already why do you have to keep predicting.
@algorithmicsimplicity
@algorithmicsimplicity 5 күн бұрын
The first time it predicts the clean image, it will not produce a good image, it will produce a blurry mess (because it will average over all of the training images). You then add noise to this blurry mess and you get an image that is almost pure noise, with a little but of structure from the original blurry mess. Then you use that as input and predict a clean image again, this time the produced image will be slightly sharper, because now the model is only averaging over all inputs which are consistent with the blurry structure from the first step. You repeat this many times, at each step the produced image gets sharper because more detail is left from the previous step.
@dubfather521
@dubfather521 5 күн бұрын
@@algorithmicsimplicity ohhhhhh
@cognitive-carpenter
@cognitive-carpenter 8 күн бұрын
Enjoyed I think is the wrong output
@alexanderbrown-dg3sy
@alexanderbrown-dg3sy 4 күн бұрын
I just came here to hit on diffusion models 😂. AR all day…who wants smoke? Paper for paper? given a few mods..AR is superior.
@user-hp7dc4bv4j
@user-hp7dc4bv4j 22 күн бұрын
This is sexy
@umaktinah9167
@umaktinah9167 9 күн бұрын
this thing are not cheap...
@GRAYgauss
@GRAYgauss 15 сағат бұрын
"Out of nothing" Right, because billions of dollars of training time and lawsuits over datasets so huge they infringe the world is nothing.
@algorithmicsimplicity
@algorithmicsimplicity 15 сағат бұрын
I trained my image generator in about $0.50 training time on data licensed to be used for training ML models.
@GRAYgauss
@GRAYgauss 14 сағат бұрын
@@algorithmicsimplicity How much energy do you think it took to curate that data. Granted, 50 cents is great - we still had to develop our techniques expensively. Go back to 2008 with modern neural architectures but no datasets or applicative research and do it again. You're on the back of giants, just because the model code is simple in hindsight and easy to train with people having gone before you doesn't mean it came out of nothing, not even close. Granted, I take your point on the licensing and this instance having been cheap. How much did it take to put that cloud infrastructure into place?
@algorithmicsimplicity
@algorithmicsimplicity 14 сағат бұрын
@@GRAYgauss I never said the technique came from nothing, I said the model can generate data with nothing as input. And yes lots of effort went into developing modern machine learning, but that has nothing to do with generative AI. Modern AI still would have developed to solve image classification, build recommendation systems, design novel drugs, control fusion reactors, etc, etc. And it was trained on my PC.
@GRAYgauss
@GRAYgauss 12 сағат бұрын
@@algorithmicsimplicity ah, if that's what you said, then I just misheard. I still don't think it's out of nothing, even the image, unless the model also has no weights generated using other's work. (Or inanely, no weights at all) But it's weird to me that you don't think generative ai and ml is one and the same. It is.. Remember when we only had deep dream and the research progress we made thanks to studying that application? Even to this day diffusion models play the same bleeding edge investigatory role for more functional ml. Diffusion models and generative AI only exist for the thousands of man years put into ml, it's been a reciprocal development for the common tools necessitated. Certainly, generative AI didn't come out of nothing like an image from noise and pretrained weights also isn't nothing. Again, even if you trained them yourself, as the something was the datasets and the framework you didn't curate or write. (And if you did write the framework, it'd have to be ignorant of modern giants, or ad infinitum, you'd need to be a caveman. else the something was encoded in the model just like somethin you didn't form from the datasets is encoded in the weights.) I'm just saying a trained model isn't nothing, it's a whole lot of something, just pre baked. And whether you can train it independently or cheaply or not disregards a whole lot of something effort in the world to make that possible. But yeah I mean if you were never saying that I'm just being obsessive.
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
31:51
Algorithmic Simplicity
Рет қаралды 96 М.
HackSussex Coders' Cup 2024
2:47:00
HackSussex
Рет қаралды 242 М.
ШЕЛБИЛАР | bayGUYS
24:45
bayGUYS
Рет қаралды 714 М.
SHE WANTED CHIPS, BUT SHE GOT CARROTS 🤣🥕
00:19
OKUNJATA
Рет қаралды 13 МЛН
Don’t take steroids ! 🙏🙏
00:16
Tibo InShape
Рет қаралды 61 МЛН
16 POWODÓW DLACZEGO AI NIGDY NIE ZASTĄPI PROGRAMISTÓW
15:25
ModestProgrammer
Рет қаралды 12
Get Started with Julia Programming | Full Course
3:06:39
Numeryst
Рет қаралды 1,2 М.
I Made a Neural Network with just Redstone!
17:23
mattbatwings
Рет қаралды 139 М.
And this year's Turing Award goes to...
15:44
polylog
Рет қаралды 85 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 678 М.
What is Differentiable Programming
2:04
Numeryst
Рет қаралды 4,1 М.
Diffusion Models | Paper Explanation | Math Explained
33:27
Outlier
Рет қаралды 221 М.
But what is a neural network REALLY?
11:17
Algorithmic Simplicity
Рет қаралды 60 М.
The Next Generation Of Brain Mimicking AI
25:46
New Mind
Рет қаралды 61 М.
ШЕЛБИЛАР | bayGUYS
24:45
bayGUYS
Рет қаралды 714 М.