Why Does Diffusion Work Better than Auto-Regression?

  Рет қаралды 116,026

Algorithmic Simplicity

Algorithmic Simplicity

Күн бұрын

Have you ever wondered how generative AI actually works? Well the short answer is, in exactly the same as way as regular AI!
In this video I break down the state of the art in generative AI - Auto-regressors and Denoising Diffusion models - and explain how this seemingly magical technology is all the result of curve fitting, like the rest of machine learning.
Come learn the differences (and similarities!) between auto-regression and diffusion, why these methods are needed to perform generation of complex natural data, and why diffusion models work better for image generation but are not used for text generation.
The following generative models were featured as demos in this video:
Images: Adobe Firefly (www.adobe.com/products/firefl...)
Text: ChatGPT (chat.openai.com)
Audio: Suno.ai (suno.ai)
Code: Gemini (gemini.google.com/app)
Video: Lumiere (Lumiere-video.github.io)
Chapters:
00:00 Intro to Generative AI
02:40 Why Naïve Generation Doesn't Work
03:52 Auto-regression
08:32 Generalized Auto-regression
11:43 Denoising Diffusion
14:19 Optimizations
14:30 Re-using Models and Causal Architectures
16:35 Diffusion Models Predict the Noise Instead of the Image
18:19 Conditional Generation
19:08 Classifier-free Guidance

Пікірлер: 155
@doku7335
@doku7335 5 күн бұрын
At first I thought "oh, another random video explaining the same basics and not adding anything new", but I was so wrong. It's an incredibly clear explanation of diffusion, and the start with the basic makes the full picture much clearer. Thank you for the video!
@jupiterbjy
@jupiterbjy 7 күн бұрын
kinda sorry to my professors and seniors but this is the single best explanation of logics behind each models. About dozen min vid > 2 years of confusion in univ
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
Next video will be on Mamba/SSM/Linear RNNs!
@benjamindilorenzo
@benjamindilorenzo 2 ай бұрын
great! Also maybe think about the Tradeoff between scaling and incremental improvements, in case your perspective is, that LLM´s also always approximate the data set and therefore memorize rather than any "emergent capabilities". So that ChatGPT also does "only" curve fitting.
@harshvardhanv3873
@harshvardhanv3873 11 күн бұрын
I am student who is pursuing a degree in ai and we want more of your videos for even simplest of the concepts in ai, trust me this channel will be a huge deal in the near future, good luck!!
@QuantenMagier
@QuantenMagier Күн бұрын
Well take my subscription then!!1111
@user-my3dd4lu2k
@user-my3dd4lu2k Ай бұрын
Man I love the fact that you present the fundamental idea with an Intuitionistic approach, and then discuss the optimization.
@pseudolimao
@pseudolimao 5 күн бұрын
this is insane. I feel bad for getting this level of content for free
@yqisq6966
@yqisq6966 12 күн бұрын
The clearest and most concise explanation of diffusion model I've seen so far. Well done.
@pw7225
@pw7225 6 күн бұрын
The way you tell the story is fantastic! I am surprised that all AI/ML books are so terrible at didactics. We should always start at the intuition, the big picture, the motivation. The math comes later when the intuition is clear.
@user-fh7tg3gf5p
@user-fh7tg3gf5p 3 ай бұрын
This genius only makes videos occassionally, that are not to be missed.
@justanotherbee7777
@justanotherbee7777 3 ай бұрын
absolutely true
@Veptis
@Veptis 5 күн бұрын
This is a great explanation on how image decoders work. I haven't seen this approach and narrative direction yet. This now makes my reference for explaining it to people that got no idea.!
@rafa_br34
@rafa_br34 14 күн бұрын
Such an underrated video, I love how you went from the basic concepts to complex ones and didn't just explain how it works but also the reason why other methods are not as good/efficient. I will definitely be looking forward to more of your content!
@jasdeepsinghgrover2470
@jasdeepsinghgrover2470 17 күн бұрын
This is a much better explanation than the diffusion paper itself. They just went all around variational inference to get the same result!
@Jack-gl2xw
@Jack-gl2xw 11 күн бұрын
I have trained my own diffusion models and it required me to do a deep dive of the literature. This is hands down the best video on the subject and covers so much helpful context that makes understanding diffusion models so much easier. I applaud your hard work, you have earned a subscriber!
@Frdyan
@Frdyan 3 күн бұрын
I have a graduate degree in this shit and this is by far the clearest explanation of diffusion I've seen. Have you thought about doing a video running over the NN Zoo? I've used that as a starting point for lectures on NN and people seem to really connect with that paradigm
@RicardoRamirez-dr6gc
@RicardoRamirez-dr6gc 12 күн бұрын
This is seriously one of the best explainer videos i've ever seen. I've spent a long time trying to understand diffusion models and not a single video has come close to this one
@benjamindilorenzo
@benjamindilorenzo 2 ай бұрын
Very good job. My suggestion is that you explain more about how it actually works, that the model learns to understand complete sceneries just from text prompts. This could fill its own video. Also it would be very nice to have a video about Diffusion Transformers like OpenAIs Sora probably is. Also it could be great to have a Video about the paper "Learning in High Dimension Always Amounts to Extrapolation". best wishes
@algorithmicsimplicity
@algorithmicsimplicity 2 ай бұрын
Thanks for the suggestions, I was planning to make a video about why neural networks generalize outside their training set from the perspective of algorithmic complexity. That paper "Learning in High Dimension Always Amounts to Extrapolation" essentially argues that the interpolation vs extrapolation distinction is meaningless for high dimensional data, and I agree, I don't think it is worth talking about interpolation/extrapolation at all when explaining neural network generalization.
@benjamindilorenzo
@benjamindilorenzo 2 ай бұрын
@@algorithmicsimplicity yes true. It would be great also because this links back to the LLM´s discussions, wether scaling up Transformers actually brings up "emergent capabilities", or if this is simple and less magical explainable by extrapolation. Or in other words: either people tend to believe, that Deep Learning Architectures like Transformers only approximating their training data set, or people tend to believe, that seemingly unexplainable or unexpected capabilities emerge while scaling. I believe, that extrapolation alone explains really good why LLM´s work so well, especially when scaled up AND that LLM´s "just" approximate their training data (curve fitting). This is why i brought this up ;)
@shivamkaushik6637
@shivamkaushik6637 Күн бұрын
Never knew youtube could give random suggestion to videos like these. This was mind blowing. The way you teach is work of art.
@HD-Grand-Scheme-Unfolds
@HD-Grand-Scheme-Unfolds 16 күн бұрын
You truly understand how to simplify... to engage our imagination... to employ naive thought or ideas to make comparisons to bring across a deeper more core principles and concepts to make the subject for more easier to grasp and get an intuition for. Algorithmic Simplicity indeed... thank you for your style of presentation and teaching. love it love it... you make me know what question I want to ask but didn't know I wanted to ask. KZbin needs your contribution in ML education. please don't forget that.
@ecla141
@ecla141 3 күн бұрын
Awesome video! I would love to see a video about graph neural networks
@karlnikolasalcala8208
@karlnikolasalcala8208 10 күн бұрын
This channel is gold, I'm glad I've randomly stumbled across one of your vids
@MeriaDuck
@MeriaDuck 2 күн бұрын
This must be one of the best and concise explanations I've seen!
@banana_lemon_melon
@banana_lemon_melon 8 күн бұрын
bruh, I loved your contents. Other channel/video usually explain general knowledge that can be easily found on internet. But you're going deeper to the intrinsic aspects of how the stuff works. This video, and one of your video about transformer, are really good.
@jcorey333
@jcorey333 3 ай бұрын
This is an amazing quality video! The best conceptual video on diffusion in AI I've ever seen. Thanks for making it! I'd love to see you cover RNNs.
@CodeMonkeyNo42
@CodeMonkeyNo42 8 күн бұрын
Great video. Love the pacing and how you distiled the material into such an easy to watch video. Great job!
@Matyanson
@Matyanson 7 күн бұрын
Thank you for the explanation. I already knew a little bit about diffusion but this is exactly the way I'd hope to learn. Start from the simplest examples(usually historical) and progresivelly advance, explaining each optimisation!
@mrdr9534
@mrdr9534 7 күн бұрын
Thanks for taking the time and effort of making and sharing these videos and Your knowledge. Kudos and best regards
@iestynne
@iestynne 7 күн бұрын
Wow, fantastic video. Such clear explanations. I learned a great deal from this. Thank you so much!
@justanotherbee7777
@justanotherbee7777 3 ай бұрын
A person with very less background can understand what he describes here.. commenting to make youtube so it gets recommended for other .. wonderful video! really good one
@xaidopoulianou6577
@xaidopoulianou6577 11 күн бұрын
Very nicely and simply explained! Keep it up
@JordanMetroidManiac
@JordanMetroidManiac 7 күн бұрын
I finally understand how models like Stable Diffusion work now! I tried understanding them before but got lost at the equation (17:50), but this video describes that equation very simply. Thank you!
@1.4142
@1.4142 3 ай бұрын
Some2 really brought out some good channels
@abdelhakkhalil7684
@abdelhakkhalil7684 9 күн бұрын
This was a good watch, thank you :)
@tkimaginestudio
@tkimaginestudio 2 күн бұрын
Great explanations, thank you!
@user-yj3mf1dk7b
@user-yj3mf1dk7b 10 күн бұрын
nice explanations, although, i've already knew about diffusion. examples from simplest to final diffusion -- were a really nice touch.
@sanjeev.rao3791
@sanjeev.rao3791 3 күн бұрын
Wow, that was a fantastic explanation.
@anatolyr3589
@anatolyr3589 Ай бұрын
Great explanation!👍👍, I personally would like to see a video observing all major types of neural nets with their distinctions, specifics, advantages, disadvantages etc. the author explains very well 👏👏
@RobotProctor
@RobotProctor 12 күн бұрын
I like to think of ML as a funky calculator. Instead of a calculator where you give it inputs and an operation and it gives you an output, you give it inputs and outputs and it gives you an operation. You said it's like curve fitting, which is the same thing, but I like thinking the words funky calculator because why not
@iancallegariaragao
@iancallegariaragao 3 ай бұрын
Great video and amazing content quality!
@akashmody9954
@akashmody9954 3 ай бұрын
Great video....already waiting for your next video
@paaabl0.
@paaabl0. 8 күн бұрын
Great video! Focus on the right elements.
@ShubhamSinghYoutube
@ShubhamSinghYoutube 2 күн бұрын
Love the conclusion
8 күн бұрын
I think it would help to mention that the auto-regressors may be viewing the image as a sequence of pixels (RGB vectors). Overall excellent video, extremely intuitive.
@algorithmicsimplicity
@algorithmicsimplicity 8 күн бұрын
In general, auto-regressors do not view images as a sequence. For example, PixelCNN uses convolutional layers and treats inputs as 2d images. Only sequential models such as recurrent neural networks would view the image as a sequence.
8 күн бұрын
@@algorithmicsimplicity of course, but I feel mentioning it may help with intuition as you’re walking through pixel by pixel image generation
@user-er9pw4qh6j
@user-er9pw4qh6j 19 күн бұрын
Soooo Good!!! Thanks for making it!!!!
@Mhrn.Bzrafkn
@Mhrn.Bzrafkn 12 күн бұрын
It was too easy understanding👌🏻👌🏻
@khangvutien2538
@khangvutien2538 10 күн бұрын
Thank you very much. I enjoyed the first part, the first 10 seconds. After, there are too any shortcuts in the explanations that I struugled to understand and be able to explain it again to myself. Still, I subscribed. As for suggestions for other videos, I'll check whether you have explained the U-Net already. If not I'd appreciate to have the same kind of explanation about it.
@RobotProctor
@RobotProctor 12 күн бұрын
Thank you. This video is wonderful
@psl_schaefer
@psl_schaefer 2 күн бұрын
Amazing video!
@ollie-d
@ollie-d 4 күн бұрын
Solid video!
@vijayaveluss9098
@vijayaveluss9098 10 күн бұрын
Great explanation
@oculuscat
@oculuscat 11 күн бұрын
Diffusion doesn't necessarily work better than auto-regression. The "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" paper introduces an architecture they call VAR that upscales noise using an AR model and this currently out-performs all diffusion models in terms of speed and accuracy.
@marcinstrzesak346
@marcinstrzesak346 17 күн бұрын
Very good video. Thank you
@joaosousapinto3614
@joaosousapinto3614 15 күн бұрын
Great video, congrats.
@zephilde
@zephilde 14 күн бұрын
Great visualisation! Good job! Maybe next video on LoRA or ControlNet ?
@algorithmicsimplicity
@algorithmicsimplicity 14 күн бұрын
Great suggestions, I will put them on my TODO list.
@banana_lemon_melon
@banana_lemon_melon 8 күн бұрын
+1 for LoRA
@demohub
@demohub 9 күн бұрын
Just subscribed. Great video
@hmmmza
@hmmmza 3 ай бұрын
what a great rare content!
@meanderthalensis
@meanderthalensis 11 күн бұрын
Great video!
@mojtabavalipour
@mojtabavalipour 8 күн бұрын
Well done!
@AurL_69
@AurL_69 12 күн бұрын
thanks for explaining
@johnbolt2686
@johnbolt2686 8 күн бұрын
I would recommend reading about active inference to possibly understand the role of generative models in intelligence.
@aydr5412
@aydr5412 8 күн бұрын
Thank you for the video. Imao curve fitting is oversimplification, it destructs us from real problem - what and how being optimized. Also there is different perspective on cases there we prefer computational efficiency over training quality: with efficiency you can train model on more data and for longer time using same amount of computational resources which actually results in better model
@johnmorrell3187
@johnmorrell3187 6 күн бұрын
Curve fitting is optimization so I'd say the two explanations are equivalent. While it's true that a more efficient method -> longer training -> better behavior, it's also true that if compute and time really were not a limiting factor then these less efficient methods would give better final performance.
@ArtOfTheProblem
@ArtOfTheProblem 14 күн бұрын
great work
@pon1
@pon1 11 күн бұрын
Still feels like magic to me 🙌🙌
@mallow610
@mallow610 10 күн бұрын
Video is a banger
@ChristProg
@ChristProg 16 күн бұрын
Thank you So much Sir. Really interesting video. But i will like you to create a video on how the generative model uses the text promt during training. Thank you Sir. I subscribed !😊
@IceMetalPunk
@IceMetalPunk 10 күн бұрын
And the newest/upcoming models seem to be tending more towards diffusion Transformers, which from my understanding is effectively a Transformer autoencoder with a diffusion model plugged in, applying diffusion directly to the latent space embeddings. Is that correct?
@craftydoeseverything9718
@craftydoeseverything9718 Күн бұрын
This was genuinely such a great video. I honestly feel like I could come away from this video and implement an image generator myself :) /gen
@recklessroges
@recklessroges 4 күн бұрын
Could you explain why the YOLO image classify is/was so effective? Thank you.
@winstongraves8321
@winstongraves8321 11 күн бұрын
Great video
@IsaOzer-lx7sn
@IsaOzer-lx7sn Күн бұрын
I want to learn more about the causal architecture idea for auto regressors, but I can't seem to find anything about them anywhere. Do you know where I can read more about this topic?
@algorithmicsimplicity
@algorithmicsimplicity Күн бұрын
I haven't seen any material that cover them really well. There are basically 2 types of causal architectures, causal CNNs and causal transformers, with causal transformers being much more widely used in practice now. Causal transformers are also known as "decoder only transformers" ("encoders" uses regular self-attention layers, "decoders" use causal self-attention). If you search for encoder vs decoder-only transformers you should find some resources that explain the difference. Basically, to make a self-attention layer causal you mask the attention scores (i.e. set some to 0), so that words can only attend to words that came before them in the input. This makes it so that every word's vector only contains information from before it. This means you can use every word's vector to predict the word that comes after it, and it will be a valid prediction because that word's vector never got to attend (i.e. see) anything after it. So, it is as if you had applied the transformer to every subsequence of input words, except you only had to apply it once.
@anthonybernstein1626
@anthonybernstein1626 22 күн бұрын
I had a good idea how diffusion models work but I still learned a lot from this video. Thanks!
@infographie
@infographie 9 күн бұрын
Excellent.
@MilesBellas
@MilesBellas 11 күн бұрын
via Pi "Diffusion models and auto-regressive (AR) models are two popular approaches for generating images and other types of data. They differ in their fundamental techniques, generation time, and output quality. Here's a brief comparison: **Diffusion Models:** * Approach: Diffusion models are based on the idea of denoising images iteratively, starting from a noisy input and gradually refining it into a high-quality output. * Generation Time: Diffusion models are generally faster than AR models for image generation, especially when using optimizations like "asymmetric step" or Cascade models. * Output Quality: Diffusion models are known for generating high-quality and diverse images, especially when trained on large datasets like Stable Diffusion or DALL-E 2. They can capture various styles and generate coherent images with intricate details. **Auto-Regressive (AR) Models:** * Approach: AR models generate images pixel by pixel, conditioning each new pixel on previously generated pixels. This sequential approach makes AR models computationally expensive, especially for large images. * Generation Time: AR models tend to be slower than diffusion models due to their sequential nature. The generation time can be significantly longer for high-resolution images. * Output Quality: While AR models can produce high-quality images, they may struggle with capturing diverse styles or maintaining coherence across different image regions. They might require additional techniques, like classifier-free guidance or super-resolution, to achieve better results. In summary, diffusion models generally offer faster generation times and better output quality compared to AR models. However, both approaches have their strengths and limitations, and the choice between them depends on the specific use case, available computational resources, and desired generation speed and output quality."
@muhammadaneeqasif572
@muhammadaneeqasif572 5 күн бұрын
can you please share the code that ubused for generation of the images in the demo. it will be very helpful
@frommarkham424
@frommarkham424 11 күн бұрын
That was exactly how i guessed they did
@zacklee5787
@zacklee5787 6 күн бұрын
Not sure I agree with some of your analysis here. The strength of diffision models doesn't come from the lower depedence of objects/pixels the model generates at once. In fact, as you mention, the model actually predicts a whole image, in practice, at every step. Even when you use the trick of predicting the noise, the noise is unintuitively not random, that is, not randomly generated, but actually depends completely on the noise or lack there of in the input. It is after all equivalent to predicting the whole image. The real strength comes from the incremental nature, that is, a step of the model further down the line can "fix" a mistake it made previously by interpreting the previous generation as noise. In the space of all say 1024x1024 pixel value combinations, there is a manifold (essentially a subset of close together images) of all target images we want to generate. The diffusion model learns to take incremental steps toward that subset of "reasonable" images from any random starting point.
@algorithmicsimplicity
@algorithmicsimplicity 6 күн бұрын
The noise is absolutely randomly generated. The reason the model can predict the noise (or equivalently image) is because it receives both the noise and image as input. If it was the case that the incremental nature helped, then I would expect diffusion models to generate higher quality outputs than auto-regressors, but this isn't the case. Auto-regressors generate higher quality outputs (e.g. arxiv.org/abs/2205.13554 ), they just take longer to run. If it was the case that NN are unable to give correct predictions on the first go, we would see the opposite, that diffusion models can correct previous generations and thereby achieve higher quality. Also see LLM which have no difficulty generating perfect outputs in one pass. Diffusion models only learn to take steps toward the data distribution starting at the standard normal distribution (origin).
@hjups
@hjups 3 ай бұрын
Do you have a citation that supports your claim for eps vs x0 prediction? It's true that the first sampling step with x0 tends to produce a blurry / averaged result, but that's a result of the loss function used when training DDPMs. If you were to use something more complex or another NN, then you'd have a GAN, which don't produce blurry or averaged results on a single forward pass. Also, if you examine the output of x0 = noise - eps for the first step, it's both mathematically and visually equivalent to the first x0 prediction sample - a blurry / averaged result. The same thing is also true when predicting velocity, but velocity is arguably harder for a network to predict due to the phase transition.
@alirezaghazanfary
@alirezaghazanfary 2 күн бұрын
thanks to very good video I have a question: can't we make a model that decrease the resolution of a picture (for example a 4*4 picture to a 2*2 and to 1*1 picture) and run it reverse (generate a 2*2 from 1*1 and 4*4 from 2*2) ? would this model works?
@algorithmicsimplicity
@algorithmicsimplicity 2 күн бұрын
Yes you absolutely could, and according to this paper: arxiv.org/abs/2404.02905v1 it works pretty well.
@Blooper1980
@Blooper1980 12 күн бұрын
Finally I understand!
@HyperFocusMarshmallow
@HyperFocusMarshmallow 9 күн бұрын
A funny thing about watching a video like this is that you see an artificial neural network produce an image and then you have another layer of neural network in the brain that tries to figure out if it was a good match or not. The so called “blurry noise” could in principle look like a good match to someone and a bad match to someone else depending on how their own categorization works. It could also be good for everyone and bad for everyone of course or some arbitrary mix along that scale. The point is that “looks like blury noise” risks being a quite unobjective statement. I mean, people see images in the clouds and so on.
@kubaissen
@kubaissen 3 ай бұрын
Nice vid thx
@EricPham-gr8pg
@EricPham-gr8pg 9 күн бұрын
Use lense projector and -zoom will save all the msthematical brain picking In video we use ccd cell in camera instantly illuminate LED pixel then zoom it down to tiny dot then send to ram and display on monitor by zoom factor corespond to resolutiom allow and zoom it back down when store it in time line of each coordinate and add all up with address and time then when unfold all we need is tiny dot first frame and last frame then start by last frame unfold into buffer subtract time but must adjust to phase angle of time at closest to last frame and just less tine drive with appropriate speed of each time axis so memory is so small
@yk4r2
@yk4r2 2 күн бұрын
Hey, could you kindly recommend more on causal architectures?
@algorithmicsimplicity
@algorithmicsimplicity 2 күн бұрын
I haven't seen any material that cover them really well. There are basically 2 types of causal architectures, causal CNNs and causal transformers, with causal transformers being much more widely used in practice now. Causal transformers are also known as "decoder only transformers" ("encoders" uses regular self-attention layers, "decoders" use causal self-attention). If you search for encoder vs decoder-only transformers you should find some resources that explain the difference. Basically, to make a self-attention layer causal you mask the attention scores (i.e. set some to 0), so that words can only attend to words that came before them in the input. This makes it so that every word's vector only contains information from before it. This means you can use every word's vector to predict the word that comes after it, and it will be a valid prediction because that word's vector never got to attend (i.e. see) anything after it. So, it is as if you had applied the transformer to every subsequence of input words, except you only had to apply it once.
@iwaniw55
@iwaniw55 Күн бұрын
Hi @algorithmicsimplicity, I am curious which papers/material did you reference for the general autogressor? I cannot seem to find any info on using random spaced out pixels to predict the next batch of pixels. Any help would be appreciated. Also great videos!!!
@algorithmicsimplicity
@algorithmicsimplicity Күн бұрын
It is more widely known as "any-order autoregression", see e.g. this paper arxiv.org/abs/2205.13554
@iwaniw55
@iwaniw55 Күн бұрын
@@algorithmicsimplicity Thank you so much! This is exactly what I was missing.
@quickdudley
@quickdudley 6 күн бұрын
My brain misinterpreted the title as "Why diffusers work better than autoencoders" (I believe because the noising process works rather like data augmentation)
@alex65432
@alex65432 3 ай бұрын
Can you make a video about the loss landscape.Like what effects do different weight inits. Optimizers or architectures like resnet have.
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
Thanks for the interesting suggestion! I was already planning to do a video about why neural networks generalize outside of their training set, I should be able to talk about the loss landscape in that video.
@joshjohnson259
@joshjohnson259 4 күн бұрын
If this explanation is too advanced for me how would you recommend I learn enough to be able to grasp these concepts? Can you direct me to some content that is one level down in complexity so I can see if that would be my starting point in understanding how these models work? I don’t really have any CS background.
@algorithmicsimplicity
@algorithmicsimplicity 4 күн бұрын
If you just want to learn how to train/use these models, I would highly recommend the fast.ai course by Jeremy Howard (course.fast.ai/ ). You can also look at 3blue1brown's videos on neural networks and transformers which are aimed at a general audience, and Andrej Karpathy's videos on implementing a transformer from scratch for a more detailed walkthrough of the models.
@duytdl
@duytdl 10 күн бұрын
So why isn't diffusion better for text? Also are you saying that auto-regression is only bad because it's expensive to do (serially)? Or is diffusion fundamentally better for images?
@algorithmicsimplicity
@algorithmicsimplicity 10 күн бұрын
Auto-regression is only bad because it is slow, it produces better generations for both text and images. For text, there aren't that many tokens that you need to generate, so you can just use auto-regression: it gives better results. For images, you are forced to use something faster, and diffusion is much faster while producing nearly as good generations.
@turhancan97
@turhancan97 13 күн бұрын
Is the idea at the beginning of the video (auto regression image generation) self supervised learning?
@algorithmicsimplicity
@algorithmicsimplicity 13 күн бұрын
Technically yes, self supervised learning just means that the labels used to train the model were created automatically from the data itself, instead of by a human. So yes both auto-regression and diffusion are self-supervised learning, since they automatically create masked/noised inputs and use the clean image as labels. Though usually when people refer to self-supervised learning specifically they mean self-supervised but not generative, so things like simCLR or contrastive learning.
@turhancan97
@turhancan97 12 күн бұрын
@@algorithmicsimplicity I understand. Thanks a lot :)
@akashmody9954
@akashmody9954 3 ай бұрын
Can you recommend some sources that i can follow if i want to do deeper into diffusion models and transformers?
@akashmody9954
@akashmody9954 3 ай бұрын
I tried to go through the research papers but the math is overwhelming
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
​@@akashmody9954 If you just want to learn how to train/use them, I'd highly recommend the fast.ai course by Jeremy Howard, it will give you practical experience using them. If you want to do research/develop new methods then I'm afraid there isn't any better option than just reading the papers. Although if code is available I sometimes find it easier to just read the code than the paper lol.
@akashmody9954
@akashmody9954 3 ай бұрын
@@algorithmicsimplicity alright.....thanks a lot man, and loving your videos as always
@bj_
@bj_ Күн бұрын
Wait... so that meme of the missile guidance system that only knows where it is by first calculating all the places it isn't, actually applies to diffusion image generation too?
@klaushermann6760
@klaushermann6760 9 күн бұрын
Now we know they're not only predictors.
@hamzaumair7909
@hamzaumair7909 Ай бұрын
I love your eplanations especially transfomers. Although this one imo could have been better, I think you are missing some ideas that should have been explained.
@algorithmicsimplicity
@algorithmicsimplicity Ай бұрын
Thanks for the feedback, any ideas in particular that you think should have been explained?
@sichengmao4038
@sichengmao4038 7 күн бұрын
can you explain why for diffusion model, there's no causal architecture? 16:26
@algorithmicsimplicity
@algorithmicsimplicity 7 күн бұрын
Basically its because NN layers accumulate information from multiple input features into one feature's vector. By making the layer only take in information from features before it in the AR order, you get a causal architecture with the same size as the original model. For diffusion, you could in principle make a causal architecture, but you would need to make a feature vector for every feature in every step of the noising process. i.e. the size of the model would need to be increased by a factor equal to the number of denoising steps, which isn't practical.
@sichengmao4038
@sichengmao4038 7 күн бұрын
@@algorithmicsimplicity don't quite understand why "the model size is increased by the number of denoising steps". What I imagine is, if we make an analogy to language model like Transformer, we now have a series of tokens (where each token is indeed a noisy image in the noising process), then we can still parallelize along the sequence dimension, isn't it?
@algorithmicsimplicity
@algorithmicsimplicity 7 күн бұрын
@@sichengmao4038 You could do that, the problem is how you convert the entire image into a token. Usually in order to convert an image into a feature vector, you need to apply a full-sized neural network. So to get your noisy image tokens you need to apply a NN for each noising step.
@agustinbs
@agustinbs 10 күн бұрын
This video is better than go to the MIT for machine learning degree. Man this is gold, thank you so much
@JoeJoeTater
@JoeJoeTater 2 күн бұрын
18:10 This is wrong. The average of a bunch of noisy images is a less-noisy image. (See "regression towards the mean") You'd have to normalize that averaged image.
@algorithmicsimplicity
@algorithmicsimplicity 2 күн бұрын
Right, I should have been more careful with my usage of the word "noisy". If you average a bunch of samples from a normal distribution, the result is a sample with less variance (i.e. less noisy). What I meant to say was the probability of the average under the normal distribution is higher (i.e. the result is closer to the origin). So the average still lies within the data manifold (as opposed to images, where the average moves outside the data manifold).
@fayezsalka
@fayezsalka Күн бұрын
Yes, that was very confusing to me too. The average of a bunch of random noise samples is 0.5, which is the mean. You would literally get a smooth grey image. Not “noise” image as shown in the video
@akashmody9954
@akashmody9954 3 ай бұрын
Can you make a video on how SORA by OpenAI works, what kind of architecture does it follow
@algorithmicsimplicity
@algorithmicsimplicity 3 ай бұрын
Unfortunately OpenAI does not publicly release details on their architectures, they only said it was a transformer based diffusion model. This thread had some speculation on the exact architecture though: threadreaderapp.com/thread/1758433676105310543.html
@assgoblin3981
@assgoblin3981 3 ай бұрын
Assgoblin approves of this content
@craftydoeseverything9718
@craftydoeseverything9718 Күн бұрын
17:58 btw, you wrote "nose", instead of "noise"
@algorithmicsimplicity
@algorithmicsimplicity Күн бұрын
So I did. Surprised no-one else mentioned it yet lol.
@glaubherrocha2935
@glaubherrocha2935 Күн бұрын
a fixed pixel with random color wouldn't make it work?
@algorithmicsimplicity
@algorithmicsimplicity Күн бұрын
I'm not sure what you are asking, can you elaborate?
@dubfather521
@dubfather521 8 күн бұрын
So denoising models work by predicting the clean image, and then to get the next step you noise its already clean output??? That doesn't make any sense. If it predicts the final image already why do you have to keep predicting.
@algorithmicsimplicity
@algorithmicsimplicity 8 күн бұрын
The first time it predicts the clean image, it will not produce a good image, it will produce a blurry mess (because it will average over all of the training images). You then add noise to this blurry mess and you get an image that is almost pure noise, with a little but of structure from the original blurry mess. Then you use that as input and predict a clean image again, this time the produced image will be slightly sharper, because now the model is only averaging over all inputs which are consistent with the blurry structure from the first step. You repeat this many times, at each step the produced image gets sharper because more detail is left from the previous step.
@dubfather521
@dubfather521 8 күн бұрын
@@algorithmicsimplicity ohhhhhh
@cognitive-carpenter
@cognitive-carpenter 11 күн бұрын
Enjoyed I think is the wrong output
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
31:51
Algorithmic Simplicity
Рет қаралды 100 М.
The most impenetrable game in the world🐶?
00:13
LOL
Рет қаралды 23 МЛН
WHY DOES SHE HAVE A REWARD? #youtubecreatorawards
00:41
Levsob
Рет қаралды 29 МЛН
КАК СПРЯТАТЬ КОНФЕТЫ
00:59
123 GO! Shorts Russian
Рет қаралды 2,6 МЛН
Mapping GPT revealed something strange...
1:09:14
Machine Learning Street Talk
Рет қаралды 98 М.
Stable Diffusion explained (in less than 10 minutes)
9:56
Render Realm
Рет қаралды 1,3 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 696 М.
How Many ERRORS Can You Fit in a Video?!
20:40
ElectroBOOM
Рет қаралды 736 М.
Fine Tuning Mistral v3.0 With Custom Data
6:58
Mosleh Mahamud
Рет қаралды 456
I Made a Neural Network with just Redstone!
17:23
mattbatwings
Рет қаралды 202 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 219 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
The most impenetrable game in the world🐶?
00:13
LOL
Рет қаралды 23 МЛН