Пікірлер
@tryptamedia7375
@tryptamedia7375 14 сағат бұрын
So do the recent large world model breakthroughs of Sora, Luma, Runway alpha imply that we've returned to auto regressive? Are they a combo of the two? Amazing video, would love to hear your thoughts!
@algorithmicsimplicity
@algorithmicsimplicity 6 сағат бұрын
From what little they have released publicly, it seems that they are simply diffusion models applied to videos, i.e. they treat videos as a collection of frames, add noise to all frames, take all noisy frames as input and try to predict all clean frames. I don't think there is any auto-regression done, but maybe that will change when they start generating longer videos.
@dschonhaut
@dschonhaut Күн бұрын
Coming from a background in computational neuroscience, but with no prior NN experience beyond coding a perception on the MNIST dataset for a class years ago, I’ve been watching all your videos lately as a first-pass before delving into these topics more rigorously. The visualizations and your concise summaries that accompany them are hugely helpful! Thanks for all your work on these educational videos, and for releasing them for free! Out of curiosity, how are you scripting your animations? Is it Manim (the @3blue1brown library)? And would you recommend it?
@algorithmicsimplicity
@algorithmicsimplicity Күн бұрын
Thanks for the feedback! I used Manim for the first 3 videos on my channel, and then decided to write my own animation library because I found Manim annoying to use. My animation library is still a work in progress, so my most recent 2 videos are made using both my own library and Manim. I would say that Manim is fine to use for simple animations/graphs, but once you start trying to animate lots of objects at once it gets annoying.
@looooool3145
@looooool3145 Күн бұрын
i now understand things, thanks!
@downloadableram2666
@downloadableram2666 2 күн бұрын
State-space models are not necessarily from ML, they're used a lot in control systems actually. Not surprised by their relationship considering both are strongly based on linear algebra.
@downloadableram2666
@downloadableram2666 2 күн бұрын
I wrote my own CUDA-based neural network implementation a while ago, but I used a sigmoid instead of RELU. Although the "training" covered in this video works, it was kinda odd (usually neural net backpropagation is done with gradient descent of the cost function). You probably cover this in another video, though, haven't gotten there yet. Good video nonetheless, never really thought about it this way.
@algorithmicsimplicity
@algorithmicsimplicity 2 күн бұрын
The training method described in this video IS gradient descent. Backprop is just one way of computing gradients (and it is usually used because it is the fastest). It will yield the same results as this training procedure.
@downloadableram2666
@downloadableram2666 2 күн бұрын
@@algorithmicsimplicity I should have figured it was doing much the same thing, just a lot slower and more 'random.' Your videos are awesome though, I'm trying to go into machine learning after doing aerospace embedded systems.
@Paraxite7
@Paraxite7 3 күн бұрын
I finally understand MAMBA! I've been trying to get my head around it for months, but now see that approaching the way the original paper stated wasn't the best way. Thank you.
@heyalchang
@heyalchang 3 күн бұрын
Thanks! That was a really excellent explanation.
@algorithmicsimplicity
@algorithmicsimplicity 3 күн бұрын
Thank you so much!
@chaseghoste8671
@chaseghoste8671 3 күн бұрын
man, what can I say
@laurentkloeble5872
@laurentkloeble5872 3 күн бұрын
very original. I learned so much in few minutes. thank you
@nothreeshoes1200
@nothreeshoes1200 5 күн бұрын
Please make more videos. They’re fantastic!
@user-ou3ts4hl7p
@user-ou3ts4hl7p 6 күн бұрын
Very good video. I get to konw the straigforward reason: why diffusion idea emerges and why diffusion is intrinsically better than autogression algorithm.
@fractalinfinity
@fractalinfinity 7 күн бұрын
I get it now! 🎉 thanks!
@alenqquin4509
@alenqquin4509 8 күн бұрын
A very good job, I have deepened my understanding of generative AI
@freerockneverdrop1236
@freerockneverdrop1236 8 күн бұрын
Neural Network can be simple, if you you are Neo :)
@Real-HumanBeing
@Real-HumanBeing 8 күн бұрын
In exactly the same way as regular AI, by containing the dataset.
@yushuowang7820
@yushuowang7820 9 күн бұрын
VAE: what about me?
@lialkalo4093
@lialkalo4093 9 күн бұрын
very good explanation
@freerockneverdrop1236
@freerockneverdrop1236 10 күн бұрын
This is very nice but I think there is one thing either wrong or not clear. At 11:40, he sais that in order to minimize the steps to generate the image, we generate multiple pixels in one step that are scattered around and independent to each other. This way we will not get the blurred image. This works in the latter steps when enough pixels have been generated and they guide the generation of the next batch of pixels. E.g., we know it is a car and we just try to add more details. But in the early steps, when we don't know what it is, if multiple pixels are generated, we fall back to average issue again, e.g. one pixel is for a car but another one is for a bridge. Therefore, I thinks in early steps, we can only generate one pixels each step. What do you think?
@algorithmicsimplicity
@algorithmicsimplicity 10 күн бұрын
You are correct that at the early steps there is large amount of uncertainty over the value of each pixel. But what matters is how much this uncertainty is reduced by knowing the value of the previously generated pixels. Even at the first step, knowing the value of one pixel does not help that much in determining the value of far away pixels. Let's say one pixel in the center is blue, does that make you significantly more confident about the color of the top left pixel? Not really.
@freerockneverdrop1236
@freerockneverdrop1236 7 күн бұрын
@@algorithmicsimplicity hi, glad to see your reply. Let me articulate. Let's just create the first 2 pixels in two ways and see the difference. First approach is create them one by one, and the second approach is to create both in one step. First approach first, at the beginning, there is no pixel, when we create one, not problem, there will be no inconsistency. Then we create the second pixel based on the first one, there are still a lot of possibilities. But it is restrained by the first pixel. On the other hand, for approach two, both pixels are created in the same step, then they cannot restrain each other. Then there could be more combinations. Some of them are include inconsistency. To avoid such thing, we should generate pixels one by one until the theme of the image is set. Then we can create batch by batch. A simplified example, if the model is trained with images include only green pixels of different strength, or red pixels of different strength, then approach one will generate second pixel of the same color as the first one, though probably with different strength. But approach 2 could generate one red and one green.
@Levy1111
@Levy1111 10 күн бұрын
I do hope you'll soon get at least 6 figures subscribers count. The quality of your videos (both in terms of education and presentation) is top notch, people need you to become popular (at least within our small tech bubble).
@official_noself
@official_noself 11 күн бұрын
480p? 2023? Are you kidding me?
@LeYuzer
@LeYuzer 12 күн бұрын
Tips: turn on 1.5x
@pshirvalkar
@pshirvalkar 12 күн бұрын
A fantastic teacher!!! Thanks! Can you please cover Bayesian monte carlo markov chains? Would be very helpful!
@algorithmicsimplicity
@algorithmicsimplicity 12 күн бұрын
Thanks for the suggestion, I will put it on my TODO list.
@alexanderterry187
@alexanderterry187 12 күн бұрын
How do the models deal with having different numbers of inputs? E.g. the text label provided can be any length, or not provided at all. I'm sure this is a basic question, but whenever I've used NNs previously they've always had a constant number of inputs or been reapplied to a sequence of data that has the same dimension at each step.
@algorithmicsimplicity
@algorithmicsimplicity 12 күн бұрын
For image input, the input is always the same size (same image size and same channels), and the output is always the same size (1 pixel). For text, you can also treat the inputs as all being the same size by padding smaller inputs up to a fixed max length, though transformers can also operate on sequences of different lengths. The output for text is always the same size (a probability distribution over tokens in the vocabulary).
@karigucio
@karigucio 12 күн бұрын
so the transformation applied to the weights does not concern purely with initialization? instead, in the expression w=exp(-exp(a)*exp(ib)) numbers a and b are the learned parameters and not w, right?
@algorithmicsimplicity
@algorithmicsimplicity 12 күн бұрын
Yes a and b are the learned parameters.
@matthewfynn7635
@matthewfynn7635 13 күн бұрын
I have been working with machine learning models for years and this is the first time i have truly understood through visualisation the use of ReLU activation functions! Great video
@jaimeduncan6167
@jaimeduncan6167 13 күн бұрын
Out of nothing? no, it grabs people's work and creates a composite with variations.
@algorithmicsimplicity
@algorithmicsimplicity 13 күн бұрын
It isn't correct to say it creates a 'composite' with variations, models can generalize outside of their training dataset in certain ways, and generative models are capable of creating entirely new things that aren't present in the training dataset.
@MichaelBrown-gt4qi
@MichaelBrown-gt4qi 13 күн бұрын
I've started binge watching all your videos. 😁
@fergalhennessy775
@fergalhennessy775 14 күн бұрын
do u have a mewing routine bro love from north korea
@MichaelBrown-gt4qi
@MichaelBrown-gt4qi 14 күн бұрын
This is a great video. I have watched videos in the past (years ago) talk about auto-regression and more lately talk about diffusion. But it's nice to see why and how there was such a jump between the two. Amazing! However, I feel this video is a little incomplete when there was no mention of the enhancer model that "cleans up" the final generated image. This enhancing model is able to create a larger image while cleaning up the six fingers gen AI is so famous for. While not technically a part of the diffusion process (because it has no random noise) it is a valuable addition to image gen if anyone is trying to build their own model.
@capcadaverman
@capcadaverman 14 күн бұрын
Not made from nothing. Made by training on real people’s intellectual property. 😂
@algorithmicsimplicity
@algorithmicsimplicity 14 күн бұрын
My image generator was trained on data licensed to be used for training machine learning models.
@capcadaverman
@capcadaverman 14 күн бұрын
@@algorithmicsimplicity not everyone is so ethical
@telotawa
@telotawa 14 күн бұрын
could diffusion work on text generation?
@algorithmicsimplicity
@algorithmicsimplicity 14 күн бұрын
Yes, it absolutely can! Instead of adding normally distributed noise, you randomly mask tokens with some probability, see e.g. arxiv.org/abs/2406.04329 . That said, it tends to produce a bit worse quality text than auto-regression (actually this is true for images as well, it's just on images auto-regression takes too long to be viable.)
@LordDoucheBags
@LordDoucheBags 14 күн бұрын
What did you mean by causal architectures? Because when I search online I get stuff about causal inference, so I’m guessing there’s a different and more popular term for what you’re referring to?
@julioalmeida4645
@julioalmeida4645 14 күн бұрын
Damn. Amazing piece
@dmitrii.zyrianov
@dmitrii.zyrianov 14 күн бұрын
Hey! Thanks for the video, it is very informative! I have a question. At 18:17 you say that an average of a bunch of noise is still a valid noise. I'm not sure why it is true here. I'd expect the average of a bunch of noise to be just 0.5 value (if we map rgb values to 0..1 range)
@algorithmicsimplicity
@algorithmicsimplicity 14 күн бұрын
Right, the average is just the center of the noise distribution which, let's say the color values are mapped from -1 to 1, is 0. This average doesn't look like noise (it is just a solid grey image), but if you ask what is the probability of this image under the noise distribution, it actually has the highest probability. The noise distribution is a normal distribution centered at 0, so the input which is all 0 has the highest probability. So the average image still lies within the noise distribution, as opposed to natural images where the average moves outside the data distribution
@dmitrii.zyrianov
@dmitrii.zyrianov 13 күн бұрын
Thank you for the reply, I think I got it now
@simonpenelle2574
@simonpenelle2574 14 күн бұрын
Amazing content I now want to implement this
@dadogwitdabignose
@dadogwitdabignose 15 күн бұрын
great video man suggestion: can you create a video on how generative transformers work? this has been really bothering me and hearing an in-depth explanation of them, like your video here, would be helpful!
@algorithmicsimplicity
@algorithmicsimplicity 15 күн бұрын
Generative transformers work in exactly the same way as generative CNNs. It doesn't matter what backbone you use the idea is the same, you will use auto-regression or diffusion to train a transformer to undo the masking/noising process.
@dadogwitdabignose
@dadogwitdabignose 15 күн бұрын
@@algorithmicsimplicity which is more efficient to use and how do they handle text data to map text into tensors?
@algorithmicsimplicity
@algorithmicsimplicity 15 күн бұрын
@@dadogwitdabignose I explain how transformer classifiers work in this video: kzbin.info/www/bejne/oYivlpdupJqAaLs . As for which is more efficient, it depends on the data. Usually for text data transformers will be more efficient (for reasons I explain in that video), and for images CNNs will be more efficient.
@Kavukamari
@Kavukamari 15 күн бұрын
"i can do eleventy kajillion computations every second" "okay, what's your memory throughput"
@deep.space.12
@deep.space.12 15 күн бұрын
If there will be a longer version of this video, it might be worth mentioning VAE as well.
@algorithmicsimplicity
@algorithmicsimplicity 15 күн бұрын
Thanks for the suggestion.
@wormjuice7772
@wormjuice7772 16 күн бұрын
This has helped me so much wrapping my head around this whole subject! Thank you for now, and the future!
@codybarton2090
@codybarton2090 16 күн бұрын
Crazy video
@gameboyplayer217
@gameboyplayer217 16 күн бұрын
Nicely explained
@snippletrap
@snippletrap 17 күн бұрын
Fantastic explanation. Very intuitive
@ibrahimaba8966
@ibrahimaba8966 17 күн бұрын
Thank you for this beautiful work!
@algorithmicsimplicity
@algorithmicsimplicity 17 күн бұрын
Thank you very much!
@boogati9221
@boogati9221 17 күн бұрын
Crazy how two separate ideas ended up converging into one nearly identical solution.
@andrewy2957
@andrewy2957 6 күн бұрын
Totally agree. I feel like that's pretty common in math, robotics, and computer science, but it just shows how every field in stem is interconnected.
@mattshannon5111
@mattshannon5111 17 күн бұрын
Wow, it requires really deep understanding and a lot of work to make videos this clear that are also so correct and insightful. Very impressive!
@vibaj16
@vibaj16 18 күн бұрын
wait, can this be used as a ray tracing denoiser? That is, you'd plug your noisy ray traced image into one of the later steps of the diffusion model, so the model tries to make it clear?
@algorithmicsimplicity
@algorithmicsimplicity 18 күн бұрын
Yep you could definitely do that, you would probably need to train a model on some examples of noisy ray traced images though.
@Maxawa0851
@Maxawa0851 17 күн бұрын
Yeah but this is very slow do
@antongromek4180
@antongromek4180 18 күн бұрын
Actually, there is no LLM, etc - but 500 million nerds - sitting in basements all over the world.
@artkuts4792
@artkuts4792 18 күн бұрын
I still didn't get how the scoring model works. So before you were labeling the important pairs by hand giving it a score based on the semantic value each pair has for a given context, but then it's done automatically by a CNN, how does it define the score though (and it's context free, isn't it)?
@algorithmicsimplicity
@algorithmicsimplicity 18 күн бұрын
The entire model is trained end-to-end to minimize the training loss. To start off with, the scoring functions are completely random, but during training they will change to output scores which are useful, i.e. which cause the model's final prediction to better match the training labels. In practice it turns out that what these scoring functions learn while trying to be useful is very similar to the 'semantic scoring' that a human would do.
@lusayonyondo9111
@lusayonyondo9111 19 күн бұрын
wow, this is such an amazing resource. I'm glad I stuck around. This is literally the first time this is all making sense to me.
@istoleyourfridgecall911
@istoleyourfridgecall911 19 күн бұрын
Hands down the best video that explains how these models work. I love that you explain these topics in a way that resembles how the researchers created these models. Your video shows the thinking process behind these models, combined with great animated examples, it is so easy to understand. You really went all out. Only if youtube promoted these kinds of videos instead of brainrot low quality videos made by inexperienced teenagers.