Flow Matching for Generative Modeling (Paper Explained)

Рет қаралды 34,978

Ай бұрын

Flow matching is a more general method than diffusion and serves as the basis for models like Stable Diffusion 3.
Paper: arxiv.org/abs/2210.02747
Abstract:
We introduce a new paradigm for generative modeling built on Continuous Normalizing Flows (CNFs), allowing us to train CNFs at unprecedented scale. Specifically, we present the notion of Flow Matching (FM), a simulation-free approach for training CNFs based on regressing vector fields of fixed conditional probability paths. Flow Matching is compatible with a general family of Gaussian probability paths for transforming between noise and data samples -- which subsumes existing diffusion paths as specific instances. Interestingly, we find that employing FM with diffusion paths results in a more robust and stable alternative for training diffusion models. Furthermore, Flow Matching opens the door to training CNFs with other, non-diffusion probability paths. An instance of particular interest is using Optimal Transport (OT) displacement interpolation to define the conditional probability paths. These paths are more efficient than diffusion paths, provide faster training and sampling, and result in better generalization. Training CNFs using Flow Matching on ImageNet leads to consistently better performance than alternative diffusion-based methods in terms of both likelihood and sample quality, and allows fast and reliable sample generation using off-the-shelf numerical ODE solvers.
Authors: Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, Matt Le
Links:
Homepage: ykilcher.com
Merch: ykilcher.com/merch
KZbin: / yannickilcher
Twitter: / ykilcher
Discord: ykilcher.com/discord
LinkedIn: / ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 74

@zerotwo7319 Ай бұрын

A Jedi himself is teaching us about generative AI. I couldn't be more grateful.

@amedeobiolatti216 Ай бұрын

These are not the papers you are looking for 🖐

@MilesBellas Ай бұрын

Hu-Po = Yoda ?😅

@blacksages Ай бұрын

Man, I have a presentation to do on this paper in a few days, but I've been stuck on it, you just make it so much clearer thank youuu! All the step by step and reminders you put in your video are so helpful, I've been through Y. Lipman presentation and he just glosses over these things because they're too obvious, but you don't and I'm so grateful!

@guillaumevermeillesanchezm2427 Ай бұрын

I see a video from Yannic on a Monday, I click like.

@jonatan01i Ай бұрын

"that's a dog in a hat. I'm very very sorry"

@ArkyonVeil Ай бұрын

Thank you for the indepth analysis. I personally only have passing interest in the content of these videos but I find listening to them a relaxing experience. And as a bonus, learn something useful every now and then. Cheers

@diga4696 Ай бұрын

Best birthday gift ever!

@YannicKilcher Ай бұрын

Happy birthday

@xplained6486 Ай бұрын

Insane video yannic, your explanation was superb! Keep up the great work

@Kram1032 Ай бұрын

very cool stuff. Interesting how, in the optimal flow version, the shape (in their examples) does indeed get matched sooner, but initially it looks kinda small, and only then reaches its full size. I guess that amounts to them hitting the shape sooner than the distribution is even able to spread out in full whereas in the original diffusion process, you'd first do the spreading out, and only *then* hone in on the result

@sergiomanuel2206 28 күн бұрын

Thank you so much Yannic!! Amazing explanation for such a complicated topic!!!!

@loukasa Ай бұрын

Great explanation Yannic

@simonstrandgaard5503 Ай бұрын

Great explanations.

@JTMoustache Ай бұрын

Damn.. that data probability path formalism is awesome.

@user-xk6rg7nh8y Ай бұрын

awesome so interesting !! it is really helpful :)) Thanks !!

@ljh0412 Ай бұрын

I was waiting for this. Thank you Yannic. Hopely you also check a paper Bespoke solver, which is implemented to speed up flow matching in AudioBox from meta.

@kev2582 Ай бұрын

Great walkthrough as always. This paper shines with its abstraction/generalization and mathematical rigor. What is missing is qualitative difference between the diffusion probability versus the OT approaches. Since this paper aged a bit, it would be interesting to look up where the authors are now. My hunch is straight line path finding will be worse qualitatively with image generation compared to diffusion models.

@sebastianp4023 28 күн бұрын

Question: Do you have a video/opinion on gMLPs from the paper "Pay Attention to MLPs" Liu et al. 2021?

@OperationDarkside Ай бұрын

Would a sand desert dune and wind analog work to visualize the probability density flow and the vector field? The grains of sand are the probability in one point, the dunes are the distribution density in 2D and the wind is the vector field.

@kaikapioka9711 Ай бұрын

Thx bud!

@IsraelMendoza-OOOOOO Ай бұрын

God Bless You brother ❤

@DeepThinker193 Ай бұрын

Omg he's wearing a hoodie. Is he hacking?

@timeTegus Ай бұрын

yes

@novantha1 Ай бұрын

Wait, so the essence of this paper is that we can define a source "Gaussian distribution" and translate that into a target Gaussian distribution based on a learned vector field which indicates a direction of flow, essentially. Notably, this is...Maybe not a deterministic process, but certainly is a finite one, in contrast to traditional diffusion denoising. But...How do we...Encode images in our dataset as a Gaussian distribution? How do we get the source distribution? Is it just noise "tokenized" as a Gaussian distribution? Is it a constant? Is it conditional on the prompt, like a latent LLM embedding (this last one would be wild, I would imagine it would be more effective for the LLM embedding to condition the target distribution but I digress). I feel like I do understand the process here, but I have no idea how I'd go about implementing this.

@drdca8263 Ай бұрын

I believe the images are the points in R^d , not the distributions. For each point in the training set (each image in the training set), I think they associate a probability distribution which is a Gaussian with that image as is mean, and a very small standard deviation , So, like, the distribution associated with a particular training image is “this starting image, plus a very small amount of noise”.

@u2b83 Ай бұрын

@@drdca8263 Karpathy made an offhand remark a few years ago that for high-dimension points (R^d) you effectively can recover the exact point just by knowing the distribution. The "concentration of measure" phenomenon suggests that in high-dimensional spaces, points tend to be closer to the surface of a hypersphere than to its center. This implies that for a given distribution, many points will have similar distances from the mean of the distribution, making the space effectively "smaller" in some intuitive sense than one might expect. This phenomenon can sometimes allow for predictions or reconstructions of data points based on less information than would be necessary in lower dimensions.

@CalebCranney 9 күн бұрын

Here's a video that I thought did an excellent job explaining the concept of normalizing flow from the coding perspective: kzbin.info/www/bejne/r6m5lKGrh9d-p7M. Then this one has some code that matches the diagrams in the Hans video: kzbin.info/www/bejne/mKaciI1mh6t6Zrc. I just spent a number of hours trying to grasp the concept of flow, and these were what made it start to click for me.

@timothy-ul9wp Ай бұрын

I wonder how “straight” the flow matching path during inference actually is, as the model doesn’t actually have information from previous steps I assume path will always point to the mean of all choices of x_0? (in Eq 23)

@TheRohr Ай бұрын

Thanks for the video! Two open questions: (1) We still need lots of data to get a good estimate of the probability distribution, right? How much should we expect and how should the dataset look like? Which is related to (2) What is actually meant with a data point or sample here? I understand for diffusion we have an image that becomes noisy. But what would be the 2-d gaussian for an RGB-image? Or is a sample here something different than an image?

@fireinthehole2272 Ай бұрын

Hi Kilcher, could you do "ReFT: Representation Finetuning for Language Models" it's really interesting.

@mikaellindhe Ай бұрын

"Hey why don't you just go toward the target" seems like a reasonable optimization

@punkdigerati Ай бұрын

Like Atz and Jewel Kilcher?

@LouisChiaki Ай бұрын

Hmm... What is their choice of simga_min? Is the end conclusion simply that we should down scale the noise by (1 - sigma_min)?

@jabowery Ай бұрын

UNCLE TED!!!

@TiagoTiagoT Ай бұрын

Are they basically using the butterfly effect to disturb a standardized gaussian distribution into the desired result?

@SofieSimp 28 күн бұрын

Do you have a record for your Stable Diffusion 3 presentation?

@eriglac Күн бұрын

i’d like to join the saturday discussions. where do i find that info?

@SouravMazumdar-ki7vv Ай бұрын

Can someone say which approch is begin discussed in 5:20

@vangos154 Ай бұрын

One of the disadvantages of flow-based models is they require reversible layers, and thus they limit the DNN architectures that can be used. Isn't that a problem anymore?

@xandermasotto7541 17 күн бұрын

continuous normalizing flows are always invertable. It's just integrating an ODE forwards vs backwards

@andylo8149 Ай бұрын

Given that flow matching is completely deterministic I don't see how it is a generalisation of diffusion models. Sure, the (deterministic) probability flow induced by a diffusion model is deterministic and is a special kind of flow matching but the training objective of a diffusion model is inherently stochastic. I think diffusion models and flow matching are different classes of models.

@SouravMazumdar-ki7vv Ай бұрын

Can someone say which approach is begin discussed here 5:20

@abhimanyu30hans Ай бұрын

For some reason I get "Unable to accept invite" from your discord invite link.

@EsotericAI Ай бұрын

Sorry but too many formulas in that paper ;P Anyway, I kind of lost track in the beginning what was going on, it started out nice with images and suddenly all was about points flowing. All going through my mind was ”What points are you talking about? Pixels?” Haha, guess I will have to watch this again when the state of my mind is more up for it :D

@ButtBandit9000 Ай бұрын

@Blooper1980 Ай бұрын

I wish I could understand this.

@tornikeonoprishvili6361 Ай бұрын

Damn the paper is math-dense. Watching this I feel like I'm being dragged along by a professional sprinter that I just can't keep up with.

@nevokrien95 Ай бұрын

Israel mentioned

@ScottzPlaylists Ай бұрын

@YannicKilcher What Hardware / Software are you using❓ It seems to be a tablet and pen, but the details would be interesting.. Would it be a Good Video on "Hot to Yannic a paper"❓ 😄 I'd watch it.. Keep up the quality content❗

@Python_Scott Ай бұрын

👍 I wondered the same... Make the video please. Or just answer here.

@AGIBreakout Ай бұрын

👍I'd watch that, and Thumbs it UP 👍 an odd number of times ❗

@NWONewsGod Ай бұрын

Me Too!!!!!!!

@NWONewsGod Ай бұрын

@@AGIBreakout Ha, Ha.... "odd number of times" would work too..!!

@NWONewsGod Ай бұрын

@@Python_Scott Something different and useful ! Yes, count me in. ☺

@mullachv Ай бұрын

Can't be over prepared for the solar eclipse

@ttul Ай бұрын

This one is going to take me several passes…

@MrNightLifeLover Ай бұрын

Published in 2022? Looks like I missed something :/

@JohnViguerie Ай бұрын

Very hand wavy

@BooleanDisorder Ай бұрын

Obvious Labrador Retriever! 01:33

@AndrewRafas Ай бұрын

At 20:53 what you talk and what you mark in the paper do not match. v() is the vector field, and not the other way around.

@YannicKilcher Ай бұрын

u() is the actual vector field, v() is the neural network learned vector field

@drdca8263 Ай бұрын

It seems to me like this kind of procedure should have many applications outside of images!... but I don’t know what? So, specifically, this should be applicable for when we want to learn a way to sample from a particular (but unknown) probability distribution. So, “generative AI” type stuff, I guess. Maybe quantizing like in language models might make this not as applicable to language models? Idk. What about world-model stuff? Or like, learning a policy? Hm, while that does involve selecting actions at random, those are often more discrete? Though, I guess not always. If one is doing a continuous control task thing, then I guess sampling from a continuous family of possible actions, may be the thing to do. Uh. Hm, so, if you started with a uniform distribution over a continuous family of actions, and wanted to evolve it towards a good distribution given the current scenario? Hm, no, I guess this probably isn’t especially applicable to that, because like, how do you obtain the samples from the target distribution? There must be *something* other than image generation, that this applies straightforwardly to...

@robmacl7 Ай бұрын

1: Probability path go Woom! 2: Waifus 3: profit

@drdca8263 Ай бұрын

Ugh, I wish “generating images of attractive women” wasn’t such a large fraction of the use of such models. I don’t think it is good for the person doing the viewing. Beetles and broken beer bottles, and all that.

@wolpumba4099 Ай бұрын

*Abstract* This video delves into the technical aspects of flow matching for generative models, contrasting it with traditional diffusion models. It explores the concept of morphing probability distributions from a source to a target, emphasizing the significance of conditional flows and the role of vector fields in guiding this transformation. The video delves into the mathematical underpinnings of flow matching, introducing key objects such as probability density paths and time-dependent vector fields. It demonstrates how these concepts are operationalized through the conditional flow matching objective, allowing for the training of neural networks to predict vector fields for data points. Finally, the video explores specific instances of flow matching, including its relationship to diffusion models and the advantages of the optimal transport path for efficient and robust sampling. *Summary* *Introduction to Flow Matching* * 0:00 - Introduction to flow matching for generative models and its application in image generation, specifically text-to-image tasks. * 1:06 - Comparison of flow matching with traditional diffusion-based models used in image generation. * 2:29 - Explanation of the diffusion process as a multi-step process of image generation involving the gradual denoising of random noise to produce a target image. * 5:46 - Introduction to flow matching as a generalization of the diffusion process, where the focus shifts from defining a fixed noising process to directly learning the morphing of a source distribution into a target distribution. *Mathematical Framework* * 6:04 - Illustration of morphing a simple Gaussian distribution into a data distribution, highlighting the challenge of the unknown target distribution and the use of Gaussian mixture models as an approximation. * 10:52 - Introduction of the concept of a probability density path as a time-dependent function that defines the probability density at a given point in data space and time. * 13:41 - Explanation of the time-dependent vector field, denoted as V, which determines the direction and speed of movement for each point in the data space to achieve the desired distribution transformation. * 17:54 - Demonstration of how the flow, representing the path of each point along the vector field over time, is determined by the vector field and the initial starting point. *Learning the Flow* * 19:26 - Explanation of how the vector field is set to generate the probability density path by ensuring its flow satisfies a specific equation. * 20:31 - Introduction of the concept of regressing the flow, which involves training a neural network to predict the vector field for each given position and time. * 21:56 - Highlighting the ability to define probability density paths and vector fields in terms of individual samples, enabling the construction of conditional probability paths based on specific data points. * 26:16 - Demonstration of how marginalizing over conditional vector fields, weighted appropriately, can yield a total vector field that guides the transformation of the entire source distribution to the target distribution. *Conditional Flow Matching* * 29:40 - Acknowledging the intractability of directly computing the marginal probability path and vector field, leading to the introduction of the conditional flow matching objective. * 30:48 - Explanation of conditional flow matching, where flow matching is performed on individual samples by sampling a target data point and a corresponding source data point, and then regressing on the vector field associated with that specific sample path. * 33:30 - Introduction of the choice to construct probability paths as a series of normal distributions, with time-dependent mean and standard deviation functions, allowing for interpolation between the source and target distributions. *Optimal Transport and Diffusion Paths* * 38:43 - Exploration of special instances of Gaussian conditional probability paths, including the recovery of the diffusion objective by selecting specific mean and standard deviation functions. * 41:21 - Introduction of the optimal transport path, which involves a straight-line movement between the source and target samples, contrasting it with the curvy paths characteristic of diffusion models. * 44:08 - Visual comparison of the vector fields and sampling trajectories for diffusion and optimal transport paths, highlighting the efficiency and robustness of the optimal transport approach. *Conclusion* * 46:48 - Recap of the key differences between flow matching and diffusion models, emphasizing the flexibility and efficiency of flow matching in learning probability distribution transformations. * 47:56 - Reiteration of the process of using a learned vector field to move samples from the source distribution to the target distribution, achieving the desired transformation. * 53:37 - Explanation of how the knowledge about the data set is incorporated into the vector field predictor, enabling it to guide the flow of the entire source distribution to the target distribution. i used gemini 1.5 pro Token count 12,628 / 1,048,576

@eitanporat9892 Ай бұрын

I feel like this paper is a very convoluted and long-winded way of saying “move in straight lines” the mathematical part is obvious and not very interesting. Your explanation was great - I just dislike when people write math for the sake of writing math in ML paper.