Expectation Maximization for the Gaussian Mixture Model

Expectation Maximization for the Gaussian Mixture Model | Full Derivation

Рет қаралды 5,372

Machine Learning & Simulation

Күн бұрын

Пікірлер: 16

@agrawal.akash9702 10 ай бұрын

this is legitimately such a great explanation. thanks!

@MachineLearningSimulation 9 ай бұрын

You're very welcome! 😊

@MachineLearningSimulation 3 жыл бұрын

There was an error on the hand-written M-Step in the beginning of the video. For the first 3 minutes I was able to overlay it. Please refer to this as the correct expression for the M-Step.

@vslaykovsky 2 жыл бұрын

11:30 isn't it a lower bound of marginal log-likelihood instead?

@MachineLearningSimulation 2 жыл бұрын

Hey, are you referring to the Q function?

@patrickg.3602 Жыл бұрын

How are you sure that the zeropoints of Q are maxima? Couldnt it be a saddle point or minima as well? Or did you just skip the part where you have to check the second derivatives?

@MachineLearningSimulation Жыл бұрын

That's a great question! :) There was no specific check for this in the video. For theoretical investigations, you can consult the following paper: kzbin.info/www/bejne/jJuXk2eupM-Dg9k Pragmatically though, EM is often observed to behave robustly if well initialized. Since the runtime of an EM fit is usually quite fast (compared to other ML methods), it is reasonable to start from multiple initial conditions and select the model with the best score (or otherwise best properties). For instance, check out this follow-up Video on sieving: kzbin.info/www/bejne/jJuXk2eupM-Dg9k I can also recommend the documentation of scikit-learn: scikit-learn.org/stable/modules/mixture.html

@nickelandcopper5636 3 жыл бұрын

Is it possible to have the Gaussian distributions be latent and have the class be non-latent? Basically the continuous variable is latent now? What would this look like?

@MachineLearningSimulation 3 жыл бұрын

That's a valid question, but it is rather uncommon to do it in practice. At least, I haven't seen it. What would be your application? In my understanding, the EM algorithm works best for Mixture Distributions, which have a latent discrete part (the Categorical distribution) and a conditioned, observed continuous part. (which could also be a different distribution from the Normal/Gaussian, but commonly it is used for the Gaussian Mixture Model). However, generally speaking, you can build any DGM you like. It is just that many DGMs come with huge difficulties in training them. A more general way for training DGMs is by Variational Inference (kzbin.info/www/bejne/mqnah4CbgJ5rbrs ) or by MCMC (no video yet) which can also handle scenarios, the EM cannot do. In fact, the EM algorithm is identical to Variational Inference if we can analytically express the posterior, what we can for GMMs. But again, regarding your proposal, I do not think it would make a lot of sense to have the latent variable to be the leaf node in the DGM. How I understand latent variables is, that you use them to model an unobserved cause of something, not an unobserved effect.

@bartosz5592 3 жыл бұрын

Hi, what about EM algorithm for one bivariate Gaussian with missing values

@MachineLearningSimulation 3 жыл бұрын

Hey, I answered your similar comment on the other video. Was it referring to the same?

@bartosz5592 3 жыл бұрын

@@MachineLearningSimulation yes thank you

@sulasrisuddin294 Жыл бұрын

How about syntax in R if we want applied in survival mixture model?

@MachineLearningSimulation Жыл бұрын

Thanks for the comment. :) Unfortunately, I am not familiar with survival mixturel models.

@user-or7ji5hv8y 3 жыл бұрын

Just wondering. Could such EM approach work well in cases where X are high dimensional?

@MachineLearningSimulation 3 жыл бұрын

Yes, surely that of course depends on how "high-dimensional". But in a reasonable number of high dimensions (2 to 100-ish) you can use the EM for the Gaussian Mixture Model where the Gaussians are Multivariate. This introduces additional degrees of freedom, e.g. choosing full covariance or just diagonal etc. I will cover this in the future once I also introduced the Multivariate Normal in my other playlist. Stay tuned for that ;)