Expectation Maximization Algorithm | Intuition & General Derivation

  Рет қаралды 8,070

Machine Learning & Simulation

Machine Learning & Simulation

Күн бұрын

Пікірлер: 33
@MachineLearningSimulation
@MachineLearningSimulation 3 жыл бұрын
Error at 13:20 : It is a lower bound, not an upper bound. Maximizing an upper bound is not meaningful. See also the comment of @Flemming for more details.
@user-or7ji5hv8y
@user-or7ji5hv8y 3 жыл бұрын
Wow, I am still amazed how EM works. It’s really brilliant probability.
@MachineLearningSimulation
@MachineLearningSimulation 3 жыл бұрын
Yes, thanks for sharing my enthusiasm for it :D I do think Probability Theory, EM Algorithm and also Math in general is extremely beautiful, if done correctly (which is my goal for the channel). There is a lot more you can do with the EM like Hidden Markov Models and there are also more things to consider like avoiding local minima by sieving, which we will be doing in a future video. Stay tuned ;)
@mohammadjoshaghani5388
@mohammadjoshaghani5388 2 жыл бұрын
@@MachineLearningSimulation Really Beautiful and Cool ! Tnx for sharing.
@mohammadjoshaghani5388
@mohammadjoshaghani5388 2 жыл бұрын
@@MachineLearningSimulation Really Beautiful and Cool ! Tnx for sharing.
@flemming9262
@flemming9262 3 жыл бұрын
very well produced video! But log is concave so you flipped the sign/direction of Jensen's inequality. In other words, you are finding a lower bound on the log-likelihood. BTW that is in fact arguably desirable as maximizing a lower bound is informative, maximizing an upper bound is not. Maybe that should be clarified for ppl learning this stuff.
@P4el1co
@P4el1co 3 жыл бұрын
You are correct, this should be noted in the video. Maximazing an upper bound makes no sense, because it doesn't guarantee you any optimal result. You need a tight lower bound that toches your function.
@MachineLearningSimulation
@MachineLearningSimulation 3 жыл бұрын
Thanks a lot for the feedback :) You are absolutely right, it has to be a lower bound. As you correctly note, maximizing an upper bound is not meaningful. Thanks a lot for pointing it out :) P.S.: Sorry for the late reply, I was a little busy the last days.
@algorithmo134
@algorithmo134 3 жыл бұрын
The sign should be log >=
@algorithmo134
@algorithmo134 3 жыл бұрын
@@MachineLearningSimulation Sign should be >=
@algorithmo134
@algorithmo134 3 жыл бұрын
The sign of Concave function for Jensen inequality is reversed
@todianmishtaku6249
@todianmishtaku6249 2 жыл бұрын
Amazing lovely video. Great job. I feel a bit unlucky that I have not come across your channel earlier.
@MachineLearningSimulation
@MachineLearningSimulation 2 жыл бұрын
Nice, thanks so much 😊 I'm happy to help and it's amazing to hear that my approach to teaching is helpful.
@pravingaikwad1337
@pravingaikwad1337 Жыл бұрын
3:50, Theta bar has two components right? (you said 3 components)
@MachineLearningSimulation
@MachineLearningSimulation Жыл бұрын
Great question! :) I think it depends on how you count parameters. We have one parameter for the "thoughts" distribution and then there are two parameters for the "words", based on the value of the thoughts distribution. Given a good thought, the probability of a good word is different from a bad thought. If you only look at the node-level of the directed graphical model, then there are two parameters: one scalar parameter, and then a two-dimensional vector parameters. If you concatenated the scalar and the two-dimensional vector into a three-dimensional vector (which is what I wanted to express at the time stamp you mentioned; I think it was a bit hidden), then there are three scalar parameters in total. Hope that helps :). Let me know if it is still unclear.
@lucavisconti1872
@lucavisconti1872 2 жыл бұрын
The video explains formally and in a very clear way the algorithm. My question is, what if we have a mix of missing data, i.e. some missing Words and some missing Thoughts?
@MachineLearningSimulation
@MachineLearningSimulation 2 жыл бұрын
Hi, thanks a lot for the comment and the interesting question :). I do not have a good idea how one would handle such cases. The EM algorithm, presented here, is particularly powerful for mixture models. What you describe would be a more general probabilistic model. This is unfortunately beyond my knowledge, so I cannot give you a good answer.
@imvijay1166
@imvijay1166 2 жыл бұрын
Hi felix, this is a nice video on em thanks for that, I question, i dont clearly understand why we have to take only posterior as q(T). why not something else. why posterior only suits q(t)
@MachineLearningSimulation
@MachineLearningSimulation 2 жыл бұрын
Hi, thanks for the comment and the kind words :). I hope I can answer your question sufficiently, it has been some time since I uploaded the video. In a more general setting, the Expectation Maximization algorithm is a tool to train latent-variable models. More concretely, in can only train those latent variable models for which the posterior is tractable. As such, it is a special case of Variational Inference (see also my video on VI: kzbin.info/www/bejne/fqm0ameCbdNjrLc ). Latent-variable models can be of various kinds. Here in the video, I present a super simplistic Bernoulli-Bernoulli model. In this model, the latent variable is just a binary scalar. However, it can have arbitrary shapes. Commonly, what you see is that the observed variable is a (high-dimensional) image (e.g. 28x28 pixels with 3 color channels is already more than 1000-dimensional) and the latent space might be 30 dimensional. Though, for those kinds of models, the EM algorithm might only work if you are interested in clustering with GMMs. Hence, maybe as a first answer to your question, the T (or more generally Z) as presented in the video can also be high-dimensional instead of just a binary scalar. If your question was more related to why it is just one T and one W or why would do not have a more complicated graphical model: This is just a modelling assumption. Typically speaking, you can find learning methods for many more complicated graphs, but those are out of the scope for this video. Hope that helped. :) Let me know if sth is still unclear.
@imvijay1166
@imvijay1166 2 жыл бұрын
@@MachineLearningSimulation Thank you!
@MachineLearningSimulation
@MachineLearningSimulation 2 жыл бұрын
@@imvijay1166 You're welcome! :)
@ryanyu512
@ryanyu512 2 жыл бұрын
The video is in high quality. It is highly appreciated if the summation symbol is written as just Σ. It is a bit confusing when I look at your written summation symbol. I though that it is summing 1 to 1 (haha). But, this confusion does not degrade your video quality. Thanks
@MachineLearningSimulation
@MachineLearningSimulation 2 жыл бұрын
Thanks a lot for the kind words! :) I can understand that the notation can be a bit cluttering from time to time. Back when I created this video, I thought it can be helpful to not leave out certain parts or shorten the notation. It's always a compromise :D. I will take your point into consideration for future videos, thanks :)
@ananthakrishnank3208
@ananthakrishnank3208 8 ай бұрын
10:27 I don't think it is right. Summation is for the whole (q * p/q), and we cannot conveniently apply summation to just q alone.
@kartikkamboj295
@kartikkamboj295 2 жыл бұрын
Hi there ! Can you pls explain why do we have a parameter vector for 'words' but just a single parameter for 'thoughts' ? Thanks in advance:
@MachineLearningSimulation
@MachineLearningSimulation 2 жыл бұрын
Hi, this is since the Bernoulli distribution for the words is conditioned on the thoughts. Depending on the thought, we have a different distribution for the word. :) Take a look at my video on Ancestral Sampling, I think that should clear things up. kzbin.info/www/bejne/e5nSiqZtbN2onLM
@EngRiadAlmadani
@EngRiadAlmadani Жыл бұрын
the only one made me understand this evil (E(P/q)= Σq* P/q)) thank you
@MachineLearningSimulation
@MachineLearningSimulation Жыл бұрын
You're welcome :) I am honored, thanks for the kind words.
@user-or7ji5hv8y
@user-or7ji5hv8y 3 жыл бұрын
I think I see why theta_k is associated with responsibilities, instead of theta_k+1.
@MachineLearningSimulation
@MachineLearningSimulation 3 жыл бұрын
Nice! Good job. I also think that this is one of my big learnings for the EM-Algorithm, the fact that we somehow have to solve a chicken-egg problem which calls for an iterative algorithm. Let me know if something was not clear enough :)
Expectation Maximization for the Gaussian Mixture Model | Full Derivation
44:23
Machine Learning & Simulation
Рет қаралды 5 М.
The EM Algorithm Clearly Explained (Expectation-Maximization Algorithm)
30:49
Learn Statistics with Brian
Рет қаралды 8 М.
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН
Cat mode and a glass of water #family #humor #fun
00:22
Kotiki_Z
Рет қаралды 42 МЛН
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 36 МЛН
The challenges in Variational Inference (+ visualization)
15:34
Machine Learning & Simulation
Рет қаралды 12 М.
Gaussian Mixture Model | Intuition & Introduction | TensorFlow Probability
17:43
Machine Learning & Simulation
Рет қаралды 5 М.
Mean Field Approach for Variational Inference | Intuition & General Derivation
25:40
Machine Learning & Simulation
Рет қаралды 10 М.
Clustering (4): Gaussian Mixture Models and EM
17:11
Alexander Ihler
Рет қаралды 290 М.
Hidden Markov Models 12: the Baum-Welch algorithm
27:02
EM Algorithm : Data Science Concepts
24:08
ritvikmath
Рет қаралды 78 М.
Expectation Maximization: how it works
10:39
Victor Lavrenko
Рет қаралды 283 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,2 МЛН
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН