Deriving the (univariate) Normal/Gaussian from a Maximum Entropy Principle

Рет қаралды 2,483

Күн бұрын

Пікірлер: 31

@watching4410 19 күн бұрын

Not sure if this is all or correct but here is a list of every concept thing done: Derivative, partial derivative Integration; zero from symmetry(odd/even) Lagrange Complete the square with, polynomial format to get a,b,c, to write as vertex notation of line Substitution System of linear equations: I think 4 equations 3 unknowns . Other more common: power, natural e, factoring, basically algebra . Stats: mean, standard deviation, variance, z score @MachineLearningSimulation what u think of this list? Wanna add anything?

@watching4410 19 күн бұрын

Would like to see how this relates to entropy something about gaussian maximizes entropy. Anything that minimizes it?

@Dhadheechi 4 ай бұрын

Thank you so much for this explanation! I had no problem with taking the derivatives of the functional, and getting ln(p) as a function of lambdas. The real novel ideas (at least for me) are writing ln(p) in terms of complete squares, carefully choosing which order you want to substitute it in the three constraints (first in the mean constraint, then the normalisation constraint, then the variance constraint), and also making simple substitutions like y = x - mu. They seem trivial but I was banging my head before watching this video about how to do these substitutions. Your video was really useful in introducing these small and useful tricks in deriving the gaussian from the maximum entropy principle

@MachineLearningSimulation 4 ай бұрын

You're very welcome 😊 Thanks for this kind comment. I also remember when struggled with the derivation as I worked through Bishops PRML book. He skipped over quite some details and so I created this video with all the details :D. I'm glad you appreciated it. :)

@Dhadheechi 4 ай бұрын

@@MachineLearningSimulation Oh I see... I am working through Bishop's new Deep Learning book and it turns out that the first 2-3 chapters are the same in both books. I see that every time I have a difficulty in the chapter 3 of the new book (Standard distributions), and for every topic I have a difficulty with, like deriving the conditional and marginal for the multivariate distribution, deriving the MLE for the same etc, you have a video covering it! Really appreciate your work man.... reading Bishop is proving to be quite a challenge for me as an undergrad who just completed his first year. Gonna watch your Calculus of variations videos and your scientific python workshop next :)

@MachineLearningSimulation 4 ай бұрын

@@Dhadheechi You're welcome! :) Good luck on your learning journey and thanks again for the nice comments also under the other videos.

@shreyasjaiswal4739 3 жыл бұрын

21:12 We can explain it also as definite integral of odd function is 0. As p(y) is even and y is odd, so there product is odd.

@MachineLearningSimulation 3 жыл бұрын

Yes, that's of course correct :)

@Camptonweat Жыл бұрын

Note that this integral is just the mean of p(y). It turns out to be zero because y is centralised (noting that lambda_1 ends up being zero), so the reasoning in the video is a little circular. I think it sufficient to simply say that the limits must converge to zero to preserve normalisation, that way you don't need to make any leading assumptions about p(y) being symmetric.

@user-or7ji5hv8y 3 жыл бұрын

Didn’t realize functionals could be so useful. Thanks

@MachineLearningSimulation 3 жыл бұрын

Thanks too for your feedback. There are a lot of nice applications for them, not only in Probabilistic Machine Learning but also in the Finite Element Method for Structural Mechanics Problems.

@ttommy516 Жыл бұрын

Thanks so much for your detailed derivation! I suppose it might be better if a simple explanation about Maximum Entropy Principle was added to the video, which could better highlight the motivation.

@MachineLearningSimulation Жыл бұрын

You're very welcome :) Indeed, that could have been a great motivation. Maybe content for a future video ;)

@bmw009a3 Жыл бұрын

A wonderful video and math journey, thank you very much

@MachineLearningSimulation Жыл бұрын

Thank you 😊 I'm happy you enjoyed it. It was a topic that bugged me for a long time, and I was happy once I could conclude it in a video.

@user-or7ji5hv8y 3 жыл бұрын

The concept here seems really profound. Just trying to see the intuition why maximizing entropy gets us to the right pdf. Is there a way to see why that should work? I can see why with a coin, you maximize the entropy if you have the probability set to 0.5, which corresponds to having the least amount of information about the coin. Does maximizing entropy work with deriving other distribution as well, as long as you set appropriate constraints?

@MachineLearningSimulation 3 жыл бұрын

Regarding the last question: Yes, you can derive a lot of pdf/pmf from this Maximum Entropy principle given the right constraints. Take a look at this table on Wikipedia: en.wikipedia.org/wiki/Maximum_entropy_probability_distribution#Other_examples Regarding the first, I must admit, that I don't have a good intuition except from some mathematical considerations. Due to the differentials entropy as the integral over - log(p) * p I think it makes sense that the distribution will somehow contain exp, but that's not just for the Gaussian/Normal. I will think about it again and might come back to the thread if I can come up with a better intuition. Maybe someone else has something nice?

@mpost909 Жыл бұрын

Love the videos! Keep it up! Do you by any chance get your video ideas partly from Bishops Pattern Recognition and Machine Learning?

@MachineLearningSimulation Жыл бұрын

Thanks a lot 😊 Yes, the videos on probabilistic machine learning are inspires by bishop's book. This very video is essentially the solution to one of the exercises, actually one that plagued me for over two years until I finally figured it out. Hence, I thought it'd be cool to have a detailed video about it 😉 glad, you enjoyed it.

@kennettallgren640 2 жыл бұрын

Thank you for your nice derivation

@MachineLearningSimulation 2 жыл бұрын

You're welcome 😊

@simonesabbadini6648 10 ай бұрын

Thank you Very Very Very Much.

@MachineLearningSimulation 10 ай бұрын

You're very welcome! :)

@noot_2 2 жыл бұрын

Where did the minus sign from the entropy expression go to? I think that turns the signs opposite on a couple of expressions. Thank you nonetheless, I was struggling understanding this

@MachineLearningSimulation 2 жыл бұрын

You're welcome :) Do you have a timestamp when it got lost? It's been some time since I created the video.

@noot_2 2 жыл бұрын

@@MachineLearningSimulation Nevermind, as I was writing this I noticed that I completely missed the part of the argMin - H(p). Thank you so much for your time

@MachineLearningSimulation 2 жыл бұрын

@@noot_2 You're welcome :)

@mirandasuryaprakash3269 2 жыл бұрын

isnt that λ2*µsquare at 11:49.. do we get the same result solving this

@MachineLearningSimulation 2 жыл бұрын

Hi, it's been some time since I uploaded the video. I think at 11:42 I display the note that it should be different, and I correct it later in the video. It also seems to be correct on the Github pdf: raw.githubusercontent.com/Ceyron/machine-learning-and-simulation/main/english/essential_pmf_pdf/univariate_normal_maximum_entropy_derivation.pdf Or do you mean something different? :) Please let me know, I would be happy to help (and fix the mistake).

@djbuffonage9500 11 ай бұрын

Can't the gaussian be the minima ?

@MachineLearningSimulation 10 ай бұрын

Valid question, indeed. We only showed that it is an extreme point under the prescribed condition. You can find an argument why this is the maximum in chapter 1.6 of Chris Bishop's "Pattern Recognition and Machine Learning": www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf