The intuition behind the Hamiltonian Monte Carlo algorithm

Tap to unmute

The intuition behind the Hamiltonian Monte Carlo algorithm

Рет қаралды 60,443

Күн бұрын

Explains the physical analogy that underpins the Hamiltonian Monte Carlo (HMC) algorithm. It then goes onto explain that HMC can be viewed as a specific type of Metropolis-Hastings sampler.
The paper by Michael Betancourt I mention is "A Conceptual Introduction to Hamiltonian Monte Carlo", 2018, ArXiv, and is available here: arxiv.org/pdf/.... The Radford Neal paper is, "MCMC using Hamiltonian dynamics", Chapter 5 in the "Handbook of Markov Chain Monte Carlo" by Brooks et al., 2011.
This video is part of a lecture course which closely follows the material covered in the book, "A Student's Guide to Bayesian Statistics", published by Sage, which is available to order on Amazon here: www.amazon.co....
For more information on all things Bayesian, have a look at: ben-lambert.co.... The playlist for the lecture course is here: • A Student's Guide to B...

Пікірлер: 47

@NeverHadMakingsOfAVarsityAthle Күн бұрын

This really is a fantastic video! Reading a few papers and book chapters I struggled to understand some of the details, but your explanations made it all click for me. It's educators like you that make learning such concepts so much more fun and rewarding! Thank you!

@GabeNicholson 3 жыл бұрын

For those who don't know the research and esoteric academic work on this stuff. Just imagine how difficult this would be to learn from academic documents. Ben has saved us hours upon hours of work and frustration.

@TanThongtan 4 жыл бұрын

This is incredible. I've struggled reading through both papers mentioned but finally got an intuitive idea of how HMC works 4 mins into this video.

@anamariatiradogonzalez 7 ай бұрын

Don’t cry, dear. It’s easy. Lol

@vinayramasesh2959 6 жыл бұрын

This was an excellent video, thanks! I hadn't understood until now that the only reason that proposals in HMC would ever be rejected is due to a slight increase in the value of H due to the fact that we're only approximately integrating the Equations of Motion.

@Zorothustra 6 жыл бұрын

That's a very intuitive explanation of the HMC sampler, great job! Thanks for sharing.

@SpartacanUsuals 6 жыл бұрын

Thank you!

@haowenwu5046 6 жыл бұрын

Ben Lambert no thank you! This video is really clear and helpful!

@richasrivastava999 2 жыл бұрын

Thanks for this video. I was trying a lot to understand physical analogy of HMC. This video really helped.

@Rudolf-ul1zh 2 ай бұрын

I am actually a bit confused about the explanation, specifically about the relationship to statistical mechanics. With the exception of the initialization of the momentum, there are no random fluctuations in the dynamics (as opposed to Ito processes). The Boltzman distribution result holds specifically for Ito processes (can be derived from the Fokker-Planck equation), but not for such deterministic dynamics. I would be happy about a clarification.

@ana_log_y 4 жыл бұрын

Thank you for such a well prepared and explained material!

@thomasrobatsch2582 3 жыл бұрын

I highly appreciate the effort you use to create animations!

@leowatson1589 Жыл бұрын

Excellent Video! But one question. At 17:10, where does the normalizing constant 1/z come from? the integral over the support of the normal's pdf is 1 and the other terms on the RHS are simply constant w.r.t the variable of integration. Is the LHS not a valid pdf at this point? Hence why we obtain the normalizing constant

@kunalarekar3405 2 ай бұрын

@SpartacanUsuals - Why wouldn't Gibbs sampling work in this scenario? Since we are sampling from a multidimensional probability space, why not to just use the Gibbs sampling? How is HMC different from Gibbs sampling? I am having a tough time understanding this, would be great if you could answer this. By the way, great explanation. It did help me understand and clarify the basic concepts of HMC.

@matakos22 3 жыл бұрын

Thank you for this, it's pure gold. Highly appreciated.

@peymantehrani7810 5 жыл бұрын

Thanks for your wonderful video! Here are my questions: first I didn't undrestand why changing M* to -M* , makes the conditional probability to change from 0 to 1? and also why the path from (theta,M) to (theta*,M*) is determinsitic?

@tboutelier 5 жыл бұрын

The path from (theta,M) to (theta*,M*) is deterministic because it is the result of the integration of the particle path through the parameter space, according to the conservation of the total energy. This is a pure mechanic reasoning and nothing probabilistic is used for this step. Understanding the change of sign of M* is still a bit magic for me! ^_^

@mikolajwojnicki2169 4 жыл бұрын

The -M* is a clever trick. You essentially reverse the motion. Imagine throwing a ball with a certain momentum m1 from a certain position x1. After a time t it will end up in a place x2 with momentum m2. If you then throw the ball from x2 but with momentum -m2 after time t it will end up at x1. So from (x1, m1) you go to (x2, m2) but report the result as (x2, -m2) because from (x2, -m2) you get back to (x1, -m1) which you would report as (x1, m1). (You don't actually go back in the algorithm. It's just to make sure there is no bias in the proposed values. See Metropolis-Hastings)

@waltryley4025 2 жыл бұрын

Brilliant work, thank you for posting.

@rwtig 4 жыл бұрын

Great video, thanks Ben. Would have been good to have seen a visualisation showing the -m trick working, took me a long time to satisfy this in my mind (essentially thinking how the equation of motion applied to the proposal point and then flipping will gets us back to the original point). Also it is very interesting that in an ideal world, this essentially never rejects a proposal as the original point and proposal should have the same energy. Maybe a visualisation driving that point home would also have been useful. Once I felt I understood these concepts, the algorithm made sense to me.

@rwtig 4 жыл бұрын

Of course very important for the -m* trick that the joint distribution is independent for m and theta, so that we can be certain we have symmetry for paths at m and -m*.

@hanyingjiang6864 3 жыл бұрын

I have read like 100 blogs and none of them can states the things clearly like this video.

@YYHo-kw2bi Жыл бұрын

Thank you～ I am watching this video for preparation of my master program

@Peledziuko 5 жыл бұрын

Thank you for this video, it goes very nicely with the papers referenced. I was wondering if you have a video or could recommend one that goes into more depth on solving for the path of the particle? I am still struggling to fully understand what that means, otherwise I am clear on other steps :)

@MrPigno87 3 жыл бұрын

Thank you for this great explanation!

@nickbishop7315 2 жыл бұрын

So HMC almost always accepts (rather than rejects) proposals as long as the momentum term is set correctly initially?

@khan.saqibsarwar 7 ай бұрын

Thankyou for the great explanation.

@qazaqtatar 3 жыл бұрын

Excellent lecture!

@tomlindstrom6698 5 жыл бұрын

This presentation is fantastic. Is there any chance you'd be able to expand it to relate the sampling concepts to warning messages in stan and the necessary tweaking (tree depth, adaptive delta etc)?

@JaGWiREE 6 жыл бұрын

What was the intuition or reason for choosing a bimodal? Was it just to make the position and momentum graph easier to visualize both components as convergence occurs?

@SpartacanUsuals 6 жыл бұрын

Thanks for your comment. Yes, it was a somewhat arbitrary choice. It mainly chose this rather than a unimodal Gaussian so that the paths were more interesting (i.e. not just circles). Best, Ben

@NilavraPathak 6 жыл бұрын

This is awesome can you please do one for Variational Inference.

@SpartacanUsuals 6 жыл бұрын

Thanks! Yes, one is currently being planned. Cheers, Ben

@ErickChaplin 3 жыл бұрын

Hi i have a question. Could you send me the wolfram program that you presented here? I have to do a presentation to my phd and it would be useful!! Thank you so much

@chenxin4741 4 жыл бұрын

Very nice visualization.

@JuanIgnaciodeOyarbide 4 жыл бұрын

Thank you for the video. I just have a question. For the bimodal posterior, wouldn't you receive warnings from Stan regarding the chains performance? I think it was Betancourt saying that, in the presence of bimodality, HMC is not appropriate.

@mururaj4022 4 жыл бұрын

Amazing video to explain the intuition! Love it! I have a question though. Mathematically, I understand why we need to flip m to -m, i.e to make the proposal symmetric. But in effect, since we are going to throw away m as we are only interested in theta, would we still need to flip it? It doesn't affect the r value (since gaussian) or the next momentum value (since we are squaring it)

@mururaj4022 4 жыл бұрын

Ok I found the answer in Neal's paper. ... This negation need not be done in practice,..