Optimal Transport and Information Geometry for Machine Learning and Data Science

Рет қаралды 12,356

Күн бұрын

Optimal transport and information geometry provide two distinct frameworks for studying the distance between probability measures. Although these are separate theories, there are many connections between them and they both have applications to data science and machine learning.
This video is adapted from a talk at the SIAM Conference on Mathematics of Data Science (MDS22).
At several points in the recording the main microphone cut out so I had to use back-up audio. Sorry for those audio hiccups.
0:00 Introduction
0:30 Introduction to Optimal Transport
7:08 Introduction to Information Geometry
12:21 Natural Gradients
13:12 Entropy Regularized Optimal Transport
16:38 Conclusion and Further Reading
References (in order of their appearance):
Khan, Gabriel, and Jun Zhang. "When optimal transport meets information geometry." Information Geometry (2022): 1-32.
arxiv.org/abs/2206.14791
Kantorovich, Leonid V. "On the translocation of masses." Journal of mathematical sciences 133, no. 4 (2006): 1381-1382.
Brenier, Yann. "Polar factorization and monotone rearrangement of vector‐valued functions." Communications on pure and applied mathematics 44, no. 4 (1991): 375-417.
Gangbo, Wilfrid, and Robert J. McCann. "The geometry of optimal transportation." Acta Mathematica 177, no. 2 (1996): 113-161.
Peyré, Gabriel, and Marco Cuturi. "Computational optimal transport: With applications to data science." Foundations and Trends® in Machine Learning 11, no. 5-6 (2019): 355-607.
arxiv.org/abs/1803.00567
Smith, Lewis. “A gentle introduction to information geometry” www.robots.ox.ac.uk/~lsgs/pos...
Nielsen, Frank. "An elementary introduction to information geometry." Entropy 22, no. 10 (2020): 1100.
arxiv.org/abs/1808.08271
Amari, Shun-ichi, and Hiroshi Nagaoka. Methods of information geometry. Vol. 191. American Mathematical Soc., 2000.
Santambrogio, Filippo. "Optimal transport for applied mathematicians." Birkäuser, NY 55, no. 58-63 (2015): 94.
Villani, Cédric. Optimal transport: old and new. Vol. 338. Berlin: springer, 2009.
cedricvillani.org/sites/dev/f...
If you are interested in the fugue excerpt at the start of the video, you can download a rough version of the sheet music here.
differentialgeometri.files.wo...
#MachineLearning #DataScience #InformationGeometry #OptimalTransport

Пікірлер: 21

@jackkinseth2936 Жыл бұрын

That room brings back memories...

@lexinwonderland5741 Жыл бұрын

Wow, the idea of information geometry and using probability spaces as topological spaces is something I can't believe I've never seen til now, this is amazing! I hope to see more lectures about it!!

@GabeKhan Жыл бұрын

Thanks! I’ll try to post another talk on information geometry in the not too distant future, but I’ve got a few other projects that are taking up my time first.

@asdf56790 2 ай бұрын

That's an amazing introduction! Thank you :)

@GabeKhan Ай бұрын

You're very welcome!

@NoNTr1v1aL Жыл бұрын

Absolutely amazing video! Already shared your video and paper to my Statistics sir who works in Information Geometry.

@GabeKhan Жыл бұрын

Thanks for the kind remark!

@naromsky 5 ай бұрын

Great presentation. Thanks.

@GabeKhan 4 ай бұрын

Glad you enjoyed it!

@geriskenderi 10 ай бұрын

Fantastic video, I always struggled with introductions to optimal transport since they often start by describing the problem in a purely mathematical way and directly move on to Wasserstein spaces, sinkhorn divergence ecc. I wanted to ask if you had any suggestions or tips for researchers of Learning methods (such as myself) who would like to move more toward researching mathematical modeling. How much of it is background and reading, for how much can one compensate by just learning and then focusing on the computational aspect? Thanks again for the video, great presentation

@GabeKhan 10 ай бұрын

Thanks for the comment and question! As a bit of a disclaimer, my background before learning OT was in differential geometry, so the resources that I found helpful might not be the best ones for you. My recommendation for learning optimal transport with the minimal amount of mathematical theory is to take a look at Computational Optimal Transport by Gabriel Peyré and Marco Cuturi. They made a ton of code available for numerical computations, which should be really helpful for doing numerical calculations involving OT. optimaltransport.github.io/resources/ In general, Gabriel Peyré does a lot of great computational work and is very mindful about making his code available. As far as mathematical modeling goes, I think it requires a combination of theoretic and computational skills. You definitely don’t need to become an expert on the theoretic side of things, but you want a solid enough foundation to make sure that your computations make sense. However, doing lots of calculations is a great way learn a subject and can make the theory much more intuitive.

@arnold-pdev 4 ай бұрын

Nice, but I disagree with the characrerization of the Monge problem as difficult relative to the Kantorovich problem. The issue is existence, not difficulty.

@GabeKhan 4 ай бұрын

Thanks for the comment! For fairly general costs and measures, Brenier's theorem (or Gangbo-McCann's version) does tell us that the solution to Kantorovich problem will be a solution to the Monge problem. In that sense, you are absolutely correct that these problems are equivalent in difficulty. However, once you formulate the Kantorovich problem, it's fairly straightforward to show that a solution exists. On the other hand, it took over 200 years after Monge's original work to find sufficient conditions for his problem to have solutions, and these results rely on background from PDEs and convex analysis. From a more practical perspective, Kantorovich's work shows that there is a dual problem to OT which can be solved by linear programming. Most approaches to computing optimal transport basically find ways to approximate this problem, and often do so by adding an extra entropy term to make the solution more diffuse so that it is easer to "find." In other words, something like the Sinkhorn algorithm "regularizes" the OT problem so that the solution is not induced by a Monge map, and instead has larger support. On the other hand, solving the Monge problem comes down to numerically solving a hairy PDE, and it's an active area of research to develop algorithms for this. In the past few years, there has been some progress in this direction, but there is still a lot to be done.

@forheuristiclifeksh7836 6 ай бұрын

13:19

@leojack1225 10 ай бұрын

nice, but I don't see the utility in ML and data science from the video.

@GabeKhan 10 ай бұрын

Fair point; I definitely stayed on the theoretical side since the talk was only 20 minutes. If you are interested in a much deeper dive into using OT to study data, I highly recommend Peyré and Cuturi's "Computational Optimal Transport." Gabriel Peyré also has a lot of code on his website for numeric computations.

@leojack1225 10 ай бұрын

@@GabeKhan having worked for quite few years in math , mainly stoch. processes and probability, I am quite skeptical about what mathematicians say being useful. Mainly they want to prove theorems, so they form niche communities about taylored techniques , looking for where they can apply them. At least nowdays, in my experience I never met in academy a mathematician sitting there in front a genuine real world new problem and trying to say something about it. The only expection is a guy outised academy. But I know that F. Nielsen is a mathematician at Sony. And he has also a book that is interesting me a lot entitled "HPC with MPI for Data Science", since I just started a new career in AI, HPC and data science. So now I am very careful about what mathematicians claim important or useful. I know theri behavior, independently from their fame.

@GabeKhan 10 ай бұрын

There’s a lot of truth to what you say. As a mathematician, my paycheck depends on me proving theorems. Real world problems often aren’t amenable to mathematical proofs and instead are solved by other methods. So although I try to think about problems which come from statistics/finance/physics, it’s definitely the case that I’m looking for interesting math in those questions rather than trying to use math to answer questions from other fields. As such, when asked to give talks for more practical areas like ML, I generally don’t discuss my own research. But even with that disclaimer, I still highly recommend that book. Marco Cuturi works for Apple and a lot of the collaborations in that group involve people in industry as well as those who aren’t mathematicians by background. And from their work on imaging/DS, I do think that their work is useful in the actual sense (not just the mathematical sense).

@leojack1225 10 ай бұрын

@@GabeKhan thanks, the good thing with mathematicians is that you can discuss rationally. With physicists is not possible, each of them has their own religion and you can not explain them what does not make sense of what they say. It was for this that I got my phd in math ,even if originally I was a physicist.

@leojack1225 10 ай бұрын

@@GabeKhan definitely I will go trough the book, which I was not aware of it.