The Karush-Kuhn-Tucker (KKT) Conditions and the Interior Point Method for Convex Optimization

Рет қаралды 135,478

Күн бұрын

Пікірлер: 330

@VisuallyExplained 2 жыл бұрын

Thank you to every one who watched the video and spotted a typo or correction. I will group them in this comment so new viewers can have easy access to them. - At 12:50 by "grad_f and grad_g are inversely proportional", I mean grad_f and grad_g are proportional to each other with a negative coefficients. - At 13:47, the correct feasibility equation for x is g(x) \le 0, and not g(x) \ge 0 as stated in the video. This typo goes away starting from 15:11

@artashesasoyan6272 2 жыл бұрын

I worked out the gradient by hand (17:00) and found out that our solutions don't match, you have an extra ' - ' (minus sign) in the argument of log. The equation should be : grad[ f(x) - t log( +g(x)) ] and not grad[ f(x) - t log( -g(x)) ]. The gradient of log(g(x)) is grad(g(x))/g(x), no minus sign.

@VisuallyExplained 2 жыл бұрын

@@artashesasoyan6272 Thanks for working this out. Your reasoning would be correct if g were positive, but in this case g

@Yu-ChenLiu Жыл бұрын

At 05:44, P(y) = max u*y subject to u≥0 results in "max{0,...,∞}=∞, if y>0" and "max{-∞,...,0}=0, if y≤0".

@HelloWorlds__JTS 7 ай бұрын

It's confusing after 6:46 when you say "u goes first", but that's actually when X "goes" first, etc. You do the same thing again at 9:48, where it is particularly misleading. Love your content! I hope you are somehow finding time to make more. The world could really use more videos giving concise explanations related to convex optimization. I'm particularly interested in manifold learning.

@fjturner123 4 ай бұрын

@@HelloWorlds__JTS I thought the same thing - at 9:48, it is not meant to mean that u is optimised after x? Otherwise I agree - the video series is really excellent - thank you so much for bringing out the visual intuition of this topic and saving us days of pouring though textbooks! It's fantastic.

@睡睡-d7l 2 жыл бұрын

20mins content better than my professor for half a semester. Thank you

@YuxuanHarry Жыл бұрын

The author must be a genius for making such a great video! Only a man with deep understanding of optimization can explain it virtually

@pejT47 3 жыл бұрын

The best video on Lagrangian method that I've ever seen! Great work, thank you!

@VisuallyExplained 3 жыл бұрын

Wow, thanks!

@ulissemini5492 3 жыл бұрын

you should submit this to 3b1b's summer of math exposition contest

@gtorres94 Жыл бұрын

I can't thank you enough for this video and all the content you produce. This has to become the standard for teaching mathematics in schools. It makes everything so much clear, learning becomes so much more efficient. People in education should look at this and reward people like you who innovate and outpeform any classic math teacher. Thank you once again.

@apaarsadhwani 3 жыл бұрын

Fantastic video! Please keep more coming, these are super-useful! I have actually worked through Boyd's book and the reason I still prefer this is, it's so much quick to refresh your memory with a short video like this. I worked through Boyd's book many years ago and barely remember much now (except that it was fun!).. I suddenly need to recall duality/IP as a quick reference, it's not practical to read that book (or even Boyd's slides). This video is just perfect for that. Another use case I see is, before you deep dive into a convex optimization book, watching this video will give you a great idea and intuition for what's coming next!

@VisuallyExplained 3 жыл бұрын

Thanks, I am glad you find them useful!

@l10n919 4 ай бұрын

This is an excellent explanation of this subject. Last year I took an Intro to Optimization course at my university and had to fight tooth and nail to get the grade I wanted. This trilogy has really increased my intuition on the subject and helped me better understand the more complex math behind KKT and other optimality math theorems which you omitted in this video. Thanks!

@meeseeks1489 Жыл бұрын

This magic! How can you represent such difficult concepts so beautifully! This is best youtube video ever

@aashishchauhan1989 3 жыл бұрын

It is mind blowing to see all these ideas visually. Keep it coming, thank you

@ElectronsSoul 3 жыл бұрын

Excellent job! This is a new subject for me and it felt really intuitive and interesting all the way through. I hope your channel get the exposure and success that this material deserves.

@govindchari4586 3 жыл бұрын

I have spent a great deal of time trying to understand this topic, and this video series is the single best resource I have ever come across.

@VisuallyExplained 3 жыл бұрын

Thanks. I am really glad it was helpful

@HadiAhmed7546 Жыл бұрын

one of the best series of vids posted on the internet

@lucadelaat8837 Жыл бұрын

What a fantastic series! Thank you so much! Keep up the beautiful work! This is the future of math education at work.

@darrenho3655 3 жыл бұрын

Amazing visuals with great explanations. I'm so grateful for channels with high quality content like this.

@VisuallyExplained 3 жыл бұрын

Much appreciated! I am glad you liked the channel.

@cowkeydev 2 жыл бұрын

Amazing video, could not understand this for the life of me but this helped tremendously. Videos like this must take a long time to make, but I feel that they will be used for generations. Thank you :)

@engiboye9893 3 жыл бұрын

I can't wait till this channel explodes and becomes very popular and I can say that I was here in the beginning. Thank you for the amazing content, keep it up!

@The1ceCube Жыл бұрын

Thank you so much for putting in all these hours to make a video like that. But it really helped me to understand the topic a lot better!

@shwetadwivedi3404 2 ай бұрын

So beautiful explanation, I was never this curious for studying this subject as much I'm now after watching this video 🙇‍♀️. Thank you !

@MintaoYE 2 жыл бұрын

The best explanation of the duality problem and KKT condition that I have seen! Thank you

@hyperduality2838 2 жыл бұрын

From a convergent, convex (lens) or syntropic perspective everything looks divergent, concave or entropic -- the 2nd law of thermodynamics. According to the 2nd law of thermodynamics all observers have a syntropic perspective. My syntropy is your entropy and your syntropy is my entropy -- duality! Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics. Homology (convergence, syntropy) is dual to co-homology (divergence, entropy). Teleological physics (syntropy) is dual to non teleological physics (entropy). Duality creates reality. "Always two there are" -- Yoda. Points are dual to lines -- the principle of duality in geometry.

@HiepNguyen-ud8qe 4 ай бұрын

I'm so amazed by your contribution and dedication in your video. I just want you to know that your series help me a lot. Thank you. I admire your work.

@lonewolfgaming5245 3 ай бұрын

Thank you. I have been searching all over for a video with proper explanation and a solved example.

@林泓均-q4j 2 жыл бұрын

Why I can only give one like to this video?! This video is awesome!!! Thanks for making it!

@shafiqreshid4288 3 жыл бұрын

Thankyou for this :) I don't think there is a better resource in internet for this topic:) Please keep the videos coming, I already binge watched all of your existing videos, Your way of storytelling is just amazing!

@VisuallyExplained 3 жыл бұрын

Thank you! Will do!

@ARROWS2030 Жыл бұрын

🎯 Key Takeaways for quick navigation: 00:00 🧠 *Convex optimization problems involve minimizing convex objective functions subject to constraints. Duality provides a useful perspective in solving such problems efficiently.* 01:27 🚢 *In a practical example of ship navigation, convex optimization is applied to minimize the objective function while considering constraints, turning it into a convex optimization problem.* 03:20 🔄 *Introducing a penalty function helps eliminate constraints, but choosing an appropriate penalty function is crucial. The "zero-infinity" penalty function is one such example.* 05:54 🔄 *Penalty functions can be approximated with linear penalties. The max of these linear penalties transforms the problem into a min-max problem, introducing the concept of dual problems.* 08:59 🔄 *Deriving the dual problem involves introducing Lagrangian multipliers, leading to a lower bound on the optimal value of the primal problem, establishing strong duality under certain assumptions.* 10:36 🎓 *The Karush-Kuhn-Tucker (KKT) conditions are essential in convex optimization, providing necessary conditions for optimality, with feasible solutions satisfying a set of equations and inequalities.* 14:29 ⚙️ *The KKT conditions reduce solving an optimization problem to solving equations and inequalities, facilitating the development of general-purpose optimization solvers.* 16:22 🔄 *The interior point method perturbs the KKT conditions to make them easier to solve, using a parameter "t" that is gradually reduced. It leverages two key insights to navigate the perturbed conditions efficiently.* 20:54 🔄 *The interior point method follows the "central path," gradually moving towards smaller "t" values until it reaches the solution. The central path stays within the interior of the feasible region, justifying the method's name.* Made with HARPA AI

@fikriansyahadzaka6647 2 жыл бұрын

Wow! great explanation. This is one topic that I find it intimidating when reading the book, but you explain it beautifully. Keep up the good work man!

@VisuallyExplained 2 жыл бұрын

It's always nice to hear that this was useful, thanks!!

@baby2k1 Жыл бұрын

thanks for the visualization, you made it so simpler for us to understand the problem and also understand what prof said in lecture

@anandsudhi7071 2 жыл бұрын

I must say, the insight that the visual approach provided just made it so intuitive. This is quite useful. Keep up the great work.

@VisuallyExplained 2 жыл бұрын

Awesome, thank you!

@nish6106 3 жыл бұрын

This video is just absoloutely amazing, for anyone just learning optimization. especially the KKT conditions its after 3 months that I have finally understood the actual intuition behind them

@kiaranr 3 жыл бұрын

These are like how I imagined math lectures would look in the future. Great work, instant sub.

@VisuallyExplained 3 жыл бұрын

Welcome aboard!

@jaimejaime9800 2 жыл бұрын

Amazing job! I think that in a lot of subjets, there are many encysted text-book explanations , with a bottom-top approach that overwhelms and trap newcomers and practicioners, making this knowledge a specialized tool. Channels like yours unblock that bottleneck, democratizing very useful insights and tools and speeding up progress, thanks!

@dididd Жыл бұрын

20min better than a whole 1hour lecture from my professor

@hakimarab2436 19 күн бұрын

what video, optimization from scratch. probably best course video ever had

@rs-ov6ie 9 ай бұрын

You are just amazing..Thank you so much

@farhanhyder6378 2 жыл бұрын

Great video, one of the best and most intuitive ones I have seen. I think you could have included a short discussion on what does it mean for the multiplier (u or v) to be binding.

@alirezaasghari4547 2 жыл бұрын

In fact, the virtual videos are incredible when it comes to learning new stuff, specially in math problems.

@lavanyapadala3119 2 жыл бұрын

the way you are presenting equations using animations its amazing sir. even a person with tiny knowledge in math can easily understand it.

@johngray6436 7 ай бұрын

I've finally known where the hell Lagrangian comes from Such a great video

@voxelsofsorrow 2 жыл бұрын

absolutely outstanding! thank you so much. as a discrete algorithm designer unexpectedly thrown into the trenches of solving a difficult bi-level mixed-integer linear optimization problem, this was very intuitive and much less scary than the Wikipedia article.

@VisuallyExplained 2 жыл бұрын

Yay, glad this was helpful!

@yahugh59 Жыл бұрын

Your insights and visualizations are immensely helpful!

@yatshunlee Ай бұрын

This is so amazing. I wish I discovered the video two years ago back then I was having my Convex Opt class lol

@1bvideo1 Жыл бұрын

Your explanation is genius! Thank you for taking the time to create such a beautiful explanation. You make learning fun.

@balbiyadsaad1607 10 ай бұрын

Second time I watch this video, fantastic content! Analogies are a piece of art.

@ayushjangid 3 жыл бұрын

That's the best video i have watched till now. Thanks a lot

@RajivSambasivan 3 жыл бұрын

Fantastic video!. You have really motivated the ideas so well and you have beautifully developed the intuition through the narrative. Kudos!

@shyambhagwat 2 жыл бұрын

Hello Bachir! , This is so amazing ! I can just say - god bless you !!! Best Duality explanation so far !!

@VisuallyExplained 2 жыл бұрын

Thanks a lot!!!

@hyperduality2838 2 жыл бұрын

@philippemaincon9702 2 жыл бұрын

Great presentation. You make it so simple...!

@jackvial5591 2 ай бұрын

Loved this series. Great intro to the topic. I’m one step closer to understanding G-FOLD

@marsag3118 3 жыл бұрын

Very nice series(and channel in general). I am a big fan of the work of Prof. Boyd, and this series makes the concept very intuitive and nicer to think about. Great work!

@VisuallyExplained 3 жыл бұрын

Thank you, I appreciate that!

@almerchant7595 2 жыл бұрын

Excellent video. I am trying to understand what you say at 5:45. " you consider all the u s" Doe this mean that we are selecting a different u for a given value of y, i.e. for y> 0 we select u=inf and for y

@VisuallyExplained 2 жыл бұрын

This is a fantastic question, and is at the heart of what duality is all about. Let me try to clarify. At 5:45, since min_x is outside of max_u, we are indeed seeing different values of u depending on the value of y. But you'll notice that when y=0, (i.e., when x is exactly on the circle, a property that the optimal x will satisfy), u can take any value it wants and it won't matter anyway since it is multiplied by 0 anyway. BUT, when you switch the order of the min and max, there is only one value of u that forces x to satisfy the constraint without penalizing it beyond that, and that value is 1/sqrt(2). Note that, if the optimal x were inside the circle, or in other terms, if the optimal y was strictly positive, then u would be forced to take the value zero. This is in fact one of the ways to test if a constraint is "tight" or not: you look at the corresponding dual variable, and check if it is equal to zero.

@almerchant7595 2 жыл бұрын

@@VisuallyExplained Thanks for the example. Would it be fair to say that since there exists an x inside the circle y < 0 ( i.e. x is a strictly feasible point) we can conclude strong duality exists (slater's constraint qualification) => "complementary slackness" can be assumed i.e. either u=0 or y=0(constraint is "active").

@cormackjackson9442 2 жыл бұрын

@@almerchant7595 Super helpful question and answer. But I feel like there is a typo in the authors response? Surely if the constraint was slack, i.e. x₁² + x₂² < 1, then y = x₁² + x₂² -1 < 0 I.e. y is strictly negative, not positive.

@arccos0160 2 жыл бұрын

elegant explanation! It should be recommended to whoever wants to learn optimization theory.

@jayantnema9610 2 жыл бұрын

this is the greatest thing I have ever seen! sooo good!!!!

@stlaurent26 2 жыл бұрын

Making a complex math concept simple ... well done!

@brahimerraji9332 2 жыл бұрын

At 16:54, I think there is a type, it's grad of (f(x) + t log (-g(x))) instead of minus

@jameschen2308 2 жыл бұрын

I was about to say

@VisuallyExplained 2 жыл бұрын

Thanks for watching carefully. I thought so too at the beginning, but then i realized that grad log(-g) is equal to g/grad_g, and not -g/grad_g.

@brahimerraji9332 2 жыл бұрын

@@VisuallyExplained Duh 😅😅

@behnamplays 3 жыл бұрын

I don’t comment much on yt, but these series are awesome! Thanks and keep us the good work!

@sudelal2144 11 ай бұрын

thank you for the amazing background music! It helps me sleep immediately.

@greatuser2560 3 жыл бұрын

thank you , this serie of videos helped me a lot to understand and deepen my knowledge of these concepts. keep up the great work

@VisuallyExplained 3 жыл бұрын

You're very welcome!

@Darkev77 3 жыл бұрын

At 13:48, why should the scalar "u" (Lagrange multiplier) be non-negative? Shouldn't it actually be negative? Fantastic video!

@VisuallyExplained 3 жыл бұрын

Thanks! “u” should be nonnegative because there is a minus sign in front of it. Overall, we want gradf and gradg to be inversely proportional to each other

@Darkev77 3 жыл бұрын

@@VisuallyExplained Hey thanks a lot for the clarification. Is it *necessary* for df and dg to be *inversely* proportional? I thought in Lagrange Multipliers, their gradients would point along the same direction (share tangent point) and hence proportional (though the multiplier can absorb the sign accordingly). Where's the flaw in my intuition? Also, in most literature "g_i(x)" is = 0 accordingly. Would having "g_i" >= 0 and "u"

@VisuallyExplained 3 жыл бұрын

@@Darkev77 Yes, it is necessary, otherwise you can move x a little bit in direction that makes both g and f decrease, so you make f even smaller without violating g(x) = 0, "min" + ">=" gives u

@Darkev77 3 жыл бұрын

@@VisuallyExplained Wow!!! Thank you so much, this is crystal clear. Do you happen to have a website where you post articles regarding optimization, because your explanation is really good (and I'm taking optimization RN and our professor is not so good :/). Thanks a lot for your time!

@VisuallyExplained 3 жыл бұрын

@@Darkev77 Thanks for the nice words, I am glad this was helpful :-) I don't have anything like that for now, but maybe in the future...

@process6996 3 жыл бұрын

Amazing intuition behind KKT!

@nguyen7272 Жыл бұрын

really amazing videos, visually and intuitively explanation are really important to me, thank you for doing these great videos

@hobbylper6264 5 ай бұрын

great video! thank you so much for explaining this topic in a very easy to understand way :)

@matthiashoffmann8814 3 жыл бұрын

Very nice video! I was looking forward to part 3 a lot. One question: You said the first KKT condition is x to be feasible, shouldn't g(x) then be

@VisuallyExplained 3 жыл бұрын

Thank you! And you are right, "g(x) >= 0" is a typo.

@user-krnujdp Жыл бұрын

Something I'm confused about, is that don't we only need to consider the active inequality constraints when determining a new search direction? So in your example at 11:40, wouldn't we just ignore g(x) since it is strictly satisfied, i.e. g(x) < 0. The local optimimum we seek might be on the edge of the problem domain anyway when g(x) = 0 or near by, so there is no reason to avoid the boundary by moving along -\del g if the constraint is still satisfied and that's the direction -\del f(x) tells us the best local direction to go in order to improve the objective. What am I missing? Thanks! Great video!

@VisuallyExplained Жыл бұрын

Maybe I can clear the confusion. The condition "grad_f(x) = -u*grad_g(x)" at 11:40 does not tell you which direction you should follow to search for the optimal x, it simply tells a condition that the optimal x should satisfy. If it happens that the optimal x is strictly inside the feasible region, then that corresponds to the special case 'u=0'. But your question is valid when we are searching for the optimal x and we consider new search directions, for example around 18:00 . Why do we even look at grad_g(x) if the current x is far from the boundary? Well, if you don't, and you greedily follow -grad_f(x), two things can happen: 1. You reach a point with zero gradient (grad_f(x)=0). In this case, you are done! 2. You reach the boundary of the feasible set (where g(x).= 0). What do you do then? Intuitively, you want to "stay on the boundary" and somehow "follow -grad_f(x)". It's not clear how to exactly do that, and it might be numerically challenging to do so. (but maybe still possible!) So instead, we choose to follow a combination of -grad_f(x) and grad_g(x), EVEN inside the the feasible set. But we give more weight to grad_g(x) as we approach the boundary (we do this by dividing grad_g(x) by g(x)). The beauty of the interior point method is that this "trade off" between objective function "f" and constraint "g" works beautifully in practice, and provably converges to the optimal solution.

@abhijeetkumar1177 Жыл бұрын

could anyone please explain how the first condition become: min(x1 + x2); at timee 1:11

@xiaowang4578 Жыл бұрын

Love the video! Student should really start from your video than the standard textbooks!!!!!!

@hyukppen 2 жыл бұрын

18:32 - I don't understand why Newton's method very slowly converges when t -> 0 . Because our objective function is : f-tlog(-g), even if the t is equal to zero, the gradient of f(x) remains. Could you explain about the slowness?

@VisuallyExplained 2 жыл бұрын

Very good question! You are right that if apply newton with t = 0, we would be minimizing f(x), but that's not really what we want, because we want to find an "x" that satisfies the constraint g(x)

@hyukppen 2 жыл бұрын

@@VisuallyExplained Thank you for your reply! Do you mean that Newton converges very slowly not because the gradient is too small, but because the gradient is rather too big? (jumping too far around a stationary point just like gradient descent method with large learning late)

@VisuallyExplained 2 жыл бұрын

@@hyukppen To be exact, since Newton is a second order method, what determines the speed of convergence is how fast the Hessian varies. For example, in the extreme case of quadratic functions, the hessian is constant and Newton converges in one step (no matter how big the gradient is). For our case, if t is too small, the hessian will vary too much near the boundary.

@hyukppen 2 жыл бұрын

@@VisuallyExplained aha! Based on your example, my simple IPM code (MATLAB) is as below: ------------------------------------------------ xk=[0;0]; tl=1; for l=1:50 tl1=0.8*tl; for k=1:5 x1=xk(1); x2=xk(2); t=tl1; g=[ 1+t*(2*x1)/(-x1^2-x2^2+1) ; 1+t*(2*x2)/(-x1^2-x2^2+1) ]; H=[t*(2*x1^2-2*x2^2+2)/(1-x1^2-x2^2)^2 t*4*x1*x2/(1-x1^2-x2^2)^2 ; ... t*4*x1*x2/(1-x1^2-x2^2)^2 t*(-2*x1^2+2*x2^2+2)/(1-x1^2-x2^2)^2]; xk1=xk-inv(H)*g; xk=xk1; end tl=tl1; end ------------------------------------------------ If I set initial t to be very small, inv(H) which is the inverse of the Hessian diverges while gradient "g" remains same as (1,1). I think this is the point you explained. Am I right?

@VisuallyExplained 2 жыл бұрын

@@hyukppen Correct! But even the gradient will diverge when you get really close to the boundary.

@theberkeleyboss 3 жыл бұрын

Great video. I've always wanted to see this topic visually explained. There's just one part I'm a bit confused about: In the dual problem, how do we know what value of "u" to try first? It sounds like we can take any value of "u", and then solve for the decision variables based on that value.

@VisuallyExplained 3 жыл бұрын

Thank you Andy for you for your comment! I am not sure I understand what you mean by "what value of "u" to try first?". Maybe I can be more helpful if you point me to the timestamp in the video that is confusing to you.

@theberkeleyboss 3 жыл бұрын

@@VisuallyExplained At 5:38, I'm wondering why are we considering all the values of U, and taking the maximum of the corresponding linear penalties? If we're finding the max, wouldn't the maximum of (u x y) always be infinity?

@VisuallyExplained 3 жыл бұрын

@@theberkeleyboss I see. Indeed, max u*y would be infinity, unless y = 0. (That's what I try to explain at 5:49). But in any case, the optimal "u" is + infinity anyway. At this point, you are right that it looks kind of silly that we consider all the values of "u" when we know that the optimal u is equal to infinity. The reason we do it however is that (i) there is no harm in do it as we are not changing the optimal value, and (ii) we gain a lot once we switch the order of the min and max. And note that once we switch the min and max, it is no longer optimal to take u=infinity (at 7:30 for example, the optimal u is sqrt(2)/2). Let me know if this doesn't make sense, and I can expand a bit more.

@theberkeleyboss 3 жыл бұрын

@@VisuallyExplained After watching this video about 10 times, I finally get primal/dual problem differences. This is a difficult and somewhat new topic for me, but I learn something new every time I watch it. I'm down to the last concept that's giving me an issue: Around 13:10, you said, "and you might wonder if this condition is also sufficient in the sense that if for some point x you succeed in finding a scalar u that makes this condition hold, then that makes x an optimal solution ... this implication is not quite true." Can you give a brief numeric example of when this would be true? I think I know what you mean, but I need to see an example. I'm guessing this probably relates to the complementary slackness condition.

@VisuallyExplained 3 жыл бұрын

@@theberkeleyboss I am glad this was helpful. I know this topic is very hard to grasp at the beginning (which is why I made this series in the first place). Let me just say that your feedback is extremely valuable to me for making new videos, so thanks for that! For the statement at 13:10, a trivial example would be an unconstrained optimization problem (i.e., f convex and g = 0). In that case, x is an opt solution iff grad_f(x) = 0 iff grad_f(x) + u grad_g(x) = 0 (simply because grad_g = 0). Or maybe you wanted a counterexample to that statement? In any case, this implication becomes true if you add the complementary slackness as you pointed out (i.e., u*g(x) = 0) AND primal and dual feasibility (i.e., g(x) >= 0 and u >= 0). All these conditions together (the gradient conditions+complementary slacknesss+primal and dual feasibility) are the KKT conditions. (See slide 10 of www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf for a proof)

@alawimidher5899 3 жыл бұрын

at 19:14 the assumption that was made is that since t is very large the log term dominates, but as we decrease t in later iterations won't that make the assumption of "dominant log term" invalid?

@VisuallyExplained 3 жыл бұрын

That is very true, but that's actually a good thing. As you make t small, the dominant term becomes f(x), which is the function that we want to minimize. Keep in mind though, that as long as "t" is not exactly zero, the term -t*log(-g) equals infinity on the boundary of the feasible set, which is the reason why x(t) never crosses that boundary and remains feasible at all times.

@SumanKumar-rf1xb Жыл бұрын

excellent explanation. can you also explain the weak duality theorem in detail. Why dual problem gives lower bound than primal. Is penalty function same as lagrange multiplier

@chadwicklin2091 7 ай бұрын

Just sth aside but I really like the background music. Gives me a sense of calm and peace for learning.

@nathant9510 3 жыл бұрын

Amazing video! Video's like this make me excited to learn, keep it up!

@leticiapiuccomarques7747 2 жыл бұрын

Very clear and concise explanation. Thank you so much.

@phogbinh 3 жыл бұрын

Subscribed. Your content is invaluable to CS grad students.

@PokHoTse Жыл бұрын

My professor did teach quite well but I'm missing some intuitive visualizations. All makes sense now thanks to your video!

@-T.K.- 7 ай бұрын

Awesome video! This is very very helpful (as I'm going to take the convex optimization class exam tomorrow...) However, I am a bit confused at around 6:30. You mentioned that the minimizer x goes first and the maximizer u goes second in the expression at 6:45. I think in mathematics, the expression is evaluated inside-first? So in this case the inner part, maximizer u, would be the first player, and the minimizer x would be the second. I'm not sure if I understand this correctly...

@pabloo.o1912 Жыл бұрын

Thank you very much for this video!! I'm just getting started with semidefinite programming

@martinsanchez-hw4fi 2 жыл бұрын

Awesome video, but in 12:50 it is not that the gradients are inversely proportional, but only that they are proportional by a negative constant

@VisuallyExplained 2 жыл бұрын

Correct! I should have been more precise. I guess it depends on which operation one has in mind when talking about "inverse". If it's "*", then "inversely proportional" is "y = C/x", if it's "+", then it's "y=-C x". You are right that the former is much more widely used, but since grad f and grad g are both vectors, only the latter interpretation is possible here.

@zyzhang1130 2 жыл бұрын

extremely helpful tysm. Two questions though: 1. at 7:06 , why that quantity is convex? because if you take max of u it is no longer linear right? 2. at 16:07, if we know u * g(x) =0, we can discuss case by case (i.e. when u = 0 and when g(x) = 0 ). Why that wouldn't work?

@jianliu5258 2 жыл бұрын

Fantastic! However, in 9:38, why the dual function gives a lower bound of the prime?

@nivethanyogarajah1493 3 жыл бұрын

Holy?! The NIP derivation makes so much sense!!

@blackguardian89 Жыл бұрын

Amazing video! Thank you! I hope there will be a lot more optimization videos from you in the future!

@bouchraer-rabbany4370 Жыл бұрын

It was genuinely helpful. Thank you for the insightful teaching

@mariestrand9918 2 ай бұрын

One thing I'm confused about: 6:45. Wouldn't "max" be evaluated first before "min", since it stands closer to the function (or is "embedded" inside "min"?)

@qr-ec8vd 2 жыл бұрын

have you thought about covering primal dual interior point method? That would be great!

@alawimidher5899 3 жыл бұрын

at 17:33 why did you assume that a negative log is minus infinity? I realize that a negative log is complex and the domain of minimization is real numbers, but is there something else i am missing for it to be minus infinity?

@VisuallyExplained 3 жыл бұрын

That's a good question. In optimization, we like to pretend that the "-log" function takes the value +infinity when its argument is negative, because that makes it an ideal penalty function. In practice, that does not really have a big importance because we never actually feed it a negative number anyway. What actually matters is that -log(x) --> +infinity as x approaches zero from the right. What happens beyond zero is not relevant, but it is convenient to imagine it is +infinity.

@alawimidher5899 3 жыл бұрын

@@VisuallyExplained A very clear answer, thank you for the excellent video and thank you for the very appreciated commitment to answer our questions.

@hanabenrabah93 Жыл бұрын

Thank you for this quality content ! Best explanation on this topic

@sanmore101 3 жыл бұрын

thanks mate. was having some doubts after going through Boyd's book... now many things are clear

@VisuallyExplained 3 жыл бұрын

You are most welcome

@uede 3 жыл бұрын

Wonderfully explained, I am in awe

@VisuallyExplained 3 жыл бұрын

Happy to hear that!

@aswathik4709 2 жыл бұрын

you put soo much effort. subscribed right away!!

@cormackjackson9442 2 жыл бұрын

This really is an amazing series! Well done. If possible, a quick answer to this questions would be amazing. @5.55 "We can take the maximum outside since x1 + x2 does not depend on you." I'm confused how we are able to take the Max out, it feels like our choice of x1 and x2, does depend on u, and hence x1 + x2 depends on you. My thoughts at this point in the video.... 1) Our function Max(u•y) depends on the constraint y, yes? Because if y is negative we take u to be 0, if y is positive we take u to be infinity. 2) Minimising our whole function, with respect to x1 and x2, depends on x1,x2, and u yes? Because if u = 0 we can just pick negative infinity values for x1 and x2, but if u is positive, we should pick values of x1 and x2 that satisfy the constraint. 3) So then how can we say x1 + x2 does not depend on u?

@VisuallyExplained 2 жыл бұрын

Thanks Cormack! Let me try to answer your question. I think your confusion comes from the order of the operations "max" and "min". 1) Max_u(u•y) does not depend on the constraints on y. This is a function of y, it has no knowledge of the constraints we want to impose on y. You plug in a value of y, you get a value. 2) When we take the min, we suppose that we have already taken the max w.r.t to u, so the function inside the min depends only on x1 and x2. the variable u doesn't exist anymore at this point. 3) see 2) above Here is different explanation: Let's say we want to evaluate min_x x1+x2 + max_u u(x1^2+x2^2-1), and let me call the function inside the min as f(x1, x2), so f(x1, x2) = x1+x2 + max_u u(x1^2+x2^2-1), and the original problem is equivalent to min_x f(x1,x2) Note that, and this is important, "f" is a function of (x1, x2) only. This is means that if you give me values for (x1, x2), I can give you the value of f(x1,x2). For example, let's say you gave me (x1=1, x2=3), then I need to evaluate f(1, 3) = 1+3 + max_u u(1^2+3^2-1), which is of course the same as max_u 1+3 u(1^2+3^2-1), which is what happens at 5:55. Evaluating f(x1,x2) requires taking a max wrt "u", but the value of f(x1,x2) doesn't depend on any specific value of "u" Now that we can evaluate f(x1, x2) for all values of x1 x2, we can take the min w.r.t (x1,x2), which means we find the couple (x1,x2) that makes f(x1,x2) as small as possible. Hope that answers your question.

@cormackjackson9442 2 жыл бұрын

@@VisuallyExplained OMG you actually replied same day. While working at Two Sigma, what are you, super man?! (I was so impressed by the Vid I checked your about hahah, very cool stuff!)

@cormackjackson9442 2 жыл бұрын

@@VisuallyExplained Hi! Thanks again so much for taking the time!! Even if you can't reply, writing this out has been very helpful. Based on your replies to myself and others, I have tried to break down exactly what I think is the process is. Note: Before I wasn't saying "constraint on y" I was saying, y is the constraint (e.g. in your video they are the same green colour)... 1. y is our constraint i.e.y(x) = x₁² + x₂² - 1 ≤ 0 2. We could define a function h(u, y(x)) = u•y = u(x₁² + x₂² - 1) 3. MAX_u≥o (u•y) Is saying MAX_u≥o h(u, y(x)), which means get the max value of (u(x₁² + x₂² - 1)) with respect to u for values of u ≥o ? 4. IMPORTANT 5:54 When maximising h(u, y(x)) we chose only u? Our x and hence y(x) value is already chosen, and u responds to it (as x moves first)? 1. y > 0 ⇒ u = ∞ 2. y < 0 ⇒ u = 0 3. y = 0 ⇒ u = Anything 4. (With these values we have recovered the original penalty function with our new penalty function: P(y) = MAX_u≥o h(u, y(x)) = MAX_u≥o(u•y)) 5. When taking MAX_u≥o(u•y)), the reason we chose only u, rather than choosing x and u, is that, as ‘x goes first’ u get’s to respond to x, and hence our choice of u is based on already knowing x. X player knows this and so chooses an x s.t. y = 0. Therefore here, u can be anything. (I have tried to draw on your comments to previous questions by @Rob Marks about this game order). 6. Tying this to your original answer to my second point -- “2) When we take the min, we suppose that we have already taken the max w.r.t to u, so the function inside the min depends only on x1 and x2. the variable u doesn't exist anymore at this point.” -- This is the case because although it feels backwards, the notion of U going second means, that it’s move is based off the X information, and so we kind of compute U first working backwards from X. I.e. working backwards from X, the second player U is computed first, with information about X (even though U is second) and so U is computed and no longer a variable. So as you say there remains no u anymore. 7. Because U is second, x₁ & x₂ choices do no depend on U, and so we can move the MAX_u≥o outside. 8. IMPORTANT: Now we change the order (Why we are allowed to do this is not clear to me)?? 9. Now that X goes second our optimal values of x₁ & x₂ Do depend on U, hence the value @7:15 x₁ = x₂ = -1/2u 10. U player no longer responds to x, so again working backwards from the second mover X (to the first mover U) we plug in X’s optimal responses into the function, which in turn gives us the function in terms of U. 11. Maximising this value for U, we find that there is only 1 specific U value, that prompts that optimal response from X, and this is u = 1/√2 12. y(x) = 0 and u = 1/√2 > 0 , tells us that the constraint is binding. This makes sense given x₁ + x₂, is unbounded bellow, and hence relaxing the constraint should give us a greater minimum. 13. (My understanding here is that this whole player 1 player 2 working backwards from the optimal move, is the formal treatment we would expect if written in a formal game theory setting). Have I missed anything important? Or worse have I missed the point completely?? Thanks again for the amazing help!

@yancomduvida 2 жыл бұрын

Excellent video as everyone already pointed out! However, I have a question on minute 5:57, we are using a linear function u.y, instead of the 0 ~infinity penalty function. My question is the following: Since we are going to maximize for U and it must be greater or equal 0, aren't we going to have the same problem of the +infinity case? I understand that the previous one had only 2 possible values, which were 0 and + infinity, but now for every slight violation of the constraint the maximum of the function will be +infinity. How is that not the same problem as before?

@VisuallyExplained 2 жыл бұрын

That is absolutely true. Up to that point, we have simply reformulated the 0-infinity penalty function in terms of a min-max, and we still have "the same problem as before". Things become interesting when we switch the order of the min and max. Now X goes second, so he can adapt to what "u" is. The higher (but still finite) "u" is, the closer the value of X will be to satisfying the constraints. In this case, even if you take the max w.r.t to "u", you will get a finite value, and not +infinity.

@yancomduvida 2 жыл бұрын

@@VisuallyExplained Oh now I see what you mean! If we find U first, there is no way it will be +infinity. But I think I'm still struggling to understand this exchange of minmax to maxmin. For example, in the working example we found out the max U and only afterwards we get X, which depends on the value of our U. I can't see how this effectively assures that our restriction is going to be respected, since we did not know if the X chosen would violate the constraint. What I mean is that the primal problem, which is minimize for X and then maximize for U makes sense, since we can se the restrictions being violated and U working to penalize it or not. However, for the dual problem it seems weird that we are working out our penalize function before knowing what the X is going to be. I think I need to deep-dive on this part, if you have anything that could help me tackle this misunderstanding, it would be really great!! I appreciate your response

@adrianom 24 күн бұрын

at 9:45 you say that "in the dual problem, the X player, which tries to make our objective function small, goes second, and therefore has the advantage". But it actually goes first, as you first solve the inner minimization and then the outer maximization - isn't it?

@sebydocky5080 10 ай бұрын

Exceptional video.... It's so clear Bravo.

@tomxiao 2 жыл бұрын

Awesome video, thanks for make this available.

@ankitbioinfo 10 ай бұрын

Beautifullly explained.

@brahimerraji9332 2 жыл бұрын

Awesome and illustrative, thank you.

@danielscott6302 Жыл бұрын

Thank you very much for this video. It has been very helpful 😊. Keep it up 👍. I just have a question about the gradients v_i of the "penalty functions" for the equality constraints h_i (at 8.31 in the video). Instead of finding the max of v_i >= 0, shouldn't we instead find the maximum of |v_i| (the absolute value of the v_i's)? Considering the figure at 5.39 in the video, for constrains that require equality to zero (i.e., the h_i's), surely we would want to consider negative slopes in a way that make the corresponding penalty function effectively +infinity for all values except for where these constraints are equal to zero. Please let me know where I may be going wrong.

@liwei9136 Ай бұрын

at 9:24, If g_i \le 0 and h_i equals 0, then shouldn't u_j \le 0 and v_i has no bound to max the objective?

@zarzavattzarzavatt9309 3 жыл бұрын

Thank you for the great explanation. Even I understood something :), but not why "because if g were positive log(-g) would be plus infinity" 17:28.

@VisuallyExplained 3 жыл бұрын

Thanks! What I meant is that if you start with a negative g, and you start increasing it while tracking the value of log(-g), then log(-g) will start decreasing. And as soon as you hit g=0, log(-g) will be equal to -infinity. This is simply because log(y) tends to -infinity as y tends to 0 from the right.

@zarzavattzarzavatt9309 3 жыл бұрын

@@VisuallyExplained thx! i was confused by the sign, it's about the whole -log(-g) thing ...

@MrChieditel Жыл бұрын

@@VisuallyExplained but zarzavatt is right: if g were positive, log(-g) would be non-defined, not plus infinity. We talk about minute 17:00. Maybe you meant "if g were positive, -log(g) would be plus infinity"... Even though, this also isn't completely true, it would be the case only for g →0 from the right... !