Thank you to every one who watched the video and spotted a typo or correction. I will group them in this comment so new viewers can have easy access to them. - At 12:50 by "grad_f and grad_g are inversely proportional", I mean grad_f and grad_g are proportional to each other with a negative coefficients. - At 13:47, the correct feasibility equation for x is g(x) \le 0, and not g(x) \ge 0 as stated in the video. This typo goes away starting from 15:11
@artashesasoyan62722 жыл бұрын
I worked out the gradient by hand (17:00) and found out that our solutions don't match, you have an extra ' - ' (minus sign) in the argument of log. The equation should be : grad[ f(x) - t log( +g(x)) ] and not grad[ f(x) - t log( -g(x)) ]. The gradient of log(g(x)) is grad(g(x))/g(x), no minus sign.
@VisuallyExplained2 жыл бұрын
@@artashesasoyan6272 Thanks for working this out. Your reasoning would be correct if g were positive, but in this case g
@Yu-ChenLiu Жыл бұрын
At 05:44, P(y) = max u*y subject to u≥0 results in "max{0,...,∞}=∞, if y>0" and "max{-∞,...,0}=0, if y≤0".
@HelloWorlds__JTS7 ай бұрын
It's confusing after 6:46 when you say "u goes first", but that's actually when X "goes" first, etc. You do the same thing again at 9:48, where it is particularly misleading. Love your content! I hope you are somehow finding time to make more. The world could really use more videos giving concise explanations related to convex optimization. I'm particularly interested in manifold learning.
@fjturner1234 ай бұрын
@@HelloWorlds__JTS I thought the same thing - at 9:48, it is not meant to mean that u is optimised after x? Otherwise I agree - the video series is really excellent - thank you so much for bringing out the visual intuition of this topic and saving us days of pouring though textbooks! It's fantastic.
@睡睡-d7l2 жыл бұрын
20mins content better than my professor for half a semester. Thank you
@YuxuanHarry Жыл бұрын
The author must be a genius for making such a great video! Only a man with deep understanding of optimization can explain it virtually
@pejT473 жыл бұрын
The best video on Lagrangian method that I've ever seen! Great work, thank you!
@VisuallyExplained3 жыл бұрын
Wow, thanks!
@ulissemini54923 жыл бұрын
you should submit this to 3b1b's summer of math exposition contest
@gtorres94 Жыл бұрын
I can't thank you enough for this video and all the content you produce. This has to become the standard for teaching mathematics in schools. It makes everything so much clear, learning becomes so much more efficient. People in education should look at this and reward people like you who innovate and outpeform any classic math teacher. Thank you once again.
@apaarsadhwani3 жыл бұрын
Fantastic video! Please keep more coming, these are super-useful! I have actually worked through Boyd's book and the reason I still prefer this is, it's so much quick to refresh your memory with a short video like this. I worked through Boyd's book many years ago and barely remember much now (except that it was fun!).. I suddenly need to recall duality/IP as a quick reference, it's not practical to read that book (or even Boyd's slides). This video is just perfect for that. Another use case I see is, before you deep dive into a convex optimization book, watching this video will give you a great idea and intuition for what's coming next!
@VisuallyExplained3 жыл бұрын
Thanks, I am glad you find them useful!
@l10n9194 ай бұрын
This is an excellent explanation of this subject. Last year I took an Intro to Optimization course at my university and had to fight tooth and nail to get the grade I wanted. This trilogy has really increased my intuition on the subject and helped me better understand the more complex math behind KKT and other optimality math theorems which you omitted in this video. Thanks!
@meeseeks1489 Жыл бұрын
This magic! How can you represent such difficult concepts so beautifully! This is best youtube video ever
@aashishchauhan19893 жыл бұрын
It is mind blowing to see all these ideas visually. Keep it coming, thank you
@ElectronsSoul3 жыл бұрын
Excellent job! This is a new subject for me and it felt really intuitive and interesting all the way through. I hope your channel get the exposure and success that this material deserves.
@govindchari45863 жыл бұрын
I have spent a great deal of time trying to understand this topic, and this video series is the single best resource I have ever come across.
@VisuallyExplained3 жыл бұрын
Thanks. I am really glad it was helpful
@HadiAhmed7546 Жыл бұрын
one of the best series of vids posted on the internet
@lucadelaat8837 Жыл бұрын
What a fantastic series! Thank you so much! Keep up the beautiful work! This is the future of math education at work.
@darrenho36553 жыл бұрын
Amazing visuals with great explanations. I'm so grateful for channels with high quality content like this.
@VisuallyExplained3 жыл бұрын
Much appreciated! I am glad you liked the channel.
@cowkeydev2 жыл бұрын
Amazing video, could not understand this for the life of me but this helped tremendously. Videos like this must take a long time to make, but I feel that they will be used for generations. Thank you :)
@engiboye98933 жыл бұрын
I can't wait till this channel explodes and becomes very popular and I can say that I was here in the beginning. Thank you for the amazing content, keep it up!
@The1ceCube Жыл бұрын
Thank you so much for putting in all these hours to make a video like that. But it really helped me to understand the topic a lot better!
@shwetadwivedi34042 ай бұрын
So beautiful explanation, I was never this curious for studying this subject as much I'm now after watching this video 🙇♀️. Thank you !
@MintaoYE2 жыл бұрын
The best explanation of the duality problem and KKT condition that I have seen! Thank you
@hyperduality28382 жыл бұрын
From a convergent, convex (lens) or syntropic perspective everything looks divergent, concave or entropic -- the 2nd law of thermodynamics. According to the 2nd law of thermodynamics all observers have a syntropic perspective. My syntropy is your entropy and your syntropy is my entropy -- duality! Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics. Homology (convergence, syntropy) is dual to co-homology (divergence, entropy). Teleological physics (syntropy) is dual to non teleological physics (entropy). Duality creates reality. "Always two there are" -- Yoda. Points are dual to lines -- the principle of duality in geometry.
@HiepNguyen-ud8qe4 ай бұрын
I'm so amazed by your contribution and dedication in your video. I just want you to know that your series help me a lot. Thank you. I admire your work.
@lonewolfgaming52453 ай бұрын
Thank you. I have been searching all over for a video with proper explanation and a solved example.
@林泓均-q4j2 жыл бұрын
Why I can only give one like to this video?! This video is awesome!!! Thanks for making it!
@shafiqreshid42883 жыл бұрын
Thankyou for this :) I don't think there is a better resource in internet for this topic:) Please keep the videos coming, I already binge watched all of your existing videos, Your way of storytelling is just amazing!
@VisuallyExplained3 жыл бұрын
Thank you! Will do!
@ARROWS2030 Жыл бұрын
🎯 Key Takeaways for quick navigation: 00:00 🧠 *Convex optimization problems involve minimizing convex objective functions subject to constraints. Duality provides a useful perspective in solving such problems efficiently.* 01:27 🚢 *In a practical example of ship navigation, convex optimization is applied to minimize the objective function while considering constraints, turning it into a convex optimization problem.* 03:20 🔄 *Introducing a penalty function helps eliminate constraints, but choosing an appropriate penalty function is crucial. The "zero-infinity" penalty function is one such example.* 05:54 🔄 *Penalty functions can be approximated with linear penalties. The max of these linear penalties transforms the problem into a min-max problem, introducing the concept of dual problems.* 08:59 🔄 *Deriving the dual problem involves introducing Lagrangian multipliers, leading to a lower bound on the optimal value of the primal problem, establishing strong duality under certain assumptions.* 10:36 🎓 *The Karush-Kuhn-Tucker (KKT) conditions are essential in convex optimization, providing necessary conditions for optimality, with feasible solutions satisfying a set of equations and inequalities.* 14:29 ⚙️ *The KKT conditions reduce solving an optimization problem to solving equations and inequalities, facilitating the development of general-purpose optimization solvers.* 16:22 🔄 *The interior point method perturbs the KKT conditions to make them easier to solve, using a parameter "t" that is gradually reduced. It leverages two key insights to navigate the perturbed conditions efficiently.* 20:54 🔄 *The interior point method follows the "central path," gradually moving towards smaller "t" values until it reaches the solution. The central path stays within the interior of the feasible region, justifying the method's name.* Made with HARPA AI
@fikriansyahadzaka66472 жыл бұрын
Wow! great explanation. This is one topic that I find it intimidating when reading the book, but you explain it beautifully. Keep up the good work man!
@VisuallyExplained2 жыл бұрын
It's always nice to hear that this was useful, thanks!!
@baby2k1 Жыл бұрын
thanks for the visualization, you made it so simpler for us to understand the problem and also understand what prof said in lecture
@anandsudhi70712 жыл бұрын
I must say, the insight that the visual approach provided just made it so intuitive. This is quite useful. Keep up the great work.
@VisuallyExplained2 жыл бұрын
Awesome, thank you!
@nish61063 жыл бұрын
This video is just absoloutely amazing, for anyone just learning optimization. especially the KKT conditions its after 3 months that I have finally understood the actual intuition behind them
@kiaranr3 жыл бұрын
These are like how I imagined math lectures would look in the future. Great work, instant sub.
@VisuallyExplained3 жыл бұрын
Welcome aboard!
@jaimejaime98002 жыл бұрын
Amazing job! I think that in a lot of subjets, there are many encysted text-book explanations , with a bottom-top approach that overwhelms and trap newcomers and practicioners, making this knowledge a specialized tool. Channels like yours unblock that bottleneck, democratizing very useful insights and tools and speeding up progress, thanks!
@dididd Жыл бұрын
20min better than a whole 1hour lecture from my professor
@hakimarab243619 күн бұрын
what video, optimization from scratch. probably best course video ever had
@rs-ov6ie9 ай бұрын
You are just amazing..Thank you so much
@farhanhyder63782 жыл бұрын
Great video, one of the best and most intuitive ones I have seen. I think you could have included a short discussion on what does it mean for the multiplier (u or v) to be binding.
@alirezaasghari45472 жыл бұрын
In fact, the virtual videos are incredible when it comes to learning new stuff, specially in math problems.
@lavanyapadala31192 жыл бұрын
the way you are presenting equations using animations its amazing sir. even a person with tiny knowledge in math can easily understand it.
@johngray64367 ай бұрын
I've finally known where the hell Lagrangian comes from Such a great video
@voxelsofsorrow2 жыл бұрын
absolutely outstanding! thank you so much. as a discrete algorithm designer unexpectedly thrown into the trenches of solving a difficult bi-level mixed-integer linear optimization problem, this was very intuitive and much less scary than the Wikipedia article.
@VisuallyExplained2 жыл бұрын
Yay, glad this was helpful!
@yahugh59 Жыл бұрын
Your insights and visualizations are immensely helpful!
@yatshunleeАй бұрын
This is so amazing. I wish I discovered the video two years ago back then I was having my Convex Opt class lol
@1bvideo1 Жыл бұрын
Your explanation is genius! Thank you for taking the time to create such a beautiful explanation. You make learning fun.
@balbiyadsaad160710 ай бұрын
Second time I watch this video, fantastic content! Analogies are a piece of art.
@ayushjangid3 жыл бұрын
That's the best video i have watched till now. Thanks a lot
@RajivSambasivan3 жыл бұрын
Fantastic video!. You have really motivated the ideas so well and you have beautifully developed the intuition through the narrative. Kudos!
@shyambhagwat2 жыл бұрын
Hello Bachir! , This is so amazing ! I can just say - god bless you !!! Best Duality explanation so far !!
@VisuallyExplained2 жыл бұрын
Thanks a lot!!!
@hyperduality28382 жыл бұрын
From a convergent, convex (lens) or syntropic perspective everything looks divergent, concave or entropic -- the 2nd law of thermodynamics. According to the 2nd law of thermodynamics all observers have a syntropic perspective. My syntropy is your entropy and your syntropy is my entropy -- duality! Syntropy (prediction) is dual to increasing entropy -- the 4th law of thermodynamics. Homology (convergence, syntropy) is dual to co-homology (divergence, entropy). Teleological physics (syntropy) is dual to non teleological physics (entropy). Duality creates reality. "Always two there are" -- Yoda. Points are dual to lines -- the principle of duality in geometry.
@philippemaincon97022 жыл бұрын
Great presentation. You make it so simple...!
@jackvial55912 ай бұрын
Loved this series. Great intro to the topic. I’m one step closer to understanding G-FOLD
@marsag31183 жыл бұрын
Very nice series(and channel in general). I am a big fan of the work of Prof. Boyd, and this series makes the concept very intuitive and nicer to think about. Great work!
@VisuallyExplained3 жыл бұрын
Thank you, I appreciate that!
@almerchant75952 жыл бұрын
Excellent video. I am trying to understand what you say at 5:45. " you consider all the u s" Doe this mean that we are selecting a different u for a given value of y, i.e. for y> 0 we select u=inf and for y
@VisuallyExplained2 жыл бұрын
This is a fantastic question, and is at the heart of what duality is all about. Let me try to clarify. At 5:45, since min_x is outside of max_u, we are indeed seeing different values of u depending on the value of y. But you'll notice that when y=0, (i.e., when x is exactly on the circle, a property that the optimal x will satisfy), u can take any value it wants and it won't matter anyway since it is multiplied by 0 anyway. BUT, when you switch the order of the min and max, there is only one value of u that forces x to satisfy the constraint without penalizing it beyond that, and that value is 1/sqrt(2). Note that, if the optimal x were inside the circle, or in other terms, if the optimal y was strictly positive, then u would be forced to take the value zero. This is in fact one of the ways to test if a constraint is "tight" or not: you look at the corresponding dual variable, and check if it is equal to zero.
@almerchant75952 жыл бұрын
@@VisuallyExplained Thanks for the example. Would it be fair to say that since there exists an x inside the circle y < 0 ( i.e. x is a strictly feasible point) we can conclude strong duality exists (slater's constraint qualification) => "complementary slackness" can be assumed i.e. either u=0 or y=0(constraint is "active").
@cormackjackson94422 жыл бұрын
@@almerchant7595 Super helpful question and answer. But I feel like there is a typo in the authors response? Surely if the constraint was slack, i.e. x₁² + x₂² < 1, then y = x₁² + x₂² -1 < 0 I.e. y is strictly negative, not positive.
@arccos01602 жыл бұрын
elegant explanation! It should be recommended to whoever wants to learn optimization theory.
@jayantnema96102 жыл бұрын
this is the greatest thing I have ever seen! sooo good!!!!
@stlaurent262 жыл бұрын
Making a complex math concept simple ... well done!
@brahimerraji93322 жыл бұрын
At 16:54, I think there is a type, it's grad of (f(x) + t log (-g(x))) instead of minus
@jameschen23082 жыл бұрын
I was about to say
@VisuallyExplained2 жыл бұрын
Thanks for watching carefully. I thought so too at the beginning, but then i realized that grad log(-g) is equal to g/grad_g, and not -g/grad_g.
@brahimerraji93322 жыл бұрын
@@VisuallyExplained Duh 😅😅
@behnamplays3 жыл бұрын
I don’t comment much on yt, but these series are awesome! Thanks and keep us the good work!
@sudelal214411 ай бұрын
thank you for the amazing background music! It helps me sleep immediately.
@greatuser25603 жыл бұрын
thank you , this serie of videos helped me a lot to understand and deepen my knowledge of these concepts. keep up the great work
@VisuallyExplained3 жыл бұрын
You're very welcome!
@Darkev773 жыл бұрын
At 13:48, why should the scalar "u" (Lagrange multiplier) be non-negative? Shouldn't it actually be negative? Fantastic video!
@VisuallyExplained3 жыл бұрын
Thanks! “u” should be nonnegative because there is a minus sign in front of it. Overall, we want gradf and gradg to be inversely proportional to each other
@Darkev773 жыл бұрын
@@VisuallyExplained Hey thanks a lot for the clarification. Is it *necessary* for df and dg to be *inversely* proportional? I thought in Lagrange Multipliers, their gradients would point along the same direction (share tangent point) and hence proportional (though the multiplier can absorb the sign accordingly). Where's the flaw in my intuition? Also, in most literature "g_i(x)" is = 0 accordingly. Would having "g_i" >= 0 and "u"
@VisuallyExplained3 жыл бұрын
@@Darkev77 Yes, it is necessary, otherwise you can move x a little bit in direction that makes both g and f decrease, so you make f even smaller without violating g(x) = 0, "min" + ">=" gives u
@Darkev773 жыл бұрын
@@VisuallyExplained Wow!!! Thank you so much, this is crystal clear. Do you happen to have a website where you post articles regarding optimization, because your explanation is really good (and I'm taking optimization RN and our professor is not so good :/). Thanks a lot for your time!
@VisuallyExplained3 жыл бұрын
@@Darkev77 Thanks for the nice words, I am glad this was helpful :-) I don't have anything like that for now, but maybe in the future...
@process69963 жыл бұрын
Amazing intuition behind KKT!
@nguyen7272 Жыл бұрын
really amazing videos, visually and intuitively explanation are really important to me, thank you for doing these great videos
@hobbylper62645 ай бұрын
great video! thank you so much for explaining this topic in a very easy to understand way :)
@matthiashoffmann88143 жыл бұрын
Very nice video! I was looking forward to part 3 a lot. One question: You said the first KKT condition is x to be feasible, shouldn't g(x) then be
@VisuallyExplained3 жыл бұрын
Thank you! And you are right, "g(x) >= 0" is a typo.
@user-krnujdp Жыл бұрын
Something I'm confused about, is that don't we only need to consider the active inequality constraints when determining a new search direction? So in your example at 11:40, wouldn't we just ignore g(x) since it is strictly satisfied, i.e. g(x) < 0. The local optimimum we seek might be on the edge of the problem domain anyway when g(x) = 0 or near by, so there is no reason to avoid the boundary by moving along -\del g if the constraint is still satisfied and that's the direction -\del f(x) tells us the best local direction to go in order to improve the objective. What am I missing? Thanks! Great video!
@VisuallyExplained Жыл бұрын
Maybe I can clear the confusion. The condition "grad_f(x) = -u*grad_g(x)" at 11:40 does not tell you which direction you should follow to search for the optimal x, it simply tells a condition that the optimal x should satisfy. If it happens that the optimal x is strictly inside the feasible region, then that corresponds to the special case 'u=0'. But your question is valid when we are searching for the optimal x and we consider new search directions, for example around 18:00 . Why do we even look at grad_g(x) if the current x is far from the boundary? Well, if you don't, and you greedily follow -grad_f(x), two things can happen: 1. You reach a point with zero gradient (grad_f(x)=0). In this case, you are done! 2. You reach the boundary of the feasible set (where g(x).= 0). What do you do then? Intuitively, you want to "stay on the boundary" and somehow "follow -grad_f(x)". It's not clear how to exactly do that, and it might be numerically challenging to do so. (but maybe still possible!) So instead, we choose to follow a combination of -grad_f(x) and grad_g(x), EVEN inside the the feasible set. But we give more weight to grad_g(x) as we approach the boundary (we do this by dividing grad_g(x) by g(x)). The beauty of the interior point method is that this "trade off" between objective function "f" and constraint "g" works beautifully in practice, and provably converges to the optimal solution.
@abhijeetkumar1177 Жыл бұрын
could anyone please explain how the first condition become: min(x1 + x2); at timee 1:11
@xiaowang4578 Жыл бұрын
Love the video! Student should really start from your video than the standard textbooks!!!!!!
@hyukppen2 жыл бұрын
18:32 - I don't understand why Newton's method very slowly converges when t -> 0 . Because our objective function is : f-tlog(-g), even if the t is equal to zero, the gradient of f(x) remains. Could you explain about the slowness?
@VisuallyExplained2 жыл бұрын
Very good question! You are right that if apply newton with t = 0, we would be minimizing f(x), but that's not really what we want, because we want to find an "x" that satisfies the constraint g(x)
@hyukppen2 жыл бұрын
@@VisuallyExplained Thank you for your reply! Do you mean that Newton converges very slowly not because the gradient is too small, but because the gradient is rather too big? (jumping too far around a stationary point just like gradient descent method with large learning late)
@VisuallyExplained2 жыл бұрын
@@hyukppen To be exact, since Newton is a second order method, what determines the speed of convergence is how fast the Hessian varies. For example, in the extreme case of quadratic functions, the hessian is constant and Newton converges in one step (no matter how big the gradient is). For our case, if t is too small, the hessian will vary too much near the boundary.
@hyukppen2 жыл бұрын
@@VisuallyExplained aha! Based on your example, my simple IPM code (MATLAB) is as below: ------------------------------------------------ xk=[0;0]; tl=1; for l=1:50 tl1=0.8*tl; for k=1:5 x1=xk(1); x2=xk(2); t=tl1; g=[ 1+t*(2*x1)/(-x1^2-x2^2+1) ; 1+t*(2*x2)/(-x1^2-x2^2+1) ]; H=[t*(2*x1^2-2*x2^2+2)/(1-x1^2-x2^2)^2 t*4*x1*x2/(1-x1^2-x2^2)^2 ; ... t*4*x1*x2/(1-x1^2-x2^2)^2 t*(-2*x1^2+2*x2^2+2)/(1-x1^2-x2^2)^2]; xk1=xk-inv(H)*g; xk=xk1; end tl=tl1; end ------------------------------------------------ If I set initial t to be very small, inv(H) which is the inverse of the Hessian diverges while gradient "g" remains same as (1,1). I think this is the point you explained. Am I right?
@VisuallyExplained2 жыл бұрын
@@hyukppen Correct! But even the gradient will diverge when you get really close to the boundary.
@theberkeleyboss3 жыл бұрын
Great video. I've always wanted to see this topic visually explained. There's just one part I'm a bit confused about: In the dual problem, how do we know what value of "u" to try first? It sounds like we can take any value of "u", and then solve for the decision variables based on that value.
@VisuallyExplained3 жыл бұрын
Thank you Andy for you for your comment! I am not sure I understand what you mean by "what value of "u" to try first?". Maybe I can be more helpful if you point me to the timestamp in the video that is confusing to you.
@theberkeleyboss3 жыл бұрын
@@VisuallyExplained At 5:38, I'm wondering why are we considering all the values of U, and taking the maximum of the corresponding linear penalties? If we're finding the max, wouldn't the maximum of (u x y) always be infinity?
@VisuallyExplained3 жыл бұрын
@@theberkeleyboss I see. Indeed, max u*y would be infinity, unless y = 0. (That's what I try to explain at 5:49). But in any case, the optimal "u" is + infinity anyway. At this point, you are right that it looks kind of silly that we consider all the values of "u" when we know that the optimal u is equal to infinity. The reason we do it however is that (i) there is no harm in do it as we are not changing the optimal value, and (ii) we gain a lot once we switch the order of the min and max. And note that once we switch the min and max, it is no longer optimal to take u=infinity (at 7:30 for example, the optimal u is sqrt(2)/2). Let me know if this doesn't make sense, and I can expand a bit more.
@theberkeleyboss3 жыл бұрын
@@VisuallyExplained After watching this video about 10 times, I finally get primal/dual problem differences. This is a difficult and somewhat new topic for me, but I learn something new every time I watch it. I'm down to the last concept that's giving me an issue: Around 13:10, you said, "and you might wonder if this condition is also sufficient in the sense that if for some point x you succeed in finding a scalar u that makes this condition hold, then that makes x an optimal solution ... this implication is not quite true." Can you give a brief numeric example of when this would be true? I think I know what you mean, but I need to see an example. I'm guessing this probably relates to the complementary slackness condition.
@VisuallyExplained3 жыл бұрын
@@theberkeleyboss I am glad this was helpful. I know this topic is very hard to grasp at the beginning (which is why I made this series in the first place). Let me just say that your feedback is extremely valuable to me for making new videos, so thanks for that! For the statement at 13:10, a trivial example would be an unconstrained optimization problem (i.e., f convex and g = 0). In that case, x is an opt solution iff grad_f(x) = 0 iff grad_f(x) + u grad_g(x) = 0 (simply because grad_g = 0). Or maybe you wanted a counterexample to that statement? In any case, this implication becomes true if you add the complementary slackness as you pointed out (i.e., u*g(x) = 0) AND primal and dual feasibility (i.e., g(x) >= 0 and u >= 0). All these conditions together (the gradient conditions+complementary slacknesss+primal and dual feasibility) are the KKT conditions. (See slide 10 of www.cs.cmu.edu/~ggordon/10725-F12/slides/16-kkt.pdf for a proof)
@alawimidher58993 жыл бұрын
at 19:14 the assumption that was made is that since t is very large the log term dominates, but as we decrease t in later iterations won't that make the assumption of "dominant log term" invalid?
@VisuallyExplained3 жыл бұрын
That is very true, but that's actually a good thing. As you make t small, the dominant term becomes f(x), which is the function that we want to minimize. Keep in mind though, that as long as "t" is not exactly zero, the term -t*log(-g) equals infinity on the boundary of the feasible set, which is the reason why x(t) never crosses that boundary and remains feasible at all times.
@SumanKumar-rf1xb Жыл бұрын
excellent explanation. can you also explain the weak duality theorem in detail. Why dual problem gives lower bound than primal. Is penalty function same as lagrange multiplier
@chadwicklin20917 ай бұрын
Just sth aside but I really like the background music. Gives me a sense of calm and peace for learning.
@nathant95103 жыл бұрын
Amazing video! Video's like this make me excited to learn, keep it up!
@leticiapiuccomarques77472 жыл бұрын
Very clear and concise explanation. Thank you so much.
@phogbinh3 жыл бұрын
Subscribed. Your content is invaluable to CS grad students.
@PokHoTse Жыл бұрын
My professor did teach quite well but I'm missing some intuitive visualizations. All makes sense now thanks to your video!
@-T.K.-7 ай бұрын
Awesome video! This is very very helpful (as I'm going to take the convex optimization class exam tomorrow...) However, I am a bit confused at around 6:30. You mentioned that the minimizer x goes first and the maximizer u goes second in the expression at 6:45. I think in mathematics, the expression is evaluated inside-first? So in this case the inner part, maximizer u, would be the first player, and the minimizer x would be the second. I'm not sure if I understand this correctly...
@pabloo.o1912 Жыл бұрын
Thank you very much for this video!! I'm just getting started with semidefinite programming
@martinsanchez-hw4fi2 жыл бұрын
Awesome video, but in 12:50 it is not that the gradients are inversely proportional, but only that they are proportional by a negative constant
@VisuallyExplained2 жыл бұрын
Correct! I should have been more precise. I guess it depends on which operation one has in mind when talking about "inverse". If it's "*", then "inversely proportional" is "y = C/x", if it's "+", then it's "y=-C x". You are right that the former is much more widely used, but since grad f and grad g are both vectors, only the latter interpretation is possible here.
@zyzhang11302 жыл бұрын
extremely helpful tysm. Two questions though: 1. at 7:06 , why that quantity is convex? because if you take max of u it is no longer linear right? 2. at 16:07, if we know u * g(x) =0, we can discuss case by case (i.e. when u = 0 and when g(x) = 0 ). Why that wouldn't work?
@jianliu52582 жыл бұрын
Fantastic! However, in 9:38, why the dual function gives a lower bound of the prime?
@nivethanyogarajah14933 жыл бұрын
Holy?! The NIP derivation makes so much sense!!
@blackguardian89 Жыл бұрын
Amazing video! Thank you! I hope there will be a lot more optimization videos from you in the future!
@bouchraer-rabbany4370 Жыл бұрын
It was genuinely helpful. Thank you for the insightful teaching
@mariestrand99182 ай бұрын
One thing I'm confused about: 6:45. Wouldn't "max" be evaluated first before "min", since it stands closer to the function (or is "embedded" inside "min"?)
@qr-ec8vd2 жыл бұрын
have you thought about covering primal dual interior point method? That would be great!
@alawimidher58993 жыл бұрын
at 17:33 why did you assume that a negative log is minus infinity? I realize that a negative log is complex and the domain of minimization is real numbers, but is there something else i am missing for it to be minus infinity?
@VisuallyExplained3 жыл бұрын
That's a good question. In optimization, we like to pretend that the "-log" function takes the value +infinity when its argument is negative, because that makes it an ideal penalty function. In practice, that does not really have a big importance because we never actually feed it a negative number anyway. What actually matters is that -log(x) --> +infinity as x approaches zero from the right. What happens beyond zero is not relevant, but it is convenient to imagine it is +infinity.
@alawimidher58993 жыл бұрын
@@VisuallyExplained A very clear answer, thank you for the excellent video and thank you for the very appreciated commitment to answer our questions.
@hanabenrabah93 Жыл бұрын
Thank you for this quality content ! Best explanation on this topic
@sanmore1013 жыл бұрын
thanks mate. was having some doubts after going through Boyd's book... now many things are clear
@VisuallyExplained3 жыл бұрын
You are most welcome
@uede3 жыл бұрын
Wonderfully explained, I am in awe
@VisuallyExplained3 жыл бұрын
Happy to hear that!
@aswathik47092 жыл бұрын
you put soo much effort. subscribed right away!!
@cormackjackson94422 жыл бұрын
This really is an amazing series! Well done. If possible, a quick answer to this questions would be amazing. @5.55 "We can take the maximum outside since x1 + x2 does not depend on you." I'm confused how we are able to take the Max out, it feels like our choice of x1 and x2, does depend on u, and hence x1 + x2 depends on you. My thoughts at this point in the video.... 1) Our function Max(u•y) depends on the constraint y, yes? Because if y is negative we take u to be 0, if y is positive we take u to be infinity. 2) Minimising our whole function, with respect to x1 and x2, depends on x1,x2, and u yes? Because if u = 0 we can just pick negative infinity values for x1 and x2, but if u is positive, we should pick values of x1 and x2 that satisfy the constraint. 3) So then how can we say x1 + x2 does not depend on u?
@VisuallyExplained2 жыл бұрын
Thanks Cormack! Let me try to answer your question. I think your confusion comes from the order of the operations "max" and "min". 1) Max_u(u•y) does not depend on the constraints on y. This is a function of y, it has no knowledge of the constraints we want to impose on y. You plug in a value of y, you get a value. 2) When we take the min, we suppose that we have already taken the max w.r.t to u, so the function inside the min depends only on x1 and x2. the variable u doesn't exist anymore at this point. 3) see 2) above Here is different explanation: Let's say we want to evaluate min_x x1+x2 + max_u u(x1^2+x2^2-1), and let me call the function inside the min as f(x1, x2), so f(x1, x2) = x1+x2 + max_u u(x1^2+x2^2-1), and the original problem is equivalent to min_x f(x1,x2) Note that, and this is important, "f" is a function of (x1, x2) only. This is means that if you give me values for (x1, x2), I can give you the value of f(x1,x2). For example, let's say you gave me (x1=1, x2=3), then I need to evaluate f(1, 3) = 1+3 + max_u u(1^2+3^2-1), which is of course the same as max_u 1+3 u(1^2+3^2-1), which is what happens at 5:55. Evaluating f(x1,x2) requires taking a max wrt "u", but the value of f(x1,x2) doesn't depend on any specific value of "u" Now that we can evaluate f(x1, x2) for all values of x1 x2, we can take the min w.r.t (x1,x2), which means we find the couple (x1,x2) that makes f(x1,x2) as small as possible. Hope that answers your question.
@cormackjackson94422 жыл бұрын
@@VisuallyExplained OMG you actually replied same day. While working at Two Sigma, what are you, super man?! (I was so impressed by the Vid I checked your about hahah, very cool stuff!)
@cormackjackson94422 жыл бұрын
@@VisuallyExplained Hi! Thanks again so much for taking the time!! Even if you can't reply, writing this out has been very helpful. Based on your replies to myself and others, I have tried to break down exactly what I think is the process is. Note: Before I wasn't saying "constraint on y" I was saying, y is the constraint (e.g. in your video they are the same green colour)... 1. y is our constraint i.e.y(x) = x₁² + x₂² - 1 ≤ 0 2. We could define a function h(u, y(x)) = u•y = u(x₁² + x₂² - 1) 3. MAX_u≥o (u•y) Is saying MAX_u≥o h(u, y(x)), which means get the max value of (u(x₁² + x₂² - 1)) with respect to u for values of u ≥o ? 4. IMPORTANT 5:54 When maximising h(u, y(x)) we chose only u? Our x and hence y(x) value is already chosen, and u responds to it (as x moves first)? 1. y > 0 ⇒ u = ∞ 2. y < 0 ⇒ u = 0 3. y = 0 ⇒ u = Anything 4. (With these values we have recovered the original penalty function with our new penalty function: P(y) = MAX_u≥o h(u, y(x)) = MAX_u≥o(u•y)) 5. When taking MAX_u≥o(u•y)), the reason we chose only u, rather than choosing x and u, is that, as ‘x goes first’ u get’s to respond to x, and hence our choice of u is based on already knowing x. X player knows this and so chooses an x s.t. y = 0. Therefore here, u can be anything. (I have tried to draw on your comments to previous questions by @Rob Marks about this game order). 6. Tying this to your original answer to my second point -- “2) When we take the min, we suppose that we have already taken the max w.r.t to u, so the function inside the min depends only on x1 and x2. the variable u doesn't exist anymore at this point.” -- This is the case because although it feels backwards, the notion of U going second means, that it’s move is based off the X information, and so we kind of compute U first working backwards from X. I.e. working backwards from X, the second player U is computed first, with information about X (even though U is second) and so U is computed and no longer a variable. So as you say there remains no u anymore. 7. Because U is second, x₁ & x₂ choices do no depend on U, and so we can move the MAX_u≥o outside. 8. IMPORTANT: Now we change the order (Why we are allowed to do this is not clear to me)?? 9. Now that X goes second our optimal values of x₁ & x₂ Do depend on U, hence the value @7:15 x₁ = x₂ = -1/2u 10. U player no longer responds to x, so again working backwards from the second mover X (to the first mover U) we plug in X’s optimal responses into the function, which in turn gives us the function in terms of U. 11. Maximising this value for U, we find that there is only 1 specific U value, that prompts that optimal response from X, and this is u = 1/√2 12. y(x) = 0 and u = 1/√2 > 0 , tells us that the constraint is binding. This makes sense given x₁ + x₂, is unbounded bellow, and hence relaxing the constraint should give us a greater minimum. 13. (My understanding here is that this whole player 1 player 2 working backwards from the optimal move, is the formal treatment we would expect if written in a formal game theory setting). Have I missed anything important? Or worse have I missed the point completely?? Thanks again for the amazing help!
@yancomduvida2 жыл бұрын
Excellent video as everyone already pointed out! However, I have a question on minute 5:57, we are using a linear function u.y, instead of the 0 ~infinity penalty function. My question is the following: Since we are going to maximize for U and it must be greater or equal 0, aren't we going to have the same problem of the +infinity case? I understand that the previous one had only 2 possible values, which were 0 and + infinity, but now for every slight violation of the constraint the maximum of the function will be +infinity. How is that not the same problem as before?
@VisuallyExplained2 жыл бұрын
That is absolutely true. Up to that point, we have simply reformulated the 0-infinity penalty function in terms of a min-max, and we still have "the same problem as before". Things become interesting when we switch the order of the min and max. Now X goes second, so he can adapt to what "u" is. The higher (but still finite) "u" is, the closer the value of X will be to satisfying the constraints. In this case, even if you take the max w.r.t to "u", you will get a finite value, and not +infinity.
@yancomduvida2 жыл бұрын
@@VisuallyExplained Oh now I see what you mean! If we find U first, there is no way it will be +infinity. But I think I'm still struggling to understand this exchange of minmax to maxmin. For example, in the working example we found out the max U and only afterwards we get X, which depends on the value of our U. I can't see how this effectively assures that our restriction is going to be respected, since we did not know if the X chosen would violate the constraint. What I mean is that the primal problem, which is minimize for X and then maximize for U makes sense, since we can se the restrictions being violated and U working to penalize it or not. However, for the dual problem it seems weird that we are working out our penalize function before knowing what the X is going to be. I think I need to deep-dive on this part, if you have anything that could help me tackle this misunderstanding, it would be really great!! I appreciate your response
@adrianom24 күн бұрын
at 9:45 you say that "in the dual problem, the X player, which tries to make our objective function small, goes second, and therefore has the advantage". But it actually goes first, as you first solve the inner minimization and then the outer maximization - isn't it?
@sebydocky508010 ай бұрын
Exceptional video.... It's so clear Bravo.
@tomxiao2 жыл бұрын
Awesome video, thanks for make this available.
@ankitbioinfo10 ай бұрын
Beautifullly explained.
@brahimerraji93322 жыл бұрын
Awesome and illustrative, thank you.
@danielscott6302 Жыл бұрын
Thank you very much for this video. It has been very helpful 😊. Keep it up 👍. I just have a question about the gradients v_i of the "penalty functions" for the equality constraints h_i (at 8.31 in the video). Instead of finding the max of v_i >= 0, shouldn't we instead find the maximum of |v_i| (the absolute value of the v_i's)? Considering the figure at 5.39 in the video, for constrains that require equality to zero (i.e., the h_i's), surely we would want to consider negative slopes in a way that make the corresponding penalty function effectively +infinity for all values except for where these constraints are equal to zero. Please let me know where I may be going wrong.
@liwei9136Ай бұрын
at 9:24, If g_i \le 0 and h_i equals 0, then shouldn't u_j \le 0 and v_i has no bound to max the objective?
@zarzavattzarzavatt93093 жыл бұрын
Thank you for the great explanation. Even I understood something :), but not why "because if g were positive log(-g) would be plus infinity" 17:28.
@VisuallyExplained3 жыл бұрын
Thanks! What I meant is that if you start with a negative g, and you start increasing it while tracking the value of log(-g), then log(-g) will start decreasing. And as soon as you hit g=0, log(-g) will be equal to -infinity. This is simply because log(y) tends to -infinity as y tends to 0 from the right.
@zarzavattzarzavatt93093 жыл бұрын
@@VisuallyExplained thx! i was confused by the sign, it's about the whole -log(-g) thing ...
@MrChieditel Жыл бұрын
@@VisuallyExplained but zarzavatt is right: if g were positive, log(-g) would be non-defined, not plus infinity. We talk about minute 17:00. Maybe you meant "if g were positive, -log(g) would be plus infinity"... Even though, this also isn't completely true, it would be the case only for g →0 from the right... !
@sherifmostafa49223 жыл бұрын
Amazing series, can you make a series for solving non convex optimization problems using convex strategies?
@RajivSambasivan3 жыл бұрын
Actually, a simple exposition of a set of tricks to identify non-convexity would be a great start.