thank you..I've learnt lot about A Ng, but this explanation shows his genius because so far I never found a more lucid explanation.
@creativeuser9086 Жыл бұрын
Question for the kind hearted people who understood: in the geometric margin, y is either 1 or -1 right? But wasn’t it mentioned before that y is either 1 or 0? I got a little messed up on that.
@bucketofjava5754 Жыл бұрын
Yes, y is either 1 or -1. y being 1 or 0 was a quality of the other binary classification algorithms [like logistic regression]
@tariqkhan1518 Жыл бұрын
it is because we multlply the y with (W(t)X+b) so that we know the tyoe of data.
@mavaamusicmachine22412 жыл бұрын
Very clear, thank you for the explanation
@MuhammadAhsan-hq2bc9 ай бұрын
The audio is bad. I am relying on the captions to follow Andrew Ng.
@aramuradyan2138 Жыл бұрын
Where can I find lecture notes to cs 229?
@merlinm1892 Жыл бұрын
if you google it it pop up
@haoranlee864911 ай бұрын
i am coming
@_soundwave_ Жыл бұрын
What is that that weird symbol he wrote during L1 soft margin svm? He called it c-i , i guess
@safiyajd Жыл бұрын
ξ Greek letter resembling our acceptance for miss-classifications occurrence
@soumyadeepsarkar2119 Жыл бұрын
1:00:00
@NANGUNOORISRIRAM2 ай бұрын
Where is the lecture Notes..?
@Adnan_19946Ай бұрын
cs229.stanford.edu/main_notes.pdf
@creativeuser9086 Жыл бұрын
What happened here? 17:51
@zer1230 Жыл бұрын
You fall a sleep during the class
@mrpotatohed45 ай бұрын
@@zer1230 close your eyes for 2 seconds and never catch back up
@SphereofTime5 ай бұрын
28:00
@KipIngram2 жыл бұрын
Is it really hard to prove that w is a linear combination of the training samples? The training samples represent some set of vectors in a vector space. They might not span all of the space, and in fact they can't span more than N where N is the number of training samples (and they might span less, if there's any linear dependence among them). But they span some subspace of dimension M. The vector w defines a hyperplane that divides that space into classes. Well, the training samples span that subspace, and therefore *any* vector in that subspace is a linear combination of them. That leaves the non-spanned dimensions to consider, but your training sample conveys no information whatsoever about those, so there's nothing to do about them. I hope linear algebra was a prerequisite for this class.
@louisb87182 жыл бұрын
Unfortunely your proof is incorrect. You have to remember that the hyperplane is defined as w^t x + b = 0 (it is affine). You need to prove that the w that solves the optimisation problem problem lies in this span.
@JohnDoe-nz3rn Жыл бұрын
"Is it really hard to prove that w is a linear combination of the training samples?" Well, let's use our critical thinking skills to answer this question. Given that the professor, multiple times, says that the result is too difficult to prove in this lecture, and he says the lecture notes prove the result (called Representer Theorem), and that upon looking at the notes or Googling this theorem and looking on Wikipedia, you can see for yourself that the proof is really much more involved than what you are making it out to be: maybe it IS that hard. And maybe your failed attempt to convert his drawings for intuition #2 and intuition #3 into a proof is not actually a genius, shortcut proof that he and everybody else somehow missed out on.
@timgoppelsroeder121 Жыл бұрын
The intuition seems correct but this is hardly a mathematical proof
@kendroctopus Жыл бұрын
He did say this, this was essentially intuition #2
@mohakkhetan8 ай бұрын
@@timgoppelsroeder121genuine question, why is it not a proper proof and just an intuition?
@posthocprior Жыл бұрын
This is a confusing lecture.
@artemkondratyev2805 Жыл бұрын
You might try the MIT version of it: kzbin.info/www/bejne/lYHamZyNra1-btE, really helped me
@jumblemaksotov3910Ай бұрын
it is so scary to watch this lectures with this bad sound quality, as an ad may pop up to blow your ears 😂😂
@ruvikm87882 жыл бұрын
he seems explaining to himself
@Emanuel-oz1kw3 ай бұрын
18:09
@Vikermajit Жыл бұрын
Poor audio
@ariakiz75322 жыл бұрын
hi
@creativeuser9086 Жыл бұрын
Kernel
@KipIngram2 жыл бұрын
The stuff at the beginning here seems silly to me. We're solving for a LINE in the classification space - it's just standard knowledge that the equation of a line can be multiplied by anything and still be the equation of the same line - it seems wasteful to apply so much attention to that idea. It's something that just seems like it should be "taken for granted" to me. And if you don't move the line, you haven't changed the classifier. Obvious. I feel like a good bit of what I'm seeing here is "mathematically overdressed." Most of the basic ideas we're talking about are simple applications of elementary probability theory. There's really no need to layer it with more than the minimum of mathematical notation. Obviously you want to state things correctly and mathematically, but once you've done that there's no need for further "decoration." If it's taken too far it starts to make those simple ideas more opaque, and that certainly doesn't seem like a good idea. Edit: The rest of the lecture (kernel trick) *more* than made up for the first part. 🙂
@ericchen6648 Жыл бұрын
The point of the scaling comments wasn't that scaling the parameters of the line does not change the line. Rather it was used to show the intuition on how to to change the non-convex intractable optimization function to another form that can be solved through the use of lagragian multipliers.
@JohnDoe-nz3rn Жыл бұрын
A) He spends from 7:00 to 8:30 explaining that scaling the vector w and the real number b by the same constant does not change the line. You say it's "wasteful to apply so much attention to that idea", but it's not necessarily obvious to everyone and he's just going slowly so that everybody can follow and not spending that much time, really. B) The rest of the time in the introduction, he is talking about a SPECIFIC clever choice of scaling (which makes ||w|| = 1/gamma) which converts the problem from something that was previously difficult to solve, into something solvable. You entirely missed the point of the scaling talk, despite him repeatedly stating the point in lecture 6 and here in lecture 7. None of this has to do with "simple application of elementary probability theory", and the notation he uses is already as simple as it can be to express the idea. If anything, he is simplifying a lot of math involved and not deriving everything in detail (he signals extra resources such as lecture notes for those interested in reading further). You might benefit from rewatching the lectures again and paying more attention.