Lecture 7 - Kernels | Stanford CS229: Machine Learning Andrew Ng (Autumn 2018)

Рет қаралды 190,651

Stanford Online

Күн бұрын

Пікірлер: 42

@shuozhang429 2 жыл бұрын

Amazing. Start of kernel trick: 28:46.

@zenicv 4 ай бұрын

thank you..I've learnt lot about A Ng, but this explanation shows his genius because so far I never found a more lucid explanation.

@appu6517 Ай бұрын

so it was nice i understood the concept of soft margin.

@mavaamusicmachine2241 2 жыл бұрын

Very clear, thank you for the explanation

@Danpage04 Жыл бұрын

Question for the kind hearted people who understood: in the geometric margin, y is either 1 or -1 right? But wasn’t it mentioned before that y is either 1 or 0? I got a little messed up on that.

@bucketofjava5754 Жыл бұрын

Yes, y is either 1 or -1. y being 1 or 0 was a quality of the other binary classification algorithms [like logistic regression]

@tariqkhan1518 Жыл бұрын

it is because we multlply the y with (W(t)X+b) so that we know the tyoe of data.

@aramuradyan2138 Жыл бұрын

Where can I find lecture notes to cs 229?

@merlinm1892 Жыл бұрын

if you google it it pop up

@MuhammadAhsan-hq2bc 10 ай бұрын

The audio is bad. I am relying on the captions to follow Andrew Ng.

@soumyadeepsarkar2119 Жыл бұрын

1:00:00

@Danpage04 Жыл бұрын

What happened here? 17:51

@zer1230 Жыл бұрын

You fall a sleep during the class

@mrpotatohed4 6 ай бұрын

@@zer1230 close your eyes for 2 seconds and never catch back up

@haoranlee8649 Жыл бұрын

i am coming

@posthocprior 2 жыл бұрын

This is a confusing lecture.

@artemkondratyev2805 Жыл бұрын

You might try the MIT version of it: kzbin.info/www/bejne/lYHamZyNra1-btE, really helped me

@KipIngram 2 жыл бұрын

Is it really hard to prove that w is a linear combination of the training samples? The training samples represent some set of vectors in a vector space. They might not span all of the space, and in fact they can't span more than N where N is the number of training samples (and they might span less, if there's any linear dependence among them). But they span some subspace of dimension M. The vector w defines a hyperplane that divides that space into classes. Well, the training samples span that subspace, and therefore *any* vector in that subspace is a linear combination of them. That leaves the non-spanned dimensions to consider, but your training sample conveys no information whatsoever about those, so there's nothing to do about them. I hope linear algebra was a prerequisite for this class.

@louisb8718 2 жыл бұрын

Unfortunely your proof is incorrect. You have to remember that the hyperplane is defined as w^t x + b = 0 (it is affine). You need to prove that the w that solves the optimisation problem problem lies in this span.

@JohnDoe-nz3rn Жыл бұрын

"Is it really hard to prove that w is a linear combination of the training samples?" Well, let's use our critical thinking skills to answer this question. Given that the professor, multiple times, says that the result is too difficult to prove in this lecture, and he says the lecture notes prove the result (called Representer Theorem), and that upon looking at the notes or Googling this theorem and looking on Wikipedia, you can see for yourself that the proof is really much more involved than what you are making it out to be: maybe it IS that hard. And maybe your failed attempt to convert his drawings for intuition #2 and intuition #3 into a proof is not actually a genius, shortcut proof that he and everybody else somehow missed out on.

@timgoppelsroeder121 Жыл бұрын

The intuition seems correct but this is hardly a mathematical proof

@kendroctopus Жыл бұрын

He did say this, this was essentially intuition #2

@mohakkhetan 9 ай бұрын

@@timgoppelsroeder121genuine question, why is it not a proper proof and just an intuition?

@SphereofTime 6 ай бұрын

28:00

@NANGUNOORISRIRAM 4 ай бұрын

Where is the lecture Notes..?

@Adnan_19946 2 ай бұрын

cs229.stanford.edu/main_notes.pdf

@jumblemaksotov3910 3 ай бұрын

it is so scary to watch this lectures with this bad sound quality, as an ad may pop up to blow your ears 😂😂

@_soundwave_ Жыл бұрын

What is that that weird symbol he wrote during L1 soft margin svm? He called it c-i , i guess

@safiyajd Жыл бұрын

ξ Greek letter resembling our acceptance for miss-classifications occurrence

@Emanuel-oz1kw 4 ай бұрын

18:09

@Vikermajit Жыл бұрын

Poor audio

@ariakiz7532 2 жыл бұрын

@Danpage04 Жыл бұрын

Kernel

@ruvikm8788 2 жыл бұрын

he seems explaining to himself

@elching.8924 15 сағат бұрын

He just read through notes. Andrew is an investor, not academic professional. Poor fee paying students.

@KipIngram 2 жыл бұрын

The stuff at the beginning here seems silly to me. We're solving for a LINE in the classification space - it's just standard knowledge that the equation of a line can be multiplied by anything and still be the equation of the same line - it seems wasteful to apply so much attention to that idea. It's something that just seems like it should be "taken for granted" to me. And if you don't move the line, you haven't changed the classifier. Obvious. I feel like a good bit of what I'm seeing here is "mathematically overdressed." Most of the basic ideas we're talking about are simple applications of elementary probability theory. There's really no need to layer it with more than the minimum of mathematical notation. Obviously you want to state things correctly and mathematically, but once you've done that there's no need for further "decoration." If it's taken too far it starts to make those simple ideas more opaque, and that certainly doesn't seem like a good idea. Edit: The rest of the lecture (kernel trick) *more* than made up for the first part. 🙂

@ericchen6648 2 жыл бұрын

The point of the scaling comments wasn't that scaling the parameters of the line does not change the line. Rather it was used to show the intuition on how to to change the non-convex intractable optimization function to another form that can be solved through the use of lagragian multipliers.

@JohnDoe-nz3rn Жыл бұрын

A) He spends from 7:00 to 8:30 explaining that scaling the vector w and the real number b by the same constant does not change the line. You say it's "wasteful to apply so much attention to that idea", but it's not necessarily obvious to everyone and he's just going slowly so that everybody can follow and not spending that much time, really. B) The rest of the time in the introduction, he is talking about a SPECIFIC clever choice of scaling (which makes ||w|| = 1/gamma) which converts the problem from something that was previously difficult to solve, into something solvable. You entirely missed the point of the scaling talk, despite him repeatedly stating the point in lecture 6 and here in lecture 7. None of this has to do with "simple application of elementary probability theory", and the notation he uses is already as simple as it can be to express the idea. If anything, he is simplifying a lot of math involved and not deriving everything in detail (he signals extra resources such as lecture notes for those interested in reading further). You might benefit from rewatching the lectures again and paying more attention.

@elching.8924 15 сағат бұрын

The industry are full of clowns. Systemic issue.