Lecture 4 - Perceptron & Generalized Linear Model | Stanford CS229: Machine Learning (Autumn 2018)

Рет қаралды 273,122

4 жыл бұрын

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/ai
Anand Avati
PhD Candidate and CS229 Head TA
To follow along with the course schedule and syllabus, visit:
cs229.stanford.edu/syllabus-au...

Пікірлер: 78

@badassopenpolling Жыл бұрын

As per me - real examples were missing in this lecture. Examples help in clarity of thoughts.

@PingCheng-wf2pb 3 ай бұрын

I think this lecture is well-organized and easy to get, which helps me immensely. I started reading the Notes first, but I still don't understand GLMs. After checking out the comments here, I decided to watch Ng's video (2008 version). Unfortunately, it was pretty much like the Notes. Then I went back to this video and found it amazing! Like, the conclusion of the "learning update rule" is handy but got missed in Notes. Also, the explanation of "assumptions/design choices" is clearer than the notes, which gives me a more concrete feel. The examples in 59 minutes are also incredibly great. I hope you dig this video and stop getting swayed by negative comments.

@samurai_coach Жыл бұрын

51:40 Memo. In conclusion, just pick which pdf method you use depending on the types of models(Gaussian, Bernoulli, and so on) and plug in the value for h theta(x) for the function to train.

@wingedmechanism 4 ай бұрын

For anyone who needs it the updated link for the notes are here: cs229.stanford.edu/lectures-spring2022/main_notes.pdf

@Dimi231 3 ай бұрын

have you seen the PS 1 homework they have to do anywhere? the programming and math that they upload on scope?

@vedikagupta407 23 күн бұрын

Thank you very much :)

@samurai_coach Жыл бұрын

59:30 good question to sums up what he's been explaining in the lecture

@dimensionentangled4514 2 жыл бұрын

We begin by learning about Perceptrons. This is motivated by the previous discussions on logistic regression, where we use the sigmoid function. In case of perceptrons, we use a modified function in place of the sigmoid. Next, we look at what exponential families are and some related examples. This is more a statistics thing. Next, we learn about GLMs (Generalised Linear Models). The appearance of the sigmoid function in logistic regression becomes apparent from this discussion. Finally, we study Softmax regression via the use of Cross-Entropy (defined therein.)

@McAwesomeReaper 7 ай бұрын

Perceptron is one of my favorite autobots.

@judychen9693 4 ай бұрын

People are too harsh on this lecturer. Even if he's a senior student, he's still learning. He hasn't got decades of experience like Dr. Ng. I think he did a fine job delivering this lecture.

@RakibHasan-ee2cd 4 ай бұрын

He was actually quite clear 2/3 of the interview except maybe the start.

@PingCheng-wf2pb 3 ай бұрын

42:00 "Given an x,we get an exponential family distribution, and the mean of that distribution will be the prediction that we make for a given new x"!!!!!

@alpaslankurt9394 4 ай бұрын

How can we find lecture notes ? Is there chance to I get ? Your lecture note page says 404 not found.

@McAwesomeReaper 7 ай бұрын

Hopefully in the last 5 years someone at Stanford took it upon themselves to attach some magnets to the bottom of these panes.

@mrpotatohed4 11 ай бұрын

Great lecture, really helped clarify to me GLMs.

@user-pf8pe2ed1y 11 ай бұрын

At about 1:07, "These "Ys" and "xs" would have been sampled". I thought for Sufficient Statistics, the Bernoulli distribution would not need to be sampled, it is assumed to have enough data, as a GLM?

@saraelshafie 3 ай бұрын

how do i get the updated lecture notes? it gives 404 page on the standford website

@finnfinn2002 Ай бұрын

cs229.stanford.edu/main_notes.pdf

@closingtheloop2593 4 ай бұрын

Him writing an expression saying " sum of class triangle, square, circle" is comedic gold. I died. 1:21:00

@MAS-cz4mf 4 ай бұрын

at 9:14 it should be theta transpose that is perpendicular to line

@OK-lj5zc 6 ай бұрын

omg this course Lectures 1-3: easy breezy positivity with Andrew Ng Lecture 4: getting hit in the head with a textbook hope it doesn't keep escalating like this...

@user-ut3fk8gw3t 4 ай бұрын

At time 26:36 in the video you made a mistake in the expression of phi where the numerator should be e to the power phi and not 1. just do cross product fraction to check it.

@projectbravery Жыл бұрын

This was really good, it did take a while compared to the previous lectures with Andrew

@calebvantassel1936 Жыл бұрын

How? It's duration is within five minutes of the other lectures.

@fahyen6557 Жыл бұрын

@@calebvantassel1936 what the hell r u talkign about

@UsmanKhan-tb5zy 9 ай бұрын

@@fahyen6557 the duration of the lecture is almost the same

@ian-haggerty 2 ай бұрын

Does anyone have the problem sets available?

@Minato-gn1tz Жыл бұрын

after the probablistic interpretation topic in the last lecture everything has just went over my head can anyone pls tell me a good resource to learn statistics and probability of this level ?

@TusharAnandfg 10 ай бұрын

lookup some books used in first year probability and statistic courses.

@anubhavkumarc 5 ай бұрын

It's a Masters level course so it makes sense that it assumes you know undergrad statistics. Probably go through some undergrad stats courses

@anubhavkumarc 5 ай бұрын

Will add they have prerequisites on the course website with specific courses mentioned so go through that, that'll help.

@rahulpadhy6325 Жыл бұрын

What is the practical use of the properties explained in exponential families?

@fahyen6557 Жыл бұрын

finding the expected value and variance. duh

@anubhavkumarc 5 ай бұрын

Optimising the likelihood is much easier in exponential families (that is you train the model more easily), expectation (that is our hypothesis/prediction) and variance are also much easier found computationally (because derivatives in general are less computationally expensive than integrals).

@harshagarwal2517 11 күн бұрын

can anyone please send the problem sheets as i am unable to login through piazza. Thanks for your help.

@vikrantkhedkar6451 10 ай бұрын

How can i get problem sets

@adityak7144 9 ай бұрын

27:18 Wouldn't the "a(n)" function (log-partition) be log(1+e^-n) + n, instead of just log(1+e^-n)?

@ShaluSarojKumar 9 ай бұрын

exactly my question! 😵‍💫

@dsazz801 Жыл бұрын

A great lecture! Thank you!

@nanunsaram Жыл бұрын

Thank you!

@marcoreichel5194 Жыл бұрын

Man, this guy's lecture was really disorganised and confusing. Why didn't he just follow Andrew Ng's notes?

@PingCheng-wf2pb 3 ай бұрын

Disagree! I think this lecture is well-organized and clear. For example, the conclusion of the "learning update rule" is particularly useful but unfortunately miss in Notes. Also, the explanation of "assumptions/design choices" is more clear than Notes, which gives me a more concrete sense.

@gracefulmango1234 9 ай бұрын

lovely

@creativeuser9086 Жыл бұрын

What is h(theta) for the last example of softmax regression?

@rijrya Жыл бұрын

it's the matrix of all c different logits, normalized, i.e. the thing he writes on the board at 1:17:21 search up softmax regression and click the first stanford edu link for a better explanation

@creativeuser9086 Жыл бұрын

@@rijrya correct. I’m also wondering why wouldn’t we train k different binary logistic classifiers instead of the softmax, especially that we can’t train the model to take input that is not in any of the k classes (say we want to classify the input as either dog, cat, mouse, and we input a horse); in a binary classifier it would output 0 for each of the dog, cat and mouse classifiers, but for softmax p(y)=0 which makes the likelihood 0 no matter what, so we can’t train.

@rijrya Жыл бұрын

@@creativeuser9086 it probably comes down to an efficiency issue, as creating a binary classification model for each class would be very inefficient especially as the number of classes increases. also since there would be a lot of redundancy in the data used to train each model i.e. the same data is used multiple times for separate models, I think you might run into overfitting issues. for the example you suggested, I think the solution that still incorporates softmax regression would be to have the classes dog, cat, mouse, and none of the above, then this would classify a horse with better results

@rijrya Жыл бұрын

I searched it up and it seems that k binary classifiers are typically only preferred over softmax when the classes aren’t mutually exclusive, e.g {dog, cat, animal}, in this case softmax would not work very well

@creativeuser9086 Жыл бұрын

@@rijrya I see. Regarding efficiency, there shouldn’t be a difference between K-binary classifiers and softmax since we use the data once to train all k-classifiers in parallel. The number of parameters and gradient computations are the same.

@haoranlee8649 6 ай бұрын

great lectures

@AditiYadav-jm8zc 5 ай бұрын

wowwwww lecture

@priyapandey8951 Жыл бұрын

The course schedule link and syllabus link provided have the notes links that are not working. Can anybody provide the correct link?

@bernarddanice1294 Жыл бұрын

Do you still need it?

@priyapandey8951 Жыл бұрын

@@bernarddanice1294 yes definitely for my exams I need it.

@user-bv4eh6tt9e 11 ай бұрын

@@bernarddanice1294 sorry, do you still have this links?

@harshgupta8936 11 ай бұрын

@@bernarddanice1294 yes if you have

@harshgupta8936 11 ай бұрын

if you get it from somewhere please send it

@akintoyefelix5124 10 ай бұрын

Kind of....

@yong_sung 9 ай бұрын

1:00:35

@guiuismo 4 ай бұрын

What is bro waffling about

@5MrSlavon Ай бұрын

Softmax regression: kzbin.info/www/bejne/n4u3lqmXZbyGirMsi=1cHP27fQImm027xh&t=4101

@kmishy Жыл бұрын

23:02 Sir, It's PMF not PDF

@antonyprinz4744 Жыл бұрын

probability density function= PDF

@kmishy Жыл бұрын

@@antonyprinz4744 Bernoulli distribution is for discrete random variable, PMF is defined for Discrete random variable

@joshmohanty585 5 ай бұрын

The distinction between PMF and PDF is entirely artificial. They are both Radon Nikodym derivatives.

@nabeel123ful 11 ай бұрын

This dude obviously didn't have the same level of the in-depth knowledge on this important topic as his professor, but just wrote down whatever on the lecture note. He couldn't provide insightful comments on what he wrote down, which I guess is mainly because he hadn't done any real-world project/research on the topics thus didn't deeply understand the stuff he was lecturing. I mean, he should know the stuff thus what he lectured was right, but he didn't fully understand it thus his audience would feel confused and bored. Sorry for being too harsh on him, but this topic deserves a much better lecture.

@amulya1284 Жыл бұрын

This lecture sure does dissappoint 😢

@badassopenpolling Жыл бұрын

I disagree from your comment. Professor has done a good job. Explained algorithms very well. You got free lectures from a reputed university who has best people. Show some respect !!

@amulya1284 Жыл бұрын

@@badassopenpolling you are right! I got overwhelmed in the beginning when i commented also the topic he is teaching is very mathematical.........but i personally preferred other profs! This guy knows his stuff but i have seen better explanations to the same content :/🫠

@alienfunbug Жыл бұрын

@@badassopenpolling He's entitled to not being thrilled with the content delivery. Just because it's free, reputable, and in depth doesnt mean there is not room for improvement. I can 100% assure you the instructor would agree, theres always room for growth. A perfect message means nothing if it is not received by its audience.

@creativeuser9086 Жыл бұрын

@@badassopenpolling what is h(theta) for the last example of softmax regression?

@nowornever7990 Жыл бұрын

@@creativeuser9086 thetha is the set of parameters (here it is 2D plane because he has considered two features x1 and x2) which will draw a straight line *thetha1 * x1 + thetha 2 * x2 + constant* (or a plane for n-dimensions). This straight line (or plane) will help us to decide where a point belongs to the one class or not. IF NOT, then if we put the value of a point X (or here x1 and x2) to this plane equation, then the output value will be less than zero.. Else value will be greater than zero if that point is a possible candidate of that class..

@odedgilad9761 Жыл бұрын

there was a student that was asked to write one sentence again ("more bigger"), and it was waste of my time by 10 secend. Do you realize that the total amount of wasting the time of all the viewers in this video is about 45 days O-: this man need to go to jail!

@chinthalaadireddy2165 9 ай бұрын

🤣

@OK-lj5zc 6 ай бұрын

🤣 👏

@vishnumahesh5988 4 ай бұрын

dude seriously? you wasted your valuable 10 seconds by commenting here.

@leeris19 26 күн бұрын

@@vishnumahesh5988 right ? AHAHHAH