Woow never thought Naive Bayes as an online classifier. Dear Professor you rock
@hamedalipour10125 жыл бұрын
I don't have the words to thank you sir
@ahindrakandarpa79103 жыл бұрын
Love your passion for the subject sir. I absolutely enjoy it when you teach.
@rbhambriiit2 жыл бұрын
great teaching style!
@nilslorentzon8635 жыл бұрын
I love your lectures, is there anyway to get the matlab code and name files or maybe post them in the description, would really appreciate it!
@doyourealise3 жыл бұрын
amazing and funny :)
@abhisheksingla22604 жыл бұрын
Hahaha. Great classification at 40:06
@StevenSarasin Жыл бұрын
Another awesome video but the superscript for the product around the 16minute mark is written as m when it should be d, since there are d words to take the product over, and m is the count of the number of words in the email.
@prince8394 жыл бұрын
Can you please tell, What would happen in the case of sentiment analysis when the test set contains a word which has never occurred in the training set, if we use naive Bayes classifier
@kilianweinberger6984 жыл бұрын
Well, typically you just drop it in that case.
@abunapha5 жыл бұрын
Starts at 1:20
@itachi4alltime5 жыл бұрын
11:59 shouldn't it be "d" instead of "m"
@kilianweinberger6985 жыл бұрын
Yes! You are right, alpha ranges from 1 .. d
@KimAnh-de7xb Жыл бұрын
Thank you professor for your awesome lecture. I have a question regarding the gaussian naive classifier: can we use it when the distribution is not approximated well by Gaussian distribution? Wish you the best!
@travohuy2 жыл бұрын
Dear Prof. Weinberger, I have a small question. To derive the formula for \hat{\theta}_{\alpha c} at 15:42, it seems that we implicitly assume emails are independent of each other. Hence implicitly, we construct one Bag-of-word for all spammed emails and one Bag-of-word for all non-spammed emails, (not one bag for each email) then estimate theta from there. In other words, there are two assumptions: - the orders of words in each spammed email do not matter. - How they distribute in the spammed emails do not matter (i.e. they can appear a lot in one email, and non in others, or equally appear in each email). Does my thinking make sense? Thanks.
@xiaoweidu46673 жыл бұрын
Algorithms are only about assumptions assumptions and damn assumptions. Assumptions are philosophical in nature.
@user-kf9tp2qv9j3 жыл бұрын
Hi, Kilian,for the cases that NB doesn’t perfectly do classification in the end of the video , what would happen if both x and o are draw from multivariable Gaussian distributions which two dimensions are correlated, that actually break the independent assumption, would it still work? Best wishes and waiting for your opinion.
@kilianweinberger6983 жыл бұрын
Yes, it would. However, NB works surprisingly well even if this assumption is broken. In practice it might just over-weight those correlated features. As an example, assume I classify news articles by topic. Given the topic/label is "politics" the words "President" and "Biden" are both likely, but they are not independent. NB would assume they are conditionally independent and say that given that both words are present, it is now extremely likely that the label is "politics". However in practice, given that you have "President" in your text, the word "Biden" won't add much information, because they are so correlated, so NB is over-confident.
@user-kf9tp2qv9j3 жыл бұрын
@@kilianweinberger698 wow, it’s amazing, thanks for your advice
@yizhiwang96323 жыл бұрын
Hi professor, I have a question regarding to the Gaussian Naive Bayes. In the lecture you use the pdf of a estimated Gaussian distribution to represent P( X | y =c). Isn't the probability of X takes any value x equals to 0 if X follows a continuous distribution? Are we using pdf here simply because we just want to know what P( X | y =c) is proportional to?
@kilianweinberger6983 жыл бұрын
Yes, sorry, I was probably a little sloppy there. For discrete distributions, Maximum likelihood maximizes the probability of the data. For continuous distributions, it maximizes the density of the data (the probability is always 0). The math is exactly the same, but the terminology has to change ...
@user-or7ji5hv8y3 жыл бұрын
I wonder if the notations can be made easier to follow.
@joonho04 жыл бұрын
If we use a hash function rather than a pre-defined dictionary, the smoothing 'd' should be infinite. How should we handle this in practice?
@BrunsterCoelho4 жыл бұрын
I guess you don't need the smoothing if you have the hash function (at least you don't need the smoothing for dealing with unseen words - you might still want it due do data acquisition/counts reason)
@jiviteshsharma10214 жыл бұрын
Hi kilian for your projects do the students build their own models and classifiers using the actual math and formulas or are they allowed to use libraries with inbuilt models such as sklearn?
@kilianweinberger6984 жыл бұрын
They are not allowed to use sklearn, just basic data structures / commands from numpy. My experience is that until you implement an algorithm yourself from ground up, you don’t really understand it... (The assignments do guide them through it, though.)
@jiviteshsharma10214 жыл бұрын
@@kilianweinberger698 Thank you professor, I will be implementing them from scratch the same now :))
@jiviteshsharma10214 жыл бұрын
@@kilianweinberger698 One last thing professor, just to clarify by the assignments you mean the homeworks right? Thank you
@ayushmalik70932 жыл бұрын
hi Professor, after removing stopwords can we use naive bayes probabilities for result interpretation?
@kilianweinberger6982 жыл бұрын
Yes, the NB probabilities give you a reasonable explanation why a classifier made a certain prediction. It can also reveal how NB over-emphasizes correlated features (because of its class conditional independence assumption that is not met). E.g. if it classifies a document as being about politics, it may reach this conclusion because the three words “President Joe Biden” are all in the document, and all three are predictive of politics - ignoring the fact that they are highly correlated and “Biden” is often surrounded by “President” even in non-political documents.
@esakkiponraj.e52243 жыл бұрын
Hallo Kilian, could you let me know, how can I use NB, if my data contains both categorical & continuous features ??
@kilianweinberger6983 жыл бұрын
Yes, you just need to use different distributions to model these different dimensions.
@esakkiponraj.e52243 жыл бұрын
@@kilianweinberger698 Thanks for your reply. After modelling with different distributions, how one can able to combine them as single model which can be used for prediction ? I found some SO answers related to the question and some were suggest to multiply the resulted predicted probabality values between different models. Here, i cannot able to understand the multiplication of the predicted prob ? Could you please explain, is this right & if it is could you explain the real reason ?
@omalve9454 Жыл бұрын
Greetings Professor! What features did you use for the gender classifier?
@jachawkvr4 жыл бұрын
How do we decide if naive bayes is a good choice given a dataset? The algorithm seems to work well even if the assumption does not hold, so testing the assumption doesn't really help us decide this.
@kilianweinberger6984 жыл бұрын
Good question! If the naive Bayes assumption doesn't hold, the classifier typically still does well when the following condition holds: features indicative towards one class, stay indicative towards that class, independent of all the other features given in the instance. Imagine a spam vs non-spam email classifier. Naive Bayes assumes that certain words are more likely given that the email is spam vs non-spam. i.e. these words are then indicative towards spam and their log(P(word|spam)) are positive (if class spam=+1). At the end when we do the classification we sum up the log probabilities log(P(word |y)) for each y (spam / not-spam). Because the Naive Bayes doesn't actually hold, these log-probabilities may be a little too high or a little too low, but it is unlikely that a spammy word suddenly is estimated as indicative towards not-spam. The classifier breaks when features interact in a way that makes them change which class they support, i.e. if word W together with word V is highly indicative towards spam, but word W without word V is highly indicative towards non-spam. You can actually test that for pairs of words empirically. For larger sets it quickly becomes infeasible to check. Hope this helps.
@jachawkvr4 жыл бұрын
Thank you so much for explaining this! This helps me understand the naive bayes algorithm a little better as well.
@anantbansal59015 ай бұрын
@@kilianweinberger698 'these words are then indicative towards spam and their log(P(word|spam)) are positive (if class spam=+1)' how can log(P(word|spam)) be positive? assuming it is a natural log and noting that P(.) is always less than equal to 1?
@anantbansal59015 ай бұрын
oh i guess I got the catch here, it shall be log(P(word|spam)/P(word|not spam)), right?
@bharasiva964 жыл бұрын
At 32:00 shouldn't the summation from 1 to n_c? I don't follow what n here represents.
@kilianweinberger6984 жыл бұрын
Yes that should be n_c. Thanks for pointing it out!
@coffeenerd4932 Жыл бұрын
@@kilianweinberger698 Isn't your original summation from 1..n correct, because the indicator function filters out all elements from other classes? If we go from 1..n_c, the index would no longer match up with the training examples...