Best lecture on word2vec. It covers everything that the papers are ambiguous on the notations and explanations on what to optimize and why.
@autripat8 жыл бұрын
The Skip-gram model discussion starts at 17:20 (we transition away from the "intractable" continuous bag of words model). The Skip-gram training objective is to learn word vector representations that are good at predicting nearby words (context). The GloVe (Global Vectors for Word Representation) model starts ay 54:36.
@niteshroyal308 жыл бұрын
Thanks Professor for such wonderful lecture on word2vec.
@wanminghuang17228 жыл бұрын
Thank you so much. Much easier to understand.
@paolofreuli16867 жыл бұрын
Awesome lecture!
@m.farahmand74408 жыл бұрын
Thanks for the informative lecture. At time 7:26 shouldn't it be gradient ascent? After all we are trying to maximize the likelihood function.
@yangli77416 жыл бұрын
I think 7:26 is just gradient descent, and the guy who reminded that the sigma sign shouldn't exist actually understand it wrong because Prof. Ghodsi may have used confusing notation. In the log-likelihood and the summation over "w", the "w" means every word from the training set (the word as prediction given context "c"); however, when taking the derivative with respect to v_w, the "w" here actually can be any word in the vocabulary and v_w any column of weight matrix W' to be learned. So we should use a different notation, e.g., w*, in the partial derivative in v_w*. Accordingly, the summation over w should exist in the first place, because w* and w are not the same thing. Later removal of the summation in the adjustment rule, w* = w* - r(1-p(w) ) \frac{\partial v_c^T v_w}{\partial w*}, can be seen as changing from GD to SGD. The only reason that the final result didn't go wrong is because the partial derivative with respect to w* when w* eq w is just zero. That is, during the SGD, only v_w is updated.
@cem99276 жыл бұрын
If we have 4 words in the dictionary, we will have 4 v_w values and in the gradient descent update we will update each v_w seperately right ?
@tejasduseja4 жыл бұрын
@@yangli7741 Thanks, I had same confusion in mind.
@imanshojaei77844 жыл бұрын
@@yangli7741 Are not also labels missed in formulation (i.e., empirical probabilities)?
@stolzenable8 жыл бұрын
Thank you for this lecture! It is very understandable. I wonder if the slides from this lecture are available somewhere?
@stolzenable8 жыл бұрын
+Alexey Grigorev with a bit of googling, I found them here: uwaterloo.ca/data-science/deep-learning
@aseefzahir39776 жыл бұрын
says "page not found"
@srujohn6523 жыл бұрын
@Alexey Grigorev Still page is not found
@rajupowers8 жыл бұрын
@7:20 - how can we factor v_c? there is no summation in the right term