Understanding Word2Vec

  Рет қаралды 77,579

Jordan Boyd-Graber

Jordan Boyd-Graber

Күн бұрын

Пікірлер: 63
@bodenseeboys
@bodenseeboys 4 жыл бұрын
orange sweater over orange polo - my man is rocking the full lobster swagger
@JordanBoydGraber
@JordanBoydGraber 4 жыл бұрын
It works well with my green screen. Plus, it is the school color for both Caltech and Princeton (so showing my school pride).
@navneethegde5999
@navneethegde5999 4 жыл бұрын
Nice presentation, perfect blend of pace, voice quality and slide data. Information is not repeated unnecessarily.
@exxzxxe
@exxzxxe 4 жыл бұрын
Exceptionally well done. Thank you!
@mohammadsalah2307
@mohammadsalah2307 4 жыл бұрын
Best explanation ever watch; much better than Stanford lecture in my opinion.
@JordanBoydGraber
@JordanBoydGraber 4 жыл бұрын
Thanks! That's high praise. Chris and Dan know much more than I do, but I like to think that my ignorance helps me sometimes explain things better, because I know what confuses people (from experience).
@dipaco_
@dipaco_ 8 ай бұрын
This is an amazing video. Very intuitive. Thank you.
@alecrobinson7124
@alecrobinson7124 4 жыл бұрын
Good god, it's nice to watch an informative video not done in the style of Siraj.
@JordanBoydGraber
@JordanBoydGraber 4 жыл бұрын
I've been making ML KZbin videos long before Siraj ...
@alecrobinson7124
@alecrobinson7124 4 жыл бұрын
@@JordanBoydGraber Touche, very true. Siraj should have copied yours, then.
@wahabfiles6260
@wahabfiles6260 3 жыл бұрын
@@alecrobinson7124 Siraj just pretends! His videos are not informative
@trexmidnite
@trexmidnite 3 жыл бұрын
That numbers is nothing but a particular vector..
@hiepnguyen034
@hiepnguyen034 5 жыл бұрын
best word2vec explanation I have seen so far
@taylorsmurphy
@taylorsmurphy 5 жыл бұрын
I can't believe I already watched all these videos somehow. Oh wait, there's a partial red bar on the bottom of most thumbnails for some reason. 😋
@JordanBoydGraber
@JordanBoydGraber 4 жыл бұрын
I know. KZbin added this feature after I adopted my Beamer template. And impossible to fix on old videos.
@mahdiamrollahi8456
@mahdiamrollahi8456 3 жыл бұрын
Great explanation of W2V especially NS...
@junmeizhong9526
@junmeizhong9526 4 жыл бұрын
For the negative sampling, the negative examples are word pairs with the same focus word for a number of noisy context words randomly sampled. But here it is done in a reverse way. Please let me know if the two ways are the same or it is a mistake here.
@Han-ve8uh
@Han-ve8uh 3 жыл бұрын
At 11:00, what does "Features" and "Evidence" refer to? How is that formula similar to logistic regression? (I was expecting some e^()/1+e^() on the RHS). In the same formula, what does c' refer to? Is it all the words that are NOT in the context of a particular word w? How did this formula become the 6 sigmoids at 12:00?
@JordanBoydGraber
@JordanBoydGraber 3 жыл бұрын
1) The sigma function encodes the exponential function that you're looking for 2) The features and evidence are word and context vectors 3) c' are the negative samples 4) This akin to the positive examples in logistic regression, while c' is like the negative examples
@Han-ve8uh
@Han-ve8uh 3 жыл бұрын
@@JordanBoydGraber For 3) Aren't the negative samples the focus word as shown at 12:30? I'm confused because sometimes the negative sample is context word and sometimes focus word. Does this depend on whether CBOW or skipgram is used? (like negative sampling CBOW means negative the focus word and negative sampling skipgram means negative the context words).
@ruizhenmai1194
@ruizhenmai1194 5 жыл бұрын
On 3:42 similarities should be |V| x 1 if multiplying Wv^T that way
@xruan6582
@xruan6582 4 жыл бұрын
I totally agree with you. We should avoid such casual expressions, which could be very misleading in a more complex scenario.
@navneethegde5999
@navneethegde5999 4 жыл бұрын
I think it can be represented in both ways, column or row vector. However I think row vector is more efficient to store in memory
@coc2912
@coc2912 Жыл бұрын
Your video helps me a lot.
@alayshah1995
@alayshah1995 4 жыл бұрын
Richard Hendricks from Pied Pieper? Yes!
@cu7695
@cu7695 5 жыл бұрын
Nice explanation of NLP terms. I would like to learn more in terms of probability distribution and it's effect on some real data set.
@xruan6582
@xruan6582 4 жыл бұрын
10:13 should the first equation be p(c|w; θ) rather than log(p(c|w; θ)) ?
@JordanBoydGraber
@JordanBoydGraber 4 жыл бұрын
Yes, that's right. Sorry!
@vinayreddy8683
@vinayreddy8683 4 жыл бұрын
I'm still confused about n-gram model and skip-ngram model. Did he made any mistake or I'm confused? Basically, n-gram models uses n-1 words to predict nth word, so it means its somehow using context words wo predict target word(n). Here in this video he said skip-ngram uses target word(focus) to predict context words. They both contradict each other!!! Any experts opinion on this is highly appreciated.
@amarnathjagatap2339
@amarnathjagatap2339 4 жыл бұрын
Ultimate reeeeee baba
@amarnathjagatap2339
@amarnathjagatap2339 4 жыл бұрын
like thoko re baba
@AlysiaLi-f9u
@AlysiaLi-f9u Жыл бұрын
Nice explanation and thank you!
@leliaglass1568
@leliaglass1568 5 жыл бұрын
thank you for the video! Very helpful!
@DebangaRajNeog
@DebangaRajNeog 4 жыл бұрын
Great explanation!
@JordanBoydGraber
@JordanBoydGraber 3 жыл бұрын
On the slide numbered 16, the sum should be over f(w'), not f(w)
@gabrield801
@gabrield801 4 жыл бұрын
Ignoring the negative samples, why do we need to optimize by gradient descent of dot products rather than merely counting the occurrence of context words for each occurrence of each focus word in the training data? (and then normalizing)
@JordanBoydGraber
@JordanBoydGraber 4 жыл бұрын
That's a great question! What you're proposing is essentially PMI, which word2vec is an approximation of (projected into a lower dimension). word2vec is throwing some information away through this projection, but it seems to help.
@gabrield801
@gabrield801 4 жыл бұрын
@@JordanBoydGraber I see, it's a lower dimension because you simply initialize random vectors (of arbitrary, lower length) and consider dot products, rather than having a (# of words)-long vector for each word. Thanks a ton!
@zahrash7864
@zahrash7864 2 жыл бұрын
what is the sigmoid sum on W.c used for ? don't we need just the softmax on every row of the C.W matrix?
@JordanBoydGraber
@JordanBoydGraber 2 жыл бұрын
But a word has multiple words in the context, we need to consider each words' effect
@compilationsmania451
@compilationsmania451 4 жыл бұрын
10:20 in the probability function, you're using exp vc.vw. But, didn't you say that the context and focus word have different vectors? Then why are we choosing the context and focus words from the same vector v?
@JordanBoydGraber
@JordanBoydGraber 4 жыл бұрын
@michael jo That's right! The "v" means that it's for the same word type (e.g., "dog") but from two different matrices.
@wilfredomartel7781
@wilfredomartel7781 2 ай бұрын
🎉
@BrunoCPunto
@BrunoCPunto 3 жыл бұрын
Great explanation
@hgkjhjhjkhjk7270
@hgkjhjhjkhjk7270 5 жыл бұрын
Upload more stuff your videos are good
@sinaubarengari
@sinaubarengari 4 жыл бұрын
Hi, My name is Ari. i am from Indonesia. can you help me explain about the sent2vec (Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features) model as you make a video about word2vec?
@GoracyKanal
@GoracyKanal 4 жыл бұрын
great explanation
@oleksandrboiko7261
@oleksandrboiko7261 4 жыл бұрын
Red line on the bottom of the thumbnail makes it think you already saw the video, and skip it
@JordanBoydGraber
@JordanBoydGraber 4 жыл бұрын
I know. I recorded the videos before KZbin started doing this ... my new videos won't have this.
@pardisranjbarnoiey6356
@pardisranjbarnoiey6356 5 жыл бұрын
Thank you! But please get rid of that red bar. The thumbnail gets confusing
@JordanBoydGraber
@JordanBoydGraber 5 жыл бұрын
Haha. I never thought about that odd interaction with KZbin. I don't want everyone to think they've watched 2/3 of all of my videos. :)
@JP-re3bc
@JP-re3bc 5 жыл бұрын
It would be helpful if on 9:56 you talked a bit what exactly d means.
@JordanBoydGraber
@JordanBoydGraber 5 жыл бұрын
It's the length of the embedding. It really doesn't mean much other than the size of the representation that you're using. I.e., how complicated your model is going to be.
@cyrilgarcia2485
@cyrilgarcia2485 4 жыл бұрын
Wait, did I miss how the words are vectorized?
@JordanBoydGraber
@JordanBoydGraber 4 жыл бұрын
Each word has a corresponding vector; it's initialized randomly and then updated, as discussed in 13:09
@mdazimulhaque
@mdazimulhaque 5 жыл бұрын
Thank you for the detailed explanation.
@username-notfound9841
@username-notfound9841 3 жыл бұрын
I like the part where you almost said *Bit* correctly. 7:24
@isleofdeath
@isleofdeath 4 жыл бұрын
Apart from some errors (the theta parameter never occurs on the right side on your equations and it is even incorrect, as the "probability" given by exp)=/sum(exp(...)) IS basiclly the theta parameter), worse is that is looks like you copied most of the math from the stanford lecture on NLP and did not even give them credits. BTW, the theta parameter is explained in that lecture...
@JordanBoydGraber
@JordanBoydGraber 11 ай бұрын
I did draw on Yoav Goldberg's lectures (and credited him). I suspect the Stanford folks did the same, but the equations themselves come from the original word2vec paper. Using Theta as a general catchall for parameters of a model is quite common in ML.
@kevin-fs5ue
@kevin-fs5ue 5 жыл бұрын
10:07
@KoltPenny
@KoltPenny 5 жыл бұрын
Really cool videos... but I just can't get out of my head that you sound like the jewish kid in Big Mouth.
Word Embeddings
14:28
macheads101
Рет қаралды 157 М.
Vectoring Words (Word Embeddings) - Computerphile
16:56
Computerphile
Рет қаралды 291 М.
Остановили аттракцион из-за дочки!
00:42
Victoria Portfolio
Рет қаралды 3,5 МЛН
Bike Vs Tricycle Fast Challenge
00:43
Russo
Рет қаралды 94 МЛН
Topic Models: Introduction
14:21
Jordan Boyd-Graber
Рет қаралды 33 М.
CS480/680 Lecture 19: Attention and Transformer Networks
1:22:38
Pascal Poupart
Рет қаралды 347 М.
Word2Vec - Skipgram and CBOW
7:21
The Semicolon
Рет қаралды 185 М.
Ali Ghodsi, Lec 13: Word2Vec Skip-Gram
1:10:01
Data Science Courses
Рет қаралды 25 М.
Lecture 2 | Word Vector Representations: word2vec
1:18:17
Stanford University School of Engineering
Рет қаралды 507 М.
12.1: What is word2vec? - Programming with Text
10:20
The Coding Train
Рет қаралды 233 М.
Simple Deep Neural Networks for Text Classification
14:47
Machine Learning TV
Рет қаралды 116 М.
Word2Vec (introduce and tensorflow implementation)
9:48
Minsuk Heo 허민석
Рет қаралды 123 М.
Остановили аттракцион из-за дочки!
00:42
Victoria Portfolio
Рет қаралды 3,5 МЛН