Introduction to NLP | GloVe Model Explained

Рет қаралды 69,088

Normalized Nerd

Күн бұрын

Пікірлер: 99

@sashimidimsums 3 жыл бұрын

Just wanna say that your explanations are awesome. Really helped me understand NLP better than reading a book.

@NormalizedNerd 3 жыл бұрын

Thanks!! :D

@javitxuaxtrimaxion4526 3 жыл бұрын

Awesome video!! I've just arrived here after reading the GloVe paper and your explanation is utterly perfect. I'll sure come back to your channel whenever I find some doubts about Machine Learning or NLP. God job!

@alh7839 2 жыл бұрын

man your video is great ! best explanation on the whole internet !

@addoul99 2 жыл бұрын

Fantastic summary of the paper. I just read it and I am pleasantly surprised at how much of the paper's math you covered in detail in this vdeo! Great

@sachinsarathe1143 3 жыл бұрын

Very Nicely Explained Buddy .... I was going through many articles but was not able to understand the Math behind it. Your video certainly helped. Keep up the Good Work.

@NormalizedNerd 3 жыл бұрын

Happy to help man!

@revolutionarydefeatism 3 жыл бұрын

Perfect! Thanks, there are not much useful videos on KZbin.

@sasna8800 3 жыл бұрын

This is the best explanation I have seen for Glove thank you a million time

@NormalizedNerd 3 жыл бұрын

❤️❤️

@riskygamiing 3 жыл бұрын

I was reading the paper and somewhat struggling on what certain parts of the derivation were or why we needed them but this video is great. Thanks so much

@bhargavaiitb 4 жыл бұрын

Thanks for the explanation. Feels like you explained better than the paper itself.

@NormalizedNerd 4 жыл бұрын

Thanks a lot man!!

@kindaeasy9797 4 ай бұрын

10:48 no we don't have vector on one side of the equation , we have scaler values on the both the sides, basic math

@popamaji Жыл бұрын

this is excellent but I hope u had mentioned the training steps of that also. what and in what shape are exactly the input and output tensor.

@vitalymegabyte 3 жыл бұрын

Guy, thank you very much, it was fucking masterpiece, that did my 22 minutes on railway station really profitable :)

@kavinvignesh2832 4 ай бұрын

based on what algorithm or model the glove model is trained using cost function? Linear Regression?

@parukadli Жыл бұрын

Is embedding for a word is fixed in Glove or it is generated every time depending on the dataset given for training the model

@dodoblasters 3 жыл бұрын

5:50 2+1+1=3?

@karimdandachi9200 2 жыл бұрын

he meant 4

@parukadli 4 жыл бұрын

Nice explanation.. .. which is better Glove or Word2vec?

@NormalizedNerd 4 жыл бұрын

That depends on the dataset. I recommend trying both.

@arunimachakraborty1175 3 жыл бұрын

Very good explanation. Thank you :)

@NormalizedNerd 3 жыл бұрын

Thanks a lot!

@ijeffking 4 жыл бұрын

Very well explained. Keep it up! Thank you.

@NormalizedNerd 4 жыл бұрын

Thank you more videos are coming :)

@ijeffking 4 жыл бұрын

@@NormalizedNerd looking forward to......

@sarsoura716 3 жыл бұрын

Good video, thanks for your efforts. I wish it had less explanation on the cost function of the GloVe model and elaborate testing of word similarity using GloVe model.

@NormalizedNerd 3 жыл бұрын

You can copy the code and test it more ;)

@sujeevan9047 3 жыл бұрын

can you do a video on the Bert word embedding model??? it is also important

@edwardrouth 4 жыл бұрын

Nice Work ! Just subscribed (y). :) Just a quick question out of curiosity "GloVe" and "Poincare GloVe" are same model ? All the best for your channel.

@NormalizedNerd 4 жыл бұрын

Thank you, man! No, they are different. Poincare GloVe is a more advanced approach. In normal GloVe, the words are embedded in Euclidean space. But in Poincare GloVe, the words are embedded in hyperbolic space! Although the latter one uses the basic concepts of the original GloVe.

@edwardrouth 4 жыл бұрын

@@NormalizedNerd Its total worth subscribing your channel. Looking forward for new videos from you on DS. Btw, i am also from West Bengal currently in Germany ;)

@NormalizedNerd 4 жыл бұрын

@@edwardrouth Oh great! Nice to meet you. More interesting videos are coming ❤️

@khadidjatahri7428 3 жыл бұрын

thanks for this well explained video. I have one question, please can you explain why do you take only the numerator portion F(w_i.w_k) and ignoring the denominator?

@revolutionarydefeatism 3 жыл бұрын

You can take the denominator instead! We need just one of them.

@bhrzali 3 жыл бұрын

Wonderful explanation! Just a question. Why do we calculate the ratio p(k|ice)/p(k|steam)?

@NormalizedNerd 3 жыл бұрын

The ratio is better at distinguishing relevant words from irrelevant words than the probabilities. And it also discriminates between relevant words. If we didn't take the ratio and work with raw probabilities then the numbers would be too small.

@83vbond 4 жыл бұрын

Good explanation. Got too technical for me after the middle, but then the code and the graph clarified things. Just one thing: you keep calling the pipe | symbol as 'slash', "j slash i", "k slash ice" etc, which isn't accurate (I think you would know it if you have studied all this). It's better to use 'given', "j given i" as it's actually said, or just say 'pipe' after explaining the first time that this is what the symbol is called. 'slash' is used to mean division, and also to mean 'one or the other', neither of which is applicable here, and the symbol isn't slash anyway. This can cause confusion for some viewers.

@NormalizedNerd 4 жыл бұрын

Yes, pipe would be a better choice.

@jibbygonewrong2458 3 жыл бұрын

It's Bayes. Anyone exposed to stats understands w/o the verbiage.

@TNTsundar 3 жыл бұрын

You should read that as “probability of i GIVEN j”. The pipe symbol is read as ‘given’.

@maximuskumar502 4 жыл бұрын

Nice explanation 👍, one quick question on your video, which software and hardware are you using for digital board?

@NormalizedNerd 4 жыл бұрын

Thank you. I use Microsoft OneNote and a basic pen tablet. Keep supporting!

@longhoang5137 3 жыл бұрын

i laughed when you said 2+1+1=3 xD

@NormalizedNerd 3 жыл бұрын

LOL XD

@alh7839 2 жыл бұрын

i was looking for the comment ^^

@gosleeeeep 2 жыл бұрын

same here lol

@rumaisalateef784 4 жыл бұрын

beautifully explained, thank you!

@NormalizedNerd 4 жыл бұрын

Happy to hear. Keep supporting :D

@trieunguyenhai49 4 жыл бұрын

thank you so much, but is X_{love} equal to 4 not 3

@NormalizedNerd 4 жыл бұрын

@TRIỀU NGUYỄN HẢI Thanks for pointing this out. Yes X_{love} = 4.

@ToniSkit 7 ай бұрын

This was great

@momona4170 3 жыл бұрын

I still don't quite understand the part where ln(X_i) was absorbed by biases, please enlighten me.

@Sarah-ik8tt 3 жыл бұрын

hello thank you for your explanation can you please link me the google collab link asap?

@SAINIVEDH 4 жыл бұрын

@ 19:13. That is a weighting function beacuse log(X_ij) may become zero and the equ.. goes crazy. More details at towardsdatascience.com/light-on-math-ml-intuitive-guide-to-understanding-glove-embeddings-b13b4f19c010

@NormalizedNerd 4 жыл бұрын

The article says f(X_ij) prevents log(X_ij) from being NaN which is not true. f(X_ij) actually puts an upper limit on co-occurrence frequencies.

@robinshort6430 2 жыл бұрын

Often Xij is zero, and in this cases ln(Xij) is infinity. How do you treat this issue?

@NormalizedNerd 2 жыл бұрын

Good point. So, here's how they tackled the problem. They defined the weighing function f like this: f(X_ij) = (X_ij/X_max)^alpha [if X_ij < X_max] 1 [otherwise] So you see when X_ij = 0, f(X_ij) is 0. That means the whole cost term becomes 0. We don't even need to compute ln(X_ij) in this case. They addressed two problems with f. 1) not giving too much importance to the word pairs that cooccur frequently. 2) avoiding ln(0) I hope this makes sense. Please tell me if anything is not clear.

@robinshort6430 2 жыл бұрын

@Normalized Nerd This is true only assuming that zero times infinity is zero! Just kidding, I just want to point out that programming zero times infinity gives (rightly) an error (on numpy), so I have to write this as an if condition. Everything else is clear, thank you very much for your great work and for your answer!

@robinshort6430 2 жыл бұрын

@@NormalizedNerd is X_max an hyper parameter?

@WahranRai 2 жыл бұрын

Your examples are not related : I love NLP... and P(k/ice) etc It will be useful to have the same sentences ...

@md.tarekhasan2206 3 жыл бұрын

Can you please make videos on ELMo, fasttext, and BERT also? It'll be helpful.

@NormalizedNerd 3 жыл бұрын

I'll try in the future :)

@psic-protosysintegratedcyb2422 4 жыл бұрын

Good introduction!

@NormalizedNerd 4 жыл бұрын

Glad it was helpful!

@fezkhanna6900 4 жыл бұрын

Fantastic video

@NormalizedNerd 4 жыл бұрын

Thanks!

@CodeAshing 4 жыл бұрын

Bruh you explained well

@NormalizedNerd 4 жыл бұрын

Thanks man!!

@kindaeasy9797 4 ай бұрын

well i think by corpus you mean document , but lemme tell you corpus has repeated words as well , to form corpus you just join all the documents

@Nextalia 3 жыл бұрын

I fail to see where the vectors come from... :-( I follow all the explanation without any problem, but... once you define J, where are the vectors coming from? Is there any neural network involved? Same problem when reading the article or any other explanations. They all try to explain where that J function comes, and then, magically, we have vectors we can compare to each other :-( Any help on that would be greatly appreciated. Thanks!

@NormalizedNerd 3 жыл бұрын

The authors introduced the word vectors very subtly. Here's the deal: 9:50, we assume that there exists a function F which takes the word vectors and produces a scalar quantity! And no, we don't have neural networks here. Everything is based on the concurrence matrix.

@Nextalia 3 жыл бұрын

@@NormalizedNerd Thanks for your answer. I found a publication that explains very well what to do after "discovering" that function: thesis.eur.nl/pub/47697/Verstegen.pdf I was somehow sure that GloVe was based in neural networks (as does word2vec), but it is not the case. However, it is a bit as a neural network since the way the vectors are created is similar to the way the weights of a NN are trained: stochastic gradient descent.

@SwapravaNath 2 жыл бұрын

The vectors are actually the parameters that one is optimizing over. Actually, the objective function J should have been written with the arguments being the vector representations of the words -- which are the optimization variable. For certain choices of the F function, e.g., softmax, the optimization becomes mathematically easy. And then, it is just a multivariable optimization problem, and a natural algorithm to solve will be gradient-descent (and more). Ref: kzbin.info/www/bejne/e4PMk6qnqJ6jaZo [Stanford course on NLP]

@ccuuttww Жыл бұрын

p(Love , I ) = 2/3 ?

@TheR4Z0R996 4 жыл бұрын

Great explanation thanks a lot my friend :)

@NormalizedNerd 4 жыл бұрын

Glad that it helped :D...keep supporting!

@eljangoolak 2 жыл бұрын

quackuarance metrics? I don't understand what that is

@bikideka7880 4 жыл бұрын

good explanation but plz use a bigger cursor, a lot of youtubers miss this.

@NormalizedNerd 4 жыл бұрын

thanks for the suggestion :D

@kekecoo5681 4 жыл бұрын

where did e came from?

@NormalizedNerd 4 жыл бұрын

e^x follows our condition. e^(a-b) = e^a/e^b

@u_luana.j 3 жыл бұрын

5:50 ..?

@sakibahmed2373 4 жыл бұрын

Hello There, First of all thank you for adding such informative videos to help the beginners in DS field. I am trying to reproduce the code from Github for the "standford Glove Model" Link ---> github.com/stanfordnlp/GloVe The problem is if i execute all the statements as mentioned in the "Readme" i get the respective files which it should provide me "cooccur.bin" & "vocab.txt". The latter does have the list of words with frequency but the former is empty and there is no such error reported in the console even. For me its very weird and i dont understand what i am doing wrong. Could you please help me on this ? N.B : I am new in ML and still learning ! Best Regards.

@NormalizedNerd 4 жыл бұрын

"cooccurrence.bin" should contain the word vectors. Make sure that the training actually started. You should see logs like... vector size: 50 vocab size: 71290 x_max: 10.000000 alpha: 0.750000 05/08/20 - 06:02.16AM, iter: 001, cost: 0.071222 05/08/20 - 06:02.45AM, iter: 002, cost: 0.052683 05/08/20 - 06:03.14AM, iter: 003, cost: 0.046717 ... I'd suggest you to try this on google colab once.

@sakibahmed2373 4 жыл бұрын

@@NormalizedNerd Hi, Thank you for your response. I never tried colab before. But what i noticed in colab is that i have to upload notebook files which i cant see in the glove project that i cloned. However I am using an online editor "repl.it". First i ran "make" command which created the "build" folder & subsequently "./demo.sh". Running this script creates a "cooccurence.bin" file but as i mentioned earlier its empty. Did i missed something here ? I am sure i missing something very small and important 😒 Below are the logs from the terminal..  make mkdir -p build gcc -c src/vocab_count.c -o build/vocab_count.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic gcc -c src/cooccur.c -o build/cooccur.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic src/cooccur.c: In function ‘merge_files’: src/cooccur.c:180:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] fread(&new, sizeof(CREC), 1, fid[i]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/cooccur.c:190:5: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] fread(&new, sizeof(CREC), 1, fid[i]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/cooccur.c:203:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] fread(&new, sizeof(CREC), 1, fid[i]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gcc -c src/shuffle.c -o build/shuffle.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic src/shuffle.c: In function ‘shuffle_merge’: src/shuffle.c:96:17: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] fread(&array[i], sizeof(CREC), 1, fid[j]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/shuffle.c: In function ‘shuffle_by_chunks’: src/shuffle.c:161:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] fread(&array[i], sizeof(CREC), 1, fin); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gcc -c src/glove.c -o build/glove.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic src/glove.c: In function ‘load_init_file’: src/glove.c:86:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] fread(&array[a], sizeof(real), 1, fin); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ src/glove.c: In function ‘glove_thread’: src/glove.c:182:9: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result [-Wunused-result] fread(&cr, sizeof(CREC), 1, fin); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ gcc -c src/common.c -o build/common.o -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic gcc build/vocab_count.o build/common.o -o build/vocab_count -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic gcc build/cooccur.o build/common.o -o build/cooccur -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic gcc build/shuffle.o build/common.o -o build/shuffle -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic gcc build/glove.o build/common.o -o build/glove -lm -pthread -O3 -march=native -funroll-loops -Wall -Wextra -Wpedantic  ./demo.sh mkdir -p build --2020-05-08 17:04:13-- mattmahoney.net/dc/text8.zip Resolving mattmahoney.net (mattmahoney.net)... 67.195.197.75 Connecting to mattmahoney.net (mattmahoney.net)|67.195.197.75|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 31344016 (30M) [application/zip] Saving to: ‘text8.zip’ text8.zip 100%[======>] 29.89M 1.97MB/s in 15s 2020-05-08 17:04:29 (1.95 MB/s) - ‘text8.zip’ saved [31344016/31344016] Archive: text8.zip inflating: text8 $ build/vocab_count -min-count 5 -verbose 2 < text8 > vocab.txt BUILDING VOCABULARY Processed 17005207 tokens. Counted 253854 unique words. Truncating vocabulary at min count 5. Using vocabulary of size 71290. $ build/cooccur -memory 4.0 -vocab-file vocab.txt -verbose 2 -window-size 15 < text8 > cooccurrence.bin COUNTING COOCCURRENCES window size: 15 context: symmetric max product: 13752509 overflow length: 38028356 Reading vocab from file "vocab.txt"...loaded 71290 words. Building lookup table...table contains 94990279 elements. Processing token: 200000./demo.sh: line 43: 114 Killed $BUILDDIR/cooccur -memory $MEMORY -vocab-file $VOCAB_FILE -verbose $VERBOSE -window-size $WINDOW_SIZE < $CORPUS > $COOCCURRENCE_FILE

@NormalizedNerd 4 жыл бұрын

@Sakib Ahmed repl is probably not a good idea for DL stuffs. Try to use colab/kaggle. You can directly clone the github repo in colab. I've created a colab notebook. Run this by yourself. It works perfectly! colab.research.google.com/drive/1BA-GRHQOsXrYwmkalQyejsnVE8zmoyH2?usp=sharing

@sakibahmed2373 4 жыл бұрын

@@NormalizedNerd Thank you so much ! It really worked... 😊 (y)

@NormalizedNerd 4 жыл бұрын

@@sakibahmed2373 Do share this channel with your friends :D Enjoy machine learning.

@atomic7680 4 жыл бұрын

G-Love 😂

@NormalizedNerd 4 жыл бұрын

Haha...Exactly what I thought when I learned the word for the first time!

@BloggerMnpr Жыл бұрын

@TheMurasaki1 4 жыл бұрын

"I love to make videos" sorry to say this, but is it correct english?

@kaustavdatta4748 4 жыл бұрын

Not the best English. But the model doesn't care as it will learn whatever you (or the dataset) teach it. The author's English doesn't impact the explanation of the model's workings.