Word Embedding and Word2Vec, Clearly Explained!!!

Рет қаралды 361,630

Күн бұрын

Пікірлер: 572

@statquest Жыл бұрын

To learn more about Lightning: lightning.ai/ NOTE: A lot of people ask for the math at 13:16 to be clarified. In that example we have 3,000,000 inputs, each connected to 100 activation functions, for a total of 300,000,000 weights on the connections from the inputs to the activation functions. We then have another 300,000,000 weights on the connections from activations functions to the outputs. 300,000,000 + 300,000,000 = 2 * 300,000,000 Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/

@karanacharya18 8 ай бұрын

In simple words, word embeddings is the by-product of training a neural network to predict the next word. By focusing on that single objective, the weights themselves (embeddings) can be used to understand the relationships between the words. This is actually quite fantastic! As always, great video @statquest!

@statquest 8 ай бұрын

bam! :)

@joeybasile545 7 ай бұрын

Not necessarily just the next word. Your statement is specific.

@NoNonsense_01 Жыл бұрын

Probably the most important concept in NLP. Thank you explaining it so simply and rigorously. Your videos are a thing of beauty!

@statquest Жыл бұрын

Wow, thank you!

@exxzxxe 10 ай бұрын

Josh; this is the absolutely clearest and most concise explanation of embeddings on KZbin!

@statquest 10 ай бұрын

Thank you very much!

@davins90 9 ай бұрын

totally agree

@chad5615 Жыл бұрын

Keep up the amazing work (especially the songs) Josh, you're making live easy for thousands of people !

@statquest Жыл бұрын

Wow! Thank you so much for supporting StatQuest! TRIPLE BAM!!!! :)

@rachit7185 Жыл бұрын

This channel is literally the best thing happened to me on youtube! Way too excited for your upcoming video on transformers, attention and LLMs. You're the best Josh ❤

@statquest Жыл бұрын

Wow, thanks!

@MiloLabradoodle Жыл бұрын

Yes, please do a video on transformers. Great channel.

@statquest Жыл бұрын

@@MiloLabradoodle I'm working on the transformers video right now.

@liuzeyu3125 Жыл бұрын

@@statquest Can't wait to see it!

@SergioPolimante 11 ай бұрын

Statquest is by far the best machine learning Chanel on KZbin to learn the basic concepts. Nice job

@statquest 11 ай бұрын

Thank you!

@harin01737 Жыл бұрын

I was struggling to understand NLP and DL concepts, thinking of dropping my classes, and BAM!!! I found you, and now I'm writing a paper on neural program repair using DL techniques.

@statquest Жыл бұрын

BAM! :)

@ashmitgupta8039 6 ай бұрын

Was literally struggling to understand this concept, and then I found this goldmine.

@statquest 6 ай бұрын

Bam! :)

@awaredz007 8 ай бұрын

Wow!! This is the best definition I have ever heard or seen, of word embedding. Right at 09:35. Thanks for the clear and awesome video. You guy rock!!

@statquest 8 ай бұрын

Thanks! :)

@pushkar260 Жыл бұрын

That was quite informative

@statquest Жыл бұрын

BAM! Thank you so much for supporting StatQuest!!! :)

@JawadAhmadCodes 4 ай бұрын

Oh my Gosh, StatQuest is surely the greatest channel I found to learn the whole universe in simple way. WOW!

@statquest 4 ай бұрын

Thank you! :)

@manuelamankwatia6556 8 ай бұрын

This is by far the best video on embeddings. A while university corse is broken down in 15minutes

@statquest 8 ай бұрын

Thanks!

@yuxiangzhang2343 Жыл бұрын

So good!!! This is literally the best deep learning tutorial series I find… after a very long search on the web!

@statquest Жыл бұрын

Thank you! :)

@mannemsaisivadurgaprasad8987 Жыл бұрын

On of the best videos I've seen till now regarding Embeddings.

@statquest Жыл бұрын

Thank you!

@myyoutubechannel2858 4 ай бұрын

In the first 19 seconds my mans explains Word Embedding more simply and elegantly than anything else out there on the internet.

@statquest 4 ай бұрын

Thanks!

@tanbui7569 Жыл бұрын

Damn, when I first learned about this 4 years ago, it took me two days to wrap my head around to understand these weights and embeddings to implement in codes. Just now, I need to refreshe myself the concepts since I have not worked with it in a while and your videos illustrated what I learned (whole 2 days in the past) in just 16 minutes !! I wished this video existed earlier !!

@statquest Жыл бұрын

Thanks!

@TropicalCoder Жыл бұрын

That was the first time I actually understood embeddings - thanks!

@statquest Жыл бұрын

bam! :)

@wizenith Жыл бұрын

haha I love your opening and your teaching style! when we think something is extremely difficult to learn, everything should begin with singing a song, that make a day more beautiful to begin with ( heheh actually I am not just teasing lol, I really like that ) thanks for sharing your thoughts with us

@statquest Жыл бұрын

Thanks!

@acandmishra 9 ай бұрын

your work is extremely amazing and so helpful for new learns who want to go into detail of working of Deep Learning models , instead of just knowing what they do!! Keep it up!

@statquest 9 ай бұрын

Thanks!

@noadsensehere9195 3 ай бұрын

This is the only video I was finding to understand this basic concept for NLP! tHANKS

@statquest 3 ай бұрын

Thanks!

@haj5776 Жыл бұрын

The phrase "similar words will have similar numbers" in the song will stick with me for a long time, thank you!

@statquest Жыл бұрын

bam!

@pichazai 8 ай бұрын

this channel is the best resource of ML in the entire internet

@statquest 8 ай бұрын

Thank you!

@user-wr4yl7tx3w Жыл бұрын

This is the best explanation of word embedding I have come across.

@statquest Жыл бұрын

Thank you very much! :)

@DanielDias-vl2js 4 ай бұрын

Thank goodness I found this channel! You've got great content and an excellent teaching methodology here!

@statquest 4 ай бұрын

Thanks!

@mycotina6438 Жыл бұрын

BAM!! StatQuest never lie, it is indeed super clear!

@statquest Жыл бұрын

Thank you! :)

@flow-saf Жыл бұрын

This video explains the source of the multiple dimensions in a word embedding, in the most simple way. Awesome. :)

@statquest Жыл бұрын

Thanks!

@rathinarajajeyaraj1502 Жыл бұрын

This is one of the best sources of information.... I always find videos a great source of visual stimulation... thank you.... infinite baaaam

@statquest Жыл бұрын

BAM! :)

@exxzxxe 9 ай бұрын

Hopefully everyone following this channel has Josh's book. It is quite excellent!

@statquest 9 ай бұрын

Thanks for that!

@dreamdrifter Жыл бұрын

Thank you Josh, this is something I've been meaning to wrap my head around for a while and you explained it so clearly!

@statquest Жыл бұрын

Glad it was helpful!

@mazensaaed8635 5 ай бұрын

I promise I'll be member in your channel when I get my first data science job

@statquest 5 ай бұрын

BAM! Thank you very much! :)

@EZZAHIRREDOUANE 8 ай бұрын

Great presentation, You saved my day after watching several videos, thank you!

@statquest 8 ай бұрын

Glad it helped!

@gustavow5746 Жыл бұрын

the best video I saw about this topic so far. Great Content! Congrats!!

@statquest Жыл бұрын

Wow, thanks!

@ananpinya835 Жыл бұрын

StatQuest is great! I learn a lot from your channel. Thank you very much!

@statquest Жыл бұрын

Glad you enjoy it!

@channel_SV Жыл бұрын

It's so nice to google and realize that there is a StatQuest about your question, when you are certain of that there hadn't been one some time before

@statquest Жыл бұрын

BAM! :)

@muthuaiswaryaaswaminathan4079 Жыл бұрын

Thank you so much for this playlist! Got to learn a lot of things in a very clear manner. TRIPLE BAM!!!

@statquest Жыл бұрын

Thank you! :)

@wellwell8025 Жыл бұрын

Way better than my University slides. Thanks

@statquest Жыл бұрын

Thanks!

@MarvinMendesCabral Жыл бұрын

Hey Josh, i'm a brazilian student and i love to see your videos, it's such a good and fun to watch explanation of every one of the concepts, i just wanted to say thank you, cause in the last few months you made me smile beautiful in the middle of studying, so, thank you!!! (sorry for the bad english hahaha)

@statquest Жыл бұрын

Muito obrigado!!! :)

@alexdamado 5 ай бұрын

Thanks for posting. It is indeed a clear explanation and helped me move forward with my studies.

@statquest 5 ай бұрын

Glad it was helpful!

@FullStackAmigo Жыл бұрын

Absolutely the best explanation that I've found so far! Thanks!

@statquest Жыл бұрын

Thank you! :)

@fouadboutaleb4157 Жыл бұрын

Bro , i have my master degree in ML but trust me you explain it better than my teachers ❤❤❤ Big thanks

@statquest Жыл бұрын

Thank you very much! :)

@RaynerGS Жыл бұрын

I admire your work a lot. Salute from Brazil.

@statquest Жыл бұрын

Muito obrigado! :)

@michaelcheung6290 Жыл бұрын

Thank you statquest!!! Finally I started to understand LSTM

@statquest Жыл бұрын

Hooray! BAM!

@mykolalebid6279 2 ай бұрын

Thank you for your excellent work. A video on negative sampling would be a valuable addition.

@statquest 2 ай бұрын

I'll keep that in mind.

@lfalfa8460 Жыл бұрын

I love all of your songs. You should record a CD!!! 🤣 Thank you very much again and again for the elucidating videos.

@statquest Жыл бұрын

Thanks!

@mamdouhdabjan9292 Жыл бұрын

Hey Josh. A great new series that I, and many others, would be excited to see is bayesian statistics. Would love to watch you explain the intricacies of that branch of stats. Thanks as always for the great content and keep up with the neural-network related videos. They are especially helpful.

@statquest Жыл бұрын

That's definitely on the to-do list.

@mamdouhdabjan9292 Жыл бұрын

@@statquest looking forward to it.

@mahdi132 Жыл бұрын

Thank you sir. Your explanation is great and your work is much appreciated.

@statquest Жыл бұрын

Thanks!

@alfredoderodt6519 Жыл бұрын

You are a beautiful human! Thank you so much for this video! I was finally able to understand this concept! Thanks so much again!!!!!!!!!!!!! :)

@statquest Жыл бұрын

Glad it was helpful!

@ajd3fjf4hsjd3 5 ай бұрын

Fantasticly simple, and complete!

@statquest 5 ай бұрын

Thanks!

@ah89971 Жыл бұрын

When I watched this,I have only one question which is why all the others failed to explain this if they are fully understood the concept?

@statquest Жыл бұрын

bam!

@rudrOwO Жыл бұрын

@@statquest Double Bam!

@meow-mi333 Жыл бұрын

Bam the bam!

@eqe-kui-nei 3 ай бұрын

@@ah89971 A lot people in this industry (even with a phd) actually dont.

@bancolin1005 Жыл бұрын

BAM! Thanks for your video, I finally realize what the negative sampling means ~

@statquest Жыл бұрын

Happy to help!

@familywu3869 Жыл бұрын

Thank you very much for your excellent tutorials! Josh. Here I have a question, at around 13:30 of this video tutorial, you mentioned to multiply by 2. I am not sure why 2? I mean if there are more than 2 outputs, will we multiply the number of output nodes, instead of 2? Thank you for your clarification in advance.

@statquest Жыл бұрын

If we have 3,000,000 words and phrases as inputs, and each input is connected to 100 activation functions, then we have 300,000,000 weights going from the inputs to the activation function. Then from those 100 activation function, we have 3,000,000 outputs (one per word or phrase), each with a weight. So we have 300,000,000 weights on the input side, and 300,000,000 weights on the output side, or a total of 600,000,000 weights. However, since we always have the same number of weights on the input and output sides, we only need to calculate the number of weights on one side and then just multiply that number by 2.

@surojit9625 Жыл бұрын

@@statquest Thanks for explaining! I also had the same question.

@jwilliams8210 Жыл бұрын

Ohhhhhhhhh! I missed that the first time around! BTW: (Stat)Squatch and Norm are right: StatQuest is awesome!!

@ColinTimmins Жыл бұрын

Thank you so much for these videos. It really helps with the visuals because I am dyslexic… Quadruple BAM!!!! lol 😊

@statquest Жыл бұрын

Happy to help!

@natuchips98 5 ай бұрын

You literally saved my life

@statquest 5 ай бұрын

bam! :)

@ramzirebai3661 Жыл бұрын

Thank you so much Mr.Josh Starmer, you are the only one that makes ML concepts easy to understand Can you , please , explain Glove ?

@statquest Жыл бұрын

I'll keep that in mind.

@eamonnik Жыл бұрын

Hey Josh! Loved seeing your talk at BU! Appreciate your videos :)

@statquest Жыл бұрын

Thanks so much! :)

@vpnserver407 Жыл бұрын

highly valuable video and book tutorial, thanks for putting this kind of special tuts out here .

@statquest Жыл бұрын

Glad you liked it!

@aoliveiragomes Жыл бұрын

Thanks!

@statquest Жыл бұрын

BAM!!! Thank you so much for supporting StatQuest!!! :)

@avishkaravishkar1451 Жыл бұрын

For those of you who find it hard to understand this video, my recommendation is to watch it at a slower pace and make notes of the same. It will really make things much more clear.

@statquest Жыл бұрын

0.5 speed bam!!! :)

@m3ow21 Жыл бұрын

I love the way you teach!

@statquest Жыл бұрын

Thanks!

@ericvaish8841 4 ай бұрын

Great explanation my man!!

@statquest 4 ай бұрын

Thank you!

@wenqiangli7544 Жыл бұрын

Great video for explaining word2vec!

@statquest Жыл бұрын

Thanks!

@nouraboub4805 5 ай бұрын

‏‪goood, thenk you so much for this playlist is the best ❤️😍

@statquest 5 ай бұрын

Glad you enjoy it!

@AliShafiei-ui8tn Жыл бұрын

the best channel ever.

@statquest Жыл бұрын

Double bam! :)

@ar_frz Ай бұрын

This was lovely! thank you.

@statquest Ай бұрын

Thanks!

@danish5326 Жыл бұрын

Thanks for enlightening us Master.

@statquest Жыл бұрын

Any time!

@tomoki-v6o Жыл бұрын

My favourite topic its magic. Bam!!

@statquest Жыл бұрын

@LakshyaGupta-ge3wj Жыл бұрын

Absolutely mind blowing and amazing presentation! For the Word2Vec's strategy for increasing context, does it employ the 2 strategies in "addition" to the 1-Output-For-1-Input basic method we talked about in the whole video or are they replacements? Basically, are we still training the model on predicting "is" for "Gymkata" in the same neural network along with predicting "is" for a combination of "Gymkata" and "great"?

@statquest Жыл бұрын

Word2Vec uses one of the two strategies presented at the end of the video.

@lexxynubbers Жыл бұрын

Machine learning explained like Sesame Street is exactly what I need right now.

@statquest Жыл бұрын

bam!

@周子懿-y5r Жыл бұрын

Thank you Josh for this great video. I have a quick question about the Negative Sampling: If we only want to predict A, why do we need to keep the weights for "abandon" instead of just ignoring all the weights except for "A"?

@statquest Жыл бұрын

If we only focused on the weights for "A" and nothing else, then training would cause all of the weights to make every output = 1. In contrast, by adding some outputs that we want to be 0, training is forced to make sure that not every single output gets a 1.

@ishaqpaktinyar7766 11 ай бұрын

you da bessssst, saved me alota time and confusion :..)

@statquest 11 ай бұрын

Thanks!

@akashbarik5806 Ай бұрын

@statquest "5:30" I'm not sure if I'm right, but after researching a bit I found out that the number of activation functions have nothing to do with the number of associations with each word, The number of activation functions depend upon the structure of your neural network, and the number of vector representations solely depend upon how you want to embed the words. In simple terms, you can have a 3 vector representation of a word and use only 2 activation functions. I may be wrong but thats what I found out.

@statquest Ай бұрын

To create 3 embedding values per input with only 2 activation functions, you could connect all of the inputs to 1 activation function and put 1 weight on each input, but then you'd need to connect all of the inputs to the other activation function and use 2 weights for each input. The problem with that second activation function is that input * w1 * w2 = input * (w1 * w2) = input * w3. so I believe you'd end up with the equivalent of just 2 embedding values per input in the end. I believe this is why neural networks are always designed to have one weight per input per activation function.

@akashbarik5806 Ай бұрын

@@statquest Thanks alot for the clarification

@saisrisai9649 Жыл бұрын

Thank you Statquest!!!!

@statquest Жыл бұрын

Any time!

@pedropaixaob 11 ай бұрын

This is an amazing video. Thank you!

@statquest 11 ай бұрын

Thanks!

@MannyBernabe 3 ай бұрын

Great work. Thank you.

@statquest 3 ай бұрын

Thanks!

@c.nbhaskar4718 Жыл бұрын

great stuff as usual ..BAM * 600 million

@statquest Жыл бұрын

Thank you so much! :)

@janapalaswathi4262 10 ай бұрын

Awesome explanation..

@statquest 10 ай бұрын

Thanks!

@pakaponwiwat2405 Жыл бұрын

Wow, Awesome. Thank you so much!

@statquest Жыл бұрын

You're very welcome!

@MadeyeMoody492 Жыл бұрын

Great video! Was just wondering why the output of the softmax activation at 10:10 are just 1 and 0s. Wouldn't that only be the case if we applied ArgMax here not SoftMax?

@statquest Жыл бұрын

In this example the data set is very small and, for example, the word "is" is always followed by "great", every single time. In contrast, if we had a much larger dataset, then the word "is" would be followed by a bunch of words (like "great", or "awesome" or "horrible", etc) and not followed by a bunch of other words (like "ate", or "stand", etc). In that case, the soft max would tells which words had the highest probability of following is and we wouldn't just get 1.0 for a single word that could follow the word 'is'.

@MadeyeMoody492 Жыл бұрын

@@statquest Ohh ok, that clears it up. Thanks!!

@auslei Жыл бұрын

Love this channel.

@statquest Жыл бұрын

Glad to hear it!

@yasminemohamed5157 Жыл бұрын

Awesome as always. Thank you!!

@statquest Жыл бұрын

Thank you! :)

@vicadegboye684 6 ай бұрын

Thanks sooooo much for your videos. Let me not belabor the praise as it's been established that you are triple bam! 🙂 Meanwhilt, I've understood every single thing in your deep learning series up till this video. I'm still a bit confused about the negative sampling thing. I don't understand the idea of how using "aardvark" to predict "a" and "abandon" somehow means we are excluding "abandon". The concept is the only thing I've not understood in the 17 videos of this neural network/deep learning playlist. I would appreciate your help.

@statquest 6 ай бұрын

The idea is that there is one word for which we want the final output value to be 1 and everything else needs to be 0s. However, rather than focusing on every single output, we just focus on the one word that we want the output to be 1 and just a handful of words that we want the output to be 0, rather than all of them.

@oliverlee2819 3 ай бұрын

@@statquest So does this mean this negative sampling is implemented in each round of backpropagation optimization? I am also not quite sure about this part either. I guess a more detailed (but simplified) demo clarify this concept better. Or maybe some articles to reference?

@statquest 3 ай бұрын

@@oliverlee2819 Yes, you do negative sampling every single time.

@oliverlee2819 3 ай бұрын

@@statquest So the word that "we don't want to predict", means the words that we just want their predicted output value (prob) to be zero right? Is this done via teacher forcing method to force the output of one word to be 1, and the words that we don't want to predict to be zero?

@statquest 3 ай бұрын

@@oliverlee2819 The first part is correct. The second part is a little off. This isn't technically teacher forcing. We're just focusing on the 1 word we want the output to be 1 and a handful of words we want the output to be 0.

@jingzhouzhao8609 6 ай бұрын

Great Video in high quality!! Just wondering "times 2" at 13:27, because I saw 4 neurons in output layers, so not "times 4"?

@statquest 6 ай бұрын

A lot of people ask for the math at 13:16 to be clarified. In that example we have 3,000,000 inputs (only the first 4 are shown...), each connected to 100 activation functions, for a total of 300,000,000 weights on the connections from the inputs to the activation functions. We then have another 300,000,000 weights on the connections from activations functions to the outputs (only 4 outputs are show, but there are 3,000,000). 300,000,000 + 300,000,000 = 2 * 300,000,000

@ParthPandey-j2h Жыл бұрын

At 13:30 why we had multiplied by 2, while calculating number of weights required. 3 million(word) * 100(activation/word) * 2?

@statquest Жыл бұрын

Because we have the same number of weights on connections going to the activation function as we have of weights going from the activation functions.

@ParthPandey-j2h Жыл бұрын

@@statquest Is it like wX+b, so taking b

@statquest Жыл бұрын

@@ParthPandey-j2h If we have 3,000,000 words and phrases as inputs, and each input is connected to 100 activation functions, then we have 300,000,000 weights going from the inputs to the activation function. Then from those 100 activation function, we have 3,000,000 outputs (one per word or phrase), each with a weight. So we have 300,000,000 weights on the input side, and 300,000,000 weights on the output side, or a total of 600,000,000 weights. However, since we always have the same number of weights on the input and output sides, we only need to calculate the number of weights on one side and then just multiply that number by 2.

@ParthPandey-j2h Жыл бұрын

Wow, thanks for the reply @@statquest Double Bam!!

@denismarcio 9 ай бұрын

Extremamente didático! Parabéns.

@statquest 9 ай бұрын

Muito obrigado! :)

@MrAhsan99 Жыл бұрын

watched this video multiple times but unable to understand a thing. I'm sure I am dumb and the Josh is great!

@statquest Жыл бұрын

Maybe you should start with the basics for neural networks: kzbin.info/www/bejne/eaKyl5xqZrGZetk

@vicadegboye684 6 ай бұрын

This is the most challenging video of the series so far IMO. I've watched it several times too, but I understand everything apart from the last part on negative sampling. And yes, I've watched and understood every single video (16 of them on the playlist up to this point) before this one in the series. This is my first time of experiencing this in his videos.

@study-tp4ts Жыл бұрын

Great video as always!

@statquest Жыл бұрын

Thanks again!

@MaskedEngineerYH Жыл бұрын

Keep going statquest!!

@statquest Жыл бұрын

That's the plan!

@tupaiadhikari Жыл бұрын

Great Explanation. Please make a video on how do we connect the output of an Embedding Layer to an LSTM/GRU for doing classification for say Sentiment Analysis

@statquest Жыл бұрын

I show how to connect it to an LSTM for language translation here: kzbin.info/www/bejne/gmmrfKqbj66Co8k

@tupaiadhikari Жыл бұрын

@@statquest Thank You Professor Josh !

@neemo8089 Жыл бұрын

Thank you so much for the video! I have one question, at 15:09, why we only need to optimize 300 steps? For one word with 100 * 2 weights? not sure how to understand the '2' as well.

@statquest Жыл бұрын

At 15:09 there are 100 weights going from the word "aardvark" to the 100 activation functions in the hidden layer. There are then 100 weights going from the activation functions to the sum for the word "A" and 100 weights going from the activation functions to the sum for the word "abandon". Thus, 100 + 100 + 100 = 300.

@neemo8089 Жыл бұрын

Thank you!@@statquest

@minhmark.01 8 ай бұрын

thanks for your tutorial!!!

@statquest 8 ай бұрын

You're welcome!

@manpower9641 2 ай бұрын

hmm, where did the x2 on the weights come from (min 13:30). Thank you :)

@statquest 2 ай бұрын

We have 3,000,000 inputs, and each input has 100 weights going to the hidden layer. We then have 100 weights going from the hidden layer to the 3,000,000 outputs. The total number of weights that we need to train, is thus the sum of the weights from the input to the hidden layer (3,000,000 * 100), plus the weights from the hidden layer to the outputs (3,000,000 * 100). Thus, we can write it as 3,000,000 * 100 * 2.

@mgeich5 9 күн бұрын

Thanks for the great video as always. One question about the vocabulary size, 3M sounds like a huge vocabulary for today standards, is that the actual number?

@statquest 8 күн бұрын

These days vocabularies are in the 30,000 range.

@The-Martian73 Жыл бұрын

mr.Starmer I think you really loved Troll 2 😅

@statquest Жыл бұрын

@ItIsJan Жыл бұрын

hey, i might be stupid or something, but with the negative sampling thing in the second step (after activation functions), you first say we dont want to predict all other words except a, and then for some reason randomly select a word tht we dont want to predict, but i thought we dont want predict all other words? and then we are ignoring all weights that go to words that we want to optimize? i might be stupid or soemthign (14:18)

@statquest Жыл бұрын

With a small vocabulary, when optimizing the weights and biases, we include the weights that lead to every word - the word that we want to predict and all of the words we don't want to predict. With negative sampling, we only include the word we want to predict and a subset of words we don't want to predict.

@paranoid_android8470 11 ай бұрын

I think there's a small mistake at 14:57. He says that we don't want to predict 'abandon' and yet he includes it in the list. I think he meant to say 'aardvark' instead. [edit]: The video is correct! Read bottom reply if you have the same question.

@statquest 11 ай бұрын

The video is correct at that time point. At that point we are selecting words we do want to predict, meaning we want their output values to be 0 instead of 1. However, we only select a handful of words that we want to have the predictions be 0 instead of all of the words we do not want to predict.

@paranoid_android8470 11 ай бұрын

@@statquest After careful rewatching the video a couple of times I noticed a missunderstanding of the word "predict" on my part. If I understand correctly, by saying we don't want to predict specific words, that entails calculating the outcome in the output layer so we can reduce their values through backpropagation. Before I understood it as "we don't want to 'predict', as in clalculate the values, for specific words"

@statquest 11 ай бұрын

@@paranoid_android8470 I agree - the wording could be improved since it is slightly ambiguous as to what it means to predict and not to predict.

@ang3dang2 4 ай бұрын

Can you do one for wav2vec? It seemingly taps on the same concept as word2vec but the equations are so much more complex.

@statquest 4 ай бұрын

I'll keep that in mind.

@enchanted_swiftie Жыл бұрын

Does this mean the neural net to get the embeddings can only have a single layer? I mean: 1. Say total 100 words of corpus 2. First hidden layer (with say I put the embedding size of 256) 3. Then another layer to predict the next word which will be 100 words again. Here, to plot the graph, or say to use the cosine similarity to get how close two words are, I will simply have to use the 256 weights of both words from the first hidden layer, right? So does that mean we can only have a single layer to optimise? Can't we add 2, 3, 50 layers? And if we can, then weights of which layer should we take as the embeddings to compare the words? Will you please guide? Thanks! You are a gem as always 🙌

@statquest Жыл бұрын

There are no rules in neural networks, just guidelines. Most of the advancements in the field have come from people doing things differently and new. So feel free to try "multilayer word embedding" if you would like. See what happens! You might invent the next transformer.

@enchanted_swiftie Жыл бұрын

@@statquest Haha, yes but... then weights of which layer should be used? 🤔😅 Yeah, I can use any as there are no strict rules, may take mean or something... but if there are any embedding models... may I know what is the standard? Thanks 🙏👍

@statquest Жыл бұрын

@@enchanted_swiftie The standard is to use a single set of weights that go to activation functions.

@enchanted_swiftie Жыл бұрын

@@statquest Oops, okay... 😅

@gabrielrochasantana 9 ай бұрын

Amazing lecture, congrats. The audio was also made from an NPL (Natural Language Processing), right?

@statquest 9 ай бұрын

The translated overdubs were.

@nimitnag6497 5 ай бұрын

Hey Josh , thanks for this amazing video. It was an amazing explanation of a cool concept. However I have a question. If in a corpus , I also have a document that states Troll 2 is bad!. Will the word bad and awesome share the similar embedding vector? If not can you please give an explanation. Thank you so much for helping around

@statquest 5 ай бұрын

It's possible that they would, since it occurs in the exact same context. However, if you have a larger dataset, you'll get "bad" in other, more negative contexts, and you'll get "awesome" in other, more positive contexts, and that will, ultimately, affect the embeddings for each word.

@nimitnag6497 5 ай бұрын

@@statquest Thank you so much Josh for your quick reply

@nimitnag6497 5 ай бұрын

Do you have any discord groups or any other forum where can ask questions ?

@statquest 5 ай бұрын

@@nimitnag6497 Unfortunately not.