Word Embedding and Word2Vec, Clearly Explained!!!

  Рет қаралды 340,582

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 561
@statquest
@statquest Жыл бұрын
To learn more about Lightning: lightning.ai/ Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/ NOTE: A lot of people ask for the math at 13:16 to be clarified. In that example we have 3,000,000 inputs, each connected to 100 activation functions, for a total of 300,000,000 weights on the connections from the inputs to the activation functions. We then have another 300,000,000 weights on the connections from activations functions to the outputs. 300,000,000 + 300,000,000 = 2 * 300,000,000
@karanacharya18
@karanacharya18 6 ай бұрын
In simple words, word embeddings is the by-product of training a neural network to predict the next word. By focusing on that single objective, the weights themselves (embeddings) can be used to understand the relationships between the words. This is actually quite fantastic! As always, great video @statquest!
@statquest
@statquest 6 ай бұрын
bam! :)
@joeybasile545
@joeybasile545 6 ай бұрын
Not necessarily just the next word. Your statement is specific.
@NoNonsense_01
@NoNonsense_01 Жыл бұрын
Probably the most important concept in NLP. Thank you explaining it so simply and rigorously. Your videos are a thing of beauty!
@statquest
@statquest Жыл бұрын
Wow, thank you!
@exxzxxe
@exxzxxe 9 ай бұрын
Josh; this is the absolutely clearest and most concise explanation of embeddings on KZbin!
@statquest
@statquest 9 ай бұрын
Thank you very much!
@davins90
@davins90 7 ай бұрын
totally agree
@SergioPolimante
@SergioPolimante 10 ай бұрын
Statquest is by far the best machine learning Chanel on KZbin to learn the basic concepts. Nice job
@statquest
@statquest 10 ай бұрын
Thank you!
@myyoutubechannel2858
@myyoutubechannel2858 3 ай бұрын
In the first 19 seconds my mans explains Word Embedding more simply and elegantly than anything else out there on the internet.
@statquest
@statquest 3 ай бұрын
Thanks!
@rachit7185
@rachit7185 Жыл бұрын
This channel is literally the best thing happened to me on youtube! Way too excited for your upcoming video on transformers, attention and LLMs. You're the best Josh ❤
@statquest
@statquest Жыл бұрын
Wow, thanks!
@MiloLabradoodle
@MiloLabradoodle Жыл бұрын
Yes, please do a video on transformers. Great channel.
@statquest
@statquest Жыл бұрын
@@MiloLabradoodle I'm working on the transformers video right now.
@liuzeyu3125
@liuzeyu3125 Жыл бұрын
@@statquest Can't wait to see it!
@ashmitgupta8039
@ashmitgupta8039 4 ай бұрын
Was literally struggling to understand this concept, and then I found this goldmine.
@statquest
@statquest 4 ай бұрын
Bam! :)
@harin01737
@harin01737 Жыл бұрын
I was struggling to understand NLP and DL concepts, thinking of dropping my classes, and BAM!!! I found you, and now I'm writing a paper on neural program repair using DL techniques.
@statquest
@statquest Жыл бұрын
BAM! :)
@JawadAhmadCodes
@JawadAhmadCodes 2 ай бұрын
Oh my Gosh, StatQuest is surely the greatest channel I found to learn the whole universe in simple way. WOW!
@statquest
@statquest 2 ай бұрын
Thank you! :)
@haj5776
@haj5776 Жыл бұрын
The phrase "similar words will have similar numbers" in the song will stick with me for a long time, thank you!
@statquest
@statquest Жыл бұрын
bam!
@tanbui7569
@tanbui7569 Жыл бұрын
Damn, when I first learned about this 4 years ago, it took me two days to wrap my head around to understand these weights and embeddings to implement in codes. Just now, I need to refreshe myself the concepts since I have not worked with it in a while and your videos illustrated what I learned (whole 2 days in the past) in just 16 minutes !! I wished this video existed earlier !!
@statquest
@statquest Жыл бұрын
Thanks!
@channel_SV
@channel_SV Жыл бұрын
It's so nice to google and realize that there is a StatQuest about your question, when you are certain of that there hadn't been one some time before
@statquest
@statquest Жыл бұрын
BAM! :)
@mannemsaisivadurgaprasad8987
@mannemsaisivadurgaprasad8987 Жыл бұрын
On of the best videos I've seen till now regarding Embeddings.
@statquest
@statquest Жыл бұрын
Thank you!
@yuxiangzhang2343
@yuxiangzhang2343 Жыл бұрын
So good!!! This is literally the best deep learning tutorial series I find… after a very long search on the web!
@statquest
@statquest Жыл бұрын
Thank you! :)
@manuelamankwatia6556
@manuelamankwatia6556 7 ай бұрын
This is by far the best video on embeddings. A while university corse is broken down in 15minutes
@statquest
@statquest 7 ай бұрын
Thanks!
@TropicalCoder
@TropicalCoder Жыл бұрын
That was the first time I actually understood embeddings - thanks!
@statquest
@statquest Жыл бұрын
bam! :)
@pichazai
@pichazai 6 ай бұрын
this channel is the best resource of ML in the entire internet
@statquest
@statquest 6 ай бұрын
Thank you!
@mycotina6438
@mycotina6438 Жыл бұрын
BAM!! StatQuest never lie, it is indeed super clear!
@statquest
@statquest Жыл бұрын
Thank you! :)
@awaredz007
@awaredz007 6 ай бұрын
Wow!! This is the best definition I have ever heard or seen, of word embedding. Right at 09:35. Thanks for the clear and awesome video. You guy rock!!
@statquest
@statquest 6 ай бұрын
Thanks! :)
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
This is the best explanation of word embedding I have come across.
@statquest
@statquest Жыл бұрын
Thank you very much! :)
@noadsensehere9195
@noadsensehere9195 2 ай бұрын
This is the only video I was finding to understand this basic concept for NLP! tHANKS
@statquest
@statquest 2 ай бұрын
Thanks!
@ah89971
@ah89971 Жыл бұрын
When I watched this,I have only one question which is why all the others failed to explain this if they are fully understood the concept?
@statquest
@statquest Жыл бұрын
bam!
@rudrOwO
@rudrOwO 11 ай бұрын
@@statquest Double Bam!
@meow-mi333
@meow-mi333 10 ай бұрын
Bam the bam!
@eqe-kui-nei
@eqe-kui-nei 2 ай бұрын
@@ah89971 A lot people in this industry (even with a phd) actually dont.
@dreamdrifter
@dreamdrifter Жыл бұрын
Thank you Josh, this is something I've been meaning to wrap my head around for a while and you explained it so clearly!
@statquest
@statquest Жыл бұрын
Glad it was helpful!
@wizenith
@wizenith Жыл бұрын
haha I love your opening and your teaching style! when we think something is extremely difficult to learn, everything should begin with singing a song, that make a day more beautiful to begin with ( heheh actually I am not just teasing lol, I really like that ) thanks for sharing your thoughts with us
@statquest
@statquest Жыл бұрын
Thanks!
@fouadboutaleb4157
@fouadboutaleb4157 Жыл бұрын
Bro , i have my master degree in ML but trust me you explain it better than my teachers ❤❤❤ Big thanks
@statquest
@statquest Жыл бұрын
Thank you very much! :)
@avishkaravishkar1451
@avishkaravishkar1451 11 ай бұрын
For those of you who find it hard to understand this video, my recommendation is to watch it at a slower pace and make notes of the same. It will really make things much more clear.
@statquest
@statquest 11 ай бұрын
0.5 speed bam!!! :)
@acandmishra
@acandmishra 7 ай бұрын
your work is extremely amazing and so helpful for new learns who want to go into detail of working of Deep Learning models , instead of just knowing what they do!! Keep it up!
@statquest
@statquest 7 ай бұрын
Thanks!
@chad5615
@chad5615 Жыл бұрын
Keep up the amazing work (especially the songs) Josh, you're making live easy for thousands of people !
@statquest
@statquest Жыл бұрын
Wow! Thank you so much for supporting StatQuest! TRIPLE BAM!!!! :)
@flow-saf
@flow-saf 11 ай бұрын
This video explains the source of the multiple dimensions in a word embedding, in the most simple way. Awesome. :)
@statquest
@statquest 11 ай бұрын
Thanks!
@exxzxxe
@exxzxxe 7 ай бұрын
Hopefully everyone following this channel has Josh's book. It is quite excellent!
@statquest
@statquest 7 ай бұрын
Thanks for that!
@rathinarajajeyaraj1502
@rathinarajajeyaraj1502 Жыл бұрын
This is one of the best sources of information.... I always find videos a great source of visual stimulation... thank you.... infinite baaaam
@statquest
@statquest Жыл бұрын
BAM! :)
@FullStackAmigo
@FullStackAmigo Жыл бұрын
Absolutely the best explanation that I've found so far! Thanks!
@statquest
@statquest Жыл бұрын
Thank you! :)
@MarvinMendesCabral
@MarvinMendesCabral Жыл бұрын
Hey Josh, i'm a brazilian student and i love to see your videos, it's such a good and fun to watch explanation of every one of the concepts, i just wanted to say thank you, cause in the last few months you made me smile beautiful in the middle of studying, so, thank you!!! (sorry for the bad english hahaha)
@statquest
@statquest Жыл бұрын
Muito obrigado!!! :)
@DanielDias-vl2js
@DanielDias-vl2js 3 ай бұрын
Thank goodness I found this channel! You've got great content and an excellent teaching methodology here!
@statquest
@statquest 3 ай бұрын
Thanks!
@mykolalebid6279
@mykolalebid6279 26 күн бұрын
Thank you for your excellent work. A video on negative sampling would be a valuable addition.
@statquest
@statquest 26 күн бұрын
I'll keep that in mind.
@lfalfa8460
@lfalfa8460 11 ай бұрын
I love all of your songs. You should record a CD!!! 🤣 Thank you very much again and again for the elucidating videos.
@statquest
@statquest 11 ай бұрын
Thanks!
@mazensaaed8635
@mazensaaed8635 4 ай бұрын
I promise I'll be member in your channel when I get my first data science job
@statquest
@statquest 4 ай бұрын
BAM! Thank you very much! :)
@ananpinya835
@ananpinya835 Жыл бұрын
StatQuest is great! I learn a lot from your channel. Thank you very much!
@statquest
@statquest Жыл бұрын
Glad you enjoy it!
@mamdouhdabjan9292
@mamdouhdabjan9292 Жыл бұрын
Hey Josh. A great new series that I, and many others, would be excited to see is bayesian statistics. Would love to watch you explain the intricacies of that branch of stats. Thanks as always for the great content and keep up with the neural-network related videos. They are especially helpful.
@statquest
@statquest Жыл бұрын
That's definitely on the to-do list.
@mamdouhdabjan9292
@mamdouhdabjan9292 Жыл бұрын
@@statquest looking forward to it.
@muthuaiswaryaaswaminathan4079
@muthuaiswaryaaswaminathan4079 Жыл бұрын
Thank you so much for this playlist! Got to learn a lot of things in a very clear manner. TRIPLE BAM!!!
@statquest
@statquest Жыл бұрын
Thank you! :)
@gustavow5746
@gustavow5746 Жыл бұрын
the best video I saw about this topic so far. Great Content! Congrats!!
@statquest
@statquest Жыл бұрын
Wow, thanks!
@michaelcheung6290
@michaelcheung6290 Жыл бұрын
Thank you statquest!!! Finally I started to understand LSTM
@statquest
@statquest Жыл бұрын
Hooray! BAM!
@wellwell8025
@wellwell8025 Жыл бұрын
Way better than my University slides. Thanks
@statquest
@statquest Жыл бұрын
Thanks!
@EZZAHIRREDOUANE
@EZZAHIRREDOUANE 7 ай бұрын
Great presentation, You saved my day after watching several videos, thank you!
@statquest
@statquest 7 ай бұрын
Glad it helped!
@LakshyaGupta-ge3wj
@LakshyaGupta-ge3wj Жыл бұрын
Absolutely mind blowing and amazing presentation! For the Word2Vec's strategy for increasing context, does it employ the 2 strategies in "addition" to the 1-Output-For-1-Input basic method we talked about in the whole video or are they replacements? Basically, are we still training the model on predicting "is" for "Gymkata" in the same neural network along with predicting "is" for a combination of "Gymkata" and "great"?
@statquest
@statquest Жыл бұрын
Word2Vec uses one of the two strategies presented at the end of the video.
@ramzirebai3661
@ramzirebai3661 Жыл бұрын
Thank you so much Mr.Josh Starmer, you are the only one that makes ML concepts easy to understand Can you , please , explain Glove ?
@statquest
@statquest Жыл бұрын
I'll keep that in mind.
@lexxynubbers
@lexxynubbers Жыл бұрын
Machine learning explained like Sesame Street is exactly what I need right now.
@statquest
@statquest Жыл бұрын
bam!
@eamonnik
@eamonnik Жыл бұрын
Hey Josh! Loved seeing your talk at BU! Appreciate your videos :)
@statquest
@statquest Жыл бұрын
Thanks so much! :)
@alexdamado
@alexdamado 4 ай бұрын
Thanks for posting. It is indeed a clear explanation and helped me move forward with my studies.
@statquest
@statquest 4 ай бұрын
Glad it was helpful!
@exxzxxe
@exxzxxe 9 ай бұрын
You ARE the Batman and Superman of machine learning!
@statquest
@statquest 9 ай бұрын
:)
@周子懿-y5r
@周子懿-y5r 11 ай бұрын
Thank you Josh for this great video. I have a quick question about the Negative Sampling: If we only want to predict A, why do we need to keep the weights for "abandon" instead of just ignoring all the weights except for "A"?
@statquest
@statquest 11 ай бұрын
If we only focused on the weights for "A" and nothing else, then training would cause all of the weights to make every output = 1. In contrast, by adding some outputs that we want to be 0, training is forced to make sure that not every single output gets a 1.
@bancolin1005
@bancolin1005 Жыл бұрын
BAM! Thanks for your video, I finally realize what the negative sampling means ~
@statquest
@statquest Жыл бұрын
Happy to help!
@RaynerGS
@RaynerGS Жыл бұрын
I admire your work a lot. Salute from Brazil.
@statquest
@statquest Жыл бұрын
Muito obrigado! :)
@mahdi132
@mahdi132 Жыл бұрын
Thank you sir. Your explanation is great and your work is much appreciated.
@statquest
@statquest Жыл бұрын
Thanks!
@familywu3869
@familywu3869 Жыл бұрын
Thank you very much for your excellent tutorials! Josh. Here I have a question, at around 13:30 of this video tutorial, you mentioned to multiply by 2. I am not sure why 2? I mean if there are more than 2 outputs, will we multiply the number of output nodes, instead of 2? Thank you for your clarification in advance.
@statquest
@statquest Жыл бұрын
If we have 3,000,000 words and phrases as inputs, and each input is connected to 100 activation functions, then we have 300,000,000 weights going from the inputs to the activation function. Then from those 100 activation function, we have 3,000,000 outputs (one per word or phrase), each with a weight. So we have 300,000,000 weights on the input side, and 300,000,000 weights on the output side, or a total of 600,000,000 weights. However, since we always have the same number of weights on the input and output sides, we only need to calculate the number of weights on one side and then just multiply that number by 2.
@surojit9625
@surojit9625 Жыл бұрын
@@statquest Thanks for explaining! I also had the same question.
@jwilliams8210
@jwilliams8210 11 ай бұрын
Ohhhhhhhhh! I missed that the first time around! BTW: (Stat)Squatch and Norm are right: StatQuest is awesome!!
@vpnserver407
@vpnserver407 Жыл бұрын
highly valuable video and book tutorial, thanks for putting this kind of special tuts out here .
@statquest
@statquest Жыл бұрын
Glad you liked it!
@The-Martian73
@The-Martian73 Жыл бұрын
mr.Starmer I think you really loved Troll 2 😅
@statquest
@statquest Жыл бұрын
:)
@alfredoderodt6519
@alfredoderodt6519 Жыл бұрын
You are a beautiful human! Thank you so much for this video! I was finally able to understand this concept! Thanks so much again!!!!!!!!!!!!! :)
@statquest
@statquest Жыл бұрын
Glad it was helpful!
@ColinTimmins
@ColinTimmins Жыл бұрын
Thank you so much for these videos. It really helps with the visuals because I am dyslexic… Quadruple BAM!!!! lol 😊
@statquest
@statquest Жыл бұрын
Happy to help!
@tupaiadhikari
@tupaiadhikari Жыл бұрын
Great Explanation. Please make a video on how do we connect the output of an Embedding Layer to an LSTM/GRU for doing classification for say Sentiment Analysis
@statquest
@statquest Жыл бұрын
I show how to connect it to an LSTM for language translation here: kzbin.info/www/bejne/gmmrfKqbj66Co8k
@tupaiadhikari
@tupaiadhikari Жыл бұрын
@@statquest Thank You Professor Josh !
@vicadegboye684
@vicadegboye684 4 ай бұрын
Thanks sooooo much for your videos. Let me not belabor the praise as it's been established that you are triple bam! 🙂 Meanwhilt, I've understood every single thing in your deep learning series up till this video. I'm still a bit confused about the negative sampling thing. I don't understand the idea of how using "aardvark" to predict "a" and "abandon" somehow means we are excluding "abandon". The concept is the only thing I've not understood in the 17 videos of this neural network/deep learning playlist. I would appreciate your help.
@statquest
@statquest 4 ай бұрын
The idea is that there is one word for which we want the final output value to be 1 and everything else needs to be 0s. However, rather than focusing on every single output, we just focus on the one word that we want the output to be 1 and just a handful of words that we want the output to be 0, rather than all of them.
@oliverlee2819
@oliverlee2819 Ай бұрын
@@statquest So does this mean this negative sampling is implemented in each round of backpropagation optimization? I am also not quite sure about this part either. I guess a more detailed (but simplified) demo clarify this concept better. Or maybe some articles to reference?
@statquest
@statquest Ай бұрын
@@oliverlee2819 Yes, you do negative sampling every single time.
@oliverlee2819
@oliverlee2819 Ай бұрын
@@statquest So the word that "we don't want to predict", means the words that we just want their predicted output value (prob) to be zero right? Is this done via teacher forcing method to force the output of one word to be 1, and the words that we don't want to predict to be zero?
@statquest
@statquest Ай бұрын
@@oliverlee2819 The first part is correct. The second part is a little off. This isn't technically teacher forcing. We're just focusing on the 1 word we want the output to be 1 and a handful of words we want the output to be 0.
@natuchips98
@natuchips98 3 ай бұрын
You literally saved my life
@statquest
@statquest 3 ай бұрын
bam! :)
@enchanted_swiftie
@enchanted_swiftie Жыл бұрын
Does this mean the neural net to get the embeddings can only have a single layer? I mean: 1. Say total 100 words of corpus 2. First hidden layer (with say I put the embedding size of 256) 3. Then another layer to predict the next word which will be 100 words again. Here, to plot the graph, or say to use the cosine similarity to get how close two words are, I will simply have to use the 256 weights of both words from the first hidden layer, right? So does that mean we can only have a single layer to optimise? Can't we add 2, 3, 50 layers? And if we can, then weights of which layer should we take as the embeddings to compare the words? Will you please guide? Thanks! You are a gem as always 🙌
@statquest
@statquest Жыл бұрын
There are no rules in neural networks, just guidelines. Most of the advancements in the field have come from people doing things differently and new. So feel free to try "multilayer word embedding" if you would like. See what happens! You might invent the next transformer.
@enchanted_swiftie
@enchanted_swiftie Жыл бұрын
@@statquest Haha, yes but... then weights of which layer should be used? 🤔😅 Yeah, I can use any as there are no strict rules, may take mean or something... but if there are any embedding models... may I know what is the standard? Thanks 🙏👍
@statquest
@statquest Жыл бұрын
@@enchanted_swiftie The standard is to use a single set of weights that go to activation functions.
@enchanted_swiftie
@enchanted_swiftie Жыл бұрын
@@statquest Oops, okay... 😅
@ajd3fjf4hsjd3
@ajd3fjf4hsjd3 3 ай бұрын
Fantasticly simple, and complete!
@statquest
@statquest 3 ай бұрын
Thanks!
@AliShafiei-ui8tn
@AliShafiei-ui8tn Жыл бұрын
the best channel ever.
@statquest
@statquest Жыл бұрын
Double bam! :)
@anhnguyenvan5806
@anhnguyenvan5806 Жыл бұрын
I am sorry but you do not turn on advertisements to get money from KZbin, do you? And thank you so much for your effort to make videos. You make the inequality in approaching knowledge lesser and lesser. I am very grateful. Hope you always have the happiest life!
@statquest
@statquest Жыл бұрын
Thank you!
@NoNonsense_01
@NoNonsense_01 Жыл бұрын
Why should he not turn on advertisement? Do you sell your services without any compensation? If you are so bothered about advertisement there is a subscription for that.
@kanakorn
@kanakorn 2 ай бұрын
@@anhnguyenvan5806 Membership is available :-)
@anonymousgreen5080
@anonymousgreen5080 8 күн бұрын
@NoNonsense_01 they never meant to turn on lol. They said they are grateful becoz of no ads. Other videos often make money from ads. This channel doesnt do that. It helps in providing users with aid in Many ML topics. Thats y he is appreciating the channel.
@MrAhsan99
@MrAhsan99 11 ай бұрын
watched this video multiple times but unable to understand a thing. I'm sure I am dumb and the Josh is great!
@statquest
@statquest 11 ай бұрын
Maybe you should start with the basics for neural networks: kzbin.info/www/bejne/eaKyl5xqZrGZetk
@vicadegboye684
@vicadegboye684 4 ай бұрын
This is the most challenging video of the series so far IMO. I've watched it several times too, but I understand everything apart from the last part on negative sampling. And yes, I've watched and understood every single video (16 of them on the playlist up to this point) before this one in the series. This is my first time of experiencing this in his videos.
@NewMateo
@NewMateo Жыл бұрын
Great vid. So your going to do a vid on transformer architectures? That would be incredible if so. Btw bought your book. Finished it in like 2 weeks. Great work on it!
@statquest
@statquest Жыл бұрын
Thank you! My video on Encoder-Decoders will come out soon, then Attention, then Transformers.
@thomasstern6814
@thomasstern6814 Жыл бұрын
@@statquest When the universe needs you most, you provide
@mariafernandaruizmorales2322
@mariafernandaruizmorales2322 Жыл бұрын
It would also be nice to have a video about the difference between LM (linear regression models) and GLM (Generalized Linear Models). I know they're different but don't quite understand thAT when interpreting them or programming them in R. THAAANKS!
@statquest
@statquest Жыл бұрын
Linear models are just models based on linear regression and I describe them here in this playlist: kzbin.info/aero/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU Generalized Linear Models is more "generalized" and includes Logistic Regression kzbin.info/aero/PLblh5JKOoLUKxzEP5HA2d-Li7IJkHfXSe and a few other methods that I don't talk about like Poisson Regression.
@mariafernandaruizmorales2322
@mariafernandaruizmorales2322 Жыл бұрын
@@statquest Thanks Josh!! I'll watch them all 🤗
@m3ow21
@m3ow21 Жыл бұрын
I love the way you teach!
@statquest
@statquest Жыл бұрын
Thanks!
@saisrisai9649
@saisrisai9649 11 ай бұрын
Thank you Statquest!!!!
@statquest
@statquest 11 ай бұрын
Any time!
@anonymushadow282
@anonymushadow282 Жыл бұрын
al fin alguien me explica como se convierte en si, todos me dicen "usa red que te lo ejecuta automaticamente" pero yo quiero saber que esta haciendo esa red internamente... al fiiiin
@statquest
@statquest Жыл бұрын
Muchas grasias!!! BAM!
@mariafernandaruizmorales2322
@mariafernandaruizmorales2322 Жыл бұрын
Please make a video about the metrics for prediction performance: RMSE, MAE and R SQUARED. 🙏🏼🙏🏼🙏🏼 YOURE THE BEST!
@statquest
@statquest Жыл бұрын
The first video I ever made is on R-squared: kzbin.info/www/bejne/aHK0fKCtZpmgfq8 NOTE: Back then I didn't know about machine learning, so I only talk about R-squared in the context of fitting a straight line to data. In that context, R-squared can't be negative. However, with other machine learning algorithms, it is possible.
@kimsobota1324
@kimsobota1324 11 ай бұрын
I appreciate the knowledge you've just shared. It explains many things to me about neural networks. I have a question though, If you are randomly assigning a Value to a word, why not try something easier? For example, In Hebrew, each of the letters of the Alef - Bet is assigned a value. these values are added together to form a sum of a word. It is the context of the word, in a sentence that forms the block. Sabe? Take a look at Gamatra, Hewbew has been doing this for thousands of years. Just a thought.
@statquest
@statquest 11 ай бұрын
Would that method result in words used in similar contexts to have similar numbers? Does it apply to other languages? Other symbols? And can we end up with multiple numbers per symbol to reflect how it can be used or modified in different contexts?
@kimsobota1324
@kimsobota1324 11 ай бұрын
I wish I could answer that question better than to tell you context is EVERYTHING in Hebrew, a language that has but doesn't use vowels, since all who use the language understand the consonant-based word structures. Not only that, but in the late 1890s Rabbis from Ukraine and Azerbaijan developed a mathematical code that was used to predict word structures from the Torah that were accurate to a value of 0.001%. Others have tried to apply it to other books like Alice in Wonderland and could not duplicate the result. You can find more information on the subject through a book called, The Bible Code, which gives much more information as well as the formuli the Jewish Mathameticians created. While it is a poor citation, I have included this Wikipedia link: en.wikipedia.org/wiki/Bible_code#:~:text=The%20Bible%20code%20(Hebrew%3A%20%D7%94%D7%A6%D7%95%D7%A4%D7%9F,has%20predicted%20significant%20historical%20events. The book is available on Amazon if you find it peaks your interest. Please let me know if this helps. @@statquest
@kimsobota1324
@kimsobota1324 11 ай бұрын
@starquest, I had not heard from you about the Wiki?
@nashwin2315
@nashwin2315 4 ай бұрын
I guess we can also take the weights from the activation function to the softmax function? That would also be two weights per word and the intuition is the same --> similar words will have similar weights.
@statquest
@statquest 4 ай бұрын
To be honest, I don't know if that would work out. It's possible that no one knows - I don't think they have worked out why word embedding networks work the way they do. Regardless, it sounds like a fun thing to try and see what happens.
@tomoki-v6o
@tomoki-v6o Жыл бұрын
My favourite topic its magic. Bam!!
@statquest
@statquest Жыл бұрын
:)
@张超-o2z
@张超-o2z 9 ай бұрын
Hey, Josh! Absolutely amazing series!!! If I understand correctly, the input weights of a specific word (e.g., gymkata) are its coordinates in multi-dimensional space? The coordinates can be used to calculate cosine similarity to find similar meanings as well(e.g., girl queen, guy king)? And is that true the philosophy applies to LLMs such as GPT embeddings? GPT Text-embeddings-ada-002 has 1536 dimensions, which means there are 1536 nodes in the 1st hidden layer?
@statquest
@statquest 9 ай бұрын
In theory it applies to LLMs, but those networks are so complex that I'm not 100% sure they do. And a model with 1536 dimensions has 1536 nodes in the first layer.
@张超-o2z
@张超-o2z 9 ай бұрын
You mean 1536 dimensions, not 1546, right? @@statquest
@statquest
@statquest 9 ай бұрын
@@张超-o2z yep
@benhargreaves5556
@benhargreaves5556 Жыл бұрын
I struggled with this video series and its only been with 3 blue 1 brown's incredibly comprehensive and clear videos on deep learning that I've been able to understand gradient descent, back propagation and basic feed forward networks. Just different learning and training styles I guess.
@statquest
@statquest Жыл бұрын
That make sense to me. I made these videos because 3blue1brown's video's didn't help me understand any of these topics. So if 3blue1brown's works for you, bam!
@vicadegboye684
@vicadegboye684 4 ай бұрын
@@statquest TBH, 3B1B videos are great but I often find it difficult to understand some concepts. I read the comments and see lots of positive reviews and then wonder if I'm the one who is dumb for not understanding some of the things he's explaining. I guess more than half of the people who positively review a math video just do it because of the crowd effect. I guess people get carried away by the cool graphics/visualizations which are often good but sometimes insufficient on their own to clearly explain concepts. That said, 3B1B is a great channel and I appreciate it being free. But, the truth still remains that his videos are not the clearest to understand. I've understood every single thing in your deep learning series up till this video. I'm still a bit confused about the negative sampling thing. I don't understand the idea of how using "aardvark" to predict "a" and "abandon" somehow means we are excluding "abandon". The concept is the only thing I've not understood in the 17 videos of this neural network/deep learning playlist. I would appreciate your help.
@statquest
@statquest 4 ай бұрын
@@vicadegboye684 The idea is that there is one word for which we want the final output value to be 1 and everything else needs to be 0s. However, rather than focusing on every single output, we just focus on the one word that we want the output to be 1 and just a handful of words that we want the output to be 0, rather than all of them.
@danish5326
@danish5326 Жыл бұрын
Thanks for enlightening us Master.
@statquest
@statquest Жыл бұрын
Any time!
@BalintHorvath-mz7rr
@BalintHorvath-mz7rr 8 ай бұрын
Awesome video! This time, I feel I miss one step through. Namely, how do you train this network? I mean, I get that we want the network as such that similar words have similar embeddings. But what is the 'Actual' we use in our loss function to measure the difference from and use backpropagation with?
@statquest
@statquest 8 ай бұрын
Yes
@balintnk
@balintnk 8 ай бұрын
@@statquest haha I feel like I didn't ask the question well :D How would the network know, without human input, that Troll 2 and Gymkata is very similar and so it should optimize itself so that ultimately they have similar embeddings? (What "Actual" value do we use in the loss function to calculate the residual?)
@statquest
@statquest 8 ай бұрын
@@balintnk We just use the context that the words are used in. Normal backpropagation plus the cross entropy loss function where we use neighboring words to predict "troll 2" and "gymkata" is all you need to use to get similar embedding values for those. That's what I used to create this video.
@nimitnag6497
@nimitnag6497 3 ай бұрын
Hey Josh , thanks for this amazing video. It was an amazing explanation of a cool concept. However I have a question. If in a corpus , I also have a document that states Troll 2 is bad!. Will the word bad and awesome share the similar embedding vector? If not can you please give an explanation. Thank you so much for helping around
@statquest
@statquest 3 ай бұрын
It's possible that they would, since it occurs in the exact same context. However, if you have a larger dataset, you'll get "bad" in other, more negative contexts, and you'll get "awesome" in other, more positive contexts, and that will, ultimately, affect the embeddings for each word.
@nimitnag6497
@nimitnag6497 3 ай бұрын
@@statquest Thank you so much Josh for your quick reply
@nimitnag6497
@nimitnag6497 3 ай бұрын
Do you have any discord groups or any other forum where can ask questions ?
@statquest
@statquest 3 ай бұрын
@@nimitnag6497 Unfortunately not.
@hepark
@hepark Ай бұрын
I thought one should be using ArgMax during the validation step, after the optimization of weights and bias was already made. Anyway, this was an interesting video, I learned a lot.
@statquest
@statquest Ай бұрын
After optimization, you can use ArgMax, but SoftMax allows us to pick words based on a distribution and that can make things more interesting.
@wenqiangli7544
@wenqiangli7544 Жыл бұрын
Great video for explaining word2vec!
@statquest
@statquest Жыл бұрын
Thanks!
@MaskedEngineerYH
@MaskedEngineerYH Жыл бұрын
Keep going statquest!!
@statquest
@statquest Жыл бұрын
That's the plan!
@paranoid_android8470
@paranoid_android8470 9 ай бұрын
I think there's a small mistake at 14:57. He says that we don't want to predict 'abandon' and yet he includes it in the list. I think he meant to say 'aardvark' instead. [edit]: The video is correct! Read bottom reply if you have the same question.
@statquest
@statquest 9 ай бұрын
The video is correct at that time point. At that point we are selecting words we do want to predict, meaning we want their output values to be 0 instead of 1. However, we only select a handful of words that we want to have the predictions be 0 instead of all of the words we do not want to predict.
@paranoid_android8470
@paranoid_android8470 9 ай бұрын
@@statquest After careful rewatching the video a couple of times I noticed a missunderstanding of the word "predict" on my part. If I understand correctly, by saying we don't want to predict specific words, that entails calculating the outcome in the output layer so we can reduce their values through backpropagation. Before I understood it as "we don't want to 'predict', as in clalculate the values, for specific words"
@statquest
@statquest 9 ай бұрын
@@paranoid_android8470 I agree - the wording could be improved since it is slightly ambiguous as to what it means to predict and not to predict.
@ar_frz
@ar_frz 13 күн бұрын
This was lovely! thank you.
@statquest
@statquest 12 күн бұрын
Thanks!
@gabrielrochasantana
@gabrielrochasantana 8 ай бұрын
Amazing lecture, congrats. The audio was also made from an NPL (Natural Language Processing), right?
@statquest
@statquest 8 ай бұрын
The translated overdubs were.
@ang3dang2
@ang3dang2 2 ай бұрын
Can you do one for wav2vec? It seemingly taps on the same concept as word2vec but the equations are so much more complex.
@statquest
@statquest 2 ай бұрын
I'll keep that in mind.
@NikitaBorisov-g2h
@NikitaBorisov-g2h 10 ай бұрын
This guy really loves Troll 2!
@statquest
@statquest 10 ай бұрын
bam!
@TheFunofMusic
@TheFunofMusic Жыл бұрын
Love this :D Notifications gang here :)
@statquest
@statquest Жыл бұрын
Thanks!
@MannyBernabe
@MannyBernabe Ай бұрын
Great work. Thank you.
@statquest
@statquest Ай бұрын
Thanks!
@ericvaish8841
@ericvaish8841 3 ай бұрын
Great explanation my man!!
@statquest
@statquest 3 ай бұрын
Thank you!
@phobiatheory3791
@phobiatheory3791 Жыл бұрын
Hi, I love your videos! They're really well explained. Could you please make a video on partial least squares (PLS)
@statquest
@statquest Жыл бұрын
I'll keep that in mind.
@shamshersingh9680
@shamshersingh9680 7 ай бұрын
Hi Josh, again the best explanation for the concept. However, I have a doubt. As per the explanation, word-embeddings are the weights associated with each word between the input and activation function layer. These weights are obtained after training on large text corpus like wikipedia. When I train another model using these embeddings on another set of data, the weights (embeddings) will change during back-propagation while training. So the embeddings will not remain same and change with every model we train. Is it correct interpretation or I am missing something here.
@statquest
@statquest 7 ай бұрын
When you build a neural network, you can specify which weights are trainable and which should be left as is. This is the basis of "fine-tuning" a model - just training specific weights rather than all of them. So, you can do that. Or you, you can just start from scratch - don't pre-train the word embeddings, but train them when you train everything else. This is what most large language models, like ChatGPT, do.
@preet111
@preet111 3 ай бұрын
Hi josh thanksfor making such great videos like this, i wanted to ask why we don't have a bias here it can help in getting better word embeddings?
@statquest
@statquest 3 ай бұрын
The bias would just be a constant offset for all of the word embeddings, so we might as well just add 0 (or not use any bias).
@preet111
@preet111 2 ай бұрын
@@statquest got it, thanks for all the great videos, you helped me in getting my dream job again.
@statquest
@statquest 2 ай бұрын
@@preet111 Congratulations!!! TRIPLE BAM!!!
@RAMPALSINGH-bf3cp
@RAMPALSINGH-bf3cp 3 ай бұрын
i like the way he talks
@statquest
@statquest 3 ай бұрын
:)
@JohnDoe-r3m
@JohnDoe-r3m Жыл бұрын
That's awesome! But how would the multilingual word2vec be trained? Would the training dataset simply include corpus of two (or more) languages? or would additional NN infrastructure be required?
@statquest
@statquest Жыл бұрын
Are you asking about something that can translate one language to another? If so, then, yes, additional infrastructure is needed and I'll describe it in my next video in this series (it's called "sequence2sequence").
@JohnDoe-r3m
@JohnDoe-r3m Жыл бұрын
@@statquest not exactly, it's more like having similar words from multiple languages to be mapped within the same vector spaces. so for example King and "King" in French, German and Spanish - would appear to be the same.
@statquest
@statquest Жыл бұрын
@@JohnDoe-r3m Hmmm.. I'm not sure how that would work because the the english word "king" and the Spanish translation, "rey", would be in different contexts (For example, the english "king" would be in a phrase "all hail the king", and the spanish version would be in a sentence that had completely different words (even if they meant the same thing).
@c.nbhaskar4718
@c.nbhaskar4718 Жыл бұрын
great stuff as usual ..BAM * 600 million
@statquest
@statquest Жыл бұрын
Thank you so much! :)
@nouraboub4805
@nouraboub4805 4 ай бұрын
‏‪goood, thenk you so much for this playlist is the best ❤️😍
@statquest
@statquest 4 ай бұрын
Glad you enjoy it!
Sequence-to-Sequence (seq2seq) Encoder-Decoder Neural Networks, Clearly Explained!!!
16:50
StatQuest with Josh Starmer
Рет қаралды 209 М.
How to Get a Developer Job - Even in This Economy [Full Course]
3:59:46
freeCodeCamp.org
Рет қаралды 3,2 МЛН
Players vs Pitch 🤯
00:26
LE FOOT EN VIDÉO
Рет қаралды 137 МЛН
The Ultimate Sausage Prank! Watch Their Reactions 😂🌭 #Unexpected
00:17
La La Life Shorts
Рет қаралды 8 МЛН
Vectoring Words (Word Embeddings) - Computerphile
16:56
Computerphile
Рет қаралды 297 М.
Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
36:15
StatQuest with Josh Starmer
Рет қаралды 752 М.
Word2Vec, GloVe, FastText- EXPLAINED!
13:20
CodeEmporium
Рет қаралды 25 М.
Long Short-Term Memory (LSTM), Clearly Explained
20:45
StatQuest with Josh Starmer
Рет қаралды 606 М.
What are Word Embeddings?
8:38
IBM Technology
Рет қаралды 18 М.
Vector Databases simply explained! (Embeddings & Indexes)
4:23
AssemblyAI
Рет қаралды 356 М.
Recurrent Neural Networks (RNNs), Clearly Explained!!!
16:37
StatQuest with Josh Starmer
Рет қаралды 600 М.
A Complete Overview of Word Embeddings
17:17
AssemblyAI
Рет қаралды 113 М.
Players vs Pitch 🤯
00:26
LE FOOT EN VIDÉO
Рет қаралды 137 МЛН