Transformer Embeddings - EXPLAINED!

  Рет қаралды 33,335

CodeEmporium

CodeEmporium

Күн бұрын

Пікірлер: 52
@CodeEmporium
@CodeEmporium 2 жыл бұрын
Hope you all liked the video. Also, if you want to know more about stratascratch (the data science interview prep website), check out the link in the description. Cheers!
@yogeshsharma4308
@yogeshsharma4308 2 жыл бұрын
Saw few videos of yours for my research paper for masters. I have started to become your fan for the your multi pass approach and stitching all the nuances. Wow!!!
@timtanida7994
@timtanida7994 2 жыл бұрын
Great video! I think the reason why concatenation works better than addition in the multi-head attention module is because after the concatenation, there is a (fully-connected) linear layer that transforms the concatenated vectors back to the embedding dimension. So in your example of 8 heads, we have a concatenated vector of dimension 16, which gets transformed back to a dimension of 2. And since the weights in the linear layer are learned, it can give more or less weight to specific heads. Meaning if one head creates better encodings than others, then I will get higher weights. Whereas if we add all encodings from all heads, all of them essentially get the same weight, and there is no discrimination.
@kevon217
@kevon217 Жыл бұрын
This is the best intuitive explanation I’ve encountered so far. Great channel!
@CodeEmporium
@CodeEmporium Жыл бұрын
Thanks a ton for the compliment :)
@TURBOKNUL666
@TURBOKNUL666 2 жыл бұрын
This is an insanely good video. Thanks for clarifying bits and pieces. This high level perspective really helps.
@CodeEmporium
@CodeEmporium 2 жыл бұрын
Thank you for the comment and for watching :)
@andrewchang7194
@andrewchang7194 2 жыл бұрын
This was a really good video. I’ve gone through the Attention is all you need paper and read a lot about seq2seq models, but it’s really nice to solidify my understanding with a higher-level overview. Appreciate this.
@CodeEmporium
@CodeEmporium 2 жыл бұрын
A much appreciated comment. Thank you so much
@l_urent
@l_urent 2 жыл бұрын
As usual, nice video. Vectors are concatenated because they are split prior to entering the multiheaded attention. Hence the (parallelizable) "fork" we see on the diagram. Also, the concatenation does not augment the dimensionality compared to that of input embeddings, i.e. it actually restores it, thus authorizing the residual link's addition, otherwise impossible "as is" du to non-conformable operands. All this is more obvious when taking everything from the matricial point of view. Regarding the addition of the positional and the meaning-space embeddings, the more convenient way to think about it is first to keep in mind that we are dealing with (mathematical space) coordinates. Would you add a longitude with a latitude ? Same intuition here : positional vectors components inject (additively, along each meaning axis through words) deterministic structures between the words of the sequence. This is supposed to uniquely identify the sequence for the downstream network to deal with. Put differently, the only things that matter are the (sinus/cosinus-based) differences we have between words. In practice, (or after normalization) embeddings components are (unit-variance) centered-on-0 values, just like sinus and cosines outputs. Having a meaning-space component of, say, 50, is extremely unlikely or quickly cleaned by "(add and) norm".
@radiradev3013
@radiradev3013 2 жыл бұрын
Why are you showing a word2vec embedding used as in input to the transformer? I thought it was common to use a tokenizer and let the model learn an embedding using the encoder part of the transformer.
@krranaware8396
@krranaware8396 2 жыл бұрын
We can use already learned embedding to convert the input and output token as mentioned in transformer paper.
@anan7s294
@anan7s294 2 жыл бұрын
I’ve been bombarded with Data analyst courses. So I landed here, but I’m a bit confused about the Google courses- Data analyst, Project mgmt., Digital marketing . I’ve a master’s degree in geography and no work experience, but I have done O Level (foundation of IT) and I can work on html. My interests are automobiles, tech gadgets, travelling and learning new skills. What do u reckon would be the most suitable job from Google’s certificate and other certificates.
@mehular0ra
@mehular0ra 2 жыл бұрын
Great content! Please keep making these
@CodeEmporium
@CodeEmporium 2 жыл бұрын
Thanks a ton! And i shall :)
@LiaAnggraini1
@LiaAnggraini1 2 жыл бұрын
Another awesome content. Thank you so much ❤️
@viratmani7011
@viratmani7011 2 жыл бұрын
How do multiheaded attention can differentiate between word vectors and position beddings?
@AbdurRahman-mv4vz
@AbdurRahman-mv4vz Жыл бұрын
Thank you for this helpful video.
@CodeEmporium
@CodeEmporium Жыл бұрын
My pleasure
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
Can I ask, the material covered in this video, is it something that data scientist would learn and use? or would it be ML engineers? or just ML scientists? Just wondering where such understanding of Transformer architecture fall under.
@CodeEmporium
@CodeEmporium Жыл бұрын
Good question. If you’re talking about just using these architectures, I would say Machine Learning Engineers would use them. If you are talking about developing new ideologies for constructing the architectures themselves, it might fall in the researchers (they have many titles : AI researchers, Deep Learning Researchers, ML researchers) The meaning of “Data Science” has changed over the years. These days, it comes across as a fuzzy title that could be a Data Analyst, Data Engineer, ML engineer and everything in between. So do data scientists use Transformers? It depends. If the data scientist is more a machine learning engineer, then yes. If they are Data Analysts or Data Engineers, I doubt they would need to even use this beyond curiosity
@rayeneaminaboukabouya4782
@rayeneaminaboukabouya4782 2 жыл бұрын
Thank you so much so if we want to use this context for am image just we replace the words by patches !
@CodeEmporium
@CodeEmporium Жыл бұрын
That’s an interesting thought! I’m sure with some modifications to the input , that could be possible !
@sanoturman
@sanoturman 2 жыл бұрын
Very well done, thank you. If that's not a secret, what software is used to design this beautiful video (including the music)?
@CodeEmporium
@CodeEmporium 2 жыл бұрын
Thanks for watching! I used Camtasia Studio for this
@Data_scientist_t3rmi
@Data_scientist_t3rmi 2 жыл бұрын
Hello, could you tell me why @ 11:00 you used 8 vectors ? why not 9 or 10 ? thank you in advance
@CodeEmporium
@CodeEmporium Жыл бұрын
8 was just the official number of heads in the main paper. You can technically use as many as you wish. I assume they didn’t see noticeable differences even increasing this number
@mehular0ra
@mehular0ra 2 жыл бұрын
Can you make a video on neural dependency parser also
@divyanshoze8962
@divyanshoze8962 2 жыл бұрын
That's some lovely content, thank you.
@CodeEmporium
@CodeEmporium 2 жыл бұрын
You are very welcome!
@mehular0ra
@mehular0ra 2 жыл бұрын
btw which tool do you use to make these animations?
@CodeEmporium
@CodeEmporium 2 жыл бұрын
Camtasia Studio
@felixakwerh5189
@felixakwerh5189 2 жыл бұрын
Thank you 🙏🏽
@proxyme3628
@proxyme3628 2 жыл бұрын
Why Multi Head Attention generates 8 word vectors for each?
@CodeEmporium
@CodeEmporium Жыл бұрын
Because we assumed 8 attention heads in multi head attention. This could have been any other number in theory. I mentioned 8 because that was how many heads were used in the main paper
@aneekdas3056
@aneekdas3056 2 жыл бұрын
Extremely helpful. Thanks a ton.
@CodeEmporium
@CodeEmporium 2 жыл бұрын
Very welcome:)
@waisyousofi9139
@waisyousofi9139 2 жыл бұрын
Thanks, Please make a video for : what happens with decoder part .
@CodeEmporium
@CodeEmporium 2 жыл бұрын
I'll make a video on GPT at some point that may address this
@clairewang8370
@clairewang8370 2 жыл бұрын
@@CodeEmporium Looking forward to your GPT video !!!😍😍😍😍😍
@0xmahdirostami
@0xmahdirostami 2 жыл бұрын
Thank you
@CodeEmporium
@CodeEmporium 2 жыл бұрын
Super welcome
@tolbaahmed
@tolbaahmed 2 жыл бұрын
💖
@CodeEmporium
@CodeEmporium 2 жыл бұрын
:)
@jayseb
@jayseb Жыл бұрын
From the get go, the French translation isn't right...
@adityarajora7219
@adityarajora7219 2 жыл бұрын
can you make same for the bert!
@CodeEmporium
@CodeEmporium 2 жыл бұрын
I have in the past. You want to click on the "i" a the top right hand side of the video. This should have links
@ScriptureFirst
@ScriptureFirst 2 жыл бұрын
TIGBA! Please post a link to your script, even if you did it stick to it strictly or even veer completely off. 🙏🏼
@WhatsAI
@WhatsAI 2 жыл бұрын
Great video as always, especially loved the previous about Transformers and this follow-up! I just wanted to say that I love your videos and you are one of the reason with Yannick that I started to make some myself. I'd love to collaborate on a project with you someday if you'd be interesting! Added you on discord if you'd like to chat.
@CodeEmporium
@CodeEmporium 2 жыл бұрын
This makes my day. So glad there are more educators like yourself out there! Let's do this! We can chat collabs soon too :)
@asyabenali3779
@asyabenali3779 2 жыл бұрын
Can I contact you please. I am interested in this field and have encountered some problems, can you help me? By email
Sentence Embeddings - EXPLAINED!
16:59
CodeEmporium
Рет қаралды 3,2 М.
Sentence Transformers - EXPLAINED!
17:51
CodeEmporium
Рет қаралды 30 М.
When Cucumbers Meet PVC Pipe The Results Are Wild! 🤭
00:44
Crafty Buddy
Рет қаралды 33 МЛН
The Singing Challenge #joker #Harriet Quinn
00:35
佐助与鸣人
Рет қаралды 36 МЛН
Blowing up Transformer Decoder architecture
25:59
CodeEmporium
Рет қаралды 17 М.
Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.
9:40
AI Coffee Break with Letitia
Рет қаралды 70 М.
A Complete Overview of Word Embeddings
17:17
AssemblyAI
Рет қаралды 112 М.
Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
36:15
StatQuest with Josh Starmer
Рет қаралды 740 М.
Position Encodings (Natural Language Processing at UT Austin)
8:05
Greg Durrett
Рет қаралды 1,3 М.
[ 100k Special ] Transformers: Zero to Hero
3:34:41
CodeEmporium
Рет қаралды 51 М.
CS480/680 Lecture 19: Attention and Transformer Networks
1:22:38
Pascal Poupart
Рет қаралды 350 М.
BERT for Topic Modeling - EXPLAINED!
35:28
CodeEmporium
Рет қаралды 18 М.
Transformer Neural Networks Derived from Scratch
18:08
Algorithmic Simplicity
Рет қаралды 148 М.
What are Transformer Models and how do they work?
44:26
Serrano.Academy
Рет қаралды 127 М.