Ark (ark)

Пікірлер

@AiDreamscape2364 Ай бұрын

Where is the second part? Thanks :)

@AiDreamscape2364 2 ай бұрын

It's obvious that this host of this channel is a teacher since he knows how to teach

@InterpretingInterpretability 2 ай бұрын

Is this from a specific visualization program? Could you share a repo for it, please?

@wilfredomartel7781 4 ай бұрын

🎉

@kenseno5108 4 ай бұрын

I am a newbie. Probably this question doesnt make sense for you guys. What I am not able to understand is why do we need the Q, K, V matrices? The concept of Q, K, V is clear. But the word vectors are learnable right? Instead of new Q, K, V matrices what if we just learn the word vectors. Of course the network will be very vague then. Is the only reason of adding Q, K, V matrices is more depth or there is some other reason. Please help me with this query.

@benwillslee2713 5 ай бұрын

Thanks bro, it's the BEST explanation of Attenction I had seen so far ( I have to say that I had seen many others ), looking forward the other Parts eventhough it's been almost 4 years since this Part1 !

@abhijitambhore5858 5 ай бұрын

Thanks bro please do the second part

@liberate7604 5 ай бұрын

Best explanation ever🥇🥇🥇🥇

@rishishrestham8681 6 ай бұрын

please do the second part🥺, this was a great explanation, very intuitive.

@suriyars4487 6 ай бұрын

Bro 26:39 in your slide there is a mistake . I think you need to remove "Weigh original words" block since that action is being done in "Matrix Multiplication" step between weight matrix and value matrix

@suriyars4487 6 ай бұрын

More correctly it should be "softmax" block !!

@suriyars4487 7 ай бұрын

Bro dropped a masterpiece and disappeared !

@VishalKumar-qu4he 7 ай бұрын

great video ... explaining why need attention step by step

@ffilez 7 ай бұрын

I like this msa vid

@JesusSavedMe99 7 ай бұрын

Msa playlist 💀

@JeunEbat 7 ай бұрын

so why is this on msa’s playlist?

@kameshsingh7867 8 ай бұрын

Very helpful, thanks.

@rollingstone1784 8 ай бұрын

@arkaung, @ark_aung : there is an error at 13:00. s_1 is a row-VECTOR, so should be written in bold (just as v_1); s_1 represents the first row in the histogram. The components (s_11, ..., s_1n are scalars (normal font)) (the small boxes in the histogram) 14:00 again s_1 is a vector 15:00; the weighs w_i are vectors as well 17:45: y_i are vectors 20:50: maybe Matrixnotation would help here: V, the set of alle vectors v_i, is a matrix of dimension 3x50. The vector v_2 has dimensiont 1x50. Matrixmultiplication v_2 * V^T leads to (1x50)*(50x3) = s_2(1x3). Normalization leads to w_2(1x3). Matrixmultiplication y_2=w_2* V leads to (1x3)*(3x50)=(1x50) Remark: it should be noted that the last step is a "right multiplication" (matrix x vector) so it is, in matrix notation: V^T * w_2^T resulting in a vector y_2^T of dimension (50x3)*(3x1)=50x1. By transposiing this vector we get y_2(1x50).

@jynxzoid 9 ай бұрын

What is this doing here in the MSA Top Playlist?

@ahmadabousetta 9 ай бұрын

Wonderful. I enjoyed your explanation. Thank you.

@paveltolmachev1898 9 ай бұрын

By the way, I watched about 10 videos on attention, this is the best video so far. Trust me

@paveltolmachev1898 9 ай бұрын

I don't really understand why we need three different matrices for query, key and value. Why not to have just one? My reasoning is as follows. Let's say we have word embeddings {v_i}. To adjust for context, we need to "nudge" these vectors around, so that they incorporate the contextual information. As mentioned in the video, we can just use the embeddings {v_i} for computing similarity scores themselves, but then there would be no learnable parameters. However, one can simply introduce a matrix M, such that it is a feature extractor: f_i = M v_i, extracting useful features from the word embedding. Then, to compute the contextual similarity scores, we can just use vectors f_i directly: s_ij = (f_i, f_j) - there seems to be absolutely no need to use any other matrices. What am I missing here? Instead, we have three matrices which essentially do this: Mq - extracts query information {q_i}. Mk extracts key information, giving us a set {k_i}. The similarities are computed as s_ij = (q_i, k_j) and then softmaxed. The values {u_i} are computed as Mv v_i. We update our original embeddings v_i replacing them with h_i = sum_j s_ij u_j. I, for now, see no reason for this complexity, and it makes me paranoid

@fir3drag0n1984 9 ай бұрын

why only a single part.. :/

@karannchew2534 9 ай бұрын

Help please! I'm lost from 24:17 where 50x50 matrice are inserted. Why are the matrice inserted???

@Jess.1.4.12 9 ай бұрын

Why is this on msa playlist😭

@MinsungsLittleWorld 10 ай бұрын

Me - Scrolls through MSA's playlist Also me - finds this video in the playlist that is not even related to MSA

@ZarifaahmedNoha-ey9kh 7 ай бұрын

bro i am too i was just scrolling and founf this

@AliahDonald 10 ай бұрын

im here from msa…

@siddharth-gandhi 10 ай бұрын

The BEST source of information I've come across on the internet about the intuition behind the Q,K and V stuff. PLEASE do part 2! You are an amazing teacher!

@ucsdvc 11 ай бұрын

This is the most intuitive but non-hand-wavy explanation of self-attention mechanism I’ve seen! Thanks so much!

@rdgopal 11 ай бұрын

This is by far the most intuitive explanation that I have come across. Great job!

@DouwedeJong 11 ай бұрын

Thanks for making this video. When the say BERT has 24 layers, 1024-dimensional hidden representations, 16 attention heads in each self-attention module, and 340M parameters. What they they represent in this video?

@RahimPasban 11 ай бұрын

this is one of the greatest videos that I have ever watched about Transformers, thank you!

@unclecode 11 ай бұрын

Such a brilliant explanation, 3 years ago, Part 1, and nothing after that! Sad

@osamutsuchiyatokyo Жыл бұрын

I believe it is at least one of the most clear presentations of multi-head attention.

@Jaybearno Жыл бұрын

Sir, you are an excellent instructor. Thank you for making this.

@atulshuklaa Жыл бұрын

You are awesome man 🎉🎉

@Boredreturn Жыл бұрын

why is this in msa top videos

@MrMacaroonable Жыл бұрын

Can you elaborate the part around 24:17 where you introduced the Q, K, V? You mentioned "..... we want to preserve the dimension..." what's the intuition for that?

@darylgraf9460 Жыл бұрын

I'd just like to add my voice to the chorus of praise for your teaching ability. Thank you for offering this intuition for the scaled dot product attention architecture. It is very helpful. I hope that you'll have the time and inclination to continue providing intuition for other aspects of LLM architectures. All the best to you and yours!

@JonathanUllrich Жыл бұрын

this video solved a month long understanding problem I had with attention. thank you so much for this educational and didactic master piece!

@izkaoix Жыл бұрын

why tf is this in msas playlist

@saimasideeq7254 Жыл бұрын

really helpful explanation

@youtube1o24 Жыл бұрын

Very decent work, please have more part 2, part 3 of this series.

@sibyjoseplathottam4828 Жыл бұрын

This is undoubtedly one of the best and most intuitive explanations of the Self-attention mechanism. Thank you very much!

@neilharvyn1417 Жыл бұрын

Transformer are robot change to car autobots are heroes villains are the deception and animals change to robot maximal protect and predacons and terracon are villan