Informer attention Architecture - FROM SCRATCH!

  Рет қаралды 3,757

CodeEmporium

CodeEmporium

Күн бұрын

Пікірлер
@LeoLan-vv1nq
@LeoLan-vv1nq 7 ай бұрын
Amazing work, can't wait for next episode !
@mohamadalikhani2665
@mohamadalikhani2665 5 ай бұрын
Thank you for your amazing content. Where can I acess the drawio file?
@lfyang9603
@lfyang9603 2 ай бұрын
thank you so much for the videos
@neetpride5919
@neetpride5919 7 ай бұрын
Why aren't the padding tokens appended during data preprocessing, before the inputs are turned by the feedfoward layer into the key, query, value, vectors?
@slayer_dan
@slayer_dan 7 ай бұрын
Adding padding before forming K, Q, and V vectors would insert extra tokens into the input sequences, altering their lengths and potentially distorting the underlying data structure. As a result, the subsequent computation of K, Q, and V vectors would incorporate these padding tokens, affecting the model's ability to accurately represent the original data. During the attention calculation, these padding tokens would influence the attention scores, potentially diluting the focus on the actual content of the input sequences. This could lead to less effective attention patterns and hinder the model's ability to learn meaningful representations from the data. Furthermore, applying padding after forming K, Q, and V vectors allows for the efficient use of masking techniques to exclude padding tokens from the attention mechanism. By setting the attention scores corresponding to padding positions to negative infinity before the softmax operation, the model effectively ignores these tokens during attention calculation. This approach preserves the integrity of the input sequences, ensures accurate attention computations, and maintains the model's focus on relevant information within the data. P.S. I used ChatGPT to format my answer because it can do this thing better.
@neetpride5919
@neetpride5919 7 ай бұрын
@@slayer_dan how could it possibly save computing power to pad the matrices with multiple, 512-element vectors, rather than simply appending tokens to the initial sequence of tokens?
@deltamico
@deltamico 6 ай бұрын
Take it with a grain if salt but I think if you hardcore the mask to not be paid attention to, you don't need learn that extra behavior for the [pad] token so it's more stable.
@jarhatz
@jarhatz 5 ай бұрын
​@@neetpride5919 Multiplying matrices on the GPU can be optimized by efficiently sizing the matrices such that they fit more cleanly in GPU cache. For example, suppose you have two skinny tall matrices that you want to multiply together. Sometimes, the kernel operations that occur across one (tall) axis can be the bottleneck in compute time. There are instances in optimization where padding matrices with 0s to uniform square shapes or multiples of the cache block size can speed up the kernel operations on the GPU.
@samson6707
@samson6707 4 ай бұрын
can i find the flow chart graphic of the informer model on github? and is draw io for free?
@ajaytaneja111
@ajaytaneja111 4 ай бұрын
Hi Ajay, again the best videos. It's 3AM here and I'm watching your Informer video! Do you think using Informers instead of Transformers in an LLM would result in more contextual responses than using Transformers? I'm trying to answer the question how would Ibformers improve efficiency of an LLM from end user point of view
@CodeEmporium
@CodeEmporium 4 ай бұрын
I believe if we use prob sparse attention (informer) over full attention (transformer) we could see some improvements from end user stand point: (1) faster response times for very long sequences since the operation is n*logn instead of quadratic in input (2) ability to handle longer sequences (3) faster training (again if the training data has long sequences)
@ajaytaneja111
@ajaytaneja111 4 ай бұрын
@@CodeEmporium , thanks a lot, Ajay. I think point no 2 will also imply the responses being "more" than in Full Attention, isn't it?
@Ishaheennabi
@Ishaheennabi 7 ай бұрын
Love from kashmir india bro!❤❤❤
@adelAKAdude
@adelAKAdude 6 ай бұрын
great video thanks question ... in the third question ... how do sample subset of keys, queries "depending on importance"
@-beee-
@-beee- 7 ай бұрын
I would love if the quizzes had answers in the comments eventually. I know this is a fresh video, but I want to check my work, not just have a discussion 😅
@rpraver1
@rpraver1 7 ай бұрын
Also as always great video, hoping in future you deal with encoder only and decoder only transformers...
@CodeEmporium
@CodeEmporium 7 ай бұрын
Yep! For sure. Thank you so much!
@sudlow3860
@sudlow3860 7 ай бұрын
With regard to the quiz I think it is B D B. Not sure how this is going to launch a discussion though. You present things very well.
@CodeEmporium
@CodeEmporium 7 ай бұрын
Ding ding ding! Good work on the quiz! While this may or may not spark a discussion, just wanted to say thanks for participating :)
@dumbol8126
@dumbol8126 7 ай бұрын
is this same as the wjat timesfm uses
@AmirthaAmirtha-m2b
@AmirthaAmirtha-m2b 6 ай бұрын
Can u tell an interactive model of AI neural network for school project.. And ur videos are nice and I understand easily.. Pls tell
@rpraver1
@rpraver1 7 ай бұрын
Not sure if just me, but starting at about 4:50 your graphics are so dark... maybe go to a white background or light gray, like your original png...
@CodeEmporium
@CodeEmporium 7 ай бұрын
Yea. Let me try brightening them up for future videos if I can. Thanks for the heads up
@eadweard.
@eadweard. 7 ай бұрын
In answer to your question, I can either: A) mono-task or B) screw up several things at once
@theindianrover2007
@theindianrover2007 7 ай бұрын
cool!
@CodeEmporium
@CodeEmporium 7 ай бұрын
Thank you 🙏
Informer attention code -  FROM SCRATCH!
36:06
CodeEmporium
Рет қаралды 2 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Арыстанның айқасы, Тәуіржанның шайқасы!
25:51
QosLike / ҚосЛайк / Косылайық
Рет қаралды 700 М.
小丑教训坏蛋 #小丑 #天使 #shorts
00:49
好人小丑
Рет қаралды 54 МЛН
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН
Informer: Time series Transformer - EXPLAINED!
15:17
CodeEmporium
Рет қаралды 13 М.
Cross Attention | Method Explanation | Math Explained
13:06
Transformer Neural Networks Derived from Scratch
18:08
Algorithmic Simplicity
Рет қаралды 153 М.
The complete guide to Transformer neural Networks!
27:53
CodeEmporium
Рет қаралды 37 М.
GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem
19:15
Rotary Positional Embeddings: Combining Absolute and Relative
11:17
Efficient NLP
Рет қаралды 40 М.
How might LLMs store facts | DL7
22:43
3Blue1Brown
Рет қаралды 917 М.
BEST OPEN Alternative to OPENAI's EMBEDDINGs for Retrieval QA: LangChain
24:22
Арыстанның айқасы, Тәуіржанның шайқасы!
25:51
QosLike / ҚосЛайк / Косылайық
Рет қаралды 700 М.