Transformer Architecture | Part 1 Encoder Architecture

Transformer Architecture | Part 1 Encoder Architecture | CampusX

Рет қаралды 20,554

Күн бұрын

Пікірлер: 155

@AidenDsouza-ii8rb 4 ай бұрын

Your Deep Learning playlist is pure gold! The intuition and simplicity you bring to complex concepts are amazing. As a dedicated student who's watched it all, I can say it's top-notch quality. Thank you for this video series.

@FindingTheSuccess-w2b 4 ай бұрын

Sir... please make videos regularly.....🙏🙏

@bhaveshshrivastava3024 2 ай бұрын

bhai videos regularly upload nahi ho paati isliye tutorial quality itni top notch hain..sir....aaram se time leke videos upload kijiye, 2 2 week main

@sowmyaraoch 3 ай бұрын

This entire playlist is so intuitive and you've made all complex concepts so simple. Please continue this playlist.

@aneessohail1008 4 ай бұрын

Sir kindly aap yeh series ki videos regularly upload Kiya karain aap jaisa course Kisi ka nhi ❤❤

@jooeeemusic7963 4 ай бұрын

Sir I'm waiting for these videos every single day. Please Upload on regular basis sir.

@drrabiairfan993 3 ай бұрын

Even the author of the original paper cannot explain that well.... an absolutely amazing and illustrative explanation to transformer....without any doubt the best explanation available anywhere.

@amanagrawal4198 4 ай бұрын

Great!! , watched full DL playlist , great resource to understand whole deep learning

@ersushantkashyap 2 ай бұрын

Nitish Sir, jaise app bolte ho, "Khol ker rakhunga apke samne" apne exactly khol ker rakh diya ye video, thank you so very much.

@Shisuiii69 2 ай бұрын

Seriously bro 💯

@harshsingh7842 6 күн бұрын

What a great explanation Man Loved it

@sohaibahmed4439 3 ай бұрын

Superb curriculum management and teaching style! Thanks!

@ParthivShah 3 ай бұрын

Thank You Very Much sir for continuing this playlist.

@RdXDeveloper 3 ай бұрын

Sir is maintaining Quality not Quantity. That's why he takes time for every video. Thank you so much sir.❤️‍🩹

@anuradhabalasubramanian9845 Ай бұрын

How are you brilliant Sir !!!!!!!!!!!!! Super Guru for us !!!!!!Great explanation Sir

@nikhilgupta6803 4 ай бұрын

as usual....awesome and simple

@zeeshanahmed8640 24 күн бұрын

Hi Nitish sir, your deep learning playlist is absolutely mind blowing. please also upload videos related to fine tuning of encoder type T/F , decoder type T/F and encoder/decoder T/F. and also upload videos related to langchain please

@paragbharadia2895 2 ай бұрын

huge respect, and lot more lessons to learn from all videos you have posted! thank you sir!

@mukul3634 2 ай бұрын

I am amazed like now i am feeling there is noting easy than transformers............ie i am an mechanical engineer u understand quite quite well sir.....................even i am feeling like it is easier than linear or logistic regression............now i can teach this concept to any 5 year old child

@sukumarane2302 Ай бұрын

So great and appreciable ! You made this complex task of explaining transformer architecture simple … Thank you sir

@videoediting0 3 ай бұрын

Marvelous explanation in a very simplified way, great man.

@tirthadebnath2497 4 ай бұрын

Your tutorials are really gold for higher studies.

@nikhilraj3840 3 ай бұрын

One of best best transformers explanation playlist , you are amazing

@saurabhkaushik8282 4 ай бұрын

Great explanation, sir! I watched the entire Transformer series, and you made it so easy to understand. Many thanks! Looking forward to the decoder parts.

@narasimhasaladi7 2 ай бұрын

the combination of add and norm operations in the residual connection of transformer encoders provides these benefits: Improved gradient flow Preservation of information Enhanced learning Increased stability Faster convergence Better generalization

@laxminarayangaidhane7063 3 ай бұрын

Wonderful explanation...I was getting bored of watching previous few videos but after completion of these videos ... I understood the current video easily😊. You have explained very nicely.

@ai_pie1000 3 ай бұрын

Bhai the way you taught na which is exactly same how easy our mind try to memorize the hard concepts. ❤

@khatiwadaAnish 3 ай бұрын

Thank you so much, You made this topic so simple even I am feeling confident to teach other.

@just__arif 2 ай бұрын

Great Explanation!

@SBhupendraAdhikari 3 ай бұрын

Thanks Sir for such beautiful explanation

@RdXDeveloper 4 ай бұрын

Sir thanks a lot for your This Effort❤. You just sir awsome. Your free courses are more valuable than a paid course. On of the best KZbin Channel This Is.❤

@SrideviSutraya 2 ай бұрын

Very good explanation

@manjeet4418 4 ай бұрын

Thank You Sir For Detailed Explanation ❤

@nomannosher8928 4 ай бұрын

always the best explanation.

@SamiUllah-ql9my 4 ай бұрын

Sir I am waiting this video very long time I love you teaching style I can't found any one who can teach better then you

@trickydotworld 3 ай бұрын

Thank you very much. Waiting for Decoder part

@virajkaralay8844 4 ай бұрын

Absolute banger video on transformers encoder, cannot wait for the decorder video to drop

@amitbohra9283 4 ай бұрын

Sir great video thanks, waiting eagerly for the second part.

@gender121 3 ай бұрын

Waiting anxiously for the remaining videos ..please bring them soon.

@princekhunt1 3 ай бұрын

Nice explanation 👌

@imteyazahmad9616 4 ай бұрын

Amazing 🤗, please upload videos regularly. Waiting for next video on decoder

@SandeepSingh-yx2si 4 ай бұрын

Very Good Explanation.

@ujjawalagrawal 4 ай бұрын

Wow great sir thanks for preparing the video

@ayushparwal2210 2 ай бұрын

interesting video sir thanks to you.

@arpittalmale6440 4 ай бұрын

The Residual connection they use in each head because by using the original contextual vector after each operation they can maintained the meaning of the sentence we provide earlier because If they do not use it, The 95% probability that after the operation through the Attentions layer they loose the context of the word with respect to each other, Because at the output, model compute a loss value based on objective and this loss is then back propagated to update the model weights including updating the value of the word embedding vectors. Their is concept of "Teacher Forcing " in this during the training the model is feed with actual ground truth output(target sequence), This can help stabilize training and accelerate convergence during training by providing more accurate and consistent results.

@LMessii10 3 ай бұрын

Brilliance ❤ 👏

@arjunsingh-dn2fo 15 күн бұрын

Sir, As we learned in the boosting algorithm, we use residuals to ensure the difference between actual and predicted values. So, sir, I think this residual connection is doing the same thing here. It is analyzing the difference between actual embedding and contextually aware embedding. If there is a vanishing problem, as you said, it passes on actual embedding to the next feed-forward neural network. Sir, What's your opinion about this?

@shubhamgattani5357 21 күн бұрын

Thank you god!

@anonymousman3014 4 ай бұрын

Sir, one more time I am requesting you to complete the deep learning playlists ASAP. Please Sir🙏.

@Shubham_gupta18 4 ай бұрын

Please continue on this playlist Nitish sir, and regularly upload the videos just a humble request. The placement season is coming soon and we need you.

@vimalshrivastava6586 4 ай бұрын

Thank you for this wonderful video.❤

@chinmoymodakturjo5293 4 ай бұрын

Kindly drop videos regularly and complete the series please !

@chiragsharma1428 4 ай бұрын

Finally the wait is over . Thanks alot Sir

@Deepak-ip1se 4 ай бұрын

Very nice video!!

@harshmohan5411 3 ай бұрын

sir, i think the reason of residual connection is, so that the information regarding positional encoding doesn't get lost because as you say that they uses 8 encoder blocks in the original transformer , so to remind the transformer about the positions i think.

@peace-it4rg 3 ай бұрын

Sir i think ki sparse embedding or matrix generate na ho isiliye resnet connections use kiya hoga taki thora dense or stable rahe architecture or network learn kar paye nhi to overfitting ho jayega. what is your thought about this

@myself4024 3 ай бұрын

🎯 Key points for quick navigation: 00:00 *📚 Introduction to Transformer Architecture* - The video begins with an introduction to Transformer architecture, highlighting key components already covered such as self-attention, multi-head attention, positional encoding, and layer normalization. - The focus will now shift to a detailed exploration of Transformer architecture, particularly the encoder part. - The teaching approach involves understanding individual components first before delving into the complete architecture. 03:07 *🛠️ Prerequisites and Preparation* - Emphasis on prerequisites for understanding Transformer architecture, including prior knowledge of self-attention, multi-head attention, positional encoding, and layer normalization. - The presenter has created a series of videos covering these foundational topics and recommends reviewing them to grasp the upcoming content on encoder and decoder architectures. - The current video will focus specifically on the encoder architecture, while the decoder will be covered in subsequent videos. 05:06 *📊 Detailed Explanation of Encoder Architecture* - The video starts the detailed exploration of the Transformer encoder architecture, using a complex diagram to represent the entire Transformer model, including both encoder and decoder. - The presenter acknowledges the complexity of the diagram and aims to break down the encoder architecture in an accessible way for better understanding. 05:49 *🗺️ Simplified Transformer Architecture* - The video simplifies the complex Transformer architecture diagram into two main components: the encoder and the decoder. - A basic representation shows that the Transformer consists of an encoder box and a decoder box. - The simplified model helps in understanding that there are multiple encoder and decoder blocks within these components. 07:13 *🏗️ Multi-Block Structure* - The simplified model is expanded to show multiple encoder and decoder blocks, with six blocks of each in the original Transformer model as per the "Attention Is All You Need" paper. - Each block within the encoder and decoder is identical, meaning understanding one block applies to all others. - The focus will be on understanding a single encoder block to grasp the entire architecture. 09:11 *🔍 Detailed Encoder Block Breakdown* - The detailed view of an encoder block reveals it consists of two main components: a self-attention block and a feed-forward neural network. - The self-attention block is described as multi-head attention, and the feed-forward neural network is a key part of the encoder block's functionality. - Additional components such as layer normalization and residual connections are also part of the encoder block's architecture. 10:18 *📈 Actual Encoder Block Architecture* - The actual architecture of an encoder block is shown, including self-attention (multi-head attention) and feed-forward neural network blocks. - The diagram includes additional elements like layer normalization and residual connections, highlighting the complexity beyond the simplified model. - The video emphasizes understanding the detailed connections and components within an encoder block. 11:48 *🔄 Sequential Processing of Encoder Blocks* - Outputs from one encoder block serve as inputs for the next encoder block, continuing through all blocks until the final output is sent to the decoder. - The process involves multiple encoder blocks (six in the original Transformer model) that are sequentially connected. - The main goal is to understand the functioning of these blocks by examining the processing within each one. 12:29 *🧩 Introduction to Detailed Example* - A new page is introduced to explain the encoder architecture with a detailed example sentence. - The goal is to track how an example sentence (e.g., "How are you") moves through the encoder and understand the encoding process. - The explanation will involve breaking down each step and how the input sentence is processed within the encoder. 13:40 *✍️ Initial Operations on Input* - Before the main encoding, the input sentence undergoes three key operations: tokenization, text vectorization, and positional encoding. - Tokenization breaks the sentence into words, and text vectorization converts these words into numerical vectors using embeddings. - Positional encoding adds information about word positions to maintain the sequence order. 14:51 *🔢 Tokenization and Vectorization* - Tokenization splits the sentence into individual words, creating tokens like "How," "are," and "you." - Text vectorization converts these tokens into 512-dimensional vectors using embeddings, which represent each word numerically. - Positional encoding is applied to integrate information about word positions into the vectors. 17:25 *📍 Positional Encoding* - Positional encoding provides positional information by generating a vector for each position in the sentence. - These positional vectors are added to the word vectors to ensure the model can understand the order of words. 18:30 *🧩 Positional Encoding and Input Vector Integration* - Positional encoding adds information about word positions to the input vectors to maintain the sequence order. - This process integrates positional vectors with word vectors to ensure that the model understands the word sequence. 19:04 *🔄 Introduction to Encoder Block Operations* - Detailed examination of the operations within the first encoder block, focusing on multi-head attention and normalization. - Introduction of a new diagram to explain the functioning of these operations. 20:08 *🧠 Multi-Head Attention Mechanism* - Multi-head attention applies multiple self-attention mechanisms to capture diverse contextual information. - This process generates contextually aware vectors for each input word by considering surrounding words. 22:01 *➕ Addition and Normalization* - After multi-head attention, addition and normalization are applied to maintain dimensional consistency and improve stability. - A residual connection is used, where the original input vectors are added to the output of the multi-head attention block. 25:28 *📏 Layer Normalization Explained* - Layer normalization standardizes each vector by calculating the mean and standard deviation for its components, adjusting them to a fixed range. - This helps stabilize the training process by ensuring that values remain within a defined range, preventing large fluctuations. 27:00 *🔄 Purpose of Residual Connections* - Residual connections (or skip connections) are used to add the original input vectors back to the output of the multi-head attention block. - This mechanism helps in maintaining the flow of gradients and preserving the original information during training. 28:35 *🧠 Feed-Forward Network in Encoder* - Introduction to the feed-forward neural network within the encoder block, including its architecture and function. - The network consists of two layers: the first with 2048 neurons using ReLU activation and the second with 512 neurons using linear activation. 32:22 *📊 Feed-Forward Network Processing* - The feed-forward network processes vectors by increasing their dimensionality, applying transformations, and then reducing the dimensionality back. - The first layer increases the vector size from 512 to 2048, and the second layer reduces it back to 512. 35:04 *🔄 Skip Connections and Normalization* - Skip connections bypass the feed-forward network output, adding the original vectors to the processed output. - After addition, layer normalization is applied again to the resulting vectors. 38:01 *🔁 Encoder Block Repetition* - The output vectors from one encoder block become the input for the next encoder block. - Each encoder block contains its own set of parameters for weights and biases, even though the architecture is similar across blocks. 39:18 *🔄 Summary of Encoder Processing* - A quick summary of the transformer encoder process from input to output. - Input sentences undergo tokenization, embedding, and positional encoding. 41:34 *❓ Questions and Residual Connections* - Discussion of the importance of residual connections in the encoder blocks. - Residual connections help stabilize training by allowing gradients to flow more effectively through deep networks. 45:55 *🔍 Alternative Path in Multi-Head Attention* - Discussion on providing an alternate path in case multi-head attention fails to perform effectively. - Residual connections allow the use of original features if transformations are detrimental. 48:00 *🧩 Feed-Forward Neural Networks in Transformers* - Exploration of why feed-forward neural networks are used in transformers alongside multi-head attention. 52:03 *🔢 Number of Encoder Blocks in Transformers* - Multiple encoder blocks are used in transformers to effectively understand and represent language. - A single encoder block does not provide satisfactory results for language comprehension. Made with HARPA AI

@not_amanullah 4 ай бұрын

This is helpful 🖤🤗

@dataninjaa 4 ай бұрын

i was desperately waiting for your videos, itna to Mirzapur 3 ka bhi nahi kiya tha

@koushik7604 4 ай бұрын

Wonderful! This is too good.

@tannaprasanthkumar9119 4 ай бұрын

It was amazing sir

@kushagrabisht9596 4 ай бұрын

Great content sir. Please launch deep learning course fast

@electricalengineer5540 4 ай бұрын

much awaited video

@PawanAgrawal3012 4 ай бұрын

Good one. Please make a dedicated playlist on Pytorch dealing with neural network

@SachinBareth-d2k 3 ай бұрын

very helpful

@vinayakbhat9530 2 ай бұрын

excellent

@durgeshameta254 4 ай бұрын

UR GENIUS.

@kunaldere-g8l 3 ай бұрын

Sir I remember one thing you said about transformer, while starting transformer topic. That is transformer architecture dumped from future.

@gender121 4 ай бұрын

Ok we are expecting decoder also soon….not too long to wait

@himanshurathod4086 4 ай бұрын

The man with 0 haters

@srinivaspadhy9821 4 ай бұрын

Sir Please bring the decoder architecture soon, I have my interview comming soon. Thank You. May you grow even faster.

@AllInOne-gn4ve 4 ай бұрын

Thanks a lot ❤❤❤❤! please sir, continue this playlist🙏🙏🙏🙏

@the_princekrrazz 4 ай бұрын

Sir, please update videos on this series regularly A humble request please please .

@KumR 4 ай бұрын

Thanks a lot Nitish. Great Video. Can I make a suggestion? Can you do one live QnA session to clarify any doubts?

@karanhanda7771 3 ай бұрын

Bro i like your videos and way of education. But jab ap koe purane topic ki baat karte hai.. uska link dal diya karo sir🙏. Thora easy rahe ga hamare liye

@meetpatel8733 4 ай бұрын

that was great video....but I have a question on the multihead attention part......In the previous video of multihead attention there were two self attention blocks used and for money bank example....Ymoney1 and Ymoney2 ....two vectors were generated so for two words money bank there were 4 vector generated Ymoney1, Ybank1 and Ymoney2, Ybank2....... but here in the main architecture you told 512 dimension vector will be input to the multi head attention block and it will give same size of vector of 512 dimension.....I don't now my question is silly or not...if you can explain on that please...... but all the videos were great.... Thank you....

@RamandeepSingh_04 4 ай бұрын

Thank you so much sir ❤🎉

@KumR 3 ай бұрын

Hi Nitish - I understand that GPT uses this transformer arch . Do all LLMs use this ?

@Thebeautyoftheworld1111 4 ай бұрын

Please make a playlist on Gen ai 🙏

@ESHAANMISHRA-pr7dh 4 ай бұрын

Thank you sir for the video. I request you to please complete the playlist 🙏🙏🙏🙏

@AmitBiswas-hd3js 4 ай бұрын

Please Sir, complete this transformer series ASAP.

@not_amanullah 4 ай бұрын

Upload regularly 🙏

@lokeshsharma4177 3 ай бұрын

Just like "Inputs" are there any prior operations for "output" before it goes to Decoders?

@not_amanullah 4 ай бұрын

Thanks ❤️

@AllInOnekh 4 ай бұрын

Finally .. thank you sir

@aiforeveryone 3 ай бұрын

Great

@farhankarim8624 2 ай бұрын

GOAT 👑

@rishabhkumar4360 3 ай бұрын

waiting for part 2

@Harshh811 3 ай бұрын

Sir, when are you dropping the next lecture(Decoder)?

@darkwraith8867 3 ай бұрын

Sir make a video for state space models and mamba architecture

@03_afshanahmedkhan39 4 ай бұрын

Ordinary people waiting for mirzapur season - 3 Legends waiting for new video for this playlist Please sir upload the next video

@shivshakti_1111_ 4 ай бұрын

Sir in next video please bring vision transformer

@wewesusui9528 4 ай бұрын

Sir please make videos regularly 🙏

@Arpi457 4 ай бұрын

Finally transformer is here.

@SAPTAPARNAKUNDU-g9d 4 ай бұрын

BERT architecture ka video kab ayega sir?

@LMessii10 3 ай бұрын

Gold mine for AI/ML

@sparshverma1699 4 ай бұрын

Thankyou for this sir

@sharangkulkarni1759 4 ай бұрын

Finally

@akashrathore1388 3 ай бұрын

Sir please create a supplimentry video on module and package because its little bit confusing the only one topic that left from python playlist

@danianiazi8229 2 ай бұрын

Please bring more videos

@priyam8665 2 ай бұрын

sir please make decoder video of transformer

@YasinArafat-hp4wr 4 ай бұрын

sir ! please complete the playlist. Please sir .