Your Deep Learning playlist is pure gold! The intuition and simplicity you bring to complex concepts are amazing. As a dedicated student who's watched it all, I can say it's top-notch quality. Thank you for this video series.
@FindingTheSuccess-w2b4 ай бұрын
Sir... please make videos regularly.....🙏🙏
@bhaveshshrivastava30242 ай бұрын
bhai videos regularly upload nahi ho paati isliye tutorial quality itni top notch hain..sir....aaram se time leke videos upload kijiye, 2 2 week main
@sowmyaraoch3 ай бұрын
This entire playlist is so intuitive and you've made all complex concepts so simple. Please continue this playlist.
@aneessohail10084 ай бұрын
Sir kindly aap yeh series ki videos regularly upload Kiya karain aap jaisa course Kisi ka nhi ❤❤
@jooeeemusic79634 ай бұрын
Sir I'm waiting for these videos every single day. Please Upload on regular basis sir.
@drrabiairfan9933 ай бұрын
Even the author of the original paper cannot explain that well.... an absolutely amazing and illustrative explanation to transformer....without any doubt the best explanation available anywhere.
@amanagrawal41984 ай бұрын
Great!! , watched full DL playlist , great resource to understand whole deep learning
@ersushantkashyap2 ай бұрын
Nitish Sir, jaise app bolte ho, "Khol ker rakhunga apke samne" apne exactly khol ker rakh diya ye video, thank you so very much.
@Shisuiii692 ай бұрын
Seriously bro 💯
@harshsingh78426 күн бұрын
What a great explanation Man Loved it
@sohaibahmed44393 ай бұрын
Superb curriculum management and teaching style! Thanks!
@ParthivShah3 ай бұрын
Thank You Very Much sir for continuing this playlist.
@RdXDeveloper3 ай бұрын
Sir is maintaining Quality not Quantity. That's why he takes time for every video. Thank you so much sir.❤️🩹
@anuradhabalasubramanian9845Ай бұрын
How are you brilliant Sir !!!!!!!!!!!!! Super Guru for us !!!!!!Great explanation Sir
@nikhilgupta68034 ай бұрын
as usual....awesome and simple
@zeeshanahmed864024 күн бұрын
Hi Nitish sir, your deep learning playlist is absolutely mind blowing. please also upload videos related to fine tuning of encoder type T/F , decoder type T/F and encoder/decoder T/F. and also upload videos related to langchain please
@paragbharadia28952 ай бұрын
huge respect, and lot more lessons to learn from all videos you have posted! thank you sir!
@mukul36342 ай бұрын
I am amazed like now i am feeling there is noting easy than transformers............ie i am an mechanical engineer u understand quite quite well sir.....................even i am feeling like it is easier than linear or logistic regression............now i can teach this concept to any 5 year old child
@sukumarane2302Ай бұрын
So great and appreciable ! You made this complex task of explaining transformer architecture simple … Thank you sir
@videoediting03 ай бұрын
Marvelous explanation in a very simplified way, great man.
@tirthadebnath24974 ай бұрын
Your tutorials are really gold for higher studies.
@nikhilraj38403 ай бұрын
One of best best transformers explanation playlist , you are amazing
@saurabhkaushik82824 ай бұрын
Great explanation, sir! I watched the entire Transformer series, and you made it so easy to understand. Many thanks! Looking forward to the decoder parts.
@narasimhasaladi72 ай бұрын
the combination of add and norm operations in the residual connection of transformer encoders provides these benefits: Improved gradient flow Preservation of information Enhanced learning Increased stability Faster convergence Better generalization
@laxminarayangaidhane70633 ай бұрын
Wonderful explanation...I was getting bored of watching previous few videos but after completion of these videos ... I understood the current video easily😊. You have explained very nicely.
@ai_pie10003 ай бұрын
Bhai the way you taught na which is exactly same how easy our mind try to memorize the hard concepts. ❤
@khatiwadaAnish3 ай бұрын
Thank you so much, You made this topic so simple even I am feeling confident to teach other.
@just__arif2 ай бұрын
Great Explanation!
@SBhupendraAdhikari3 ай бұрын
Thanks Sir for such beautiful explanation
@RdXDeveloper4 ай бұрын
Sir thanks a lot for your This Effort❤. You just sir awsome. Your free courses are more valuable than a paid course. On of the best KZbin Channel This Is.❤
@SrideviSutraya2 ай бұрын
Very good explanation
@manjeet44184 ай бұрын
Thank You Sir For Detailed Explanation ❤
@nomannosher89284 ай бұрын
always the best explanation.
@SamiUllah-ql9my4 ай бұрын
Sir I am waiting this video very long time I love you teaching style I can't found any one who can teach better then you
@trickydotworld3 ай бұрын
Thank you very much. Waiting for Decoder part
@virajkaralay88444 ай бұрын
Absolute banger video on transformers encoder, cannot wait for the decorder video to drop
@amitbohra92834 ай бұрын
Sir great video thanks, waiting eagerly for the second part.
@gender1213 ай бұрын
Waiting anxiously for the remaining videos ..please bring them soon.
@princekhunt13 ай бұрын
Nice explanation 👌
@imteyazahmad96164 ай бұрын
Amazing 🤗, please upload videos regularly. Waiting for next video on decoder
@SandeepSingh-yx2si4 ай бұрын
Very Good Explanation.
@ujjawalagrawal4 ай бұрын
Wow great sir thanks for preparing the video
@ayushparwal22102 ай бұрын
interesting video sir thanks to you.
@arpittalmale64404 ай бұрын
The Residual connection they use in each head because by using the original contextual vector after each operation they can maintained the meaning of the sentence we provide earlier because If they do not use it, The 95% probability that after the operation through the Attentions layer they loose the context of the word with respect to each other, Because at the output, model compute a loss value based on objective and this loss is then back propagated to update the model weights including updating the value of the word embedding vectors. Their is concept of "Teacher Forcing " in this during the training the model is feed with actual ground truth output(target sequence), This can help stabilize training and accelerate convergence during training by providing more accurate and consistent results.
@LMessii103 ай бұрын
Brilliance ❤ 👏
@arjunsingh-dn2fo15 күн бұрын
Sir, As we learned in the boosting algorithm, we use residuals to ensure the difference between actual and predicted values. So, sir, I think this residual connection is doing the same thing here. It is analyzing the difference between actual embedding and contextually aware embedding. If there is a vanishing problem, as you said, it passes on actual embedding to the next feed-forward neural network. Sir, What's your opinion about this?
@shubhamgattani535721 күн бұрын
Thank you god!
@anonymousman30144 ай бұрын
Sir, one more time I am requesting you to complete the deep learning playlists ASAP. Please Sir🙏.
@Shubham_gupta184 ай бұрын
Please continue on this playlist Nitish sir, and regularly upload the videos just a humble request. The placement season is coming soon and we need you.
@vimalshrivastava65864 ай бұрын
Thank you for this wonderful video.❤
@chinmoymodakturjo52934 ай бұрын
Kindly drop videos regularly and complete the series please !
@chiragsharma14284 ай бұрын
Finally the wait is over . Thanks alot Sir
@Deepak-ip1se4 ай бұрын
Very nice video!!
@harshmohan54113 ай бұрын
sir, i think the reason of residual connection is, so that the information regarding positional encoding doesn't get lost because as you say that they uses 8 encoder blocks in the original transformer , so to remind the transformer about the positions i think.
@peace-it4rg3 ай бұрын
Sir i think ki sparse embedding or matrix generate na ho isiliye resnet connections use kiya hoga taki thora dense or stable rahe architecture or network learn kar paye nhi to overfitting ho jayega. what is your thought about this
@myself40243 ай бұрын
🎯 Key points for quick navigation: 00:00 *📚 Introduction to Transformer Architecture* - The video begins with an introduction to Transformer architecture, highlighting key components already covered such as self-attention, multi-head attention, positional encoding, and layer normalization. - The focus will now shift to a detailed exploration of Transformer architecture, particularly the encoder part. - The teaching approach involves understanding individual components first before delving into the complete architecture. 03:07 *🛠️ Prerequisites and Preparation* - Emphasis on prerequisites for understanding Transformer architecture, including prior knowledge of self-attention, multi-head attention, positional encoding, and layer normalization. - The presenter has created a series of videos covering these foundational topics and recommends reviewing them to grasp the upcoming content on encoder and decoder architectures. - The current video will focus specifically on the encoder architecture, while the decoder will be covered in subsequent videos. 05:06 *📊 Detailed Explanation of Encoder Architecture* - The video starts the detailed exploration of the Transformer encoder architecture, using a complex diagram to represent the entire Transformer model, including both encoder and decoder. - The presenter acknowledges the complexity of the diagram and aims to break down the encoder architecture in an accessible way for better understanding. 05:49 *🗺️ Simplified Transformer Architecture* - The video simplifies the complex Transformer architecture diagram into two main components: the encoder and the decoder. - A basic representation shows that the Transformer consists of an encoder box and a decoder box. - The simplified model helps in understanding that there are multiple encoder and decoder blocks within these components. 07:13 *🏗️ Multi-Block Structure* - The simplified model is expanded to show multiple encoder and decoder blocks, with six blocks of each in the original Transformer model as per the "Attention Is All You Need" paper. - Each block within the encoder and decoder is identical, meaning understanding one block applies to all others. - The focus will be on understanding a single encoder block to grasp the entire architecture. 09:11 *🔍 Detailed Encoder Block Breakdown* - The detailed view of an encoder block reveals it consists of two main components: a self-attention block and a feed-forward neural network. - The self-attention block is described as multi-head attention, and the feed-forward neural network is a key part of the encoder block's functionality. - Additional components such as layer normalization and residual connections are also part of the encoder block's architecture. 10:18 *📈 Actual Encoder Block Architecture* - The actual architecture of an encoder block is shown, including self-attention (multi-head attention) and feed-forward neural network blocks. - The diagram includes additional elements like layer normalization and residual connections, highlighting the complexity beyond the simplified model. - The video emphasizes understanding the detailed connections and components within an encoder block. 11:48 *🔄 Sequential Processing of Encoder Blocks* - Outputs from one encoder block serve as inputs for the next encoder block, continuing through all blocks until the final output is sent to the decoder. - The process involves multiple encoder blocks (six in the original Transformer model) that are sequentially connected. - The main goal is to understand the functioning of these blocks by examining the processing within each one. 12:29 *🧩 Introduction to Detailed Example* - A new page is introduced to explain the encoder architecture with a detailed example sentence. - The goal is to track how an example sentence (e.g., "How are you") moves through the encoder and understand the encoding process. - The explanation will involve breaking down each step and how the input sentence is processed within the encoder. 13:40 *✍️ Initial Operations on Input* - Before the main encoding, the input sentence undergoes three key operations: tokenization, text vectorization, and positional encoding. - Tokenization breaks the sentence into words, and text vectorization converts these words into numerical vectors using embeddings. - Positional encoding adds information about word positions to maintain the sequence order. 14:51 *🔢 Tokenization and Vectorization* - Tokenization splits the sentence into individual words, creating tokens like "How," "are," and "you." - Text vectorization converts these tokens into 512-dimensional vectors using embeddings, which represent each word numerically. - Positional encoding is applied to integrate information about word positions into the vectors. 17:25 *📍 Positional Encoding* - Positional encoding provides positional information by generating a vector for each position in the sentence. - These positional vectors are added to the word vectors to ensure the model can understand the order of words. 18:30 *🧩 Positional Encoding and Input Vector Integration* - Positional encoding adds information about word positions to the input vectors to maintain the sequence order. - This process integrates positional vectors with word vectors to ensure that the model understands the word sequence. 19:04 *🔄 Introduction to Encoder Block Operations* - Detailed examination of the operations within the first encoder block, focusing on multi-head attention and normalization. - Introduction of a new diagram to explain the functioning of these operations. 20:08 *🧠 Multi-Head Attention Mechanism* - Multi-head attention applies multiple self-attention mechanisms to capture diverse contextual information. - This process generates contextually aware vectors for each input word by considering surrounding words. 22:01 *➕ Addition and Normalization* - After multi-head attention, addition and normalization are applied to maintain dimensional consistency and improve stability. - A residual connection is used, where the original input vectors are added to the output of the multi-head attention block. 25:28 *📏 Layer Normalization Explained* - Layer normalization standardizes each vector by calculating the mean and standard deviation for its components, adjusting them to a fixed range. - This helps stabilize the training process by ensuring that values remain within a defined range, preventing large fluctuations. 27:00 *🔄 Purpose of Residual Connections* - Residual connections (or skip connections) are used to add the original input vectors back to the output of the multi-head attention block. - This mechanism helps in maintaining the flow of gradients and preserving the original information during training. 28:35 *🧠 Feed-Forward Network in Encoder* - Introduction to the feed-forward neural network within the encoder block, including its architecture and function. - The network consists of two layers: the first with 2048 neurons using ReLU activation and the second with 512 neurons using linear activation. 32:22 *📊 Feed-Forward Network Processing* - The feed-forward network processes vectors by increasing their dimensionality, applying transformations, and then reducing the dimensionality back. - The first layer increases the vector size from 512 to 2048, and the second layer reduces it back to 512. 35:04 *🔄 Skip Connections and Normalization* - Skip connections bypass the feed-forward network output, adding the original vectors to the processed output. - After addition, layer normalization is applied again to the resulting vectors. 38:01 *🔁 Encoder Block Repetition* - The output vectors from one encoder block become the input for the next encoder block. - Each encoder block contains its own set of parameters for weights and biases, even though the architecture is similar across blocks. 39:18 *🔄 Summary of Encoder Processing* - A quick summary of the transformer encoder process from input to output. - Input sentences undergo tokenization, embedding, and positional encoding. 41:34 *❓ Questions and Residual Connections* - Discussion of the importance of residual connections in the encoder blocks. - Residual connections help stabilize training by allowing gradients to flow more effectively through deep networks. 45:55 *🔍 Alternative Path in Multi-Head Attention* - Discussion on providing an alternate path in case multi-head attention fails to perform effectively. - Residual connections allow the use of original features if transformations are detrimental. 48:00 *🧩 Feed-Forward Neural Networks in Transformers* - Exploration of why feed-forward neural networks are used in transformers alongside multi-head attention. 52:03 *🔢 Number of Encoder Blocks in Transformers* - Multiple encoder blocks are used in transformers to effectively understand and represent language. - A single encoder block does not provide satisfactory results for language comprehension. Made with HARPA AI
@not_amanullah4 ай бұрын
This is helpful 🖤🤗
@dataninjaa4 ай бұрын
i was desperately waiting for your videos, itna to Mirzapur 3 ka bhi nahi kiya tha
@koushik76044 ай бұрын
Wonderful! This is too good.
@tannaprasanthkumar91194 ай бұрын
It was amazing sir
@kushagrabisht95964 ай бұрын
Great content sir. Please launch deep learning course fast
@electricalengineer55404 ай бұрын
much awaited video
@PawanAgrawal30124 ай бұрын
Good one. Please make a dedicated playlist on Pytorch dealing with neural network
@SachinBareth-d2k3 ай бұрын
very helpful
@vinayakbhat95302 ай бұрын
excellent
@durgeshameta2544 ай бұрын
UR GENIUS.
@kunaldere-g8l3 ай бұрын
Sir I remember one thing you said about transformer, while starting transformer topic. That is transformer architecture dumped from future.
@gender1214 ай бұрын
Ok we are expecting decoder also soon….not too long to wait
@himanshurathod40864 ай бұрын
The man with 0 haters
@srinivaspadhy98214 ай бұрын
Sir Please bring the decoder architecture soon, I have my interview comming soon. Thank You. May you grow even faster.
@AllInOne-gn4ve4 ай бұрын
Thanks a lot ❤❤❤❤! please sir, continue this playlist🙏🙏🙏🙏
@the_princekrrazz4 ай бұрын
Sir, please update videos on this series regularly A humble request please please .
@KumR4 ай бұрын
Thanks a lot Nitish. Great Video. Can I make a suggestion? Can you do one live QnA session to clarify any doubts?
@karanhanda77713 ай бұрын
Bro i like your videos and way of education. But jab ap koe purane topic ki baat karte hai.. uska link dal diya karo sir🙏. Thora easy rahe ga hamare liye
@meetpatel87334 ай бұрын
that was great video....but I have a question on the multihead attention part......In the previous video of multihead attention there were two self attention blocks used and for money bank example....Ymoney1 and Ymoney2 ....two vectors were generated so for two words money bank there were 4 vector generated Ymoney1, Ybank1 and Ymoney2, Ybank2....... but here in the main architecture you told 512 dimension vector will be input to the multi head attention block and it will give same size of vector of 512 dimension.....I don't now my question is silly or not...if you can explain on that please...... but all the videos were great.... Thank you....
@RamandeepSingh_044 ай бұрын
Thank you so much sir ❤🎉
@KumR3 ай бұрын
Hi Nitish - I understand that GPT uses this transformer arch . Do all LLMs use this ?
@Thebeautyoftheworld11114 ай бұрын
Please make a playlist on Gen ai 🙏
@ESHAANMISHRA-pr7dh4 ай бұрын
Thank you sir for the video. I request you to please complete the playlist 🙏🙏🙏🙏
@AmitBiswas-hd3js4 ай бұрын
Please Sir, complete this transformer series ASAP.
@not_amanullah4 ай бұрын
Upload regularly 🙏
@lokeshsharma41773 ай бұрын
Just like "Inputs" are there any prior operations for "output" before it goes to Decoders?
@not_amanullah4 ай бұрын
Thanks ❤️
@AllInOnekh4 ай бұрын
Finally .. thank you sir
@aiforeveryone3 ай бұрын
Great
@farhankarim86242 ай бұрын
GOAT 👑
@rishabhkumar43603 ай бұрын
waiting for part 2
@Harshh8113 ай бұрын
Sir, when are you dropping the next lecture(Decoder)?
@darkwraith88673 ай бұрын
Sir make a video for state space models and mamba architecture
@03_afshanahmedkhan394 ай бұрын
Ordinary people waiting for mirzapur season - 3 Legends waiting for new video for this playlist Please sir upload the next video
@shivshakti_1111_4 ай бұрын
Sir in next video please bring vision transformer
@wewesusui95284 ай бұрын
Sir please make videos regularly 🙏
@Arpi4574 ай бұрын
Finally transformer is here.
@SAPTAPARNAKUNDU-g9d4 ай бұрын
BERT architecture ka video kab ayega sir?
@LMessii103 ай бұрын
Gold mine for AI/ML
@sparshverma16994 ай бұрын
Thankyou for this sir
@sharangkulkarni17594 ай бұрын
Finally
@akashrathore13883 ай бұрын
Sir please create a supplimentry video on module and package because its little bit confusing the only one topic that left from python playlist