Vision Transformer Basics

  Рет қаралды 31,763

Samuel Albanie

Samuel Albanie

Күн бұрын

Пікірлер: 46
@rldp
@rldp 11 ай бұрын
This is one of the best explanations of not just ViT, but transformers in general that I have watched. Excellent video
@whale27
@whale27 11 ай бұрын
Unbelievable quality. Happy to be here before this channel blows up.
@SamuelAlbanie1
@SamuelAlbanie1 11 ай бұрын
Thanks!
@capsbr2100
@capsbr2100 8 ай бұрын
Goodness, what a remarkable video. This is by far the best explanation video I have watched about vision transformers.
@thetechnocrack
@thetechnocrack 9 ай бұрын
This is one of the cleanest explanation of ViTs I have come across. Amazing work Samuel! Inspiring.
@aakashsharma3179
@aakashsharma3179 Ай бұрын
I have never been so HOOKED while watching a video which is going fairly "deep" into such topics. Very well presented. Keep up the good work man.
@newbie8051
@newbie8051 Ай бұрын
Took an Image Processing course last semester and one of the topics the prof suggested to learn about was ViT's I knew about the basics of Deep Learning and over the summers explored more about autoencoders Starting from August, I dwelved into the Transformer architecture, and now ViT seems very simple It's just that we divide input image into patches, line these patches up in a sequence and convert them to embeddings, add positional vectors and boom feed it to the transformer module
@jesusalpaca7170
@jesusalpaca7170 8 ай бұрын
for a beginner like me, I would say, this is the introduce video that we were waiting for :')
@srinjoy.bhuiya
@srinjoy.bhuiya 5 ай бұрын
One of the greatest explanations of the concepts of transformers to a Computer Vision Reserach
@thecheekychinaman6713
@thecheekychinaman6713 9 ай бұрын
I was studying up on Transformers and ViTs half a year ago, and recently checked back to find this (to my surprise). Great clear explanations, can tell CAML is in great hands!
@continuallearning8366
@continuallearning8366 Жыл бұрын
Excellent video! Honored to be here before it goes viral 🙏🏾
@RayWang-m6b
@RayWang-m6b 11 ай бұрын
Thank you for making this wonderful video. So clear! Please continue your awesome video work!
@piclkesthedrummer6439
@piclkesthedrummer6439 6 ай бұрын
This is by far one of the most accurate, yet understandable and intuitive explaination of such a hard concept, you did a better job at explaining it than the authors! very impressive!
@PotatoKaboom
@PotatoKaboom Жыл бұрын
I've held guest lectures on the inner workings of transformers myself, but I still learned a bunch from this! Everything after 22:15 was very exciting to watch, very well presented and easy to understand! Very well done, I dubscribed for more :)
@abhimanyuyadav2685
@abhimanyuyadav2685 11 ай бұрын
Your weekly ai news was really useful Please bring it back
@vil9386
@vil9386 9 ай бұрын
Wow, this video helped me a lot in understanding Attention and ViT. Packed with all the logics needed to design a solution using the latest as of this day.
@aminkarimi1068
@aminkarimi1068 6 ай бұрын
The best video to easily understand VIT
@MdAkmolMasud
@MdAkmolMasud 5 ай бұрын
The best explanation of ViT..
@mattsong6875
@mattsong6875 Жыл бұрын
Thanks for such a informative and educational video
@이연우-i2n
@이연우-i2n 11 ай бұрын
🎯 Key Takeaways for quick navigation: 00:00 🧠 *The Evolution of AI and Computer Vision* - General methods leveraging computation prove most effective in AI development. - Evolution from handcrafted features to Convolutional Neural Networks (CNNs) and then to Transformers, showcasing a reduction in inductive biases and an increase in data-driven approaches. 01:09 🤖 *Neural Network Architectures* - Importance of network architecture in building intelligent machines. - Distinction between network architecture and network parameters, focusing on resource limitations and efficient design. 02:32 💡 *Introduction to Transformers* - Transformers' dominance in AI, initially in Natural Language Processing (NLP) and then in Computer Vision. - Discussion on why Transformers took time to transition from NLP to Computer Vision. 03:57 🌐 *Understanding Transformers: Encoder and Decoder* - Explanation of the Transformer architecture with its encoder and decoder components. - Different variants of Transformers: Encoder-only, Decoder-only, and Encoder-Decoder architectures. 05:33 🔍 *Applying Transformers to Computer Vision* - Vision Transformers (ViT) process images by slicing them into patches, using position embeddings and Transformer encoders. - The methodology of transforming images into a sequence of embeddings for the Transformer encoder. 07:08 🔗 *Multi-Head Attention in Transformers* - Detailed explanation of the multi-head attention mechanism in Transformers. - Role of queries, keys, and values in facilitating communication between different embeddings. 09:12 🧩 *Transformer Encoder Blocks and Scaling* - The structure and function of Transformer encoder blocks, including multi-head attention and MLP. - Importance of residual connections and layer normalization in optimizing Transformer models. 11:05 🚀 *Scaling and Hardware Influence in AI* - The impact of scaling and hardware advancements on Transformer model performance. - Discussion on the exponential increase in computational resources for training large models. 13:50 🛠 *MLP and Optimization in Transformers* - Role of the multi-layer perceptron (MLP) in Transformer architecture for independent processing of embeddings. - Importance of non-linearities like ReLU and GELU in Transformer models. 15:00 ⚙️ *Residual Connections and Layer Normalization* - Implementation and significance of residual connections and layer normalization in Transformers. - These components facilitate gradient flow and stable learning in deep network training. 17:05 🌐 *Positional Embeddings in Transformers* - Explanation of positional embeddings in Transformers, necessary for maintaining spatial information in sequences. - Different methods of implementing positional embeddings in Transformer models. 19:27 🔄 *Cross Attention and Causal Attention in Transformers* - Discussion of Made with HARPA AI
@251_satyamrai4
@251_satyamrai4 2 ай бұрын
beautifully explained.
@sbdzdz
@sbdzdz 11 ай бұрын
Very well presented!
@rmmajor
@rmmajor 7 ай бұрын
That is a masterpiece of a video! Many thanks for your work!
@ShravanKumar147
@ShravanKumar147 4 ай бұрын
Beautifully put together. Keep it going @Sam
@soylentpink7845
@soylentpink7845 Жыл бұрын
Very good video - contents & it’s presentation!
@gnorts_mr_alien
@gnorts_mr_alien 7 ай бұрын
man, what a video. thank you!
@siriuscoding
@siriuscoding 8 күн бұрын
superb
@minute_machine_learning5362
@minute_machine_learning5362 6 ай бұрын
great explanation
@shyb8079
@shyb8079 6 ай бұрын
Thank you for ur content.
@zainbaloch5541
@zainbaloch5541 7 ай бұрын
Thank you so much!
@amoghjain
@amoghjain 11 ай бұрын
Thank you so very much for sharing your insights and intuition behind soooo many concepts.
@SamuelAlbanie1
@SamuelAlbanie1 11 ай бұрын
Glad it was helpful!
@plutophy1242
@plutophy1242 Ай бұрын
excellent contents and slides!!!
@SamuelAlbanie1
@SamuelAlbanie1 Ай бұрын
Thanks!
@EigenA
@EigenA 8 ай бұрын
Great work!
@flamboyanta4993
@flamboyanta4993 Жыл бұрын
Excellent and clearly communicated. Thanks. question in 20:05 when discssing positional embeddings, the legend of the waves says dim 4,....dim 7. Here, does dim refer to the length of the pathch D? as in, we'll get as many sine waves as D dims ?
@tomrichter9021
@tomrichter9021 9 ай бұрын
Great video
@geomanisgod
@geomanisgod 8 ай бұрын
A+++ quality from other planets.
@miraclemaxicl
@miraclemaxicl 8 ай бұрын
More Compute Is All You Need
@flamboyanta4993
@flamboyanta4993 Жыл бұрын
Another question: in 30:00 discussing how early attention layers tend to focus on local features and deeper ones on more global features of the input. I didn't understand the significance of the x-axis (sorted attention head). is this just a count of how many attention head there are in the respective block? Which suggests that in the large data regime, even early attention blocks with 14+ heads will also tend to observe the features globally? Is this correct? And thank you in advance!
@capsbr2100
@capsbr2100 8 ай бұрын
So for someone approaching this now, working on resource-constrained devices, both for training and inference, it makes more sense to just stick to CNNs?
@iez
@iez 9 ай бұрын
any ViTs that are open source?
@felipesuarez5041
@felipesuarez5041 3 ай бұрын
Crazy how transformers are beating all these other classical architectures like CNNs, that have been used since ancient Greece times.
@AKD-le2kb
@AKD-le2kb 5 ай бұрын
w
@РодионЧаускин
@РодионЧаускин Ай бұрын
Jackson Scott Wilson Anna Jackson Anna
Self-supervised vision
28:48
Samuel Albanie
Рет қаралды 6 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 3,8 МЛН
Из какого города смотришь? 😃
00:34
МЯТНАЯ ФАНТА
Рет қаралды 2,6 МЛН
One day.. 🙌
00:33
Celine Dept
Рет қаралды 45 МЛН
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Gail Weiss: Thinking Like Transformers
1:07:12
Formal Languages and Neural Networks Seminar
Рет қаралды 17 М.
Vision Transformer in PyTorch
29:52
mildlyoverfitted
Рет қаралды 84 М.
Vision Transformer explained in detail | ViTs
1:11:48
Code With Aarohi
Рет қаралды 2,1 М.
Vision Transformer for Image Classification
14:47
Shusen Wang
Рет қаралды 122 М.
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 1,8 МЛН
Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained
24:57
Aleksa Gordić - The AI Epiphany
Рет қаралды 42 М.
Vision Transformer and its Applications
34:38
Open Data Science
Рет қаралды 44 М.