Adding Self-Attention to a Convolutional Neural Network! : PyTorch Deep Learning Tutorial Section 13

  Рет қаралды 1,460

Luke Ditria

Luke Ditria

Күн бұрын

TIMESTAMPS:
0:00 Introduction
0:22 Attention Mechanism Overview
1:20 Self-Attention Introduction
3:02 CNN Limitations
4:09 Using Attention in CNNs
6:30 Attention Integration in CNN
9:06 Learnable Scale Parameter
10:14 Attention Implementation
12:52 Performance Comparison
14:10 Attention Map Visualization
14:29 Conclusion
In this video I show how we can add Self-Attention to a CNN in order to improve the performance of our classifier!
Donations
www.buymeacoffee.com/lukeditria
The corresponding code is available here!
github.com/LukeDitria/pytorch...
Discord Server:
/ discord

Пікірлер: 9
@esramuab1021
@esramuab1021 16 күн бұрын
thank U
@profmoek7813
@profmoek7813 Ай бұрын
Master piece. Thank you so much 💗
@yadavadvait
@yadavadvait Ай бұрын
Good video! Do you think this experiment of adding the attention head so early on can extrapolate well to graph neural networks?
@LukeDitria
@LukeDitria Ай бұрын
Hi thanks for your comment! Yes, Graph Attention Networks do what you are describing!
@thouys9069
@thouys9069 Ай бұрын
very cool stuff. Any idea how this compares to Feature Pyramid Networks, which are typically used to enrich the high-res early convolutional layers? I would imagine that the FPN works well if the thing of interest is "compact". I.e. can be captured well by a quadratic crop, whereas the attention would even work for non-compact things. Examples would be donuts with large holes and little dough, or long sticks, etc.
@LukeDitria
@LukeDitria Ай бұрын
I believe Feature Pyramid Networks were primarily for object detection, and are a way of bringing fine grain information from earlier layers deeper into the network with big residual connections, they sill rely on multiple conv layers to combine spatial information. What we're trying to do here is mix spatial information early in the network. With attention the model can also choose how exactly to do that.
@unknown-otter
@unknown-otter Ай бұрын
I'm guessing that adding self-attention in deeper layers would have lesser of an impact due to each value having greater receprive field? If not, then why not to add at the end, where it would be less expensive? Without the fact that we could incorporate it in every conv block if we had infinite compute
@LukeDitria
@LukeDitria Ай бұрын
Thanks for your comment! Yes you are correct, in terms of combining features spatially it won't have as much of an impact if the features already have a large receptive field. The idea is to try to add it as early as possible, and yes you could add it multiple times throughout your network, though you would probably stop once your feature map is around 4x4 etc...
@unknown-otter
@unknown-otter Ай бұрын
Thanks for the clarification! Great video
Khóa ly biệt
01:00
Đào Nguyễn Ánh - Hữu Hưng
Рет қаралды 19 МЛН
Василиса наняла личного массажиста 😂 #shorts
00:22
Денис Кукояка
Рет қаралды 8 МЛН
Children deceived dad #comedy
00:19
yuzvikii_family
Рет қаралды 3,9 МЛН
PINK STEERING STEERING CAR
00:31
Levsob
Рет қаралды 22 МЛН
Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention
15:25
VGG Deep Neural Network Explained with Pytorch
16:01
Machine Learning Explained
Рет қаралды 2,6 М.
Attention Is All You Need - Paper Explained
36:44
Halfling Wizard
Рет қаралды 97 М.
Convolutional Block Attention Module (CBAM) Paper Explained
7:05
Soroush Mehraban
Рет қаралды 5 М.
Cadiz smart lock official account unlocks the aesthetics of returning home
0:30
Will the battery emit smoke if it rotates rapidly?
0:11
Meaningful Cartoons 183
Рет қаралды 27 МЛН
i like you subscriber ♥️♥️ #trending #iphone #apple #iphonefold
0:14
сюрприз
1:00
Capex0
Рет қаралды 1,6 МЛН