Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation (Paper Explained)

  Рет қаралды 15,182

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 33
@herp_derpingson
@herp_derpingson 4 жыл бұрын
27:00 I think a better interpretation would be. "When I am at this position, I am more important. Or less important." Also attention based models are inherently interpretable compared to convolution based models. So, I think these will win out in the long run. Perhaps we can have a hybrid of CNN and attention.
@socratic-programmer
@socratic-programmer 4 жыл бұрын
To an extent convolutional models can also be analysed to see what parts were the most excited (and contributed to the final prediction). The other main advantage - and the reason I think we will at least have some hybrid of conv + attention - is that convolutions are much more parameter-efficient than FC or self-attention layers.
@redjammie8342
@redjammie8342 4 жыл бұрын
@@socratic-programmer Also local connectivity at low level visual features make perfect sense.
@Kram1032
@Kram1032 4 жыл бұрын
It's gonna take a lot of doing to make that feasible but I'm really curious what could happen with this attentional type of processing for multimodal data. Like, imagine you could scrub the web like they did for GPT-3 but include not just text but also images. Entire illustrated books. Embedded videos with spoken language. Language is fundamentally dependent on the real world. It's crazy how far we can get with *just* text but I'd imagine a lot of things could be easily disambiguated if words aren't just typed but also heard or in the context of other stuff. So making attention more efficient for images is a solid step towards something like this and I'm really looking forward to what'll come of it.
@felipemello1151
@felipemello1151 4 жыл бұрын
Google actually has a trained NN that accepts all sorts of inputs (images, text, etc). The idea was to have a single model for everything. I cant remember the name of it though.
@Kram1032
@Kram1032 4 жыл бұрын
@@felipemello1151 Google Brain I think, but I'd imagine there was quite some progress since
@jasdeepsinghgrover2470
@jasdeepsinghgrover2470 4 жыл бұрын
I think we are very close to something like this but positional embedding should become something general then. Like context embedding. Something like an Image caption should be associated with both the image and the text referring to the image. Maybe after that, this will be possible.
@alceubissoto
@alceubissoto 4 жыл бұрын
Thanks for the video Yannic. Amazing explanation!
@binjianxin7830
@binjianxin7830 4 жыл бұрын
When convolutions go deep, they seem not only to be more efficient but also condense information in various abstract and profound ways. Certainly the Attention layers need more efficiency.
@PaganPegasus
@PaganPegasus 2 жыл бұрын
7:55 Yannic just predicted the Perceiver architecture. Madman.
@jahcane3711
@jahcane3711 4 жыл бұрын
Beautiful. Thank you Yannic
@TechVizTheDataScienceGuy
@TechVizTheDataScienceGuy 4 жыл бұрын
Nicely explained! 👍
@marcussky
@marcussky 4 жыл бұрын
Check out Tabnet... Attention is coming for tabular data as well...
@whatdl6002
@whatdl6002 4 жыл бұрын
Are we a couple of million dollars of Neural Architecture Search away from the end of convolutions???
@blizzard072
@blizzard072 3 жыл бұрын
As the subscript implies, there seems to be a positional embedding r_p for every output position o. Then I'm not sure if that would be memory friendly.. Having relative positional embeddings for every pixel seems intense.
@shrutishrestha8296
@shrutishrestha8296 4 жыл бұрын
are there any code using this for segmentation?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
I don't think so
@seyeeet8063
@seyeeet8063 3 жыл бұрын
can someone explain to me what does axial means? :) have a hard time getting it
@sahilriders
@sahilriders 3 жыл бұрын
Did you checked out MaX-DeepLab paper? It will be nice if you can make a video on that.
@YannicKilcher
@YannicKilcher 3 жыл бұрын
thanks!
@trevormartin1944
@trevormartin1944 4 жыл бұрын
Does anyone know what Yannic uses to be able to draw and edit over the PDFs?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
OneNote
@jackeown
@jackeown 4 жыл бұрын
You should do a video on TabNet for tabular data using neural nets. I feel like there's a lot there and the explanations online kind of suck.
@GyuHobbyRC
@GyuHobbyRC 4 жыл бұрын
I enjoyed a great video let's be friends 😊😊😊😊 우리 친구 해요.!!!~~^^
@az8134
@az8134 3 жыл бұрын
Attention is the new MLP when you are rich
@freddiekalaitzis5708
@freddiekalaitzis5708 4 жыл бұрын
In times when SOTA is unfortunately king to young reviewers, I can appreciate the authors' need to perform well at least within some class of models. Imagine the frustration when all you offer to the community is a competitive alternative, only for the reviewer to retort it's not the best tool by an arbitrary margin. Great video.
@monstrimmat
@monstrimmat 4 жыл бұрын
"What's a good number?"
@Lee-vs5ez
@Lee-vs5ez 4 жыл бұрын
So many tricks for reducing computational powers lately. Intuitive but also qustionable
@autonomous2010
@autonomous2010 4 жыл бұрын
Yep. A lot of approaches scale very poorly requiring exponentially more resources the more data you have. So there's a lot of experimenting to try to get around that major limitation.
@mariomariovitiviti
@mariomariovitiviti 4 жыл бұрын
These names are getting out of hand
@qimingzhong1044
@qimingzhong1044 4 жыл бұрын
with transformer dominating ranking indexes, light weight neural network might be a thing of the past.
@redjammie8342
@redjammie8342 4 жыл бұрын
what do you mean lightweight neural network?
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Вопрос Ребром - Джиган
43:52
Gazgolder
Рет қаралды 3,8 МЛН
Caleb Pressley Shows TSA How It’s Done
0:28
Barstool Sports
Рет қаралды 60 МЛН
Jaidarman TOP / Жоғары лига-2023 / Жекпе-жек 1-ТУР / 1-топ
1:30:54
MLT __init__ Session #2: DeepLab - Semantic Image Segmentation
28:53
MLT Artificial Intelligence
Рет қаралды 13 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 423 М.
Hopfield Networks is All You Need (Paper Explained)
1:05:16
Yannic Kilcher
Рет қаралды 101 М.
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 2 МЛН
Deep Ensembles: A Loss Landscape Perspective (Paper Explained)
46:32
Yannic Kilcher
Рет қаралды 23 М.
Rethinking Attention with Performers (Paper Explained)
54:39
Yannic Kilcher
Рет қаралды 56 М.
Object-Centric Learning with Slot Attention (Paper Explained)
42:39
Yannic Kilcher
Рет қаралды 17 М.