27:00 I think a better interpretation would be. "When I am at this position, I am more important. Or less important." Also attention based models are inherently interpretable compared to convolution based models. So, I think these will win out in the long run. Perhaps we can have a hybrid of CNN and attention.
@socratic-programmer4 жыл бұрын
To an extent convolutional models can also be analysed to see what parts were the most excited (and contributed to the final prediction). The other main advantage - and the reason I think we will at least have some hybrid of conv + attention - is that convolutions are much more parameter-efficient than FC or self-attention layers.
@redjammie83424 жыл бұрын
@@socratic-programmer Also local connectivity at low level visual features make perfect sense.
@Kram10324 жыл бұрын
It's gonna take a lot of doing to make that feasible but I'm really curious what could happen with this attentional type of processing for multimodal data. Like, imagine you could scrub the web like they did for GPT-3 but include not just text but also images. Entire illustrated books. Embedded videos with spoken language. Language is fundamentally dependent on the real world. It's crazy how far we can get with *just* text but I'd imagine a lot of things could be easily disambiguated if words aren't just typed but also heard or in the context of other stuff. So making attention more efficient for images is a solid step towards something like this and I'm really looking forward to what'll come of it.
@felipemello11514 жыл бұрын
Google actually has a trained NN that accepts all sorts of inputs (images, text, etc). The idea was to have a single model for everything. I cant remember the name of it though.
@Kram10324 жыл бұрын
@@felipemello1151 Google Brain I think, but I'd imagine there was quite some progress since
@jasdeepsinghgrover24704 жыл бұрын
I think we are very close to something like this but positional embedding should become something general then. Like context embedding. Something like an Image caption should be associated with both the image and the text referring to the image. Maybe after that, this will be possible.
@alceubissoto4 жыл бұрын
Thanks for the video Yannic. Amazing explanation!
@binjianxin78304 жыл бұрын
When convolutions go deep, they seem not only to be more efficient but also condense information in various abstract and profound ways. Certainly the Attention layers need more efficiency.
@PaganPegasus2 жыл бұрын
7:55 Yannic just predicted the Perceiver architecture. Madman.
@jahcane37114 жыл бұрын
Beautiful. Thank you Yannic
@TechVizTheDataScienceGuy4 жыл бұрын
Nicely explained! 👍
@marcussky4 жыл бұрын
Check out Tabnet... Attention is coming for tabular data as well...
@whatdl60024 жыл бұрын
Are we a couple of million dollars of Neural Architecture Search away from the end of convolutions???
@blizzard0723 жыл бұрын
As the subscript implies, there seems to be a positional embedding r_p for every output position o. Then I'm not sure if that would be memory friendly.. Having relative positional embeddings for every pixel seems intense.
@shrutishrestha82964 жыл бұрын
are there any code using this for segmentation?
@YannicKilcher4 жыл бұрын
I don't think so
@seyeeet80633 жыл бұрын
can someone explain to me what does axial means? :) have a hard time getting it
@sahilriders3 жыл бұрын
Did you checked out MaX-DeepLab paper? It will be nice if you can make a video on that.
@YannicKilcher3 жыл бұрын
thanks!
@trevormartin19444 жыл бұрын
Does anyone know what Yannic uses to be able to draw and edit over the PDFs?
@YannicKilcher4 жыл бұрын
OneNote
@jackeown4 жыл бұрын
You should do a video on TabNet for tabular data using neural nets. I feel like there's a lot there and the explanations online kind of suck.
@GyuHobbyRC4 жыл бұрын
I enjoyed a great video let's be friends 😊😊😊😊 우리 친구 해요.!!!~~^^
@az81343 жыл бұрын
Attention is the new MLP when you are rich
@freddiekalaitzis57084 жыл бұрын
In times when SOTA is unfortunately king to young reviewers, I can appreciate the authors' need to perform well at least within some class of models. Imagine the frustration when all you offer to the community is a competitive alternative, only for the reviewer to retort it's not the best tool by an arbitrary margin. Great video.
@monstrimmat4 жыл бұрын
"What's a good number?"
@Lee-vs5ez4 жыл бұрын
So many tricks for reducing computational powers lately. Intuitive but also qustionable
@autonomous20104 жыл бұрын
Yep. A lot of approaches scale very poorly requiring exponentially more resources the more data you have. So there's a lot of experimenting to try to get around that major limitation.
@mariomariovitiviti4 жыл бұрын
These names are getting out of hand
@qimingzhong10444 жыл бұрын
with transformer dominating ranking indexes, light weight neural network might be a thing of the past.