ConvNeXt: A ConvNet for the 2020s - Paper Explained (with animations)

  Рет қаралды 22,599

AI Coffee Break with Letitia

AI Coffee Break with Letitia

Күн бұрын

Пікірлер: 30
@Chris9-z7v
@Chris9-z7v 3 жыл бұрын
Hello Letitia, thank you for this great video! Correct me if I am wrong, but I am pretty sure that I have seen the 1/4 size ratio you talk about in 12:38 in both the original ViT paper and the "Training data-efficient image transformers & distillation through attention" paper that I have read. In the original ViT paper they use this MLP block ratio in all almost all of their experiments, without mentioning it implicitly whilst in the second one, they mention the 1/4 ratio of the MLP block in page 5 of the paper. I am a newbie in Deep learning and transformers though so take everything I say with a grain of salt 😅
@AICoffeeBreak
@AICoffeeBreak 3 жыл бұрын
Thanks! Yes, it's Table 1 in the ViT paper. Then we totally misunderstood what that factor 4 was referring to while making the video. 🙈
@charlesfoster6326
@charlesfoster6326 3 жыл бұрын
Tacking on, an expansion ratio of 3 or 4 in the MLP is also pretty standard in transformers for natural language tasks.
@CristianGarcia
@CristianGarcia 3 жыл бұрын
They went all in with the storytelling on this paper, they even extracted the core design choices as "wisdom bits". I really don't believe they achieved the final architecture this way but reading the "linear improvement story" was very entertaining.
@DerPylz
@DerPylz 3 жыл бұрын
First coffee bean of the year!! 🎉 congrats on the 11k subs!
@Kartik_C
@Kartik_C 3 жыл бұрын
Thank you Miss Coffee Bean! The 60 sec explanation of translational equivariance was amazing!
@MrMadmaggot
@MrMadmaggot 9 ай бұрын
3:32 I love how SKEWED that fookin graph is maam is just fkn nuts.
@ElQaheryProductions
@ElQaheryProductions 2 жыл бұрын
This is a really nice way of reviewing papers! Keep it up!
@hannesstark5024
@hannesstark5024 3 жыл бұрын
10K subscriber congrats! ^^
@AICoffeeBreak
@AICoffeeBreak 3 жыл бұрын
Yes! Thank you! 🤝 Means a lot from an early subscriber like yourself.
@marverickbin
@marverickbin Жыл бұрын
Cant wait to try a unet with convnext backbone
@pourmohammaddeveloper2034
@pourmohammaddeveloper2034 2 жыл бұрын
many many thanks from Iran
@RodrigoCh
@RodrigoCh 2 жыл бұрын
Fastest 20 min ever! Thank you for the clear explanation. I especially like how you animate the explanation! May I ask what do you use to do the animations? Maybe you could add some FAQ section; I can imagine you get this question a lot.
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
Thanks, this comments makes us very happy! I do not want to make a FAQ section: comments and questions are good for making the Algorithm believe it should push us further up into your recommendations: I animate everything but Ms. Coffee Bean in good old PowerPoint (yeah, tools are what you can make of them 🙈 ). Ms. Coffee Bean is animated in the video editing software: kdenlive (available for all operating systems and open source).
@giantbee9763
@giantbee9763 2 жыл бұрын
Great point on how we jumped right into transformers and forgotten to exactly pin down the effect of small tweaks. Great video again! :D
@edwardbrown2873
@edwardbrown2873 3 жыл бұрын
Love this. Superb. Keep it up!
@AICoffeeBreak
@AICoffeeBreak 3 жыл бұрын
Thank you! Will do! 😀
@giantbee9763
@giantbee9763 2 жыл бұрын
I think what they might have meant by inverted bottle neck : Key, value, query and the residual connections :D Though would you call that an inverted bottle neck? What do you think @letitia?
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
No, it is a tiny detail that concerns how the MLP layer is built. d -> 4d -> d. Here is Alexa explaining this (link with the right time stamp: kzbin.info/www/bejne/n5XMep-ifraCnLc )
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
I missed the point there in the video when talking about inverted bottlenecks. I thought about the Swin Transformer 🙈
@giantbee9763
@giantbee9763 2 жыл бұрын
@@AICoffeeBreak That's right! I forgot about how the positional feedforward layer is constructed.Which indeed is an inverted bottleneck.
@butterkaffee910
@butterkaffee910 3 жыл бұрын
LeCun must be so happy right now
@AICoffeeBreak
@AICoffeeBreak 3 жыл бұрын
Absolutely. 😆
@hyunkim2172
@hyunkim2172 2 жыл бұрын
Many thanks!
@hararani
@hararani Жыл бұрын
Hello Letitia, thank you so much for you video it's great inspiration for my thesis. If you don't mind can I ask you question? In your opinion Is it possible if I do research paper that compare between ViT, DEiT and ConvNext for image classification in 10.000 images as newbie? because the model is considered new and not so many paper already implement those models. Thank you.
@JapiSandhu
@JapiSandhu 2 жыл бұрын
Can convnext be used for video classification with time series data? Can there be a 3D-Convnext ? Like how there would be a 3DCNN?
@AICoffeeBreak
@AICoffeeBreak 2 жыл бұрын
I do not see why this wouldn't be extendable to video. :)
@TimScarfe
@TimScarfe 3 жыл бұрын
Awesome 🔥🔥😎😎
@gabrieldealca4829
@gabrieldealca4829 2 жыл бұрын
What is the best state-of-the-art architecture for regression tasks involving images?
@stewartjohnston7213
@stewartjohnston7213 2 жыл бұрын
Needed to hear this 🙌!! Get the stats you deserve = P r o m o s m!
ConvNeXt: A ConvNet for the 2020s | Paper Explained
40:08
Aleksa Gordić - The AI Epiphany
Рет қаралды 17 М.
When you have a very capricious child 😂😘👍
00:16
Like Asiya
Рет қаралды 18 МЛН
Гениальное изобретение из обычного стаканчика!
00:31
Лютая физика | Олимпиадная физика
Рет қаралды 4,8 МЛН
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 120 МЛН
MAMBA and State Space Models explained | SSM explained
22:27
AI Coffee Break with Letitia
Рет қаралды 57 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 424 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Masked Autoencoders Are Scalable Vision Learners - Paper explained and animated!
12:56
AI Coffee Break with Letitia
Рет қаралды 26 М.
7 Outside The Box Puzzles
12:16
MindYourDecisions
Рет қаралды 269 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 1,5 МЛН
The entire history of Computer Vision explained one great visualization at a time.
21:38
Transformers explained | The architecture behind LLMs
19:48
AI Coffee Break with Letitia
Рет қаралды 29 М.
AI can't cross this line and we don't know why.
24:07
Welch Labs
Рет қаралды 1,5 МЛН
When you have a very capricious child 😂😘👍
00:16
Like Asiya
Рет қаралды 18 МЛН