Another great video. It is really helpful that you provide context for the design choices. Would love to see more videos where you explain how theory transfers to code.
@chankhavu3 жыл бұрын
I like how in your videos, you not only explain the details within the paper but also the more "meta" stuffs that is harder for people to grasp without reading through a lot of papers. Reading and understanding one paper is easy. Develop an intuitive understanding of a whole research subfield and its general directions is the hard part.
@TheAIEpiphany3 жыл бұрын
Thanks! Yes this one was rich in contextual information: DanNet, diagram correction from Twitter, and Swin transformer mainly I guess? Well, it's oftentimes hard to understand a specific paper without having all the necessary context - and it takes time to accumulate it.
@yevhendiachenko37032 жыл бұрын
Thank you! The video is excellent. I like that you mix code + paper in explanation and the fact that you provide a context and highlight the most essential parts.
@TheAIEpiphany2 жыл бұрын
Thank you!
@gauravlochab96142 жыл бұрын
Thanks for the amazing explanation. Yes mixing up the code and paper boosts the implementation speed many folds. I love your work, you are awesome!
@TheAIEpiphany2 жыл бұрын
Thank you man
@lalitmrinki2 жыл бұрын
Thank you for such an in-depth explanation. Your plan of explaining the history and convergence and then going through the paper and code is great way for learners to understand the concepts deeply. Its very important to select the important portions from the paper for further exposition and to leave-out unnecessary boilerplate stuff. I liked that you didn't say "go and read the paper yourself"!
@armingh9283 Жыл бұрын
Thank you. Very informative
@TheAIEpiphany3 жыл бұрын
We need to start working on reasoning - perception is converging we're out of ideas lol Bad jokes aside - at this point, it seems that CNN priors are quite adequate (in the case of natural images) - a hybrid approach (initial stages CNN-like and later stages transformer-like) seems to be the way to go, but the game is still on.
@ansariyusuf47742 жыл бұрын
mix of paper and code is great!
@francomarchesoni90042 жыл бұрын
agreed
@manub.n62232 жыл бұрын
Thank you so much for the brilliant explanation
@TheAIEpiphany2 жыл бұрын
Thanks! 🚀
@adityakane56693 жыл бұрын
Excited for this one!
@MrSebastian123582 жыл бұрын
Thanks a lot for your amazing effort.
@PritishMishra2 жыл бұрын
Very thanks for the awesome content!
@luna06093 жыл бұрын
This was a great video. The best I've seen about explaining a research paper. 👏
@TheAIEpiphany3 жыл бұрын
Hah I don't know about that but thanks! 😂
@sushantgautam7732 жыл бұрын
Nice Explanation. By the way, could I know which software you are using just showing multiple things in one.
@НиколайНовичков-е1э2 жыл бұрын
Thank you!
@enip2393 жыл бұрын
very nice content! I even didn't notice they use the old ResNet top-1 acc instead of wightman's. And that's make this model less comparative to the SOTAs.
@marvlousdasta25662 жыл бұрын
Great video as always. What software are using to present and annotate the paper?
@TheAIEpiphany2 жыл бұрын
Thanks! OneNote.
@Kenny4PresidentFTW2 жыл бұрын
this channles videos are amazing
@oncedidactic3 жыл бұрын
always semirants!
@TheAIEpiphany3 жыл бұрын
My made up word just got its 1st validation - it's an official word from now on!
@oncedidactic3 жыл бұрын
ayyyyyyyyyyyyyyyyyyyy :D
@mritunjaymusale3 жыл бұрын
the pre-training they did on Imagenet-22k was supervised or unsupervised like the way transformer papers do ?
@TheAIEpiphany3 жыл бұрын
Supervised - same as ImageNet 1k. :)
@JapiSandhu2 жыл бұрын
Can this be used for video classification?
@mahmoodkashmiri2 жыл бұрын
what tool do you use to read research papers on Ubuntu? Thank You!
@TheAIEpiphany2 жыл бұрын
I use OneNote on Windows!
@mahmoodkashmiri2 жыл бұрын
Thank You 😊
@eng_ajy50912 жыл бұрын
Hi , First of all, I would like to thank you for your excellent and wonderful videos on artificial intelligence. I am a PhD student working on fast video captioning and I hope to reach real time captioning But I am confused by too many articles and too many techniques and algorithms in this field I need your help in guiding me to choose the right path among the existing methods: (traditional CNN, Transformer, YOLO, self attention only or make combination or others ) While maintaining a trade-off between speed and accuracy