DETR - End to end object detection with transformers (ECCV2020)

  Рет қаралды 23,181

Nicolas Carion

Nicolas Carion

Күн бұрын

This is the talk associated with the ECCV 2020 oral paper "End to end object detection using transformer" by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
Github: github.com/fac...
Paper: arxiv.org/abs/...
Blog: / end-to-end-object-dete...

Пікірлер: 22
@kvnptl4400
@kvnptl4400 4 ай бұрын
A very nice presentation with clear visualizations and easy-to-understand explanations! Great Work!!🌟🌟🌟🌟🌟 Smooth animations 👌
@syedabdul8509
@syedabdul8509 3 жыл бұрын
Excellent Explanation. But I want to know the most important thing in this video, How did you create those cool animations like @1:58-@2:20 and @8:00-@8:05
@praveen9083
@praveen9083 3 жыл бұрын
I'm expecting this answer too!
@nicollenunes4459
@nicollenunes4459 9 ай бұрын
@@praveen9083 me 2!
@QuintinMassey
@QuintinMassey 2 жыл бұрын
Outstanding work. I’m also very interested in the, arguably more difficult, small object detection problem.
@MarioHari
@MarioHari 4 жыл бұрын
Nice work! A small correction to what you said: "Semantic segmentation labels each pixel in the whole image. It is not restricted to only pixels in the background".
@nicolascarion3111
@nicolascarion3111 4 жыл бұрын
You're right, my statement is imprecise. I meant that semantic annotations of foreground classes are not used in the panoptic task.
@MarioHari
@MarioHari 4 жыл бұрын
@@nicolascarion3111 merci infiniment :)
@ujjalkrdutta7854
@ujjalkrdutta7854 2 жыл бұрын
@@nicolascarion3111 Can we then say that: "Panoptic Segmentation= Instance Segmentation+Semantic Segmentation minus annotations of foreground classes" ?
@Ramakrishnan-bq9is
@Ramakrishnan-bq9is 3 жыл бұрын
Thanks for sharing! Could you please explain what you mean by full differentiable and how other methods might not be fully differentiable?
@goldenshale
@goldenshale 2 жыл бұрын
This is an end to end neural network defined by functions which all have derivatives. In the R-CNN family of algorithms you have one procedure that produces a bunch of region proposals, then you crop out these regions and feed them to a classifier, and then you run another algorithm to prune out overlapping and low confidence predictions. Since there are multiple steps that have logical rather than mathematical implementations, you can't take derivatives all the way through to back propagate information through the whole system.
@rohinim7707
@rohinim7707 4 жыл бұрын
Amazing! What was the main motivation behind using a sequence model for an object detection?
@redjammie8342
@redjammie8342 4 жыл бұрын
It is not a sequence model. It was successfully used for sequences, but it's not a sequence model by definition.
@chandrahasp6697
@chandrahasp6697 9 ай бұрын
Really good work!
@ujjalkrdutta7854
@ujjalkrdutta7854 2 жыл бұрын
Elegant explanation. liked it
@Nino234mff
@Nino234mff 3 жыл бұрын
Thank you for the great work and the presentation!
@kaceangelo132
@kaceangelo132 3 жыл бұрын
i realize it is quite off topic but do anyone know of a good website to watch new movies online ?
@bakercain265
@bakercain265 3 жыл бұрын
@Kace Angelo try Flixzone. Just google for it =)
@ZobeirRaisi
@ZobeirRaisi 4 жыл бұрын
What this mean?: "since the transformer is a permutation equivalent some extra care is required to retain the 2d structure of the image."
@nicolascarion3111
@nicolascarion3111 4 жыл бұрын
The transformer isn't aware of the 2D structure of the image, because 1) we flatten it and 2) permuting the inputs of a transformer simply permutes its outputs (permutation equivariance). That's why we add 2D positional encodings. This is similar to what is done in NLP, to retain the order of the sentence.
@ZobeirRaisi
@ZobeirRaisi 4 жыл бұрын
@@nicolascarion3111 Thanks for your explanation. I have another question: Right now DETR because of rectangle bboxes of COCO-dataset produces rectangle-bboxes outputs, if we had polygon bboxes (8 points), which parts of the architecture must be modified to output a polygon shape bboxes?
@nicolascarion3111
@nicolascarion3111 4 жыл бұрын
@@ZobeirRaisi Well you need to modify the regression head as well as the loss and matching function (GiOU may not make sense anymore, so you'll likely have to stick to L1). For this kind of questions, it's best to open an issue on our github. Thanks!
DETR: End-to-End Object Detection with Transformers (Paper Explained)
40:57
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 329 М.
SHAPALAQ 6 серия / 3 часть #aminkavitaminka #aminak #aminokka #расулшоу
00:59
Аминка Витаминка
Рет қаралды 2,1 МЛН
End-to-End Object Detection with Transformers
26:08
Launchpad
Рет қаралды 7 М.
CV3DST - Transformers and DETR
1:03:23
Dynamic Vision and Learning Group
Рет қаралды 8 М.
Why Musk and Other Tech Execs Want as Many Babies as Possible | WSJ
6:55
The Wall Street Journal
Рет қаралды 131 М.
How to do Object Detection using ESP32-CAM and Edge Impulse YOLO Model
16:50
How might LLMs store facts | Chapter 7, Deep Learning
22:43
3Blue1Brown
Рет қаралды 576 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 996 М.
Attention in transformers, visually explained | Chapter 6, Deep Learning
26:10
Object Detection as a Machine Learning Problem - Ross Girshick
24:55