Anchor Boxes | Essentials of Object Detection

  Рет қаралды 9,551

Kapil Sachdeva

Kapil Sachdeva

Жыл бұрын

This tutorial highlights challenges in object detection training, especially how to associate a predicted box with the ground truth box.
It then shows and explains the need for injecting some domain/human knowledge as a starting point for the predicted box.

Пікірлер: 28
@user-gf8xk7zz3h
@user-gf8xk7zz3h 3 ай бұрын
Really helpful video, especially for someone new to Object Detection
@anghuynhnguyen9625
@anghuynhnguyen9625 5 ай бұрын
This list of videos is amazing. It helps me understand more about the task and also its components. I am looking forward to the Anchor Boxes Generation video. Hope it will be up soon!
@KapilSachdeva
@KapilSachdeva 4 ай бұрын
🙏
@kappa12385
@kappa12385 Жыл бұрын
Waiting for intuitive explanation of mAP. Although I know it mathematically, I am trying to understand it's real meaning. Will be waiting for your thoughts on it. Thanks a lot for making such high quality content.
@KapilSachdeva
@KapilSachdeva Жыл бұрын
Yes. Will explain mAP as part of this series. It is on my list.
@varunanand14064
@varunanand14064 Жыл бұрын
Hi, thank you so much for the amazingly intuitive explanation building up the motivation for anchor boxes from foundational concepts! Just had one quick question regarding the criteria used to assign anchor boxes to ground-truths. Do we assign a ground-truth to the anchor box it has the highest IoU with, or do we assign an anchor box to the ground-truth it has the highest IoU with? The former approach may result in an anchor box being assigned to multiple ground-truths (if we have objects very close together in the image) which may confuse the model when it tries to learn optimal offsets. The latter approach may result in some ground-truths not being covered but this can be mitigated by having "enough" anchors. Am I right in my understanding and can you please elaborate a bit on this?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
We assign an anchor box (or anchor boxes) to a ground truth. This means that to a ground truth box one or more anchor boxes can be associated. But a given anchor box can not be associated with more than one ground truth box.
@srikanthramakrishna1073
@srikanthramakrishna1073 Жыл бұрын
If I understand correctly, we pick one anchor box that has the highest IoU with the ground truth and then fine tune the center coordinates, width and height to get final predictions. Is NMS still required since we go with a single prediction?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
NMS is not used during the training. It is only used during the test phase as we do not have the ground truth during test/inference.
@mohammadyahya78
@mohammadyahya78 Жыл бұрын
At 8:06, you said, "we will be placing these 3 prior bounding boxes on all cells no matter if they have ground truth bounding boxes or not". Does this apply to YOLO as well pleasE?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
Yes
@lakshaydulani
@lakshaydulani Жыл бұрын
if I understood it correctly, Anchor Boxes are instructions about the dimensions for a particular class e.g. a cow's anchor box will be wide with low height a giraffe's anchor box will be tall this is done to refine the number of predictions with our prior knowledge of the domain right sir?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
Yes. Since we know the classes of objects we intend to predict will will create the anchor boxes with width and height that are suitable for those objects. I will describe this in the follow up tutorial.
@mohammadyahya78
@mohammadyahya78 Жыл бұрын
Great video. I did not understand the equation in minute 2:48 please? 416/13=32 what does it mean ?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
It shows that the feature map of 13X13 was obtained at the stride of 32. Or, the image dimensions were reduced by the factor of 32
@harshith_takkala
@harshith_takkala Жыл бұрын
so, each cell in feature map has 'n' anchor boxes and 1 out of n correspond to 1 out of (m+1) classes, where m is the number of ground truth classes. Its not necessary for n=m ?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
Not at all. N does not have to be same as M.
@mohammadyahya78
@mohammadyahya78 Жыл бұрын
Second question, why at 4:00, why comparing ground truth BB with all predicitons would cause the network to make many predicitons please?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
Network will still make many predictions but we only want to compute the loss for few of them that are near to the ground truth box. Also think about when you have many ground truth objects in an image. If you compare all anchor boxes with all ground box then network will not learn anything
@mohammadyahya78
@mohammadyahya78 Жыл бұрын
Does YOLOv2 does something similar to what you explained in 6:39 please?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
Yes
@srikanthramakrishna1073
@srikanthramakrishna1073 Жыл бұрын
So when you say three predictions per grid cell, did you mean you have three classes as an example?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
No it means there are 3 bounding boxes are being predicted per cell. For each bounding box there will be 4 numbers for the offsets, 1 for objectness and m numbers for classes. For e.g if you have 5 classes then m=5
@mohammadyahya78
@mohammadyahya78 Жыл бұрын
Why now we can apply criteria for selecting the bounding box at 7:11 but we were not able before please?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
Not sure I understood your question.
@BySpyZ
@BySpyZ Жыл бұрын
I don't understand something, we don't have the thuth box if we predict an object ? How can we now were is the ground truth whitout the coordinate ? thx
@KapilSachdeva
@KapilSachdeva Жыл бұрын
Not sure if I understand your comment completely. But here is an attempt - During training you do have the ground truth boxes. The concept of anchor boxes is that you start with some assumed prediction boxes and regress them towards the ground truth boxes. The reason to assume these anchor boxes at the start of training is because the image (and feature) space is vast.
@BySpyZ
@BySpyZ Жыл бұрын
@@KapilSachdeva Thank you very much for your reply, would you be available to contact you, as I have to do a presentation and would like some advice please.
Bounding Box Prediction | Yolo | Essentials of Object Detection
10:10
터키아이스크림🇹🇷🍦Turkish ice cream #funny #shorts
00:26
Byungari 병아리언니
Рет қаралды 27 МЛН
Must-have gadget for every toilet! 🤩 #gadget
00:27
GiGaZoom
Рет қаралды 11 МЛН
Дибала против вратаря Легенды
00:33
Mr. Oleynik
Рет қаралды 4,1 МЛН
Tom & Jerry !! 😂😂
00:59
Tibo InShape
Рет қаралды 59 МЛН
C4W3L08 Anchor Boxes
9:43
DeepLearningAI
Рет қаралды 135 М.
GIoU vs DIoU vs CIoU | Losses | Essentials of Object Detection
19:29
Kapil Sachdeva
Рет қаралды 3,6 М.
Focal Loss for Dense Object Detection
12:57
ComputerVisionFoundation Videos
Рет қаралды 32 М.
Bounding Box Formats | Essentials of Object Detection
7:56
Kapil Sachdeva
Рет қаралды 5 М.
Faster R-CNN: Faster than Fast R-CNN!
12:18
Soroush Mehraban
Рет қаралды 6 М.
YOLO Object Detection Explained for Beginners
35:34
AI Sciences
Рет қаралды 22 М.
Variational Autoencoder - VISUALLY EXPLAINED!
35:33
Kapil Sachdeva
Рет қаралды 11 М.
Secret Wireless charger 😱 #shorts
0:28
Mr DegrEE
Рет қаралды 2,3 МЛН
YOTAPHONE 2 - СПУСТЯ 10 ЛЕТ
15:13
ЗЕ МАККЕРС
Рет қаралды 138 М.
Gizli Apple Watch Özelliği😱
0:14
Safak Novruz
Рет қаралды 4,8 МЛН
Спутниковый телефон #обзор #товары
0:35
Product show
Рет қаралды 1,7 МЛН