C4W3L09 YOLO Algorithm

Рет қаралды 216,665

Күн бұрын

Take the Deep Learning Specialization: bit.ly/2PQaZNs
Check out all our courses: www.deeplearning.ai
Subscribe to The Batch, our weekly newsletter: www.deeplearning.ai/thebatch
Follow us:
Twitter: / deeplearningai_
Facebook: / deeplearninghq
Linkedin: / deeplearningai

Пікірлер: 66

@ujjalkrdutta7854 3 жыл бұрын

i have read multiple blog posts on yolo, along with the original paper, but this video provides the intuition at a different level. amazing !

@andresfernandoaranda5498 4 жыл бұрын

Same concept is used in YOLO v3, but instead of softmax activation for all classes, logistic regression is applied to each class (meaning there can be an object belonging to two classes)

@vaibhavsingh1049 5 жыл бұрын

So, at 1:49 we give the pc value to the 2nd anchor box because it had more IoU and not to the 1st. So to generalize, check if there's something worth in the grid; if there is, assign the associated pc value to anchor box with the highest IoU.

@PoRouS22 5 жыл бұрын

Thank you very much for all your YOLO videos. They are just great :)

@PoRouS22 3 жыл бұрын

@Kohen Dominick ...

@manuel783 3 жыл бұрын

YOLO algorithm *CORRECTION* At time 5:00, for the slide titled "Outputting the non-max suppressed output" the text should read "For each grid cell" instead of "For each grid call".

@sahil-7473 3 жыл бұрын

I am not clear how will it's work at Inference time? How can I get model output BB into original image format? Kindly give me the mathematics how to compute it?

@haroldsu1696 6 жыл бұрын

Thank you Andrew !

@Denmark_ 5 жыл бұрын

Is this YOLO or YOLO 9000? According to the YOLO paper, I think the y should be 3x3x((2x5)+3), so y is 3x3x13. Is this right?

@waqasmalik4657 5 жыл бұрын

Is yolo is a deep learning algorithm???

@ngantrieuninh9871 4 жыл бұрын

I read some documents and I know yolo use HSV, can you explain for me why?

@haojiechen4284 5 жыл бұрын

Clear and good ecough. Thank you.

@nipunaviduranga6614 4 жыл бұрын

How to define anchors boxes boundary

@manikantabandla3923 Жыл бұрын

Is Non-Max suppression used during training?

@GaganDaroach 3 жыл бұрын

Is this a graduate or undergraduate level course?

@BSelm05 3 жыл бұрын

the best AI teacher, thank you

@zatizsumkoolshyte 4 жыл бұрын

amazing educator

@guardrepresenter5099 5 жыл бұрын

Is someone else tell me training time we are using anchor box terminology become boundingbox in prediction time is that right?Prediction time acnhorbox not using only boundingbox right?

@AISHORTS9797 4 жыл бұрын

Yes you are correct.Anchor box is only used to see the IOU matching with the ground truth bounding box. If the value of IOU between anchorbox and ground truth box of particular object is greater than 0.5 then we will consider the anchor and for that classlabels are [object confidence as 1(object with which IOU>0.5),bounding box coordinates of that object,classlabel as 1 for that object and zero for remaining].

@Sniper-rl3xq Жыл бұрын

Amazing tutorial!! thank you so muchh

@sandipansarkar9211 3 жыл бұрын

great explanation

@ritwek98 3 жыл бұрын

Thank you!!

@theunknown2090 5 жыл бұрын

How to get the programming exercise

@maheshmedam1672 6 жыл бұрын

@5:28 How come bounding boxes size are different ? How is the bounding box size changing ?

@akanksharathore3946 5 жыл бұрын

Have the same question. How exactly are the bounding boxes being predicted at every step?

@pradeepkumar-qo8lu 4 жыл бұрын

Bounding boxes are parameters to be learned/trained basically a continuous/regression output hence the bounding boxes change in size but not anchor boxes they are fixed in size

@lakshmanvengadesan9096 3 жыл бұрын

Let's say I have an object in 3 of the grid cells. Then, the outputs of all the 3 of the grid cells should be identical, with the same values of bx,by, bh,bw. Am I correct?

@MrAmgadHasan Жыл бұрын

Not really. The object should be assigned to the cell that contains the object's center. The remaining cells should predict 'background'

@maximlopin 4 жыл бұрын

thank you

@lorryzou9367 Жыл бұрын

If we divide the image into 3*3=9 small boxes, why do we still need bx, by, bh, bw these box coordinate variables?

@maker72460 2 ай бұрын

Object may not lie in the center of that grid. Bounding boxing coordinates will specify the fitting bb for the object

@rpcruz 6 жыл бұрын

This algorithm simplified the bounding box regression by having a 3x3 (or some other) grid output, right? What I didn't understand is how anchor boxes are used in this algorithm...

@smart_world7928 6 жыл бұрын

YOLO and faster R-CNN have something in common and that is anchor boxes which are used to simulate the famous commonly used image pyramids as we see in SVM classifiers conventional training. Regardless of why we use anchors, YOLO v2 uses 5 anchor boxes (instead of 9 , unlike Faster R-CNN) for each cell of the 3x3 grid here. Faster R-CNN uses 9 of them but not for each cell of the grid and slides them on conventional feature maps resulted from the intermediate layer of a CNN. As far as I understood, YOLO used 5 bounding boxes at each cell in the 13x13 output feature map and predicts 5 coordinates for each bounding box. Since they constrain the location prediction (by using grids that Faster R-CNN does not use) the parametrization is easier to learn and it makes the network more stable. Hope you got the point. ;)

@beandog5445 5 жыл бұрын

@@smart_world7928 k

@azharhussian4326 3 жыл бұрын

Yeah I didn't get it too. It is just increasing the size of the output, instead of making one predictions per cell, now the algorithm makes two predictions per cell. I don't understand what is the purpose of predefined boxes here.

@thelonespeaker 3 жыл бұрын

@@azharhussian4326 Did you get it in the end? Been through every post on the internet about anchor boxes and no one has come close to correctly explain how they are used in the process. Pretty frustrating.

@azharhussian4326 3 жыл бұрын

@@thelonespeaker Yeah still kind of stuck. But anchor boxes are usually used in the loss functions.

@adityarajora7219 4 жыл бұрын

at 6:01 how this lady's bounding box is made.......because there is separate CNN for each grid cell....can somebody explain ?

@sanjivgautam9063 4 жыл бұрын

2 bounding boxes from 2 anchor boxes. Maybe this question comes because of your previous question in previous video.

@alexter-sarkisov8321 4 жыл бұрын

OK, so how many objects can one cell of YOLOv1 predict? The article says 'we only predict one set of class probabilities per grid cell regardless of the number of boxes'? It seems that the article skirts around the fact that the model can only predict at most 1 object/cell, but the wording above does not exclude, for example, the case when all B objects belong to the same class. So how many?

@mager8460 Жыл бұрын

I think the answer is: One cell can predict, at maximum, one object for every anchor box.

@ricoaditya1 3 жыл бұрын

How to get value c1, c2, c3?

@polimetakrylanmetylu2483 2 жыл бұрын

c1 c2 and c3 are the classification part of an algorithm. They basically mean 'if the bounding box intersects an object, what is its type'. During training, your data should be annotated, so each bbox should have position and class if applicable. When you train your nn, you check if a box is over some iou with an object, and if it is then you train c1 c2 c3 like any other classifier

@omihalikar7799 5 жыл бұрын

what happens when the expected training output was close to bounding box 1, but the output of the network was 2 and coordinates of the box on expected output were incorrectly marked close to 1 whereas they should have been close to 2

@danielkusuma6473 2 жыл бұрын

Thanks for the video, it brought me back to light:) I however still have a question: In the Yolo v1 paper it is described that the final convolutional output layer is a tensor of 7x7x1024 dimension (Darknet), then the detection follows, where grid cells dimension of 7x7 are defined. My assumption here is, since the dimension of the conv output the same as the grid cell's, can one say that one grid cell represents one pixel, hence the detection proceeds one 'pixel' at a time?

@MrAmgadHasan Жыл бұрын

One grid cell represents a small 'crop' (e.g. 20x20 pixels) of the image, not necessarily a single pixel. Another thing to note: the algorithms processes all grid cells simultaneously in one shot. It doesn't process them sequentiall.

@supriamir5251 Жыл бұрын

@@MrAmgadHasan how the yolo can predict the final output for 3 different scale? Yolov3 have 3 scale with different feature map

@mehranmehralian4608 5 жыл бұрын

@0:56 I think something is wrong. According to YOLO paper we have [S,S,(B * 5 + C)] which means each cell has C classes but here you said that each anchor box has C classes or [S,S,B*(5+C)].

@yumik4990 5 жыл бұрын

[S,S,(B*5+C)] is YOLOv1. He is talking about YOLOv2. The two models are pretty different in input encoding and also in the definition of loss, if I understood correctly. fairyonice.github.io/Part_4_Object_Detection_with_Yolo_using_VOC_2012_data_loss.html

@aicoding2010 2 жыл бұрын

@@yumik4990 YOLOv1 predict 2 boxes for each grid cell. I saw that YOLOv1 doesn't predefine 2 anchor boxes but it predicts based on the grid cell. And YOLOv1 assign only 1 box to grid cell with object with the highest IoU in the training process. I don't know how we can do this in the training process. If we run 1-st epochs, box 1 can have higher IoU, but when we run 2-nd epoch, box 2 can have higher IoU.

@dhidhi1000 4 жыл бұрын

How is this grid cell segmentation actually encoded in the neural network? Is it encoded at all? If I understood correctly, the segmentation is only encoded into the training data, and the network is supposed to "learn" to output the y=3x3x16 that matches the locations of the objects relative to the grid cell on the training data. In other words, the network has no information about any image grid.

@TragicGFuel 13 күн бұрын

In the previous videos, its shown that the grid is actually the "cut down" version of the image after being through multiple convolution layers. That's the grid!

@RH-mk3rp Жыл бұрын

what are the values of don't care question marks? Is it up to the labeler or is there a convention?

@vaneEAE 10 ай бұрын

Did you get the answer?

@jamieabw4517 7 ай бұрын

@@vaneEAE from my research it seems its just up to the labeler, i havent found any convention anywhere

@vaneEAE 7 ай бұрын

@@jamieabw4517 I saw that these terms are not considered in the loss function. Therefore, it is of no interest to know what value these terms take.

@alexdalton4535 2 жыл бұрын

what if an object spans more than one grid cell?

@ab452 Жыл бұрын

They often do, bh and bw ground truths are defined taking into account the size of the orignal image. Meaning that one object on a cell can have bh, and bw that go behind the boundaries of the cell. That's not a problem because you do regression towards these values. the cell serves only to mark where the object should be detected. If the center point of the object is one cell then, the target vector for that cell is the only one that will have bh bw x, y for that object.

@AutomationExpertry 4 жыл бұрын

source code?

@munzutai 4 жыл бұрын

Just wanted to let you know that this video has been ripped and re-uploaded: kzbin.info/www/bejne/aYHZZ2mYntaWZ6c

@neileapenninan8706 4 жыл бұрын

I know I thought that too! Actually Andrew Ng is the founder of Deeplearning.ai so technically it isn't a reupload he must've just want to consolidate everything