3 Region Proposal Network

3 Region Proposal Network | Faster R-CNN

Рет қаралды 31,948

Күн бұрын

Explained Region Proposal Network with Proper Code Explanation.
Github link: github.com/Aar...
Basic idea of Faster R-CNN : • 1 Object Detection Usi...
Data Preparation for Faster R-CNN: • 2 Faster R-CNN | Objec...
VGG Network explained in detail: • Expalined VGG-16 With ...
If you have any questions with what we covered in this video then feel free to ask in the comment section below & I'll do my best to answer your queries.
Please consider clicking the SUBSCRIBE button to be notified for future videos & thank you all for watching.
Channel: www.youtube.co....
Support my channel 🙏 by LIKE ,SHARE & SUBSCRIBE
Check the complete Machine Learning Playlist : www.youtube.co....
Check the complete Deep Learning Playlist : www.youtube.co....
Subscribe my channel: www.youtube.co....
Contact: aarohisingla1987@gmail.com
A Faster R-CNN object detection network is composed of a feature extraction network which is typically a pretrained CNN. This is then followed by two subnetworks which are trainable.
The first is a Region Proposal Network (RPN), which is, as its name suggests, used to generate object proposals and the second is used to predict the actual class of the object.
The architecture of Faster R-CNN is complex.
We provide input image, from which we want to obtain:
a list of bounding boxes.
a label assigned to each bounding box.
a probability for each label and bounding box.
We will use VGG as a base network for extracting features.
Anchor Boxes:
Anchor boxes are some of the most important concepts in Faster R-CNN. These are responsible for providing a predefined set of bounding boxes of different sizes and ratios that are used for reference when first predicting object locations for the RPN.
Anchors are fixed bounding boxes that are placed throughout the image with different sizes and ratios that are going to be used for reference when first predicting object locations.
Non-maximum suppression (NMS)
NMS is the second stage of filtering used to get rid of overlapping boxes, because even after filtering by thresholding over the classes scores, we still end up with a lot of overlapping boxes.
#RPN #RegionalProposalNetwork #Faster-Rcnn #R-cnn #RCNN#PifordTechnologies #AI #ArtificialIntelligence #DeepLearning #ConvolutionalNeuralNetwork #CNN #ComputerVision #

Пікірлер: 80

@justinamichael316 11 ай бұрын

Clear with loads of information. Thank you.

@CodeWithAarohi 11 ай бұрын

Glad it was helpful!

@SimbarasheWilliamMutyambizi Ай бұрын

Wonderful explanation. Thank you soo much

@CodeWithAarohi Ай бұрын

I'm glad you found it helpful!

@billiemage890 3 жыл бұрын

Not much content about Faster rcnn on youtube , your explanations are so good helps me alot, thanks

@CodeWithAarohi 3 жыл бұрын

Glad my video helped

@omrastogi5258 4 жыл бұрын

Great Work, thanks for this video

@CodeWithAarohi 4 жыл бұрын

Welcome

@moekhan5158 3 жыл бұрын

Good explanation. Thank you

@CodeWithAarohi 3 жыл бұрын

You are welcome!

@shanmugasundari9729 15 күн бұрын

Yes mam... Your videos are insightful... Thank you for your knowledge sharing... I had one doubt mam... In RPN how bounding box regression will happen... How it compare the bounding box info of original image from the feature map? Because coordinates of feature map are different from original image right? Thanks in advance mam☺

@ss-dy1tw 3 жыл бұрын

Ur really simply amazing, Very simple explanation. Your videos are really good for quick understanding and refreshing all concepts very quickly. I am pretty sure that you will reach big heights in your career. U rocks always Aarohi.

@CodeWithAarohi 3 жыл бұрын

Glad my videos are helpful. And thanks for appreciating my work. Such Comments give me motivation to work even more harder.

@mondal1839 4 жыл бұрын

Friendly Explanation. Got a plethora of information for my undergraduate thesis.

@CodeWithAarohi 4 жыл бұрын

glad my video is helpful

@ashadahmad6493 4 жыл бұрын

Hi, explanation is good. Thank you so much for making this video

@CodeWithAarohi 4 жыл бұрын

You are welcome

@nag9502 2 жыл бұрын

Your voice is very low from video No.8, But all the videos are very useful. Thank you very much.

@CodeWithAarohi 2 жыл бұрын

Sorry for that. I took care of that in my recent videos.

@prasantas2195 3 жыл бұрын

Its a superb one Aarohi. Except some audio issues in your upload, it is all good. You have amazing clarity of thought. Thanks a lot

@CodeWithAarohi 3 жыл бұрын

Thankyou for appreciating my work and your feedback on audio issue.

@saravananbaburao3041 4 жыл бұрын

Awesome explanation👍👍 Becoming fan of ur lecturers 👍👍

@CodeWithAarohi 4 жыл бұрын

Thankyou 😊

@surflaweb 4 жыл бұрын

Perfect explanation!

@CodeWithAarohi 4 жыл бұрын

Thankyou

@ziaulhassan9042 2 жыл бұрын

Your videos are amazing, love from Pakistan, please just try to improve audio parameters

@CodeWithAarohi 2 жыл бұрын

Glad my videos are helpful. Yes, I have improved the audio now 😊

@vinothbose 4 жыл бұрын

Excellent explanation

@CodeWithAarohi 4 жыл бұрын

Thanks

@israrahmed7951 4 жыл бұрын

@@CodeWithAarohi Mam I have sent you on mail my sample dataset please look in to it. Thank you

@yagmuraktas2423 2 жыл бұрын

Hello, thank you for this clear and simple explanation! I still have a question on my mind and I can't find the answer anywhere. It is said that the anchor box generation is applied by a fixed scale and size factors. Let's say 3 different aspect ratios (1:1, 1:2, 2:1 ) and 3 different scales (128, 256 and 512) for each anchor point to generate 9 anchor boxes. I cant understand where we apply these scales and aspect radios. We just apply 3x3 convolution which gives us anchor boxes then apply paralel CNNs for objectness score and location of the box. How we really apply this box size scale condition and obtain the locations I cant get it

@himanshumangoli6708 2 жыл бұрын

In first lecture of faster RCNN you have said that we will apply anchor boxes of various size, so after applying VGG-CNN then we apply convolution ??My question is that at which place we apply anchor boxes in the image after applying convolution which is after VGGCNN Or something other

@samratsinghrathore7240 3 жыл бұрын

from the image it seems that both classification and regression happen in parallel. Then how it happens? I mean how it proceeds i.e. if there exits no object in the image then what would be the regression do? WHere would the cnn performing regression draw the bounding box. In short in context it seems both these classification and regression layers collaborate to work together but the model image shown states that they work in parallel then how do both of them coolaborate?

@krishnakrish9658 2 жыл бұрын

I have doubt can u please answer How rcnn works on plant leaf disease detection

@CodeWithAarohi 2 жыл бұрын

Process is similar. Just provide your plant dataset to Algorithm as input

@krishnakrish9658 2 жыл бұрын

@@CodeWithAarohi I mean how rcnn will decide the particular regions on what bases. How can I msg to u for clarifying my doubts . I have so many doubts regarding this please help me

@namandalsania 2 жыл бұрын

Can we use any other algorithm in place of VGG in the Pretrained CNN step ?

@CodeWithAarohi 2 жыл бұрын

Yes

@ThiruSankaravelu 4 жыл бұрын

Thanks a lot

@CodeWithAarohi 4 жыл бұрын

Happy to help

@shireeshkumar6631 2 жыл бұрын

@Code With Aarohi , can we use any other model other than vgg here, say resnet or something else?

@CodeWithAarohi 2 жыл бұрын

yes

@vijaysreen4912 3 жыл бұрын

Hi Aarohi, Thank you so much for such a clear explanation. Could you also please help with testing this on new test images

@CodeWithAarohi 3 жыл бұрын

Yes, soon

@prasaddalavi9683 3 жыл бұрын

i have one simple doubt that if CNN output is flat 1D array then how RPN is processing that by apply conv on that?

@CodeWithAarohi 3 жыл бұрын

The input to RPN is feature maps extracted from VGG. In VGG network here, we are not using the last layer for which we need to convert the images into 1D. This vgg is a partial network. Lets say we have used only 10 layers of VGG and after 10 layers, the feature map which we have - give to RPN for further working.

@prasaddalavi9683 3 жыл бұрын

@@CodeWithAarohi Got it! Thank you for your quick response Arohi!.

@bluebox6307 3 жыл бұрын

In case of a binary classifier, shouldnt the dimensions of the classfier be Conv, 1*1, 1*9 then?

@loftyTHEOWNER 3 жыл бұрын

Yes, that is what I think too, but the paper from Microsoft, my professor and even her, explain that in theory, the classification part contains half of the channels of the regression part. Can't find an explanation for that.

@danaali1710 3 жыл бұрын

Hi, thank you please answer my question. if my project consists of detecting one class of object , should i annotate my images and create bounding box.

@CodeWithAarohi 3 жыл бұрын

yes

@DeepakSingh-le6di 3 жыл бұрын

x_class will be 9*14*14, how are you predicting class from this as they are feature maps ? similarly for x_regr 36*14*14, can you explain this ?

@shivamkumar-qp1jm 2 жыл бұрын

I think She should use some pooling layer but did not mention here

@wenbofeng4516 3 жыл бұрын

Can't find another explanation make more sense to me! Thank you! One more question, why we put the feature map of an image as the input of RPN, but not the image itself? How is it proved to have a better performance? Thank you!

@CodeWithAarohi 3 жыл бұрын

We give feature maps instead of whole image because it is a better idea to work on selected parts of image where objects are present instead of processing a whole image

@anig8298 3 жыл бұрын

Feature maps come from base network in this case it is pretrained VGG network. VGG weights are already trained on ImageNet data-set which has millions of images with 1000 classes. So these pretrained weights already know the most frequently occurring important shapes from images like circle, ovals, triangles, lines or other geometric features. Faster rcnn leverages that and proposes region having above mentioned shapes as a interesting regions from feature maps where most probably object of our interest is present. In short pretrained weights can filter out unnecessary part from input and reduces further processing...

@abdelrahmanabdelhadi4195 3 жыл бұрын

Hello , this is an amazing video , I have one question though, why didn't you use a pretrained VGG instead of building one from scratch?

@CodeWithAarohi 3 жыл бұрын

Glad you liked my video and there is no specific reason for choosing the VGG instead of Pretrained VGG. You can use pretrained VGG network.

@pretish97 3 жыл бұрын

Hello Mam, great video. thanks for that. Please can you brief on how to train the FRCNN with vgg model and test new images on it.

@CodeWithAarohi 3 жыл бұрын

Okay sure

@RajKumar-jk9dm 2 жыл бұрын

Volume of lecture is very very low, keep this in care while recording others lectures

@CodeWithAarohi 2 жыл бұрын

I took care of low voice after this video but thanks for sharing

@jaysoni7812 3 жыл бұрын

Hello ma'am tour explanation is pretty amazing but still i have 3 questions 1) Why you use partial vgg function i mean we have a vgg16 model with weights is already present inside keras.applications library you know 2) In some other tutorial of Faster RCNN i have seen that they use coco pretrained model from tensorflow object detection so i understand your RPN network which you wrote by your self but i'm also confusing between your code and pretained model means from ANN to CNN we used to write code or neurons by our self so why this coco model comes into the picture of object detection and what is main use of it? 3) How can we test and save this faster r-cnn model for future use or creating API

@CodeWithAarohi 3 жыл бұрын

Hi, Answer to your 1st question is You can use that vgg16 trained model which is already available but I am using this vgg here just to make this concept clear that if anyone want to use their own vgg16 network then how they can use it. Partial vgg16 simply means that I want to pick the features from in between the CNN. But if you want, you can pick features from last layer of vgg16. Answer to 2nd question: Every person run code in their ways. So use COCO model if you want to make Object detector which can detect the categories which are available in COCO dataset. Otherwise no need of it. 3. For saving model use this: model.save('path/to/location') and for using saved model use this: from tensorflow import keras model = keras.models.load_model('path/to/location')

@jaysoni7812 3 жыл бұрын

@@CodeWithAarohi okay i got it. I have question that in real world scenario what would you prefer trained vgg16 or partical vgg16? and i'm confuse with COCO model can you please tell me why people use COCO model because i think it's better to train our own model bcz everyone has different dataset and what you prefer for real world scenario?

@loftyTHEOWNER 3 жыл бұрын

@@jaysoni7812 COCO is good at segmenting different parts of the image, classifying them too. But, when 2 dogs, for example, are close together, COCO identifies them as 1 single region. To separate there segments, we use bboxes, so now we can color the 2 dogs with different colors.

@sahhaf1234 4 жыл бұрын

Hi, The explanation is good. But it seems that in the code there are not 16 but 17 layers. Also, is the code written in tensorflow?

@CodeWithAarohi 4 жыл бұрын

Code is written in tensorflow

@sahhaf1234 4 жыл бұрын

@@CodeWithAarohi Thanks AArohi. You are great..

@CodeWithAarohi 4 жыл бұрын

@@sahhaf1234 welcome

@tugbaevci9737 3 жыл бұрын

Please turn on auto subtitle

@sruthikuriakose7711 3 жыл бұрын

Thank you! I just have one issue - the audio seems to be too low(just for this one in the Faster RCNN playlist ), and there aren't any transcripts. Do you have a blog or any useful links I could refer to?

@CodeWithAarohi 3 жыл бұрын

Sorry about that... Don't have any blogs

@ArunKumar-sg6jf 4 жыл бұрын

How u decide number of neurons in conv,by which method

@CodeWithAarohi 4 жыл бұрын

I am not deciding the neurons myself. We are using neurons as per the paper of FAster RCNN

@anig8298 3 жыл бұрын

Mostly it decided by experimentation which balances trade off between accuracy vs time (model complexity)

@toonepali9814 4 жыл бұрын

Thank you for the explanation. Subscribed. Can you however help my confusion? In the 'red box' for RPN, the bottom Conv layer is used to predict whether there is object or not and I believe once it finds the object it applies the bounding box(hence 2*9 in that layer). If it does, why do we need the regressor step to adjust the bounding box?

@CodeWithAarohi 4 жыл бұрын

bottom conv (Classifier) tells the probability of having the object. Regression is for the coordinates

@loftyTHEOWNER 3 жыл бұрын

@@CodeWithAarohi You are not replying to the question though. @TooNepali do you think so? that the second dimension in the classification is selecting the bbox? But then it looks weird because we have as each pixel from the previous activation map with a dimension of 2x9, in which I put a classification score in the first column (1 for each anchor), and I can't understand the use of the second column.