YOLOv2 (YOLO9000) and YOLOv3 Explained
53:49
Faster R-CNN PyTorch Implementation
1:07:36
Fast R-CNN Explained | ROI Pooling
33:10
R-CNN Explained
33:56
10 ай бұрын
Пікірлер
@alivecoding4995
@alivecoding4995 2 сағат бұрын
:) thanks!!!
@alivecoding4995
@alivecoding4995 2 сағат бұрын
Does YOLOv7 have a similar architecture?
@Explaining-AI
@Explaining-AI Сағат бұрын
Hello, there are some similarities in terms of presence of pyramid pooling and top down bottom up pathways, but the design of those blocks are quite different. Also yolov7 uses E-ELAN rather than CSP residual blocks. If you are interested then do take a look at this paper - arxiv.org/pdf/2304.00501 . It provides highlights and changes of all different yolo versions. For YOLOv7, refer to Figure 16 (Page 21).
@Coldgpu
@Coldgpu 22 сағат бұрын
yoyo
@ansalrobinson
@ansalrobinson Күн бұрын
Can you please share prediction code
@Explaining-AI
@Explaining-AI 2 сағат бұрын
Hello, the repo (github.com/explainingai-code/SSD-PyTorch/blob/main/tools/infer.py) has prediction as well as evaluation code.
@Bioinforere99
@Bioinforere99 Күн бұрын
Best explnation of Denoising Diffusion Probabilistic Models!
@jeffmacleod5194
@jeffmacleod5194 3 күн бұрын
So much great info in 8 minutes. Thank you so much!
@Explaining-AI
@Explaining-AI 3 күн бұрын
Thank you for the appreciation :)
@moiirani8827
@moiirani8827 4 күн бұрын
The quality of the writings too poor to see the equations
@坨坨王
@坨坨王 5 күн бұрын
thank you so much, this video is very helpful to me. you are very generous.
@Explaining-AI
@Explaining-AI 4 күн бұрын
I'm happy that the video ended up being of help to you :)
@davidjennicson5614
@davidjennicson5614 6 күн бұрын
Hey I am a new subscriber can you explain the implementations of LayoutLMV3 and UDOP and help implementing from scratch
@nickbakker9747
@nickbakker9747 7 күн бұрын
Can you explain to me why there is a break on line 114 in the train_torchvision_frcnn.py. It now looks like because of the break it will only use one batch and than break out of the epoch? I really like your videos thankss
@Explaining-AI
@Explaining-AI 7 күн бұрын
The only explanation is my oversight :D I must have been debugging something before pushing the code, and ended up forgetting to remove the 'break' at the end. My apologies for the confusion, and thank you so much for pointing it out. Have fixed it now in the repo.
@luisangeld9894
@luisangeld9894 10 күн бұрын
I am trying to use this model to train coco, but I am having issues using it, seems the model is veru structured to be trained on PASCAL VOC, any idea how can I adapt it to COCO? great video
@Explaining-AI
@Explaining-AI 3 күн бұрын
Hello, apologies for the late reply. I think the model should work once you set the right number of classes. But you would need changes in the dataset class. If you are still facing problems after making the dataset class changes(or if you need help with that), can you please open an issue on the repo and I can try to help resolve that.
@alivecoding4995
@alivecoding4995 12 күн бұрын
If you look at the latent space images at 37:26 you cannot believe the decoder can re-generate the original image from it. As there is simply a lot of missing information. Any explanation on how it does it? First I thought this is due to original image information leaking through through skip-connections between down and up blocks. But we are not using those in the auto-encoder.
@frimlinso1894
@frimlinso1894 12 күн бұрын
In the sampling algorithm (algo 2), I don't understand why we have to add noise z back in. Can anyone explain this to me?
@Explaining-AI
@Explaining-AI 9 күн бұрын
In the reverse process, at each time step we have a distribution (P(xt-1|xt)), which is a gaussian(N(mu_t-1, sigma)). We use the prediction of noise at each timestep to compute the predicted mean, mu_theta. The adding noise part is actually reparameterization trick to sample from the predicted P(xt-1) distribution. Which is why we sample a random noise z, shift it by the mean of this predicted distribution and then scale it by sigma. Also, if we straightaway use mu_theta(so always return mean instead of reparameterization trick to sample from P(xt-1)), then the entire reverse process would end up being deterministic.
@frimlinso1894
@frimlinso1894 8 күн бұрын
@Explaining-AI that makes sense, thank you very much!
@Martingrossman78
@Martingrossman78 16 күн бұрын
Hi, great explanation of RCNN with very useful insights which often are skipped. I am especially grateful for answering questions like "Why SVM, Why different IOU Thr, etc."
@Explaining-AI
@Explaining-AI 15 күн бұрын
Happy that you found the explanation helpful!
@rannvijaysingh1
@rannvijaysingh1 16 күн бұрын
Brother so good keep it up
@arizmohammadi5354
@arizmohammadi5354 17 күн бұрын
Can we train our dataset with vq-gan (without transformer) and then use it in train_ddpm_vqve?
@Explaining-AI
@Explaining-AI 15 күн бұрын
Hello, I might be misunderstanding your question so do let me know if thats the case. But for stable diffusion we don't need transformer. In the repo (github.com/explainingai-code/StableDiffusion-PyTorch?tab=readme-ov-file#training), what you are mentioning, is exactly what I implemented. Train VQVAE+PerceptualLoss+Discriminator (same as VQ-GAN without transformer, which is only needed for generating new latent images) on a dataset. Once auto-encoder part of VQGAN is trained, we then save the latent representations for all the training images, using the trained encoder of VQGAN. Finally, use these latent representations, to train LDM. We don't need to train transformer, as thats for generating new latent images , for which we are using DDPM.
@arizmohammadi5354
@arizmohammadi5354 11 күн бұрын
@@Explaining-AI Thank you so much for your kind reply. Yeah, you are right.
@maximestudio2513
@maximestudio2513 18 күн бұрын
Can you explain us how Multi-view Diffusion Base Model works please
@Explaining-AI
@Explaining-AI 15 күн бұрын
Hello, have added this to my list. But since I am not familiar with it, as of now, will take me some time to cover this.
@maximestudio2513
@maximestudio2513 15 күн бұрын
@@Explaining-AI thank you, you are amazing
@hammadkhan7927
@hammadkhan7927 19 күн бұрын
Can you please share the notes of all object detection videos
@yoverale
@yoverale 20 күн бұрын
Just what i was needing. Thanks 🙏🏻
@hoangduong5954
@hoangduong5954 21 күн бұрын
Please answer me, how can they train CLASS SPECIFIC bounding box regressor? So they input class as input in one model and regress the bounding box or they build multiple(if they detect 6 class then we build 6 model) model and each model they train on specific bounding box regressor? Please answer me
@Explaining-AI
@Explaining-AI 21 күн бұрын
Hello, I have tried to explain a bit on this, do let me know if this does not clarify everything for you. This is how the official rcnn repo does it. We create as many box regressor models as there are classes. Then we train each of these regressors separately using proposals assigned to the respective classes. github.com/rbgirshick/rcnn/blob/master/bbox_regression/rcnn_train_bbox_regressor.m#L76 During inference, given the predicted classes for proposals, we use the trained regressor for that class to modify the proposal . github.com/rbgirshick/rcnn/blob/master/bbox_regression/rcnn_test_bbox_regressor.m#L58-L65
@Explaining-AI
@Explaining-AI 21 күн бұрын
Btw you could also do this by one fc layer. Lets say you have 10 classes. Then your bounding box regressor fc layer predicts 10 times 4 , 40 values. These are tx ty tw th for all 10 classes. Then during training, the bounding box regression loss will be computed between the ground truth transformation targets and prediction values at indexes corresponding to ground truth class. At inference, you take the class index with highest predicted probability value. The predicted tx, ty,tw, th are then the 4 values(from 40) corresponding to this highest probable class.
@hoangduong5954
@hoangduong5954 20 күн бұрын
@@Explaining-AIthank you alot!!!! I am fully understand it now. So they do train multiple models and choose the model based on class. Thats crazy though!
@adeirman2705
@adeirman2705 21 күн бұрын
Please create yolo panoptic sir, it would be a huge help and it has so many applications
@Explaining-AI
@Explaining-AI 21 күн бұрын
Added this to my list. Will try to get to this as soon as I can.
@arizmohammadi5354
@arizmohammadi5354 22 күн бұрын
It was great! Good luck!
@BlueQuantum
@BlueQuantum 22 күн бұрын
why gussian noise only added. Not Rician, Laplacian etc.. there are so many other probability distribution.
@Explaining-AI
@Explaining-AI 21 күн бұрын
Hello, have replied to something similar here(highlighted comment) - kzbin.info/www/bejne/fmWYnXlqqLqan6c&lc=Ugznn1UksOPa3NfWLXR4AaABAg
@Kamalsai369
@Kamalsai369 22 күн бұрын
Bro no one in this platform explained clearly as much as you Thankyou for providing these lectures for free of cost . I think for paid courses also no one can explain this much thankyou again
@Explaining-AI
@Explaining-AI 21 күн бұрын
Really happy that you found the explanation helpful :)
@rishidixit7939
@rishidixit7939 24 күн бұрын
Subscribed
@sartq_333
@sartq_333 24 күн бұрын
one of the finest videos on yolo available on internet. contains intuitive as well as detailed explanation (right from research paper). concepts like these are hard to explain in so much detail. thanks a lot for the amazing work, cheers!
@Explaining-AI
@Explaining-AI 21 күн бұрын
Thank you for this comment :)
@Mahan_Veisi
@Mahan_Veisi Ай бұрын
Fantastic video! You’re undoubtedly on your way to becoming one of the top lecturers in Generative AI. I’m excited to see more of your work in the future!
@Explaining-AI
@Explaining-AI Ай бұрын
Thank you so much for your words of encouragement and support :)
@Andrey41k
@Andrey41k Ай бұрын
Thanj you very much for the video, it is very interesting. Though, I have one question, at 15:40 timestanp. You mention that there may be a situation, where ground truth box doesn't have big IoU with any of anchor boxes. How do we pick these anchor boxes (i just cant get which methodology we have to follow when picking the dimensions for anchor boxes)?
@Explaining-AI
@Explaining-AI Ай бұрын
Thank You! For faster rcnn, that is the reason why we add low overlap anchor boxes as well(if they are indeed the best anchor box available). Here the authors did not tune anchor box selection at all for a dataset, they just pick one which captures large enough variation in terms of scale and aspect ratio. In models like yolov2 , they use the anchor box strategy but use k means to pick the the best anchor boxes. So once you use k means on your ground truth box dimensions, you end up with cluster centres that are good representatives of bxo dimensions in your dataset. These cluster centres then become a good choice for your anchor boxes width and height.
@alivecoding4995
@alivecoding4995 Ай бұрын
How does self attention work in convnets (instead of transformers)? 😊
@Explaining-AI
@Explaining-AI Ай бұрын
After a reshape of the input, the self attention works exactly same as transformers. Assuming you have a BxCxHxW feature map at a certain stage of network. Then during self attention you reshape it into Bx(H*W)xC. Now it becomes very similar to how you would have seen it in transformers. H*W is the number of grid cells(tokens) and C is the embedding dimension of each token. We just compute attention between all spatial grid cells.
@alivecoding4995
@alivecoding4995 29 күн бұрын
@ Thank you 😊
@robbegeusens1302
@robbegeusens1302 Ай бұрын
Great video, but why do you add 1E-6 when calculating your IOU?
@Explaining-AI
@Explaining-AI 29 күн бұрын
Thank You! That is just for ensuring the iou method never ends up doing a division by 0, like say in some degenerate case where bounding box area is zero(of both gt and prediction). That just makes the iou computation numerically stable no matter what the predicted and ground truth box is.
@raihanpahlevi6870
@raihanpahlevi6870 Ай бұрын
what do you mean by topk proposals 2000, is this from the single image we take 2000 proposals?
@Explaining-AI
@Explaining-AI 29 күн бұрын
Hello @raihanpahlevi6870, Yes thats correct. 2000 proposals are taken from a single image.
@yusuphajuwara1490
@yusuphajuwara1490 Ай бұрын
Thanks for this wonderfully intuitive video! It provided a fantastic breakdown of the fundamentals of diffusion models. Let me try to answer your question about why the reverse process in diffusion models is also a (reverse) diffusion with Gaussian transitions. Why Reverse Diffusion Uses Gaussian Transitions 1. Forward Diffusion Introduces Noise Gradually Remember the β term? In the forward process, β is chosen to be very small (close to 0). This ensures that Gaussian noise is added gradually to the data over many steps. Each step introduces only a tiny amount of noise, meaning the transition from the original image to pure noise happens slowly and smoothly. This gradual noise addition is crucial because it preserves the structure of the data for longer, making it easier for the reverse process to reconstruct high-quality images. If we added large amounts of noise in one go, like in VAEs, the original structure would be harder to recover, leading to blurrier reconstructions. 2. Reverse Diffusion Needs "Gaussian-Like" Inputs The forward process only involves adding isotropic Gaussian noise at each step. This means the model learns to work with samples that are progressively noised in a Gaussian way. However, in the reverse process, when the model predicts the noise at each step, the resulting sample isn't guaranteed to remain Gaussian-like. To fix this, after subtracting the model's predicted noise, we add a small Gaussian noise with a carefully chosen variance. This step helps "Gaussianize" the sample, ensuring it aligns with what the model expects at the next time step. This small added noise smoothens any irregularities and makes the reverse process more stable, resulting in higher-quality outputs. Step-by-Step Noise Removal The reverse process works by removing noise step-by-step, moving from pure noise back to a clean image (closer to x0 ). This gradual approach is crucial because predicting small changes (i.e., removing a little noise at a time) is much easier for the model than trying to reconstruct the clean image in one big jump. This is why diffusion models produce sharper and more realistic images compared to VAEs, where predictions often result in blurry outputs due to the lack of such gradual refinement.
@mariolinovalencia7776
@mariolinovalencia7776 Ай бұрын
Excellent. Complete, clear and to the point
@GouravJoshi-z7j
@GouravJoshi-z7j Ай бұрын
I am new to this field can anyone provide me with the prerequisites to understand this video
@Explaining-AI
@Explaining-AI 29 күн бұрын
Hello @GouravJoshi-z7j, I think this list covers the pre-requisites . Gaussian Distribution and its properties .......Mean/variance of adding two independent gaussians Reparameterization trick Maximum Likelihood Estimation Variational Lower Bound Bayes theorem, conditional independence KL Divergence, KL divergence between two gaussians VAE(cause the video incorrectly assumes knowledge about it) I may have missed something so in case there is some aspect of the video that you aren't able to understand even after that please do let me know
@alivecoding4995
@alivecoding4995 Ай бұрын
thank you very much!!! great explanation ❤
@Explaining-AI
@Explaining-AI Ай бұрын
Thank You :)
@Mahan_Veisi
@Mahan_Veisi Ай бұрын
Thank you! It was amazing. While there are limited content available for diffusion models, you did pretty nice.❤
@Explaining-AI
@Explaining-AI Ай бұрын
Thank you for your kind words :)
@abdulkarimatrash
@abdulkarimatrash Ай бұрын
Excellent Work ! Please Don't Stop :)
@Explaining-AI
@Explaining-AI Ай бұрын
Thank you so much for your support
@comunedipadova1790
@comunedipadova1790 Ай бұрын
Please don't use music in the background, it's very distracting, thanks.
@Explaining-AI
@Explaining-AI Ай бұрын
Thank you for the feedback. Have taken care of this in my recent videos.
@hieuaovan7101
@hieuaovan7101 Ай бұрын
nice video with great image that can clearly explain <3
@Explaining-AI
@Explaining-AI Ай бұрын
Thank You :)
@tvwatch9669
@tvwatch9669 Ай бұрын
This is insanely underrated.
@awayxu
@awayxu Ай бұрын
So why the reverse process is also a diffusion process with the same Gaussian form? Does anyone know 😢
@drakkhein
@drakkhein Ай бұрын
Any updates on the idea of generating videos with longer frames?
@Explaining-AI
@Explaining-AI Ай бұрын
Hello, currently working on couple of object detection videos. Post that will be creating one which includes techniques for longer frame generations, interpolation and other stuff.
@kijudaa8781
@kijudaa8781 Ай бұрын
It is the best lecture on object detection and metrics that I have ever seen, thanks a lot❤
@Explaining-AI
@Explaining-AI Ай бұрын
Thank you for your kind words and am glad that the video ended up being of help to you.
@maximestudio2513
@maximestudio2513 Ай бұрын
Nice job
@noahjob8261
@noahjob8261 Ай бұрын
Could you please use resnet as backbone in the ssd archtiecture and make the necessary scripts in the github link of yours
@princekhunt1
@princekhunt1 Ай бұрын
Please never stop to teach us.
@princekhunt1
@princekhunt1 Ай бұрын
Nice tutorials. Keep on create a videos such this we will support.
@Explaining-AI
@Explaining-AI Ай бұрын
Thank you for your support :)
@ggg-m8e5c
@ggg-m8e5c Ай бұрын
What if the training dataset is made of rectangular images ?
@princekhunt1
@princekhunt1 Ай бұрын
Nice tutorial
@princekhunt1
@princekhunt1 Ай бұрын
Nice tutorial keep it up
@Explaining-AI
@Explaining-AI Ай бұрын
Thank You!