Hello, there are some similarities in terms of presence of pyramid pooling and top down bottom up pathways, but the design of those blocks are quite different. Also yolov7 uses E-ELAN rather than CSP residual blocks. If you are interested then do take a look at this paper - arxiv.org/pdf/2304.00501 . It provides highlights and changes of all different yolo versions. For YOLOv7, refer to Figure 16 (Page 21).
@Coldgpu22 сағат бұрын
yoyo
@ansalrobinsonКүн бұрын
Can you please share prediction code
@Explaining-AI2 сағат бұрын
Hello, the repo (github.com/explainingai-code/SSD-PyTorch/blob/main/tools/infer.py) has prediction as well as evaluation code.
@Bioinforere99Күн бұрын
Best explnation of Denoising Diffusion Probabilistic Models!
@jeffmacleod51943 күн бұрын
So much great info in 8 minutes. Thank you so much!
@Explaining-AI3 күн бұрын
Thank you for the appreciation :)
@moiirani88274 күн бұрын
The quality of the writings too poor to see the equations
@坨坨王5 күн бұрын
thank you so much, this video is very helpful to me. you are very generous.
@Explaining-AI4 күн бұрын
I'm happy that the video ended up being of help to you :)
@davidjennicson56146 күн бұрын
Hey I am a new subscriber can you explain the implementations of LayoutLMV3 and UDOP and help implementing from scratch
@nickbakker97477 күн бұрын
Can you explain to me why there is a break on line 114 in the train_torchvision_frcnn.py. It now looks like because of the break it will only use one batch and than break out of the epoch? I really like your videos thankss
@Explaining-AI7 күн бұрын
The only explanation is my oversight :D I must have been debugging something before pushing the code, and ended up forgetting to remove the 'break' at the end. My apologies for the confusion, and thank you so much for pointing it out. Have fixed it now in the repo.
@luisangeld989410 күн бұрын
I am trying to use this model to train coco, but I am having issues using it, seems the model is veru structured to be trained on PASCAL VOC, any idea how can I adapt it to COCO? great video
@Explaining-AI3 күн бұрын
Hello, apologies for the late reply. I think the model should work once you set the right number of classes. But you would need changes in the dataset class. If you are still facing problems after making the dataset class changes(or if you need help with that), can you please open an issue on the repo and I can try to help resolve that.
@alivecoding499512 күн бұрын
If you look at the latent space images at 37:26 you cannot believe the decoder can re-generate the original image from it. As there is simply a lot of missing information. Any explanation on how it does it? First I thought this is due to original image information leaking through through skip-connections between down and up blocks. But we are not using those in the auto-encoder.
@frimlinso189412 күн бұрын
In the sampling algorithm (algo 2), I don't understand why we have to add noise z back in. Can anyone explain this to me?
@Explaining-AI9 күн бұрын
In the reverse process, at each time step we have a distribution (P(xt-1|xt)), which is a gaussian(N(mu_t-1, sigma)). We use the prediction of noise at each timestep to compute the predicted mean, mu_theta. The adding noise part is actually reparameterization trick to sample from the predicted P(xt-1) distribution. Which is why we sample a random noise z, shift it by the mean of this predicted distribution and then scale it by sigma. Also, if we straightaway use mu_theta(so always return mean instead of reparameterization trick to sample from P(xt-1)), then the entire reverse process would end up being deterministic.
@frimlinso18948 күн бұрын
@Explaining-AI that makes sense, thank you very much!
@Martingrossman7816 күн бұрын
Hi, great explanation of RCNN with very useful insights which often are skipped. I am especially grateful for answering questions like "Why SVM, Why different IOU Thr, etc."
@Explaining-AI15 күн бұрын
Happy that you found the explanation helpful!
@rannvijaysingh116 күн бұрын
Brother so good keep it up
@arizmohammadi535417 күн бұрын
Can we train our dataset with vq-gan (without transformer) and then use it in train_ddpm_vqve?
@Explaining-AI15 күн бұрын
Hello, I might be misunderstanding your question so do let me know if thats the case. But for stable diffusion we don't need transformer. In the repo (github.com/explainingai-code/StableDiffusion-PyTorch?tab=readme-ov-file#training), what you are mentioning, is exactly what I implemented. Train VQVAE+PerceptualLoss+Discriminator (same as VQ-GAN without transformer, which is only needed for generating new latent images) on a dataset. Once auto-encoder part of VQGAN is trained, we then save the latent representations for all the training images, using the trained encoder of VQGAN. Finally, use these latent representations, to train LDM. We don't need to train transformer, as thats for generating new latent images , for which we are using DDPM.
@arizmohammadi535411 күн бұрын
@@Explaining-AI Thank you so much for your kind reply. Yeah, you are right.
@maximestudio251318 күн бұрын
Can you explain us how Multi-view Diffusion Base Model works please
@Explaining-AI15 күн бұрын
Hello, have added this to my list. But since I am not familiar with it, as of now, will take me some time to cover this.
@maximestudio251315 күн бұрын
@@Explaining-AI thank you, you are amazing
@hammadkhan792719 күн бұрын
Can you please share the notes of all object detection videos
@yoverale20 күн бұрын
Just what i was needing. Thanks 🙏🏻
@hoangduong595421 күн бұрын
Please answer me, how can they train CLASS SPECIFIC bounding box regressor? So they input class as input in one model and regress the bounding box or they build multiple(if they detect 6 class then we build 6 model) model and each model they train on specific bounding box regressor? Please answer me
@Explaining-AI21 күн бұрын
Hello, I have tried to explain a bit on this, do let me know if this does not clarify everything for you. This is how the official rcnn repo does it. We create as many box regressor models as there are classes. Then we train each of these regressors separately using proposals assigned to the respective classes. github.com/rbgirshick/rcnn/blob/master/bbox_regression/rcnn_train_bbox_regressor.m#L76 During inference, given the predicted classes for proposals, we use the trained regressor for that class to modify the proposal . github.com/rbgirshick/rcnn/blob/master/bbox_regression/rcnn_test_bbox_regressor.m#L58-L65
@Explaining-AI21 күн бұрын
Btw you could also do this by one fc layer. Lets say you have 10 classes. Then your bounding box regressor fc layer predicts 10 times 4 , 40 values. These are tx ty tw th for all 10 classes. Then during training, the bounding box regression loss will be computed between the ground truth transformation targets and prediction values at indexes corresponding to ground truth class. At inference, you take the class index with highest predicted probability value. The predicted tx, ty,tw, th are then the 4 values(from 40) corresponding to this highest probable class.
@hoangduong595420 күн бұрын
@@Explaining-AIthank you alot!!!! I am fully understand it now. So they do train multiple models and choose the model based on class. Thats crazy though!
@adeirman270521 күн бұрын
Please create yolo panoptic sir, it would be a huge help and it has so many applications
@Explaining-AI21 күн бұрын
Added this to my list. Will try to get to this as soon as I can.
@arizmohammadi535422 күн бұрын
It was great! Good luck!
@BlueQuantum22 күн бұрын
why gussian noise only added. Not Rician, Laplacian etc.. there are so many other probability distribution.
@Explaining-AI21 күн бұрын
Hello, have replied to something similar here(highlighted comment) - kzbin.info/www/bejne/fmWYnXlqqLqan6c&lc=Ugznn1UksOPa3NfWLXR4AaABAg
@Kamalsai36922 күн бұрын
Bro no one in this platform explained clearly as much as you Thankyou for providing these lectures for free of cost . I think for paid courses also no one can explain this much thankyou again
@Explaining-AI21 күн бұрын
Really happy that you found the explanation helpful :)
@rishidixit793924 күн бұрын
Subscribed
@sartq_33324 күн бұрын
one of the finest videos on yolo available on internet. contains intuitive as well as detailed explanation (right from research paper). concepts like these are hard to explain in so much detail. thanks a lot for the amazing work, cheers!
@Explaining-AI21 күн бұрын
Thank you for this comment :)
@Mahan_VeisiАй бұрын
Fantastic video! You’re undoubtedly on your way to becoming one of the top lecturers in Generative AI. I’m excited to see more of your work in the future!
@Explaining-AIАй бұрын
Thank you so much for your words of encouragement and support :)
@Andrey41kАй бұрын
Thanj you very much for the video, it is very interesting. Though, I have one question, at 15:40 timestanp. You mention that there may be a situation, where ground truth box doesn't have big IoU with any of anchor boxes. How do we pick these anchor boxes (i just cant get which methodology we have to follow when picking the dimensions for anchor boxes)?
@Explaining-AIАй бұрын
Thank You! For faster rcnn, that is the reason why we add low overlap anchor boxes as well(if they are indeed the best anchor box available). Here the authors did not tune anchor box selection at all for a dataset, they just pick one which captures large enough variation in terms of scale and aspect ratio. In models like yolov2 , they use the anchor box strategy but use k means to pick the the best anchor boxes. So once you use k means on your ground truth box dimensions, you end up with cluster centres that are good representatives of bxo dimensions in your dataset. These cluster centres then become a good choice for your anchor boxes width and height.
@alivecoding4995Ай бұрын
How does self attention work in convnets (instead of transformers)? 😊
@Explaining-AIАй бұрын
After a reshape of the input, the self attention works exactly same as transformers. Assuming you have a BxCxHxW feature map at a certain stage of network. Then during self attention you reshape it into Bx(H*W)xC. Now it becomes very similar to how you would have seen it in transformers. H*W is the number of grid cells(tokens) and C is the embedding dimension of each token. We just compute attention between all spatial grid cells.
@alivecoding499529 күн бұрын
@ Thank you 😊
@robbegeusens1302Ай бұрын
Great video, but why do you add 1E-6 when calculating your IOU?
@Explaining-AI29 күн бұрын
Thank You! That is just for ensuring the iou method never ends up doing a division by 0, like say in some degenerate case where bounding box area is zero(of both gt and prediction). That just makes the iou computation numerically stable no matter what the predicted and ground truth box is.
@raihanpahlevi6870Ай бұрын
what do you mean by topk proposals 2000, is this from the single image we take 2000 proposals?
@Explaining-AI29 күн бұрын
Hello @raihanpahlevi6870, Yes thats correct. 2000 proposals are taken from a single image.
@yusuphajuwara1490Ай бұрын
Thanks for this wonderfully intuitive video! It provided a fantastic breakdown of the fundamentals of diffusion models. Let me try to answer your question about why the reverse process in diffusion models is also a (reverse) diffusion with Gaussian transitions. Why Reverse Diffusion Uses Gaussian Transitions 1. Forward Diffusion Introduces Noise Gradually Remember the β term? In the forward process, β is chosen to be very small (close to 0). This ensures that Gaussian noise is added gradually to the data over many steps. Each step introduces only a tiny amount of noise, meaning the transition from the original image to pure noise happens slowly and smoothly. This gradual noise addition is crucial because it preserves the structure of the data for longer, making it easier for the reverse process to reconstruct high-quality images. If we added large amounts of noise in one go, like in VAEs, the original structure would be harder to recover, leading to blurrier reconstructions. 2. Reverse Diffusion Needs "Gaussian-Like" Inputs The forward process only involves adding isotropic Gaussian noise at each step. This means the model learns to work with samples that are progressively noised in a Gaussian way. However, in the reverse process, when the model predicts the noise at each step, the resulting sample isn't guaranteed to remain Gaussian-like. To fix this, after subtracting the model's predicted noise, we add a small Gaussian noise with a carefully chosen variance. This step helps "Gaussianize" the sample, ensuring it aligns with what the model expects at the next time step. This small added noise smoothens any irregularities and makes the reverse process more stable, resulting in higher-quality outputs. Step-by-Step Noise Removal The reverse process works by removing noise step-by-step, moving from pure noise back to a clean image (closer to x0 ). This gradual approach is crucial because predicting small changes (i.e., removing a little noise at a time) is much easier for the model than trying to reconstruct the clean image in one big jump. This is why diffusion models produce sharper and more realistic images compared to VAEs, where predictions often result in blurry outputs due to the lack of such gradual refinement.
@mariolinovalencia7776Ай бұрын
Excellent. Complete, clear and to the point
@GouravJoshi-z7jАй бұрын
I am new to this field can anyone provide me with the prerequisites to understand this video
@Explaining-AI29 күн бұрын
Hello @GouravJoshi-z7j, I think this list covers the pre-requisites . Gaussian Distribution and its properties .......Mean/variance of adding two independent gaussians Reparameterization trick Maximum Likelihood Estimation Variational Lower Bound Bayes theorem, conditional independence KL Divergence, KL divergence between two gaussians VAE(cause the video incorrectly assumes knowledge about it) I may have missed something so in case there is some aspect of the video that you aren't able to understand even after that please do let me know
@alivecoding4995Ай бұрын
thank you very much!!! great explanation ❤
@Explaining-AIАй бұрын
Thank You :)
@Mahan_VeisiАй бұрын
Thank you! It was amazing. While there are limited content available for diffusion models, you did pretty nice.❤
@Explaining-AIАй бұрын
Thank you for your kind words :)
@abdulkarimatrashАй бұрын
Excellent Work ! Please Don't Stop :)
@Explaining-AIАй бұрын
Thank you so much for your support
@comunedipadova1790Ай бұрын
Please don't use music in the background, it's very distracting, thanks.
@Explaining-AIАй бұрын
Thank you for the feedback. Have taken care of this in my recent videos.
@hieuaovan7101Ай бұрын
nice video with great image that can clearly explain <3
@Explaining-AIАй бұрын
Thank You :)
@tvwatch9669Ай бұрын
This is insanely underrated.
@awayxuАй бұрын
So why the reverse process is also a diffusion process with the same Gaussian form? Does anyone know 😢
@drakkheinАй бұрын
Any updates on the idea of generating videos with longer frames?
@Explaining-AIАй бұрын
Hello, currently working on couple of object detection videos. Post that will be creating one which includes techniques for longer frame generations, interpolation and other stuff.
@kijudaa8781Ай бұрын
It is the best lecture on object detection and metrics that I have ever seen, thanks a lot❤
@Explaining-AIАй бұрын
Thank you for your kind words and am glad that the video ended up being of help to you.
@maximestudio2513Ай бұрын
Nice job
@noahjob8261Ай бұрын
Could you please use resnet as backbone in the ssd archtiecture and make the necessary scripts in the github link of yours
@princekhunt1Ай бұрын
Please never stop to teach us.
@princekhunt1Ай бұрын
Nice tutorials. Keep on create a videos such this we will support.
@Explaining-AIАй бұрын
Thank you for your support :)
@ggg-m8e5cАй бұрын
What if the training dataset is made of rectangular images ?