ExplainingAI

ExplainingAI

Hello, I am Tushar and this channel is a product of two things that I am very passionate about.
Learning something new everyday specifically related to my field which is ML, Deep Learning & Computer Vision(hoping to expand this list with this channel :) )

Teaching and explaining things in the most simple manner that I can, to people who are interested in knowing what I already know. Making their learning process little bit easier and lot more fun.

If you are interested in continuously learning and improving and have any intersection with my interests, then do subscribe (obviously only if you like the content otherwise just ignore me for now and comeback when my content has become worthy of your subscription, which it will someday :) )

Thank you so much for visiting my channel
Tushar,
x-Amazon | Last seen building AI for cooking robot @ Nymble (www.eatwithnymble.com/)
www.linkedin.com/in/tushar-kumar-40299b19/

YOLOv2 (YOLO9000) and YOLOv3 Explained

53:49

YOLOv2 (YOLO9000) and YOLOv3 Explained

21 күн бұрын

Building a Video Generation Model with Diffusion Transformers | Explanation and Implementation

47:46

Building a Video Generation Model with Diffusion Transformers | Explanation and Implementation

Ай бұрын

Single Shot Multibox Detector | SSD Object Detection Explained and Implemented

56:01

Single Shot Multibox Detector | SSD Object Detection Explained and Implemented

2 ай бұрын

Scalable Diffusion Models with Transformers | DiT Explanation and Implementation

36:55

Scalable Diffusion Models with Transformers | DiT Explanation and Implementation

3 ай бұрын

YOLO Object Detection | YoloV1 Explanation and Implementation Tutorial

53:40

YOLO Object Detection | YoloV1 Explanation and Implementation Tutorial

4 ай бұрын

ControlNet with Diffusion Models | Explanation and PyTorch Implementation

32:29

ControlNet with Diffusion Models | Explanation and PyTorch Implementation

5 ай бұрын

Faster RCNN PyTorch Code Walkthrough | Fine-Tuning and Custom Dataset Training

1:03:16

Faster RCNN PyTorch Code Walkthrough | Fine-Tuning and Custom Dataset Training

6 ай бұрын

Faster R-CNN PyTorch Implementation

1:07:36

Faster R-CNN PyTorch Implementation

6 ай бұрын

Faster R-CNN Explanation | Region Proposal Network

41:22

Faster R-CNN Explanation | Region Proposal Network

8 ай бұрын

Fast R-CNN Explained | ROI Pooling

33:10

Fast R-CNN Explained | ROI Pooling

8 ай бұрын

Mean Average Precision (mAP) | Explanation and Implementation for Object Detection

28:27

Mean Average Precision (mAP) | Explanation and Implementation for Object Detection

9 ай бұрын

R-CNN Explained

33:56

R-CNN Explained

10 ай бұрын

Stable Diffusion from Scratch in PyTorch | Conditional Latent Diffusion Models

51:51

Stable Diffusion from Scratch in PyTorch | Conditional Latent Diffusion Models

10 ай бұрын

Stable Diffusion from Scratch in PyTorch | Unconditional Latent Diffusion Models

42:29

Stable Diffusion from Scratch in PyTorch | Unconditional Latent Diffusion Models

11 ай бұрын

DCGAN Tutorial with PyTorch Implementation

22:39

DCGAN Tutorial with PyTorch Implementation

Жыл бұрын

Generative Adversarial Networks | Tutorial with Math Explanation and PyTorch Implementation

26:35

Generative Adversarial Networks | Tutorial with Math Explanation and PyTorch Implementation

Жыл бұрын

Denoising Diffusion Probabilistic Models Code | DDPM Pytorch Implementation

25:52

Denoising Diffusion Probabilistic Models Code | DDPM Pytorch Implementation

Жыл бұрын

Denoising Diffusion Probabilistic Models | DDPM Explained

29:29

Denoising Diffusion Probabilistic Models | DDPM Explained

Жыл бұрын

Image Classification Using Vision Transformer | An Image is Worth 16x16 Words

9:04

Image Classification Using Vision Transformer | An Image is Worth 16x16 Words

Жыл бұрын

ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation

18:45

ATTENTION | An Image is Worth 16x16 Words | Vision Transformers (ViT) Explanation and Implementation

Жыл бұрын

PATCH EMBEDDING | Vision Transformers explained

8:22

PATCH EMBEDDING | Vision Transformers explained

Жыл бұрын

I implement DALLE 1 from SCRATCH on MNIST

21:35

I implement DALLE 1 from SCRATCH on MNIST

Жыл бұрын

VQ-VAE | Everything you need to know about it | Explanation and Implementation

17:09

VQ-VAE | Everything you need to know about it | Explanation and Implementation

Жыл бұрын

Implementing Variational Auto Encoder from Scratch in Pytorch

42:29

Implementing Variational Auto Encoder from Scratch in Pytorch

Жыл бұрын

Understanding Variational Autoencoder | VAE Explained

11:47

Understanding Variational Autoencoder | VAE Explained

Жыл бұрын

Пікірлер

@alivecoding4995

@alivecoding4995 2 сағат бұрын

:) thanks!!!

@alivecoding4995

@alivecoding4995 2 сағат бұрын

Does YOLOv7 have a similar architecture?

@Explaining-AI Сағат бұрын

Hello, there are some similarities in terms of presence of pyramid pooling and top down bottom up pathways, but the design of those blocks are quite different. Also yolov7 uses E-ELAN rather than CSP residual blocks. If you are interested then do take a look at this paper - arxiv.org/pdf/2304.00501 . It provides highlights and changes of all different yolo versions. For YOLOv7, refer to Figure 16 (Page 21).

@Coldgpu 22 сағат бұрын

yoyo

@ansalrobinson Күн бұрын

Can you please share prediction code

@Explaining-AI 2 сағат бұрын

Hello, the repo (github.com/explainingai-code/SSD-PyTorch/blob/main/tools/infer.py) has prediction as well as evaluation code.

@Bioinforere99 Күн бұрын

Best explnation of Denoising Diffusion Probabilistic Models!

@jeffmacleod5194

@jeffmacleod5194 3 күн бұрын

So much great info in 8 minutes. Thank you so much!

@Explaining-AI 3 күн бұрын

Thank you for the appreciation :)

@moiirani8827 4 күн бұрын

The quality of the writings too poor to see the equations

@坨坨王 5 күн бұрын

thank you so much, this video is very helpful to me. you are very generous.

@Explaining-AI 4 күн бұрын

I'm happy that the video ended up being of help to you :)

@davidjennicson5614

@davidjennicson5614 6 күн бұрын

Hey I am a new subscriber can you explain the implementations of LayoutLMV3 and UDOP and help implementing from scratch

@nickbakker9747

@nickbakker9747 7 күн бұрын

Can you explain to me why there is a break on line 114 in the train_torchvision_frcnn.py. It now looks like because of the break it will only use one batch and than break out of the epoch? I really like your videos thankss

@Explaining-AI 7 күн бұрын

The only explanation is my oversight :D I must have been debugging something before pushing the code, and ended up forgetting to remove the 'break' at the end. My apologies for the confusion, and thank you so much for pointing it out. Have fixed it now in the repo.

@luisangeld9894

@luisangeld9894 10 күн бұрын

I am trying to use this model to train coco, but I am having issues using it, seems the model is veru structured to be trained on PASCAL VOC, any idea how can I adapt it to COCO? great video

@Explaining-AI 3 күн бұрын

Hello, apologies for the late reply. I think the model should work once you set the right number of classes. But you would need changes in the dataset class. If you are still facing problems after making the dataset class changes(or if you need help with that), can you please open an issue on the repo and I can try to help resolve that.

@alivecoding4995

@alivecoding4995 12 күн бұрын

If you look at the latent space images at 37:26 you cannot believe the decoder can re-generate the original image from it. As there is simply a lot of missing information. Any explanation on how it does it? First I thought this is due to original image information leaking through through skip-connections between down and up blocks. But we are not using those in the auto-encoder.

@frimlinso1894 12 күн бұрын

In the sampling algorithm (algo 2), I don't understand why we have to add noise z back in. Can anyone explain this to me?

@Explaining-AI 9 күн бұрын

In the reverse process, at each time step we have a distribution (P(xt-1|xt)), which is a gaussian(N(mu_t-1, sigma)). We use the prediction of noise at each timestep to compute the predicted mean, mu_theta. The adding noise part is actually reparameterization trick to sample from the predicted P(xt-1) distribution. Which is why we sample a random noise z, shift it by the mean of this predicted distribution and then scale it by sigma. Also, if we straightaway use mu_theta(so always return mean instead of reparameterization trick to sample from P(xt-1)), then the entire reverse process would end up being deterministic.

@frimlinso1894 8 күн бұрын

@Explaining-AI that makes sense, thank you very much!

@Martingrossman78

@Martingrossman78 16 күн бұрын

Hi, great explanation of RCNN with very useful insights which often are skipped. I am especially grateful for answering questions like "Why SVM, Why different IOU Thr, etc."

@Explaining-AI 15 күн бұрын

Happy that you found the explanation helpful!

@rannvijaysingh1

@rannvijaysingh1 16 күн бұрын

Brother so good keep it up

@arizmohammadi5354

@arizmohammadi5354 17 күн бұрын

Can we train our dataset with vq-gan (without transformer) and then use it in train_ddpm_vqve?

@Explaining-AI 15 күн бұрын

Hello, I might be misunderstanding your question so do let me know if thats the case. But for stable diffusion we don't need transformer. In the repo (github.com/explainingai-code/StableDiffusion-PyTorch?tab=readme-ov-file#training), what you are mentioning, is exactly what I implemented. Train VQVAE+PerceptualLoss+Discriminator (same as VQ-GAN without transformer, which is only needed for generating new latent images) on a dataset. Once auto-encoder part of VQGAN is trained, we then save the latent representations for all the training images, using the trained encoder of VQGAN. Finally, use these latent representations, to train LDM. We don't need to train transformer, as thats for generating new latent images , for which we are using DDPM.

@arizmohammadi5354

@arizmohammadi5354 11 күн бұрын

@@Explaining-AI Thank you so much for your kind reply. Yeah, you are right.

@maximestudio2513

@maximestudio2513 18 күн бұрын

Can you explain us how Multi-view Diffusion Base Model works please

@Explaining-AI 15 күн бұрын

Hello, have added this to my list. But since I am not familiar with it, as of now, will take me some time to cover this.

@maximestudio2513

@maximestudio2513 15 күн бұрын

@@Explaining-AI thank you, you are amazing

@hammadkhan7927

@hammadkhan7927 19 күн бұрын

Can you please share the notes of all object detection videos

@yoverale 20 күн бұрын

Just what i was needing. Thanks 🙏🏻

@hoangduong5954

@hoangduong5954 21 күн бұрын

Please answer me, how can they train CLASS SPECIFIC bounding box regressor? So they input class as input in one model and regress the bounding box or they build multiple(if they detect 6 class then we build 6 model) model and each model they train on specific bounding box regressor? Please answer me

@Explaining-AI 21 күн бұрын

Hello, I have tried to explain a bit on this, do let me know if this does not clarify everything for you. This is how the official rcnn repo does it. We create as many box regressor models as there are classes. Then we train each of these regressors separately using proposals assigned to the respective classes. github.com/rbgirshick/rcnn/blob/master/bbox_regression/rcnn_train_bbox_regressor.m#L76 During inference, given the predicted classes for proposals, we use the trained regressor for that class to modify the proposal . github.com/rbgirshick/rcnn/blob/master/bbox_regression/rcnn_test_bbox_regressor.m#L58-L65

@Explaining-AI 21 күн бұрын

Btw you could also do this by one fc layer. Lets say you have 10 classes. Then your bounding box regressor fc layer predicts 10 times 4 , 40 values. These are tx ty tw th for all 10 classes. Then during training, the bounding box regression loss will be computed between the ground truth transformation targets and prediction values at indexes corresponding to ground truth class. At inference, you take the class index with highest predicted probability value. The predicted tx, ty,tw, th are then the 4 values(from 40) corresponding to this highest probable class.

@hoangduong5954

@hoangduong5954 20 күн бұрын

@@Explaining-AIthank you alot!!!! I am fully understand it now. So they do train multiple models and choose the model based on class. Thats crazy though!

@adeirman2705 21 күн бұрын

Please create yolo panoptic sir, it would be a huge help and it has so many applications

@Explaining-AI 21 күн бұрын

Added this to my list. Will try to get to this as soon as I can.

@arizmohammadi5354

@arizmohammadi5354 22 күн бұрын

It was great! Good luck!

@BlueQuantum 22 күн бұрын

why gussian noise only added. Not Rician, Laplacian etc.. there are so many other probability distribution.

@Explaining-AI 21 күн бұрын

Hello, have replied to something similar here(highlighted comment) - kzbin.info/www/bejne/fmWYnXlqqLqan6c&lc=Ugznn1UksOPa3NfWLXR4AaABAg

@Kamalsai369 22 күн бұрын

Bro no one in this platform explained clearly as much as you Thankyou for providing these lectures for free of cost . I think for paid courses also no one can explain this much thankyou again

@Explaining-AI 21 күн бұрын

Really happy that you found the explanation helpful :)

@rishidixit7939

@rishidixit7939 24 күн бұрын

Subscribed

@sartq_333 24 күн бұрын

one of the finest videos on yolo available on internet. contains intuitive as well as detailed explanation (right from research paper). concepts like these are hard to explain in so much detail. thanks a lot for the amazing work, cheers!

@Explaining-AI 21 күн бұрын

Thank you for this comment :)

@Mahan_Veisi Ай бұрын

Fantastic video! You’re undoubtedly on your way to becoming one of the top lecturers in Generative AI. I’m excited to see more of your work in the future!

@Explaining-AI Ай бұрын

Thank you so much for your words of encouragement and support :)

@Andrey41k Ай бұрын

Thanj you very much for the video, it is very interesting. Though, I have one question, at 15:40 timestanp. You mention that there may be a situation, where ground truth box doesn't have big IoU with any of anchor boxes. How do we pick these anchor boxes (i just cant get which methodology we have to follow when picking the dimensions for anchor boxes)?

@Explaining-AI Ай бұрын

Thank You! For faster rcnn, that is the reason why we add low overlap anchor boxes as well(if they are indeed the best anchor box available). Here the authors did not tune anchor box selection at all for a dataset, they just pick one which captures large enough variation in terms of scale and aspect ratio. In models like yolov2 , they use the anchor box strategy but use k means to pick the the best anchor boxes. So once you use k means on your ground truth box dimensions, you end up with cluster centres that are good representatives of bxo dimensions in your dataset. These cluster centres then become a good choice for your anchor boxes width and height.

@alivecoding4995

@alivecoding4995 Ай бұрын

How does self attention work in convnets (instead of transformers)? 😊

@Explaining-AI Ай бұрын

After a reshape of the input, the self attention works exactly same as transformers. Assuming you have a BxCxHxW feature map at a certain stage of network. Then during self attention you reshape it into Bx(H*W)xC. Now it becomes very similar to how you would have seen it in transformers. H*W is the number of grid cells(tokens) and C is the embedding dimension of each token. We just compute attention between all spatial grid cells.

@alivecoding4995

@alivecoding4995 29 күн бұрын

@ Thank you 😊

@robbegeusens1302

@robbegeusens1302 Ай бұрын

Great video, but why do you add 1E-6 when calculating your IOU?

@Explaining-AI 29 күн бұрын

Thank You! That is just for ensuring the iou method never ends up doing a division by 0, like say in some degenerate case where bounding box area is zero(of both gt and prediction). That just makes the iou computation numerically stable no matter what the predicted and ground truth box is.

@raihanpahlevi6870

@raihanpahlevi6870 Ай бұрын

what do you mean by topk proposals 2000, is this from the single image we take 2000 proposals?

@Explaining-AI 29 күн бұрын

Hello @raihanpahlevi6870, Yes thats correct. 2000 proposals are taken from a single image.

@yusuphajuwara1490

@yusuphajuwara1490 Ай бұрын

Thanks for this wonderfully intuitive video! It provided a fantastic breakdown of the fundamentals of diffusion models. Let me try to answer your question about why the reverse process in diffusion models is also a (reverse) diffusion with Gaussian transitions. Why Reverse Diffusion Uses Gaussian Transitions 1. Forward Diffusion Introduces Noise Gradually Remember the β term? In the forward process, β is chosen to be very small (close to 0). This ensures that Gaussian noise is added gradually to the data over many steps. Each step introduces only a tiny amount of noise, meaning the transition from the original image to pure noise happens slowly and smoothly. This gradual noise addition is crucial because it preserves the structure of the data for longer, making it easier for the reverse process to reconstruct high-quality images. If we added large amounts of noise in one go, like in VAEs, the original structure would be harder to recover, leading to blurrier reconstructions. 2. Reverse Diffusion Needs "Gaussian-Like" Inputs The forward process only involves adding isotropic Gaussian noise at each step. This means the model learns to work with samples that are progressively noised in a Gaussian way. However, in the reverse process, when the model predicts the noise at each step, the resulting sample isn't guaranteed to remain Gaussian-like. To fix this, after subtracting the model's predicted noise, we add a small Gaussian noise with a carefully chosen variance. This step helps "Gaussianize" the sample, ensuring it aligns with what the model expects at the next time step. This small added noise smoothens any irregularities and makes the reverse process more stable, resulting in higher-quality outputs. Step-by-Step Noise Removal The reverse process works by removing noise step-by-step, moving from pure noise back to a clean image (closer to x0 ). This gradual approach is crucial because predicting small changes (i.e., removing a little noise at a time) is much easier for the model than trying to reconstruct the clean image in one big jump. This is why diffusion models produce sharper and more realistic images compared to VAEs, where predictions often result in blurry outputs due to the lack of such gradual refinement.

@mariolinovalencia7776

@mariolinovalencia7776 Ай бұрын

Excellent. Complete, clear and to the point

@GouravJoshi-z7j

@GouravJoshi-z7j Ай бұрын

I am new to this field can anyone provide me with the prerequisites to understand this video

@Explaining-AI 29 күн бұрын

Hello @GouravJoshi-z7j, I think this list covers the pre-requisites . Gaussian Distribution and its properties .......Mean/variance of adding two independent gaussians Reparameterization trick Maximum Likelihood Estimation Variational Lower Bound Bayes theorem, conditional independence KL Divergence, KL divergence between two gaussians VAE(cause the video incorrectly assumes knowledge about it) I may have missed something so in case there is some aspect of the video that you aren't able to understand even after that please do let me know

@alivecoding4995

@alivecoding4995 Ай бұрын

thank you very much!!! great explanation ❤

@Explaining-AI Ай бұрын

Thank You :)

@Mahan_Veisi Ай бұрын

Thank you! It was amazing. While there are limited content available for diffusion models, you did pretty nice.❤

@Explaining-AI Ай бұрын

Thank you for your kind words :)

@abdulkarimatrash

@abdulkarimatrash Ай бұрын

Excellent Work ! Please Don't Stop :)

@Explaining-AI Ай бұрын

Thank you so much for your support

@comunedipadova1790

@comunedipadova1790 Ай бұрын

Please don't use music in the background, it's very distracting, thanks.

@Explaining-AI Ай бұрын

Thank you for the feedback. Have taken care of this in my recent videos.

@hieuaovan7101 Ай бұрын

nice video with great image that can clearly explain <3

@Explaining-AI Ай бұрын

Thank You :)

@tvwatch9669 Ай бұрын

This is insanely underrated.

@awayxu Ай бұрын

So why the reverse process is also a diffusion process with the same Gaussian form? Does anyone know 😢

@drakkhein Ай бұрын

Any updates on the idea of generating videos with longer frames?

@Explaining-AI Ай бұрын

Hello, currently working on couple of object detection videos. Post that will be creating one which includes techniques for longer frame generations, interpolation and other stuff.

@kijudaa8781 Ай бұрын

It is the best lecture on object detection and metrics that I have ever seen, thanks a lot❤

@Explaining-AI Ай бұрын

Thank you for your kind words and am glad that the video ended up being of help to you.

@maximestudio2513

@maximestudio2513 Ай бұрын

Nice job

@noahjob8261 Ай бұрын

Could you please use resnet as backbone in the ssd archtiecture and make the necessary scripts in the github link of yours

@princekhunt1 Ай бұрын

Please never stop to teach us.

@princekhunt1 Ай бұрын

Nice tutorials. Keep on create a videos such this we will support.

@Explaining-AI Ай бұрын

Thank you for your support :)

@ggg-m8e5c Ай бұрын

What if the training dataset is made of rectangular images ?

@princekhunt1 Ай бұрын

Nice tutorial

@princekhunt1 Ай бұрын

Nice tutorial keep it up

@Explaining-AI Ай бұрын

Thank You!