Vision Transformer for Image Classification Using transfer learning

Рет қаралды 13,894

Күн бұрын

Пікірлер: 122

@JKaks-gr5zm 7 ай бұрын

I am getting the error "ModuleNotFoundError: No module named 'going_modular'" even though the going_modular folder and the Notebook are under the same folder. I am working in Colab. Please Help Ma'am.

@Ikramkrt 6 ай бұрын

i have the same probleme but in jupyter , do you resolve this probleme?

@Ritam_Goswami_ 3 ай бұрын

i am currently having problems with epochs not running, it keeps taking very long time, what to do

@Ritam_Goswami_ 3 ай бұрын

just install the module from the directory in which the module is present, in a different cell

@anishmgeorge207 3 ай бұрын

Madam, I have one doubt...Here we use a pretrained model and we are training the model again with our dataset. So my doubts are from where do we get the pre trained model? And for which dataset the pretrained model got trained? Also, after retraining the model with our dataset, the weights will all get changed right?

@soravsingla6574 11 ай бұрын

Hello Ma’am Your AI and Data Science content is consistently impressive! Thanks for making complex concepts so accessible. Keep up the great work! 🚀 #ArtificialIntelligence #DataScience #ImpressiveContent 👏👍

@CodeWithAarohi 11 ай бұрын

My pleasure 😊

@Vibhu-ts8dh 5 ай бұрын

ma'am how do i save and then load the model....since after saving and loading the model, i am not able to get the same predictions..is there any resources i can refer to learn about it

@ambikajadoonanan2852 Жыл бұрын

Good day. Thank you for this wonderful demo. I have a few questions: 1. Are there any other existing vision transformer models that you know of? 2. How do I go about training a model using images corresponded with nutritional values in a certain column range within a separate excel database and spitting out the values predicted when applied to a single image? The name on each image is also identified against each value within the excel file. Many many thanks in advance for the assistance. :)

@sanjoetv5748 Жыл бұрын

please make a landmark detection here in vision transformer. i greatly in need for this project to be finished and the task is to create a 13 landmark detection using vision transformer. and i cant find any resources that teaches how to do a landmark detection if vision transformer. this channel is my only hope.

@hulkbaiyo8512 Жыл бұрын

I combine ur code and my code of training process. Add Learning rate scheduler and GPU memory gc. The result and speeds of training become so much beautiful without worry about GPU out of memory

@CodeWithAarohi Жыл бұрын

Sounds great!

@harshavenkatesh4409 Жыл бұрын

could not generate a random directory for manager socket , how do i resolve this error?

@sandhyarani-wk4mn Жыл бұрын

mam I am getting no module found error for importing engine from going modular. I have downloaded and copied in the directory. plz help mam

@CodeWithAarohi Жыл бұрын

Check the location of going_modular folder and your jupyter notebook. Both should be under same folder

@YashSharma-le3mo Жыл бұрын

Hi mam I have Cuda available But it is giving assertion error Unable to run with Cuda

@CodeWithAarohi Жыл бұрын

Check pytorch version. Is it compiled with cuda.

@dr.noushathshaffi7515 Жыл бұрын

I've been searching for this tutorial for long time, and I can't express how thankful I am, Aarohi! Your KZbin channel is an absolute gem, and it truly deserves a multitude of subscriptions. The way you effortlessly share your expertise is not only enlightening but also engaging. Keep up the exceptional work!

@CodeWithAarohi Жыл бұрын

Thank you for your heartwarming comment 🙂

@MaryBrockyn 10 ай бұрын

Hi again, when I print the summary of the Vision Transformer, the Input Shapes for each Layer start with 32. I understand that the very first input [32, 3, 224, 224] means we have originally have an image size 224x224 with 3 colour channels. What does the 32 mean? Is that the batch size, and if so, do I have to change that value if I change my batch size for training?

@CodeWithAarohi 10 ай бұрын

Yes, you are correct! The "32" in the input shape [32, 3, 224, 224] refers to the batch size.

@MaryBrockyn 10 ай бұрын

Hello again, how can I save the model to use it later on again?

@CodeWithAarohi 10 ай бұрын

You need to do something like this. # save model MODEL_PATH = 'custom-model' model.model.save_pretrained(MODEL_PATH) # loading model model = DetrForObjectDetection.from_pretrained(MODEL_PATH) model.to(DEVICE)

@MaryBrockyn 10 ай бұрын

Thank you!@@CodeWithAarohi

@올라쿤레아요데지오몰 10 ай бұрын

Hi Aarohi, you made it look easy. I have a challenge: I am getting this error: ModuleNotFoundError: No module named 'helper_functions'

@CodeWithAarohi 10 ай бұрын

You can get the helper_functions.py file from ghere and paste it in your directory github.com/AarohiSingla/Image-Classification-Using-Vision-transformer

@올라쿤레아요데지오몰 10 ай бұрын

@@CodeWithAarohi Thank you. It worked! One more thing, which activation function did you use? and at what stage did you implement it please?

@soravsingla6574 11 ай бұрын

Code with Aarohi is Best KZbin channel for Artificial Intelligence #BestChannel #KZbinChannel #ArtificialIntelligence #CodeWithAarohi #DataScience #Engineering #MachineLearning #DataAnalysis #BestLearning #LearnDataScience #DataScienceCourse #AytificialIntelligenceCourse #Codewithaarohi #CodeWithAarohi

@joshuahentinlal205 Жыл бұрын

Maam i have problem in importing engine of going_modular can you help please

@CodeWithAarohi Жыл бұрын

Dowmload the going_modular folder from github.com/AarohiSingla/Image-Classification-Using-Vision-transformer and put it in your current working directory.

@joshuahentinlal205 Жыл бұрын

thanks alot Maam it really helped me. and one more enquiry, using your code, while training my dataset with just 2000 images i had been trainning for more than an hour but not even 1 epochs is completed. it goes it something like forever loop. can you please help @@CodeWithAarohi

@rajatchakraborty2058 5 ай бұрын

When I am trying to predict an image for my dataset it is showing "The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0" error. Can anyone please help

@CodeWithAarohi 5 ай бұрын

This means that you're trying to perform an operation that requires the two tensors to have the same size along their first dimension, but they don't match. For example, tensor "a" might have a shape of [4, X], where 4 represents the size of the first dimension. Tensor "b" might have a shape of [3, Y], where 3 represents the size of its first dimension. The error is raised because the size (4) of the first dimension of tensor "a" does not match the size (3) of the first dimension of tensor "b".

@НиколайНовичков-е1э Жыл бұрын

Thank you! Your video is very informative!

@CodeWithAarohi Жыл бұрын

Glad it was helpful!

@devavratpro7061 Жыл бұрын

Hi, Thanks for your great video. I am willing to traing the model for some other input size like 448x448. However, the model only takes 224x224 input size or gives error. How can I make neceesary changes?

@CodeWithAarohi Жыл бұрын

You'll need to adapt the architecture to accommodate the larger input size. The key components to modify include: 1- In the original ViT, the input image is divided into non-overlapping patches of size 16x16 pixels. For a 448x448 input size, you'll need to adjust the patch size accordingly. To keep it consistent with the original approach, you can use a patch size of 28x28 (448/16). 2- The number of patches depends on the input size and patch size. For 448x448 input and 28x28 patches, you'll have 16x16 = 256 patches. 3- Adjust the embedding dimension to suit your needs. The embedding dimension should still be proportional to the patch size and number of patches. 4- You may need to adjust the number of transformer blocks to accommodate the larger input size. More blocks may be required for better performance. Example- Using PyTorch and Hugging Face Transformers ViT model for a 448x448 input size: import torch from transformers import ViTFeatureExtractor, ViTModel # Modify the feature extractor to match your desired input size feature_extractor = ViTFeatureExtractor( image_size=(448, 448), patch_size=28, # Adjusted patch size ) # Modify the ViT model architecture model = ViTModel( image_size=(448, 448), patch_size=28, num_classes=1000, # Adjust the number of output classes # Modify other parameters as needed (embedding_dim, num_layers, etc.) )

@maharaniizza4601 11 ай бұрын

Hi Ms.Aarohi, thank you so much for your video. Can I ask, if I want to add callback early stopping, is it correct to modify the file engine in the epoch looping section? Thank you

@CodeWithAarohi 11 ай бұрын

Yes, correct

@nandiniloku7747 Жыл бұрын

thank you , very good explanation . which pre-trained model you are using here, is that tey are same as cnn pre trained model or you are using only the weights of the pre trained model ? which pre trained model is this >?

@CodeWithAarohi Жыл бұрын

You can check this: github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py Here check class ViT_B_16_Weights(WeightsEnum):

@gitgat-wx4vq 6 ай бұрын

import torch import maxvit # from .maxvit import MaxViT, max_vit_tiny_224, max_vit_small_224, max_vit_base_224, max_vit_large_224 # Tiny model network: maxvit.MaxViT = maxvit.max_vit_tiny_224(num_classes=1000) input = torch.rand(1, 3, 224, 224) output = network(input) my purpose is to do give an input as an image (1,3,224,224) and generate output as its description for that. how should i do that, what should i add more to this code?

@CodeWithAarohi 6 ай бұрын

To achieve this, you'll need to use a different model architecture and approach, as image classification models like MaxViT are not designed for generating textual descriptions.

@rohitsk5300 Жыл бұрын

how can i extract the trained model for making an app??

@CodeWithAarohi Жыл бұрын

MODEL_PATH = 'custom-model' model.model.save_pretrained(MODEL_PATH)

@rohitsk5300 Жыл бұрын

@@CodeWithAarohi I'm not sure .. how to make that work... my code is almost same as the explained code... what should be exactly done to extract it out and loaded it back.....

@fatematujjohora6163 Жыл бұрын

How to install going_modular? plz answer me

@CodeWithAarohi Жыл бұрын

going_modular is a folder in my repo. You need to put it in current working directory.

@imrankhan-el2zp 4 ай бұрын

how to resolve this issue??? ModuleNotFoundError Traceback (most recent call last) Cell In[1], line 6 4 from torch import nn 5 from torchvision import transforms ----> 6 from helper_functions import set_seeds ModuleNotFoundError: No module named 'helper_functions'

@CodeWithAarohi 4 ай бұрын

PAste teh helper_functions.py file where your juoyter notebook is

@MaryBrockyn Жыл бұрын

Thanks for the tutorial! Is there a quick way to let all images out of a folder get classified by the trained model and to also add the confusion matrix and other metrics therefore?

@MaryBrockyn Жыл бұрын

Also I am wondering about how to convert the images that I wanna get classified into the proper input shape? Can ypu help with that? Thanks in advance!

@CodeWithAarohi Жыл бұрын

image_transform = transforms.Compose( [ transforms.Resize(image_size), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ), ] )

@MaryBrockyn Жыл бұрын

Thank you! Do you maybe also have an answer for my first question? ( Is there a quick way to let all images out of a folder get classified by the trained model and to also add the confusion matrix and other metrics as the accuarcy, rcall and F1-Score therefore?) @@CodeWithAarohi

@Sunil-ez1hx Жыл бұрын

Thank you soo much mam for this amazing video

@CodeWithAarohi Жыл бұрын

Thanks for liking

@soravsingla8782 10 ай бұрын

Awesome

@kadapallanithin Жыл бұрын

Thanks for the video

@CodeWithAarohi Жыл бұрын

welcome

@DataTheory92 Жыл бұрын

Can you make lectures on MLops please?

@CodeWithAarohi Жыл бұрын

Will try

@hussamsarfraz7952 8 ай бұрын

thnx alot

@CodeWithAarohi 8 ай бұрын

Most welcome

@loveofmylifesoumyarashmi9972 8 ай бұрын

code to print the accuracy , f1 score, precision and recall??

@CodeWithAarohi 8 ай бұрын

Will create a separate video on it.

@salihsalur4855 4 ай бұрын

Do you have code to f1 score, precision...

@sathishkumars4463 Жыл бұрын

Awesome upload. How do I save the model or weights which I can load and perform inference later?

@debjitdas1714 7 ай бұрын

Very informative tutorial, Thank you. I have the following questions and doubts- 1) During training, how to save the best model only after each epoch, and load that best model after completing training, for future use? (e.g. based on lowest validation loss) 2) How to generate the confusion matrix and also the F-1 Score, Precision, Recall? 3) Finally how to identify actually which test samples are correctly predicted and which test samples are not? 4) Since, after initial 4-5 epochs the gap between training loss and test loss or between train accuracy and test accuracy is increasing continuously, so it needs further fine-tuning, so, please suggest how to do that.

@salihsalur4855 4 ай бұрын

Hello, Could you answer question 2? f-1 Score, precision ... Do you have code to f1 score ...

@priyanshupandey3148 Жыл бұрын

Please upload the notebooks. It is not there.

@CodeWithAarohi Жыл бұрын

github.com/AarohiSingla/Image-Classification-Using-Vision-transformer

@priyanshupandey3148 Жыл бұрын

@@CodeWithAarohi Thank you very much!

@ericobeng3139 5 ай бұрын

Thank you for this great video. Can this be applied to video datasets? or do you have a video link to training ViT on Video dataset? Thank you.

@CodeWithAarohi 5 ай бұрын

Yes, ViT can be applied to video datasets. While ViT was initially designed for processing static images, researchers have extended its application to video data by incorporating temporal information.

@safiullah353 9 ай бұрын

from going_modular.going_modular import engine here a problem occur i'm unable to handle this please help me here

@CodeWithAarohi 9 ай бұрын

What error you are getting?

@shrikar7341 7 ай бұрын

if i were to put an image for prediction lets say an image of orange but the only class headers are dandelion and daisy what will the prediction be?

@CodeWithAarohi 7 ай бұрын

If you have added background class for random images which are not a part of these 2 classes then model will take the image of orange as background but if you only have these 2 classes then model will try to provide label to this orange image. Your model will not behave accurately in this case.

@Shaggysus 7 ай бұрын

hi this video so helpful. im facing a issue with the helper_functions. how can i resolve that issue?

@CodeWithAarohi 7 ай бұрын

Download helper_functions.py file from here and paste it in your working directory: github.com/AarohiSingla/Image-Classification-Using-Vision-transformer

@PawanKumar-fu2fh 5 ай бұрын

ModuleNotFoundError: No module named 'going_modular'

@CodeWithAarohi 5 ай бұрын

going_modular is a folder. You need to put it in your current working directory and please check the path of it.

@FERNANDOVALLE-ig8gl 5 ай бұрын

Could you add how to calculate the confusion matrix and other metrics please?

@mehedihasanshojib5831 Жыл бұрын

I prepared my dataset like you. But when i try to train it gives OSError: Caught OSError in DataLoader worker process 0. and image file is truncated (40 bytes not processed). I followed same to same like your code. just applied my own dataset. Can you tell me how to fix it?

@harshavenkatesh4409 Жыл бұрын

did you figure it out ?

@sharifimroz6231 Жыл бұрын

could you please share the dataset link?

@teetanrobotics5363 Жыл бұрын

Awesome. tutorials. Aarohi, Could you please make a code tutorial for video superresolution using ESRGAN ?

@CodeWithAarohi Жыл бұрын

Sure, I will do the video after finishing pipelined videos.

@shresthjain7557 Жыл бұрын

how to download that data set?

@CodeWithAarohi Жыл бұрын

You can prepare your dataset by creating 2 folders and then put some images in those folders.

@pranavdubal-c9j 6 ай бұрын

I am getting an error of module 'torchvision.models' has no attribute 'ViT_B_16_Weights' 1 # 1. Get pretrained weights for ViT-Base ----> 2 pretrained_vit_weights = torchvision.models.ViT_B_16_Weights.DEFAULT 3 4 # 2. Setup a ViT model instance with pretrained weights 5 pretrained_vit = torchvision.models.vit_B_16(weights=pretrained_vit_weights).to(device) AttributeError: module 'torchvision.models' has no attribute 'ViT_B_16_Weights'

@danielasefa8087 11 ай бұрын

Thanks so much ,I was waiting this video from you.

@CodeWithAarohi 11 ай бұрын

Hope you like it!

@sidharthpisharody 9 ай бұрын

Mam is it possible to implement the paper "GA-Nav: Efficient Terrain Segmentation for Robot Navigation in Unstructured Outdoor Environments" I tried it but there is a "ModuleNotFoundError: No module named 'mmcv._ext'" error that I am not able to rectify. If u could show it it would be very helpful

@CodeWithAarohi 9 ай бұрын

I will try but after finishing the pipelined work.

@AbdulQadeerRasooli-l8k 10 ай бұрын

Hi thanks for your great video. i faced to this error ### ModuleNotFoundError: No module named 'going_modular', how to download going_modul folder from your link i cannot downloaded this folder

@CodeWithAarohi 9 ай бұрын

You can get the folder from here:github.com/AarohiSingla/Image-Classification-Using-Vision-transformer

@dipankarporey2171 Жыл бұрын

Could you please make one single video completely on "Attention"(including self-attention) architecture? Thank you for these videos.

@CodeWithAarohi Жыл бұрын

Sure!

@shounakdas1001 10 ай бұрын

Thanks Aarohi, it is brilliant. Great Help to learn ViT

@CodeWithAarohi 10 ай бұрын

Glad it was helpful!

@cyreneschannel5017 Жыл бұрын

i like your video using image classification transformer, can you also make a video using vision transformer using video dataset

@CodeWithAarohi Жыл бұрын

Sure

@ericobeng3139 5 ай бұрын

@@CodeWithAarohi Please was the video on using ViT for videos already done?

@YashSharma-le3mo Жыл бұрын

Mam What actually it means that you have modified Classifer head and pause all other layers?

@CodeWithAarohi Жыл бұрын

Modified the Classifier Head: Modifying the classifier head means that you are changing the architecture or parameters of the top layers responsible for making predictions. This can include adding or removing layers, changing the number of neurons, or making other architectural changes to better suit your specific task. Paused All Other Layers: "Pausing" or "freezing" layers means that you are preventing the weights of the layers in the feature extraction backbone from being updated during training. In other words, you are keeping these layers fixed and not allowing them to learn new features during fine-tuning.

@YashSharma-le3mo Жыл бұрын

@@CodeWithAarohi ok mam Thank you

@soravsingla6574 11 ай бұрын

Well done

@CodeWithAarohi 11 ай бұрын

Thanks

@neelshah1651 10 ай бұрын

Thank you for such a great content!!

@CodeWithAarohi 10 ай бұрын

Glad you enjoy it!

@MaryBrockyn 10 ай бұрын

Hello again, I am wondering about why you are using the CategricalCrossEntropy as the loss function. I tried to use Binary Cross Entropy instead as ist is a binary classification problem. I used loss_fn = torch.nn.BCELoss() . Somehow it does not work with your model. Do you have any idea why?

@MaryBrockyn 10 ай бұрын

I am receiving this error: "Using a target size (torch.Size([4])) that is different to the input size (torch.Size([4, 2])) is deprecated. Please ensure they have the same size."

@CodeWithAarohi 10 ай бұрын

The reason for using categorical cross-entropy is that it is well-suited for multi-class classification problems.

@CodeWithAarohi 10 ай бұрын

The error you're encountering indicates a mismatch between the size of your target labels and the size of the model's output.

@MaryBrockyn 10 ай бұрын

@@CodeWithAarohi But we are dealing with a binary problem, and not a multiclass classification problem, right? So thats why I assume a BCE would be a better loss function

@MaryBrockyn 10 ай бұрын

Also my programm runs perfectly fine with CrossEntropyLoss(). As soon as I simply change the loss to BCELoss I get the error

@pifordtechnologiespvtltd5698 7 ай бұрын

Amazing

@CodeWithAarohi 7 ай бұрын

Thanks

@cyberhard Жыл бұрын

Excellent as usual! How well do vision transformerd compare traditional CNNs for image classification?

@DataTheory92 Жыл бұрын

Vision transformer perform more better than CNN on images task as tested by scientist.

@DataTheory92 Жыл бұрын

For more complex task we have LLM models now where ML and normal neural networks are outdated. Understand first framework that why it is designed and how it operated then implement it using a code . You will understand more