Pytorch Data Augmentation using Torchvision

Рет қаралды 42,532

Күн бұрын

Пікірлер: 52

@AladdinPersson 4 жыл бұрын

Just wanted to mention something I didn't bring up in the video. You should be a bit careful when adding transformations to your model, more is not always better. In some cases it might actually ruin your results, let's say a simple scenario of the MNIST dataset. Using RandomHorizontalFlip or RandomVerticalFlip might totally destroy your model since the transformations changed the actual digit in the image and it's training on incorrect samples.

@InturnetHaetMachine 3 жыл бұрын

5:34 I don't know why, I just laughed out loud so much when that cat image popped up when you're talking about vertical flip.

@joshlazor6208 4 жыл бұрын

Two things: 1.) How would you calculate the mean and standard deviation of the image with Normalize? 2.) What does resize to 256, 256 do? Does it change the pixels to 256x256 pixels?

@AladdinPersson 4 жыл бұрын

1) You would go through each image in your dataset and calculate the mean and standard deviation for each channel. Then you use those values with Normalize to get mean 0 and standard deviation of 1. 2) Yes

@joshlazor6208 4 жыл бұрын

@@AladdinPersson Do you have a video on finding the Mean and Standard Deviation of different Images? I'm not sure how to do that

@AladdinPersson 4 жыл бұрын

@@joshlazor6208 I don't have a video on that unfortunately, but I think you can work it out yourself if you spend some time on it ;) What you want to do is use torch.mean and torch.std on the input tensors. Try and google these things also, there's a lot of answers on pytorch forum and you can also ask questions there. Here's a thread that might be useful. discuss.pytorch.org/t/about-normalization-using-pre-trained-vgg16-networks/23560/6

@akarshrastogi3682 3 жыл бұрын

Hi, an important query : so if we Don't go over that loop that saves more images to a folder, our images will essentially be "replaced" with some changes, and not "added" to the list of already existing images, right? When we feed the transforms list to a DataLoader, does the number of images to train on increase many fold, OR stays the same, but with multiple changes to the same images according to those probabilities ?

@zhitongchen7549 5 ай бұрын

A question I encounter is, like for the image segmentation task, the labels are a image as well, if we apply random augmentation for the images, it is not really possible to get correspond labels to be transformed into same layout right? like random crop, we can not make sure that the label and the image been cropped at the same place.

@holandaraf 4 жыл бұрын

Hi Aladdin, great video. I am having trouble understanding something related to data augmentation itself. For example, in your video, you do something like this: suppose you have 20 images and you apply all of those transforms. You still end up with 20 images, but they are all rotated, gray scaled etc. Woudn't it work better if you had your 20 original images + the new images transformed? You would have more data and more generalization, I suppose. Not sure if I was able to make myself clear, sorry. Thanks in advance!

@BeansGiveMeGas 3 жыл бұрын

scroll down to the last comment on this page, the answer is in detail

@tanmay8639 3 жыл бұрын

Can someone tell this?

@muhammedjaabir2609 2 жыл бұрын

@@tanmay8639 imagine you have 256 images and your batch size is 16, so your first batch is fed into the model for the training in the first epoch, imagine it gets the original images at first using the method __getitem__() for those 16 images and then after finishing all the iterations and when it comes to the second epoch it then again get those 16 images and also notice that there's a `p= ` for the augmentations so this time it might augment the image. so basically you are feeding the new images to the model

@tanmay8639 2 жыл бұрын

@@muhammedjaabir2609 hi, so by this you mean more the epochs more the variety of input images seen by the model.

@caot681 3 жыл бұрын

I follow this video and previous video "How to build custom Datasets for Text in pytorch" and see that: If we use transform this way, and then use random_split to create train set and validation set, then the validation set will be applied the same transforms as the train set (which not I want). I think validation set should apply something like Resize, Normalize, but not RandomHorizontalFlip, RandomRotation...How can I apply different transforms to train set and validation set for the custom Datasets in your previous video "How to build custom Datasets for Text in pytorch"?

@actzful 4 жыл бұрын

Hello, thanks for the awesome video. I have one question, when using the dataloader, how is the transformation being done? For example, you have 2 transformation and 10 images. How does the dataloader transform it to 30 images (including original image) or does it?

@AladdinPersson 4 жыл бұрын

Those are great questions, I'll try my best to explain. For each image you pass through it gets mapped to one image that has been transformed. Using several transforms doesn't therefore extend the data but rather over a couple of epochs the dataset has been augmented with a lot of variants of the passed images. When using so many transforms as I did in the video with probabilities it's unlikely the model will ever see the original images but rather for each pass it will see new images that's just distorted. Effectively it augments the data if you pass an image many many times but over a single pass there's just a single variant for every image. Hopefully it was clear, it can be a bit confusing. Edit: Also if you have several transformation (like two as you mentioned) the image will pass both of them, and the resulting image will be when both transformations have been applied.

@actzful 4 жыл бұрын

@@AladdinPersson Thanks for the great explanation! Another question I have is when you pass in ToTensor() or Normalize in Compose, does it get transformed immediately or it will only get transformed when you iterate through epochs with the DataLoader? Transform immediately meaning that the result dataset will be of Tensor and Normalized in the first iteration of training. I guess I'm not quite clear on if those transformation is being done sequentially or randomly. For example, Compose[Resize -> RandomCrop -> ColorJitter -> ToTensor -> Normalize]. In Epoch 1, it would train with Resize, Epoch 2, it would train with RandomCrop, and so on and so forth. Does it work in this way or differently? Is the purpose of transform method in ImageFolder essentially just to increase the data sample or can it be use as a transformation of the data as well? Meaning that if all the training data came in all shape and sizes and I want it to Resize them all to 224 and turn them to Grayscale(3), will the result dataset be 224 AND Grayscale(3) or it will be sometimes that it is 224 with Grayscale(3) and sometimes it will be with no Grayscale(3).

@AladdinPersson 4 жыл бұрын

Yeah when you send in Compose(transform1, transform2, ...) it will perform each transform sequentially. If you for example would use two transforms the first being Resize to 32x32 and the second one to convert it to GrayScale, the resulting image would always be 32x32 grayscale. That is why you usually use Random transforms (like we did in the video) so for each pass they will be different from each other (probably) and thus augmenting the data over several epochs.

@mauriciog8158 2 жыл бұрын

I have a question: after the transformations are done, do the labels from the images also get corresponded to the new data or I have to do something else to assign to the new images their labels?

@RahulKumar-mh4bk 4 жыл бұрын

Thanks for the great video! I was wondering if we could make small cropped/patches images beforehand and pass them through the network. I am completely beginner, could provide some references or explain how I could make those patches? Also, in data augmentation can we specify number of output images after transform?

@AladdinPersson 4 жыл бұрын

I'm not sure what it is you want to do exactly, you want to crop them and add them in the dataset or? Why do you want to it that way? In the data augmentation each of the transforms are applied sequentially and there is no way to say how many augmented images there will be exactly

@nikolayandcards 4 жыл бұрын

What comes first: labeling or augmentation? For example, I can't find a dataset for a specific task and I have to make my own but I am lazy to label 4000 images manually. I thought that I can label only like 50 images and then use data augmentation so I can make them 250 total (images + labels/segmentation masks). Is this correct or do I have to perform data augmentation (from 50 to 250) first and then label all (250) images manually?

@AladdinPersson 4 жыл бұрын

Yea if you label 50 images use data augmentation (and a lot of it, check out: arxiv.org/abs/1909.13719) then you'll effectively get a much larger dataset. I would recommend using a pretrained network on ImageNet or something and you could probably achieve some decent results. Let me know how it goes for you

@jonesbbq307 4 жыл бұрын

How does RandomGrayscale work? If your input images are mixed with 3-channels and 1-channel how do you write your model to account for that?

@AladdinPersson 4 жыл бұрын

It's going to to repeat the single grayscale channel over the RGB, such that R==G==B. You will still have 3 channels but all will be identical.

@jonesbbq307 4 жыл бұрын

Aladdin Persson I see. Also I am having trouble figuring out how to save the images into different folders.

@skyli3945 4 жыл бұрын

isn't the normalization already included in the ToTensor() function? I learnt this from one course in udemy that, ToTensor already handled transforming imagies to tensor (in a way that color channel comes first, comparing the way that its origionally stored which is coloar channel comes last), and normalize it plus, I don't think we need ToPIL() to start with for the image transformation to work?

@AladdinPersson 4 жыл бұрын

You're right that ToTensor also does some normalization. To Tensor divides by 255, and in that way makes everything between [0,1]. We want mean of it to be 0 and standard deviation 1 so we do couple of extra normalization. I'm quite sure we need ToPIL, if I remember correctly we will get an error otherwise.

@katyhessni 5 ай бұрын

thanks

@hizircanbayram9898 4 жыл бұрын

Is transform.Compose applies transforms respectively? Thank you

@AladdinPersson 4 жыл бұрын

Transforms compose is just so that we can apply all the transformations we add to the list. Specifically all of these will be applied sequentially, so we if have transformers.Compose([ transform1, transform2, ]) then transform1 will first be applied, then transform2

@mahussain1 4 жыл бұрын

Hi Aladdin, I have a question: Do applying Random Transformation (flip, crop rotate or any etc.) with p=1 means it is applied to all tensors/PIL images that are passed? OR it is like, if p=1 then any randomly chosen PIL image/Tensor will be be transformed because it is now selected and those skipped tensors/PIL images will be preserved as they were before? please help me out. Thanks in advance.

@AladdinPersson 4 жыл бұрын

If you set p=1 it's applied to all images that are getting passed/loaded

@mahussain1 4 жыл бұрын

@@AladdinPersson thank you so much!

@nerdygeek7425 4 жыл бұрын

@Aladdin Persson I have a query. I am working on a custom dataset of images with variable sizes. I tried to apply transforms.Resize(224,224) but an error showing -"Unknown resampling filter (224)" What has happended and how to overcome it? Thanks in advance

@nerdygeek7425 4 жыл бұрын

Lol. Figured it out. I didn't put parenthesis in mentioning the size. Tranforms.Resize((224,224)) will be the result. Previously it set the parameter "interpolation" to 224 which caused the resampling error.

@AladdinPersson 4 жыл бұрын

@@nerdygeek7425 Hehe was just about to say that :)

@zunwang1687 4 жыл бұрын

Hello, I'm a little confused about understanding the normalization process. In linear regression, for each feature, we subtract its mean and divided by its std. In image every image is an observation so I think we have 3*224*224 features. So why don't we center and normalize data on each channel's each pixel, instead only do this on each channel?

@AladdinPersson 4 жыл бұрын

That's an interesting perspective! I'm not entirely sure but just thinking about it I can imagine that if you would normalize every pixel then that could destroy the correlation between different pixel values and hence ruin the structure of the original image. By subtracting with the same value over the entire image the distribution of the pixel values stay the same and hence the structure and the main contents of the original image remains but the image could have been altered in brightness/color etc.

@zunwang1687 4 жыл бұрын

@@AladdinPersson Wow that's an awesome interpretation, thanks!

@hoangdz6558 4 жыл бұрын

I have strictly followed your videos (both customDataset and Transform) but I my code could not generate any photo. Can you explain why? Thanks

@AladdinPersson 4 жыл бұрын

Perhaps you could share your code and in that way I could easier see what could've gone wrong. Actually I think it would be best if you could do a post on pytorch forum: discuss.pytorch.org/ And link the post and I can respond there (it's much easier to read code on there)

@narayanareddybattula5465 4 жыл бұрын

Did u check your folder structure like this ==> ./train/class_name/image_*.jpg

@venkatesanr9455 4 жыл бұрын

Great explanation and informative content, Any application related with end to end processing would help alot such as NLP tasks/others. Like to share future works?

@AladdinPersson 4 жыл бұрын

Thanks I appreciate you saying that, my recent videos have been towards NLP. I've done a couple videos on torchtext, is there anything more specifically you're looking for with nlp preprocessing?

@venkatesanr9455 4 жыл бұрын

@@AladdinPersson Yes I hav seen that and good to learn. Are u have plans in doing NLP projects like named entity recognition, chatbot or any others.

@AladdinPersson 4 жыл бұрын

Yeah for sure, not sure in what order I'll be doing them as I have a few other concepts I want to explore also, like image captioning etc

@rajukurapati1639 4 жыл бұрын

Why isn't this playlist in a sequential order?

@AladdinPersson 4 жыл бұрын

What do you mean? Why it's not Pytorch tutorial 1, 2, etc? I made the videos to be stand-alone so that some videos might be built on another one but they aren't necessarily to be watched in any sequential ordering

@rajukurapati1639 4 жыл бұрын

@@AladdinPersson You were mentioning that "In the next video, we will be using this and this", so a little confused.

@AladdinPersson 4 жыл бұрын

@@rajukurapati1639 I understand, that is odd and was a mistake on my part. Perhaps I was confused at the time over how to structure the videos