Semi-supervised Learning explained

Рет қаралды 92,372

Күн бұрын

Пікірлер: 89

@deeplizard 7 жыл бұрын

Machine Learning / Deep Learning Tutorials for Programmers playlist: kzbin.info/aero/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras Machine Learning / Deep Learning Tutorial playlist: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL

@ruggieroseccia2790 5 жыл бұрын

How can people say that this is a "very well done video"??? It does not explain anything. It almost does not have any sense even! Who guarantees me that the labelled data are enough to correctly fit the NN? If I am able to fit the model, why should I care about labelling more data? What about overfitting? What if the NN mislabels the unlabelled data?

@tymothylim6550 4 жыл бұрын

Thank you very much for this video! I learnt a lot from this, and find semi-supervised learning a great way to utilize unlabelled data! Great work!

@hughculling2486 Жыл бұрын

Thanks, I've definitely got a clearer idea now.

@sergiu-danielkopcsa2328 5 жыл бұрын

The cat in the middle (1:30) is the best xD Nice series though, thanks a lot!

@TheMISBlog 3 жыл бұрын

Very Informative Video , Thanks

@hunttingbuckley4560 4 жыл бұрын

After pseudo-labeling, do you validate the outcome? Or do you remove data whose prediction was under some threshold? E.g. run the unlabeled data through the model and then only use newly labeled data that exceeds 80% or 90% confidence (for example).

@tuheditz 4 ай бұрын

3YEARS ??💀BRO DID YOU GET THE ANSWER

@rohitjagannath5331 7 жыл бұрын

Good explanation. Very crisp and clear with good example.

@deeplizard 7 жыл бұрын

Thank you, rohit!

@VaibhavJawale13081977 6 жыл бұрын

This video is really very helpful as the explanation is really very clear.

@abdulhameedmalik4299 6 ай бұрын

Best video

@sgartner 7 жыл бұрын

If the unlabeled portion vastly outnumbers the labeled portion, it seems like you're taking a risk pushing through the pseudo-labeled content as it could very well contain a larger number of incorrectly labeled items than the original set. Isn't this going to be counter productive? Is there a way to avoid this without manually evaluating a significant percentage of the giant data set?

@deeplizard 7 жыл бұрын

Hey Scott, thanks for watching! Yeah, you’re right that we could be taking a risk of mislabeling data by using pseudo-labeled samples in our training set. Something we could do to lessen the risk is to only include the pseudo-labeled samples in our training set that received a predicted probability for a particular category that was higher than X%. For example, we could make a rule to only include pseudo-labeled samples in the training set that received a prediction for a specific category of, say, 80% or more. This doesn’t completely strip out the risk of mislabeling, but it does decrease it. The samples that didn't make the cut due to not having a prediction that met the X% rule could then be predicted on again after the model was retrained with a larger data set that included the first round of pseduo-labeled samples. Also, before going through the pseudo-labeling process, we need to ensure that our model is performing well during training and validation (“well-performing” is subjective here). Additionally, the labeled data that the model was initially trained on should be a decent representation of the full data set. For example, we’d be in trouble if we were training on images of cats and dogs, but the only labeled dogs we had were of larger breeds, like Labs or Boxers. If the remaining unlabeled data that we end up pseudo-labeling had images of Chihuahuas and Pomeranians, then you can imagine that these small breeds may become mislabeled as cats since the model was never trained to recognize small dogs as actually being dogs. Hope this helps!

@EDeN99 5 жыл бұрын

This lady is very good, I need a video on the code walk-through. This can also be called automatic annotation.

@diogo9610 5 жыл бұрын

@@deeplizard I had the same question and I understand what you're saying, but isn't supervised learning always preferable to this method? Or is unsupervised learning used when, for instance, we don't have enough time to label all the data? thanks and keep up the great videos

@Njali5 4 жыл бұрын

@@diogo9610 Sometimes you may not have the information pre-labeled. For example, if you look at medical data, say cancer supressor genes, so far we only know of about 1500 of them. But the human genome consists. of more than 33000 genes. So, in such cases semi-supervised learning is pretty helpful as the amount of labeled data is very less.

@lucylu19881120 4 жыл бұрын

I think this needs a little more in-depth thoughts. if one trained modeled based on labeled data is used to generate the pseudo labels for the rest of data. Why would you even bother to train a new model at all? Your label-producing model seems to be the ground-truth already. Your newly trained model would only be something that is extremely close to your original model. i.e. if there's some inherent flaw in your original model causing errors in your labels (ex. good training error, bad testing error, even if you have high confidence in your results), it will get propagated through. This whole pseudo-labeling didn't really make sense to me. Unless you're doing an ensemble of models to minimize potential error. But even then.... I thought it would make more sense if a model is first trained or parsed based on unlabeled data and then fine-tuned with labeled data.

@sourabhkumar8896 7 жыл бұрын

so so thank you, It is the best explaination of machine learning i ever seen. please made more video on machine learning with lengthy video.

@deeplizard 7 жыл бұрын

Thank you, Sourabh! I'm glad you found the explanation helpful! This particular video is part of my series of machine learning and deep learning concepts covered in this playlist: kzbin.info/aero/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Additionally, my playlist below covers machine learning and deep learning concepts with code using the neural network API, Keras: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL Just wanted to share these with you in case you had not seen them already since you requested more videos!

@cemregokalp7888 5 жыл бұрын

Thank you so much for the detailed & helpful explanation!! Besides, the background rocks :D

@deeplizard 5 жыл бұрын

Thank you, Cemre!

@konm 3 жыл бұрын

Nice and simple! Thanks a lot for your effort!

@Rainbow-jk6ok 5 ай бұрын

I wish I knew your course 6 years ago go .Please do a full course from scratch.

@sahilseewal5509 5 жыл бұрын

Nice explanation with good example

@qusayhamad7243 4 жыл бұрын

thank you very much for this clear and helpful explanation.

@aymanehar4386 2 жыл бұрын

Very interesting videos. I am just wondering why, in pseudo-labeling, we retrain the model on the labeled dataset, that are already trained? Thanks for the interesting content.

@rezaxxx 7 жыл бұрын

Good explanation! Thanks for these videos. You should have a much bigger crowd.

@deeplizard 7 жыл бұрын

Thanks, Major REX!

@markcrook-rumsey8639 6 жыл бұрын

Super helpful thanks!

@deeplizard 6 жыл бұрын

You're welcome Mark!

@alexanderyau6347 6 жыл бұрын

Thank you for your video, helped me a lot. My question is why we need semi-supervised learning? What if the trained model is not good enough, and the pseudo-label may not correct for data without a label, so performance of the later trained model with pseudo-label data may not be good enough.

@moonman239 2 жыл бұрын

If we train with pseudolabels, how can we be sure our new weights will be significantly better than the old ones? It seems to me that we might as well just use the old weights.

@isaquemelo8134 6 жыл бұрын

Thank you for these videos! I found your channel today and already watched a bunch of videos.. btw, you have one of best explanations i've ever seen

@deeplizard 6 жыл бұрын

Thank you, Isaque - Happy to have you here! Glad you found the channel!

@paragjp 4 жыл бұрын

Requires more detail explanation of Autoencoder requirements. Only understood a reason is just to reduced noise. Thanks

@Ashutosh_Dayal 5 жыл бұрын

Great explanation 😊

@jiaojiaowang3140 6 жыл бұрын

very clear explanation!

@gideonfaive6886 4 жыл бұрын

{ "question": "The unlabeled data gets their labels from………………", "choices": [ "prediction from trained model with labeled datA", "Pseudo labeling ", "Unsupervised learning ", "prediction from trained model with unlabeled data " ], "answer": "prediction from trained model with labeled datA", "creator": "Faiveg ", "creationDate": "2020-04-04T20:58:18.238Z" }

@deeplizard 4 жыл бұрын

Thanks, Gideon! Just added your question to deeplizard.com/learn/video/b-yhKUINb7o :)

@kartikpodugu Жыл бұрын

pseudo labelling may be error prone right? as the model is trained on limited data and so network may provide wrong labels. how to deal with this?

@kevinyang2556 4 жыл бұрын

Thanks for the clear explanation! I was wondering, if we were to provide that semi-supervised model with a completely different animal to test on, like a bird, what approaches are there to tell the user that the input is neither cat nor dog? I know you mentioned some models can provide probabilities of being assigned cat or dog, so is it possible that some model could say there's

@jthlzs4236 3 жыл бұрын

Hello, I have a question ,whether the self supervised learning belongs to unsupervised learning or semi supervised learning.

@noCOtwo 4 жыл бұрын

so lets say you train on the labeled then you pseudo label the unlabeled now do you reset the neural net weights before you train on the new larger data set?

@deeplizard 4 жыл бұрын

I don't believe there is any set rule for this. You could go either way, but I would likely choose to not reset the weights and just pick up training on the larger data set after adding the pseudo labeled data.

@maximeletutour4673 2 жыл бұрын

There is a big risk of overfitting on wrong labels. I did that in one of my projetc, using only predictions with a high probability and it fails anyway.

@nhatlequocnhat7183 7 жыл бұрын

thank you for a good video

@Akavall 6 жыл бұрын

Is the model trained on hand labelled data + pseudo-labeled data more accurate than the model just trained on hand labelled data? I would assume so, but I can't think of a compelling reason about why that is? If the model trained only on hand labelled data is really bad, then pseudo-labeling will be bad, and hand labelled data + pseudo-labeled data model will also be bad. Thanks a lot for the videos!

@moonman239 2 жыл бұрын

From the other comments, I gather that we can set a rule that only the data which was actually labeled or that the model was X% confident about will be included in the training set. Then we retrain the model on that training set and hopefully can label the unlabeled data much more accurately.

@AnnunciateMe 4 жыл бұрын

I tried to implement pseudo coding, and what seems to be problematic and unrealistic is if I predict the pseudo labels through the current classifier and try to consider the pseudo labels as the true values, then the predicted values in the training will equal the pseudo values, since I am using exactly the same model. so imagine a simple example here without the step to concatenate the labelled data. pseudo_labels = model.predict( unlabelled_images ) model.fit( unlabelled_images , pseudo_labels ) so what the model will do for fitting, is to "predict" values based of the unlabelled_images and compare it with pseudo_labels which are set as the true values, which in turn would be equal, as we just predict the values from the same exact model. So not sure if am missing something?

@radagon6004 4 жыл бұрын

Do semi supervised used in driverless car, creadit card fraud dectection system?

@TechnicalStoner 3 жыл бұрын

Can't we just use a unsupervised algorithm like any clustering algorithms to cluster the unlabelled data and then use the labelled data for training?

@thespam8385 5 жыл бұрын

{ "question": "Semi-supervised learning employs _______________ to create labels for the remaining unlabeled data.", "choices": [ "pseudo-labeling", "autoencoders", "validation sets", "optimizers" ], "answer": "pseudo-labeling", "creator": "Chris", "creationDate": "2019-12-12T04:16:26.512Z" }

@deeplizard 5 жыл бұрын

Thanks, Chris! Just added your question to deeplizard.com

@007shibendu 5 жыл бұрын

I found the video very informative, but I have a confusion on the final dataset being created. Is it tested based on the initial trainset? What are the tests performed for it's authenticity?

@Waleed-qv8eg 6 жыл бұрын

Thanks again, Is that mean the pseudo-labeling task just to label data that will be part of the training set? What is the main point? I agree with Rex "You should have a much bigger crowd"! KEEP IT UP, WE LOVE YOUR VIDEOS.

@deeplizard 6 жыл бұрын

Thank you, الانترنت لحياة أسهل! Yes, pseudo-labeling is used to increase the size of your labeled training set when the original amount of labeled data you have is insufficient or small relative to the amount of unlabeled data that you have access to.

@Waleed-qv8eg 6 жыл бұрын

deeplizard I got it 👍 Thanks

@hamidawan687 3 жыл бұрын

Can you please explain what algorithm must be used for pseudo-labeling? Is KNN suitable or are there some other algorithms as well? Doesn't pseudo-labeling throgh KNN classifier or K Mean clustering cause various dataset scans?

@PritishMishra 4 жыл бұрын

The best example of semi-supervised learning is Google Photos !! It asks us about the name of the person and automatically recognize others...

@habibleila405 4 жыл бұрын

thank you for this presentation, i'm just wondering that when we take the output of the training of the labled data so it may be so errors so the unlabeled data wont be like 100% correct right!!! how we can deal with that ??

@ikeif 3 жыл бұрын

Wouldn't this process of pseudo-labeling lead to overfitting?

@jideilori 3 жыл бұрын

Okay, I understand semi supervised learning, how about self supervised. In my mind, I was seeing semi supervised learning as self-supervised. I really hope you have a video that explains it or the difference.. 🙃

@justchill99902 6 жыл бұрын

Hey! nice explanation. Question - Suppose we have a dataset of different cats and dogs and it is 1000 images. Now to hand label these 1000 images would be tedious. That is why we label let's say only 300 "different dogs and cats" so that our training data has variety. Now, we train the model and pass the other 700 images for pseudo-labelling. I understand that based on what the model learnt from the 300 images in the training dataset, it predicts the labels for the 700 images. But then we again have to hand label these images with those predictions right? then what is the point? I don't think I follow... 🤔 Is the purpose "automatically labelling" here? or what it is? Thanks Lizzy!

@deeplizard 6 жыл бұрын

Yes, you're understanding of the pseudo-labeling concept is correct. Rather than manually labeling the remaining 700 images with the generated pseudo labels, you would likely write some program that would automate this process for you based on the model's predictions.

@justchill99902 6 жыл бұрын

@@deeplizard aah right! Got it now :) Thanks :)

@gerelbatbatgerel1187 5 жыл бұрын

@ashishshrma 6 жыл бұрын

Thanks, that was a very nice explanation, can you suggest me some links to learn more about semi-supervised learning?

@deeplizard 6 жыл бұрын

Thanks, Kintsugi! I don't have any specific links right off hand, but if you google "semi-supervised learning arxiv," you will get a list of published papers on the topic that you may be interested in checking out.

@hiroshiperera7107 6 жыл бұрын

Heloooo.... Can you please let me know how we can validate the unlabeled data from our model?

@deeplizard 6 жыл бұрын

Hey Hiroshi - Long time, no see! :) Validation requires that you have the labels for the validation set, so you can't validate with unlabeled data.

@hiroshiperera7107 6 жыл бұрын

Yes... my good old friend. :) Thanks for replying.. .. I'm having a small question. I'm having few people who label images.Also I have some images that have been labeled by some experts. If I need to evaluate the new people, and if the labeled images are less in quantity what will be a good approach to evaluate the new labelers? Will semi superviced learning be a good approach? Once after the network is trained with labeled and unlabeled data, can I take the new model to evaluate their labels?

@AshwathSalimath 6 жыл бұрын

Can you help us with the Implementation Code?

@deeplizard 6 жыл бұрын

Hey Ashwath - What type of code implementation? Check out our Keras playlist below. It has many videos showing how to implement the code for a range of neural network topics and projects. kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL

@AshwathSalimath 6 жыл бұрын

I wanted to know the Implementation for Semi-Supervised Learning. Thank you.

@deeplizard 6 жыл бұрын

I see. I will add that to my list of potential topics for future videos. Thanks!

@connergesbocker9902 Жыл бұрын

💟

@jonyejin 2 жыл бұрын

This notion is very weird I feel. First you train with labeled data, and put unlabeled data in the model to get inference from those unlabeled, and once it is 'pseudo-labeled', you train again with all the datas. What if the model's 'pseudo-labeling' is mostly wrong?

@Intu11110 4 жыл бұрын

Do you know how many girls are interested in ai? Barely any, thanks for the material on your channel.

@vsiegel 3 жыл бұрын

I do not see how this makes sense. It can not learn anything new if the unlabelled samples get partially mislabelled, I think. If they are correctly labelled, it already works as intended. If not, it learns making the same errors, maybe with more confidence, i would think. It would perfectly make sense if the pseudo-labelled data is manually corrected, or if samples causing errors are removed.

@kougamishinya6566 3 жыл бұрын

I don't see the unsupervised learning part here. It's just 2 rounds of supervised learning. Unsupervised would have been, clustering the unlabled data with an UL algorithm like K means. And then using the labeled data to map labels onto the clusters, then train a new supervised model using the whole training data made up of true and pseudo labels. How is what you described actually using USL? If you just train a regular SL model and then use it to predict more unlabeled data that doesn't make it UL. Predicting the labels is literally the whole point of the SL model.