Active Learning. The Secret of Training Models Without Labels.

Рет қаралды 8,822

Күн бұрын

A large part of the success of supervised machine learning systems is the existence of large quantities of labeled data. Unfortunately, in many cases, creating these labels is difficult, expensive, and time-consuming.
An obvious solution is to use machine learning to aid in the creation of the labels, but this presents a chicken and egg problem: how do we build a model to create labels before labeling our data to train that model?
Active Learning is one solution. A semi-supervised learning technique to build better-performing machine learning models using fewer training labels.
Paper mentioned in the video:
Active Learning Literature Survey. burrsettles.com/pub/settles.a...
🔔 Subscribe for more stories: www.youtube.com/@underfitted?...
📚 My 3 favorite Machine Learning books:
• Deep Learning With Python, Second Edition - amzn.to/3xA3bVI
• Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow - amzn.to/3BOX3LP
• Machine Learning with PyTorch and Scikit-Learn - amzn.to/3f7dAC8
Twitter: / svpino
Disclaimer: Some of the links included in this description are affiliate links where I'll earn a small commission if you purchase something. There's no cost to you.

Пікірлер: 52

@tecbrain 26 күн бұрын

Fantástico vídeo. La verdad es que ahora voy a trabajar el código para entenderlo. Gracias por el trabajo que haces para ayudarnos.

@hasanx8317 23 күн бұрын

Duplicated records in the data has a significant meaning. It means that this repeatedly appearing record in the past is probably going to repeatedly appear in the future, it a VIP records, and knowing how to handle it well means you succeeded in high percentage of your supposed to do. So having duplicate data should some how eventually make the model very accurate in predicting it's related lable, more accurate than unique records.

@thecouchman2112 Жыл бұрын

Really helpful video, thanks. One small thing though, the sound effects on the title screens were a bit loud imo :)

@underfitted Жыл бұрын

Noted! Thanks for the feedback!

@underfitted Жыл бұрын

GOOD ONE!

@emeebritto 2 ай бұрын

yaa... >.

@miguelduqueb7065 Жыл бұрын

Nice video! You can also use a similar approach to compare models and stay with the one that performs best. Here is how: A few years ago I was collecting data in the chemistry lab in order to fit some models. Each experiment took 1 day to complete, so I started with a simple factorial design, fitted all models to the initial data set, and then predicted the point of maximum divergence between all models. That point was used as the next experiment and models we refitted thereafter. This procedure was repeated several times. Computing uncertainty in your predictions is similar, but only with one model.

@underfitted Жыл бұрын

Thanks for sharing!

@sahanakaweraniyagoda9866 Жыл бұрын

This is lit 🔥. Love this practical approach to Machine learning. Keep doing the amazing work 👏👏

@underfitted Жыл бұрын

Thanks! Much more coming!

@fikriansyahadzaka6647 Жыл бұрын

Nice video! Could you also explain about semi-supervised learning? There are not many videos that clearly explain about the progress so far in semi-supervised learning, even though the topic become more popular nowadays

@Param3021 Жыл бұрын

Another nice video! Learned a new concept - *Active Learning*

@underfitted Жыл бұрын

Glad to hear that!

@maheshBasavaraju Жыл бұрын

Loved the Idea of smart labelling. very cool

@knutjagersberg381 Жыл бұрын

Love it, world class content! Also agree. A thought: Why not start with few shot or zero shot learning before active learning?

@underfitted Жыл бұрын

If you have a model capable of zero-shot, absolutely!

@jubakala Жыл бұрын

Thanks! This was exactly what I needed at the moment! (:

@vidyachandran944 Жыл бұрын

Great content! Thank you :)

@JoaquinRevello Жыл бұрын

Excellent Video. This channel is going to be huge soon

@jayantghadge4027 Жыл бұрын

This method to me seems a little bit like boosting. I might be wrong though, but boosting is what came to my mind after watching the video.

@jainamshroff4998 Жыл бұрын

A Very good video!

@mahendrakumargohil6384 Жыл бұрын

Excellent Information 👍👍

@underfitted Жыл бұрын

Glad it was helpful!

@brunoras Жыл бұрын

Super insightfull, I`m using this ideas right now!

@123arskas Жыл бұрын

If you've made it public (for smaller scale projects) please give the link to its repo. Thank you

@underfitted Жыл бұрын

Wonderful!

@kemalariboga Жыл бұрын

Great content!

@underfitted Жыл бұрын

Thanks!

@roshanaryal7786 Жыл бұрын

Hi, Santiago! Love your content! Could you please make a video on how to start machine learning as a beginner with some programming experience. I've been doing web dev but want to transit into ML. I will appreciate your response 😊

@underfitted Жыл бұрын

It's coming soon!

@erdi749 Жыл бұрын

I love your videos, nice and extremely informative! Just a quick comment: is it possible not to have those " bommmm!" soun?(: It make impossible to listen your videos in a car or with headphone. Thank you!

@underfitted Жыл бұрын

Thinks, Erdi! Yes, if you watch my last few videos, I’ve improved the audio, including removing that particular sound 😏

@lorenzoleongutierrez7927 Жыл бұрын

Great explanation, thanks! Do you have some example of labeling services providing this approach?. greetings !

@fobaogunkeye3551 Жыл бұрын

Lovely video Santiago! Quick question: How do we label the low confidence data that the model initially had a hard time predicting since we also didn't know what the label was in the first place. How do we know the label/class to use for that low confidence predicted data when we re-train ?

@underfitted Жыл бұрын

We will start by labeling some of the data manually. The goal is to seed the process to start generating automatic labels.

@dimasveliz6745 Жыл бұрын

dynamic! Liked it more!

@underfitted Жыл бұрын

Cool, thanks!

@juan.o.p. Жыл бұрын

Very interesting

@underfitted Жыл бұрын

Glad you think so!

@123arskas Жыл бұрын

I've some queries. There's no proper practical application of it is it? Since the paper talks about methods proposed along with practical issues. Since your videos are straight to the point and you try to keep it simple, just wanna know if you've found practical implementation of it in Python etc. Do give a link to it in the description. Thank you

@underfitted Жыл бұрын

Yeah, I've personally used Active Learning multiple times. It's a very practical way to decide how to label a dataset.

@Param3021 Жыл бұрын

1:03 - We need to Build a Model to Label the data we need, to Build a Model 🤯

@underfitted Жыл бұрын

Yup :)

@sodipepaul9370 Жыл бұрын

Wow.

@underfitted Жыл бұрын

Wow indeed

@CarlosBCU Жыл бұрын

Hi, maybe a silly question but how you calculate the confidence after step 2?

@underfitted Жыл бұрын

Assuming you are using a classification model, for example, that will be the confidence (probability) returned by the model. More specifically, the softmax value corresponding to the highest predicted class.

@CarlosBCU Жыл бұрын

@@underfitted many thanks for your answer! What if we are running a regression?

@modakad 29 күн бұрын

@@underfitted Answering CarlosBCU's question on confidence : I dont think your answer sufficiently clarifies the approach. Lets take an example. Suppose we have two classes, class 0, class 1. for observation A, softmax vector is [0.92,0.08] and for observation 2 its [0.60,0.40] {remember, Softmax gives a vector of values, which all add up to 1}. Which observation should we pick ? Not obs1. Obs2 is where the model has low confidence - as the model separates its predictions by only a magnitude of 0.2 (abs(0.6-0.4)) and in osb1, the separation is higher.

@modakad 29 күн бұрын

@@CarlosBCU I think the answer would be - choose the observations with higher error (RMSE, MSE etc.)

@modakad 29 күн бұрын

If you are using sigmoid loss function, then it would be trickier.