I wondered what would happen if he used a hand sanitizer lol
@psijkopupa68534 жыл бұрын
@@advaitgogte6385he is a drummer
@bjarnij37824 жыл бұрын
@@psijkopupa6853 or a yard worker lol
@grumpyae863 жыл бұрын
i swear to god i was just about to say that lol
@jingyiwang511310 ай бұрын
Thank you for this explanation about k-fold cross validation! It is really helpful!😃
@garybutler16723 жыл бұрын
1. Train/Validation 2. Either 3. K-Fold CV I've seen a lot of answers I disagree with in the comments, so I'll explain. First, the terminology is Train/Validation when used to train the model. The Test set should be taken out prior to doing the Train/Validation split and remain separate throughout training. The Test set will then be used to test the trained model. Second, the answers. 1. Obviously training will take longer doing it 10 times. 2. While training did take longer, you are actually running the same size model in production. All other things being equal the run times of both already trained models should also be equal. 3. The improved accuracy is why you would want to use K-Fold CV. If I'm wrong, please explain. I'll probably never see your comments, but you could help someone else.
@lordblanck79233 жыл бұрын
i don't think test set is separated prior to training. Its like there is 10 equal size but random value groups. Lets say you choose group 1 and then every other group is combined to get training set while group 1 is test set and then you get a value and again you repeat this time with group 2, group 2 is test set while other group combines to get a training set and there you go, we repeat this and get our value
@ahmedmustahid4936 Жыл бұрын
3. The improved accuracy is why you would want to use K-Fold CV. I dont think accuracy does not necessarily get improved if K fold cv is used. K fold cv is used to reduce variation in the values of metrics accross different train/test sets.
@thorbenburig4763Ай бұрын
Dead comment, but maybe it would be more precise to say that K-Fold CV is used to maximize the accuracy of the accuracy (with that the model predicts) :D So instead of maybe an accuracy of 99% K-Fold CV will give you 95% but this accuracy will be more close to what you will get if your model will be deployed into production.
@dorsolomon72517 жыл бұрын
It's obvious that the result are : train/test, train/test, and then cross validation. cross validation run the program "k" times so it's "k" time slower , but one the other hand is more accuracy.
@tamvominh32725 жыл бұрын
So which model that I should take to do a demo with a data point? because if k=10 I will have 10 models. Using these 10 models and voting for the last label? Is it right? Thank you so much.
@ericklestrange62554 жыл бұрын
it is but sometimes you come here from reading tons of complicated shit around and you just want things to be chewed for you, thats why we come to videos. thanks for the results
@whalingwithishmael77515 жыл бұрын
Simple and beautiful
@DoughyBoy5 жыл бұрын
Why is it so hard to find a simple, concrete, and by hand example of simple k cross validation? All the documentation I can find is very generalized information, but no practical examples anywhere.
@Jsheng0076 жыл бұрын
Interesting to see that your video presented like this, mind to share how do you present your drawing like this?
@reedsutton80394 жыл бұрын
This is incorrect. You should correct this video, as you're encouraging people to mix their train and test sets, which is a cardinal sin of machine learning. Every time you say test set, you should be saying validation set. Test set can only be tested one time, and cannot be used to inform hyperparameters.
@kias872 жыл бұрын
The voice sounds like Sebastian Thrun. Great guy :)
@AimeOrtega-t9pАй бұрын
Thanks for sharing such valuable information! Just a quick off-topic question: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). What's the best way to send them to Binance?
@ijyoyo2 жыл бұрын
Interesting video. Thanks for sharing.
@apericube277 жыл бұрын
k-fold cross validation runs k learning experiences, so at the end you get k different models.... Which one do you chose ?
@tahaait72367 жыл бұрын
You take the average of testing accuracy ,he said that at the end
@apericube277 жыл бұрын
I am not talking about accuracy but models, you can't always "average" models. I guess there are 2 options: 1-The cross-validation builds k models, then you get only an estimation of the accuracy and you will have to build a model on the whole train set afterward to have your final model. 2- The cross-validation builds a unique model with the whole train set and then estimates the accuracy on the k subsets
@paolofazzini31466 жыл бұрын
I had the very same doubt and I find strange I was unable to find a quick answer even browsing several sources. However I think that 1), between your hypotheses, makes the most sense: the cross validation is meant to find out whether the MODEL (and not its parameters) is the right fit for the sample data; for instance it should find out if, say, a polynomial has the right number of parameters(=the right degree) and does not cause overfitting. Once you know, by this method, how good your model is you might use the whole set to train the found model and get the best parameters, as you say.
@FernandoWittmann6 жыл бұрын
Paolo is correct. We actually use cross-validation for evaluating the hyperparameters rather than getting a final performance estimation on unseen data. 20% of the dataset should still be reserved for testing instead of being an `alternative` way for testing. Then, cross-validation is applied to the training set which is split into training and validation subsets in a cross-valdated way. More details here: scikit-learn.org/stable/modules/cross_validation.html
@regivm1236 жыл бұрын
Thanks for the question Guillaume... I too was struggling with this. Thanks Fernando and Paolo for responses.
@9MeiRoleplay7 жыл бұрын
this video is really usefull, thank you very much. it help me a lot.
@junbozhao96754 жыл бұрын
Yeah. Very Clear.
@sanika6916 Жыл бұрын
Thankyou so much very informative
@snk22887 жыл бұрын
The test bin is different every time, so how do you average the results? Can you please provide a detailed explanation on this?
@ericklestrange62554 жыл бұрын
that is because you arent using a random seed on your classification algorithm. random_state=1
@randa78927 жыл бұрын
do all the 10 folds have to be of the same size? what is the effect if they are of different sizes?
@ivoriankoua39166 жыл бұрын
the final part will have fewer instances than the other k − 1 parts . Supposed that with the same example you have 207 data set size . You'll use 207/10 = 20 (as an integer operation) and also 207%10=7 so finally , you'll get 11k model with 10 which have 10 as value and the last one 7
@Tyokok5 жыл бұрын
Thanks for the video! Quick(silly) Question: in any of those validation methods, every time you change training data, are you going to re-fit the model? If so, every time validating step is respect to different model fit. Then how you determine your final model decision?
@knowlen4 жыл бұрын
You re-train on 100% of the data. -future viewers fyi
@fishertech2 жыл бұрын
@@knowlen So if you average the 10 separate performance metrics, does that mean in the futurre for a prediction task you must also use each of the k separate models and then take the average prediction?
@knowlen2 жыл бұрын
@@fishertech not necessarily. In practice, ensembles do tend to perform better, but scale poorly (in compute and memory w.r.t. the perf gains). Cross validation reveals good hyper parameters. Once we have the values we just train a single model on the full data and expect similar performance to the averaged ensemble from the cross validation phase.
@rishabmacherla3326 Жыл бұрын
@@knowlen We won't be re-training on the 100% data every time, instead, we train only the k-1 blocks of the data. Let's say, we have 20 records in a dataset, we split into 4 parts. In the first iteration K1 will be test and K2,K3,K4 will be used to train. In next iteration, K2 will be test and K1,K3,K4 will be used to train and so on. So doing this, we can observe that, we are training the model will all the possible data values in the dataset. There will be no use in training the 100% dataset everytime.
@knowlen Жыл бұрын
@@rishabmacherla3326 I meant that once we've cross validated all our models, we select the top performing one and train it on 100% of the data. If you already know the best model, you don't need to re-run cross validation for incremental changes to data pulled from the same target distribution --that's just a waste of compute.
@sumitdam96425 жыл бұрын
Can anybody provide me the video link which describes the training and test sets by Mrs. Katie ?
@samanthabentley2152Ай бұрын
At 1:01 I was startled by the voice 😂
@muzhao-r9v3 ай бұрын
i have a question, this method will create ten models in ten test data, and validate ten times in ten data, so what is the final model?
@ryanmccauley2117 жыл бұрын
Great explanation thanks!
@omidasadi22645 жыл бұрын
Hi bro, thanks for sharing this learning..... ??just a question?? with which application do you make this tutorial? it's amazing... your text came on and above your hand.
@conexionesmentales54442 жыл бұрын
by any chance did you get the response to which application was used?
@omidasadi22642 жыл бұрын
@@conexionesmentales5444 no, but I'm glad to hear about it
@shahi_gautam5 жыл бұрын
I have a small dataset of 48 samples if I have applied MLP using 6-fold, Do I still need validation set to avoid the biased result on the small dataset? Please suggest.
@ericklestrange62554 жыл бұрын
from my book it says that smaller sets require bigger k (number of folds), and the oposite because of computational cost. however it also seems counter intuitive to me, since having an already super small dataset and dividing it by a big number youll end up with practically individual samples so you cant correlate... (?)
@باسلبنعبدالله-ع5د5 жыл бұрын
thank you very much , that video is helpful ..
@theawesomeNZ5 жыл бұрын
But this doesn't solve the issue of choosing the bin size, i.e. trade-off between training set and test set (although you are now using all the data for both tasks at some point).
@quubands4018 Жыл бұрын
When performing cross-validation, the value of K refers to the number of folds that the data is divided into. The choice of K depends on the size of the dataset and the desired level of precision in the performance estimate. If the dataset is small, a larger value of K can be used to ensure that the model is trained and tested on as many data points as possible. However, if the dataset is large, a smaller value of K can be used to reduce the computational complexity of the cross-validation process. A commonly used value of K is 10, which means that the data is divided into 10 equal parts, with each part used as a test set once and the remaining parts used as a training set. However, other values of K can be used depending on the specific dataset and the goals of the analysis. It is important to note that the choice of K can affect the estimated performance of the model, with higher values of K leading to a lower bias but higher variance in the estimate. Therefore, it is often recommended to perform multiple rounds of cross-validation with different values of K to obtain a more robust estimate of model performance.
@mohashobak74544 жыл бұрын
So is this supervised, unsupervised or semi-supervised algorithm?
@TTBOn00bKiLleR3 жыл бұрын
it's not about the algorithm u train and infere, it's about what data you choose to train any of them and test any of them, so that the model produces most accurate results
@shwetaredkar7345 жыл бұрын
Here in K fold CV, A model in each fold computes an average result. So entire 10 fold CV is an average of average? What does it mean by 5 times 10 fold cv? How it is different from the normal 10 fold CV? Can someone help me understand this?
@yogeshwarshendye48573 жыл бұрын
won't this make the model specialized for the data that we have??
@thesiberian99717 жыл бұрын
What I don’t get is: say you’ve picked the 1st bin as your test set for the first run and the rest as your training set. Hasn’t the model learned everything in the training set for the rest of the runs? What’s the point of using all the k’s when they’ve already been used before?
@ferkstkojtt7 жыл бұрын
So basically you build k different models. Afterwards you validate the models on their average error to see how they differ. In the last step you are supposed to create the best models out of these k-models but I don't fully understand if you either just pick one model or combine them into one super model..
@BigBadBurrow7 жыл бұрын
You have k completely separate experiments, with a new/untrained network each time. But in each case have different train / test data. You then create an avg across all experiments and that is your error rate. It's just a more robust way of testing the network.
@asadmohammed7066 жыл бұрын
Just like it was told at the beginning of the video, If we only split the data into 2 parts (i.e. the train and test datasets) we might not extract information to the maximum extent. If we split the data into k parts, and then perform the cross-validation on different datasets we gain higher accuracy. If we do the 10 fold CV we get 10 results, those are your 10 different training and test accuracy and we choose the best one, so we are able to find the best subsets and combination.
@xordux76 жыл бұрын
You learn in first run and then unlearn it. Then you choose another bin, learn from it in second run and then unlearn it again...this cycle goes on until k cycles are completed.
@TehTechExpert6 жыл бұрын
1:15 - 1:18 Someone please tell me what he said he was speaking english then he just jumbled his words up
@mr.s77676 жыл бұрын
check the subtitles!
@seouljazzylife4 жыл бұрын
whereas in the work that katie showed you hes refering to the video another person did
@rlalduhsaka67466 жыл бұрын
so, what is the difference with test_train_ split with test size=0.1
@charismaticaazim6 жыл бұрын
10% of the data is used for testing & 90% is used for training.
@Sthern345 жыл бұрын
Thanks, clear
@lionheart50785 жыл бұрын
do a simple practical example by hand, not just theory always. People understand better when there are actual numbers and you go through the entire procedure, even if its a trivial example.
@BitsOfBoris6 жыл бұрын
Nice !
@oliveryoung65019 жыл бұрын
what do you mean by data points, you mean instances ?
@chirathabey77298 жыл бұрын
+olie tim Yes, Problem Instances, Data Tuples, Data Points, Records are all same.
@patrickdoherty91162 жыл бұрын
That huge popped blister on his hand is lowkey distracting
@Sergiogccm4 жыл бұрын
Like because the blister made with a barbell.
@AhmedGamal-xi3vj3 жыл бұрын
Can anyone share the answers for those questions please
@kostas_x9 ай бұрын
kzbin.info/www/bejne/d3Wxd36fds-gjaM
@weizeyin67726 жыл бұрын
hey guys from ECON704
@bodilelbrink Жыл бұрын
nobody else distracted by the wound?
@ytber86996 жыл бұрын
i know its very old video but still its not necessary to show your hand while writing
@wint76276 жыл бұрын
Well, it helps us see what he's pointing at.
@ranit_5 жыл бұрын
Focus on content and you will not notice the hand any more. :)
@c0t5565 жыл бұрын
Why does his hand bother you???
@julianfbeck4 жыл бұрын
ihhhhh
@EllieOK3 жыл бұрын
Kann man dir helfen?
@StEvUgnIn2 жыл бұрын
You miss a part of your skin Sir
@fellipealcantara68563 жыл бұрын
can't watch it... the blister is too anoying
@killvampires7 жыл бұрын
I think the answers are train/test, train/test, and then 10-fold C.V. Also, don't make a video with some nasty open sore on your hand please. Wear a glove or something.
@fuu8127 жыл бұрын
Please don't be rude. Also, don't comment if your surname sounds like chew. Use an alias or something.
@phum1267 жыл бұрын
LOL you're complaining about a sore on his hand...try and be more of an uptight bitch bahahah smh
@TEAdog776 жыл бұрын
Shouldn't a model based on a simple train/test split have the same run time on new data compared to a model based on a cross validation approach?
@ranit_5 жыл бұрын
@alex chow I'm down-voting this remark. Stop spreading hatred & focus on the content please. His knowledge is much more deeper than the 'open sore'. Happy learning!
@SuperMixedd5 жыл бұрын
@@ranit_ i'm downvoting your response to his remark. He has every right to express his opinion, and moreover, he is right that open sore is distracting at the very least, or simply disgusting if we were completely candid
@lovemormus6 жыл бұрын
the hand is so annoying
@j.u.m.p.s.c.a.r.e6 жыл бұрын
what are you even saying? can't understand anything!