How to select the best model using cross validation in python

Рет қаралды 68,569

Krish Naik

Күн бұрын

Пікірлер: 79

@vaibhavkhobragade9773 3 жыл бұрын

This video helps me to clear all my doubts regarding cross-validation and data leakage.

@VVV-wx3ui 5 жыл бұрын

I have done a course and also did bit of predictions using ML, DL including ANN, CNN and LSTM. However, now i understand the libraries to use for difference cases. Thanks For coming up with such Videos. Please do more, i am your subscriber.

@srinivasanganesan9294 3 жыл бұрын

Krish, A very simple explanation of how CV can be used in algorithm selection. Very well done.

@drvren030 3 жыл бұрын

got an exam in a couple of hours, and this video cleared up a LOT of things! thank you for going into the concept, and using that to explain what's going on in your code. kudos man, kudos

@pranavakailash8751 3 жыл бұрын

That is Clean AF! Thanks for this video, Really appreciated

@Nikhil-jj7xf 5 жыл бұрын

Explained with simplicity . Thanks Krish..

@infidos 4 жыл бұрын

Awesome and clean, simple explanation.

@asadkhan-kk2ru Жыл бұрын

Excellent

@sivakumarprasadchebiyyam9444 Жыл бұрын

Hi its a very good video. Could you plz let me know if cross validation is done on train data or total data?

@l2mbenop346 5 жыл бұрын

Can you increase font size of the editor. Its very small and eye straining to read in mobiles.

@tarabalam9962 Жыл бұрын

Great teaching

@muskangupta5873 4 жыл бұрын

best video 🙌 keep posting sir you are awesome

@akashpoudel571 5 жыл бұрын

Thank you sir , for a lucid explaination...

@chinedumezeakacha1604 4 жыл бұрын

Very apt and straight to the point. Thanks for sharing

@divyanshuanand9990 5 жыл бұрын

Excellent. Thanks for the video.

@VenkatDinesh02 3 жыл бұрын

Krish For logistical regression problem we should use Mode right ..Use used mean here..why

@amankapri 4 жыл бұрын

Very Good Explanation

@TJ-wo1xt 2 жыл бұрын

gr8 explanation.

@animemodeactivated6404 4 жыл бұрын

Hi Krish, After selecting the model, How to select the best chunk of data for training, as different splits of data will give different accuracy. Very helpful if you can post some video for the same.

@EcommerceAdvices 3 жыл бұрын

Hi Anime, I think after selecting best model with good average accuracy, we dont need to split further again i.e. now train on whole dataset and make/save a model. What say?

@salvador9431 5 жыл бұрын

Is it ok to use your train_x anda train_y data in your cross validation? Or is better to use your whole x and y variables?

@generationwolves 5 жыл бұрын

The X, and y variables. The whole point of using cross validation techniques is to try various combinations of train and test sets from your original dataset, and find out how effective your algorithm is for any of these combinations.

@anubabu4187 3 жыл бұрын

Nice video...sir..how to find the cross validation of non standard parameter...example specificity..

@malkitsingh2654 3 жыл бұрын

Don of datascience Community

@borngenius4810 3 жыл бұрын

Excelllent explanation. So if I am using cross_val_score instead of train_test_split, I don't need to find out and analyse metrices like Precision, Recall , F1 , ROC ? Just getting the accuracy.mean() is good enough? p.s I am new to DS so I hv probably mixed up a few things

@YavuzDurden 2 жыл бұрын

Sir, how can I achive this datasets which validated? Why we are applying cross validation if we cant select the high scored scores data? Thank you.

@louerleseigneur4532 3 жыл бұрын

Thanks Krish

@ajaykushwaha-je6mw 3 жыл бұрын

I have a question. In cross validation we perform multiple experiment based on cv value. In K fold also we do the same thing. What is the difference between these two ?

@asadkhan-kk2ru Жыл бұрын

Very good

@Hellow_._ Жыл бұрын

how to use other CV techniques in coding like stratified cv, time series cv and leave one out cv?

@swethanandyala 2 жыл бұрын

Hi sir... When we are using cv=10...then it simply applies Kfold sampling...can we import stratified k fold and put cv=stratified k fold while wroking with a dataset which has class imbalance as the stratified sampling gives same ratio of clases in train and validation data

@Raja-tt4ll 4 жыл бұрын

very nice video. Thank you.

@SararithMAO Жыл бұрын

If i just want to apply K-fold Cross validation, so i don't need to do train test split, right?

@simplify1411 2 жыл бұрын

Sir what if the total observations are like 107 or 191 or any prime number...How to split using k fold cv?

@tiagosilvacorrea9004 5 жыл бұрын

Very Good! Thanks

@akpovoghoigherighe964 6 жыл бұрын

This is very useful.

@denischo2133 3 жыл бұрын

What to do if I wanna apply minmax or standardscale to fit train and transform only test set in cross Val score? The rule of thumb is to apply these technique on train and test separately so how I can perform this? Cross Val score doesn’t has a specific argument

@Hiyori___ 3 жыл бұрын

great tutorial

@SumitKumar-uq3dg 5 жыл бұрын

In cross validaion we are running different models and taking mean of all the acuracies. So which model will be our final model!

@MasterofPlay7 4 жыл бұрын

can you output the model summary and the confusion matrix using cross_val_score?

@laxminarasimhaduggaraju2671 5 жыл бұрын

Iam following ur videos The way u explains is simply awesome N many more happy thanks for sharing the information N knowledge about D.S

@shahadiqbal176 5 жыл бұрын

have u done it usind decision tree, random forest, naive bayes??

@nagandranathvemishetti9247 3 жыл бұрын

Sir will it work for multi-class problem

@markmorillo2954 3 жыл бұрын

Great

@niteshsrivastava6504 4 жыл бұрын

Does cross_val_score functio uses hyper params and startified folds?

@BiranchiNarayanNayak 6 жыл бұрын

When to use the trian_test_split() and cross_val_score() on the dataset ? As I have seen most of the programs use train_test_split with 70%,30% or 60%,40% train,test data split and fit the model. So which is the best approach ?

@janvonschreibe3447 5 жыл бұрын

There is not really a neat rule. A rule of thumb is to take the same ratio as your test/train set ratio

@ravikshdikola6089 3 жыл бұрын

if train _scores and cross_validate scores difference is negative value so does that mean that model perform very well

@rajusrkr5444 4 жыл бұрын

xCELENT VIDEO

@auroshisray9140 3 жыл бұрын

Thank you sir

@ANILKUMAR-qd8lx 6 жыл бұрын

Please can be explain feature selection in model

@shaiksajid613. 2 жыл бұрын

what is that accuracy is that train accuracy or test ?

@arunkumaracharya9641 4 жыл бұрын

You said...if CV = 10, then ten experiments are conducted but did not tell in what ratio train and test sample is split. Also there is different interpretation of random_state on internet. If random_state = 'none' then sample changes and if random_state = any integer then sample does not change irrespective of any integer you choose. But in your case sample did change if any integer was used. Please clarify

@maynorhernandez746 Жыл бұрын

In the case of CV= 10 the ratio is train=0.9, testing=0.1, thats because you split the "cake" in 10 pieces. Let´s see for instance CV= 5. The cake is split in 5 pieces so you have 4 pieces to training and 1 piece to test. So you will have train= 4/5= 0.8 and testing 1/5= 0.2. For a CV= 4. Training = 3/4= 0.75 and testing =1/4= 0.25. I hope this clarify

@mekalamadhankumar3224 3 жыл бұрын

It is difficult calculate which model suitable to use data because we can use all models to check the good accuracy . This is to lead to big problem in coding part .

@Data_mata Жыл бұрын

❤❤

@piyushaneja7168 3 жыл бұрын

sir i m confused in this, we are selecting a part of dataset for testing nd rest for training ,in next iteration we are selecting a part for testing(that was already trained in previous iteration? if so then it wont give correct accuracy as model has already seen the data? or Am i missing some point..

@zulfiquarshaikh3461 3 жыл бұрын

Bro in second iteration data that goes in training ans testing is randomly lifted..but its not the same in second iteration. And second has unseen data for testing

@piyushaneja7168 3 жыл бұрын

@@zulfiquarshaikh3461 okay thnku bro..!!!

@kareemel-tantawy8355 4 жыл бұрын

k fold cross validation use to decide which model is the best for regression and classification only or I can use it to decide which model is the best for clustering

@chinedumezeakacha1604 4 жыл бұрын

Just for classification. Not used for clustering I think.

@sunnysavita9071 5 жыл бұрын

sir you didn't define the test_size in train_test_split().

@rohithn2056 4 жыл бұрын

if u don't define automatically train_test_split function takes 75:25 ratio

@nguyenluu3082 2 жыл бұрын

Can you explain the effect of random_state to the accuracy?

@jackdaws7125 2 жыл бұрын

each random state is a different randomization of the train-tes split of the data. So the reason the accuracy is changing is that in each case the split was done differently and lead to different results, which is why its quite unreliable and CrossValidation helps us solve it

@yogendrasaikiran4486 3 жыл бұрын

Iam unable to use that cross validation function in my system

@markmorillo2954 3 жыл бұрын

Nice viddo

@akashpoudel571 5 жыл бұрын

Sir it's cross val comes first n then tunning the parameter always in general??

@krishnaik06 5 жыл бұрын

First hypertunning then cross validation

@akashpoudel571 5 жыл бұрын

@@krishnaik06 Ok sir..

@akashpoudel571 5 жыл бұрын

@@krishnaik06 Could u upload some more algorithms with their params meaning ...just the video on hyperparameters for logistic algm,regression ...If u have time sir

@AmitVerma-yg8pp 4 жыл бұрын

I am a little confused with cv folds, and no.of values in X and Y dataset.

@KrishnaMishra-fl6pu 3 жыл бұрын

If you take your k fold value as 5, then the CV will perform 5 exps Suppose there are 50 records and you took k fold value as 5 Then size of the test data would be = 50/5 i.e 10 Exp1==> test data = df[0:10,:] Exp2 ==> test data = df[10:20,:] Exp3 ==> test data = df[20:30,:] Exp4 ==> test data = df[30:40,:] Exp5 ==> test data = df[40:50,:]

@samueleboh8965 5 жыл бұрын

Thanks

@MasterofPlay7 4 жыл бұрын

can you output the model summary and the confusion matrix using cross_val_score?

@chinedumezeakacha1604 4 жыл бұрын

No I would't think so. I think cross validation is a quick way of determining which ML algorithm is most suitable. When you use which ever one that returns a high CV score, you can then do the model summary and confusion matrix using the confusion matrix library.

@MasterofPlay7 4 жыл бұрын

@@chinedumezeakacha1604 actually I tried it and you can do it

@prashanthpandu2829 5 жыл бұрын

I am having a doubt that u have to use cross_val_score on train datset or on the whole dataset

@venkilfc 4 жыл бұрын

@@generationwolves If you use cross validation to tune your hyper parameters and improve your model, then you shouldn't apply cross validation on the entire dataset but only on the training data. Test data must be always independent. Otherwise it will result in data leakage. If you just want to have an overall look of the scores of the splits then you can apply on the whole dataset.