This video helps me to clear all my doubts regarding cross-validation and data leakage.
@VVV-wx3ui5 жыл бұрын
I have done a course and also did bit of predictions using ML, DL including ANN, CNN and LSTM. However, now i understand the libraries to use for difference cases. Thanks For coming up with such Videos. Please do more, i am your subscriber.
@srinivasanganesan92943 жыл бұрын
Krish, A very simple explanation of how CV can be used in algorithm selection. Very well done.
@drvren0303 жыл бұрын
got an exam in a couple of hours, and this video cleared up a LOT of things! thank you for going into the concept, and using that to explain what's going on in your code. kudos man, kudos
@pranavakailash87513 жыл бұрын
That is Clean AF! Thanks for this video, Really appreciated
@Nikhil-jj7xf5 жыл бұрын
Explained with simplicity . Thanks Krish..
@infidos4 жыл бұрын
Awesome and clean, simple explanation.
@asadkhan-kk2ru Жыл бұрын
Excellent
@sivakumarprasadchebiyyam9444 Жыл бұрын
Hi its a very good video. Could you plz let me know if cross validation is done on train data or total data?
@l2mbenop3465 жыл бұрын
Can you increase font size of the editor. Its very small and eye straining to read in mobiles.
@tarabalam9962 Жыл бұрын
Great teaching
@muskangupta58734 жыл бұрын
best video 🙌 keep posting sir you are awesome
@akashpoudel5715 жыл бұрын
Thank you sir , for a lucid explaination...
@chinedumezeakacha16044 жыл бұрын
Very apt and straight to the point. Thanks for sharing
@divyanshuanand99905 жыл бұрын
Excellent. Thanks for the video.
@VenkatDinesh023 жыл бұрын
Krish For logistical regression problem we should use Mode right ..Use used mean here..why
@amankapri4 жыл бұрын
Very Good Explanation
@TJ-wo1xt2 жыл бұрын
gr8 explanation.
@animemodeactivated64044 жыл бұрын
Hi Krish, After selecting the model, How to select the best chunk of data for training, as different splits of data will give different accuracy. Very helpful if you can post some video for the same.
@EcommerceAdvices3 жыл бұрын
Hi Anime, I think after selecting best model with good average accuracy, we dont need to split further again i.e. now train on whole dataset and make/save a model. What say?
@salvador94315 жыл бұрын
Is it ok to use your train_x anda train_y data in your cross validation? Or is better to use your whole x and y variables?
@generationwolves5 жыл бұрын
The X, and y variables. The whole point of using cross validation techniques is to try various combinations of train and test sets from your original dataset, and find out how effective your algorithm is for any of these combinations.
@anubabu41873 жыл бұрын
Nice video...sir..how to find the cross validation of non standard parameter...example specificity..
@malkitsingh26543 жыл бұрын
Don of datascience Community
@borngenius48103 жыл бұрын
Excelllent explanation. So if I am using cross_val_score instead of train_test_split, I don't need to find out and analyse metrices like Precision, Recall , F1 , ROC ? Just getting the accuracy.mean() is good enough? p.s I am new to DS so I hv probably mixed up a few things
@YavuzDurden2 жыл бұрын
Sir, how can I achive this datasets which validated? Why we are applying cross validation if we cant select the high scored scores data? Thank you.
@louerleseigneur45323 жыл бұрын
Thanks Krish
@ajaykushwaha-je6mw3 жыл бұрын
I have a question. In cross validation we perform multiple experiment based on cv value. In K fold also we do the same thing. What is the difference between these two ?
@asadkhan-kk2ru Жыл бұрын
Very good
@Hellow_._ Жыл бұрын
how to use other CV techniques in coding like stratified cv, time series cv and leave one out cv?
@swethanandyala2 жыл бұрын
Hi sir... When we are using cv=10...then it simply applies Kfold sampling...can we import stratified k fold and put cv=stratified k fold while wroking with a dataset which has class imbalance as the stratified sampling gives same ratio of clases in train and validation data
@Raja-tt4ll4 жыл бұрын
very nice video. Thank you.
@SararithMAO Жыл бұрын
If i just want to apply K-fold Cross validation, so i don't need to do train test split, right?
@simplify14112 жыл бұрын
Sir what if the total observations are like 107 or 191 or any prime number...How to split using k fold cv?
@tiagosilvacorrea90045 жыл бұрын
Very Good! Thanks
@akpovoghoigherighe9646 жыл бұрын
This is very useful.
@denischo21333 жыл бұрын
What to do if I wanna apply minmax or standardscale to fit train and transform only test set in cross Val score? The rule of thumb is to apply these technique on train and test separately so how I can perform this? Cross Val score doesn’t has a specific argument
@Hiyori___3 жыл бұрын
great tutorial
@SumitKumar-uq3dg5 жыл бұрын
In cross validaion we are running different models and taking mean of all the acuracies. So which model will be our final model!
@MasterofPlay74 жыл бұрын
can you output the model summary and the confusion matrix using cross_val_score?
@laxminarasimhaduggaraju26715 жыл бұрын
Iam following ur videos The way u explains is simply awesome N many more happy thanks for sharing the information N knowledge about D.S
@shahadiqbal1765 жыл бұрын
have u done it usind decision tree, random forest, naive bayes??
@nagandranathvemishetti92473 жыл бұрын
Sir will it work for multi-class problem
@markmorillo29543 жыл бұрын
Great
@niteshsrivastava65044 жыл бұрын
Does cross_val_score functio uses hyper params and startified folds?
@BiranchiNarayanNayak6 жыл бұрын
When to use the trian_test_split() and cross_val_score() on the dataset ? As I have seen most of the programs use train_test_split with 70%,30% or 60%,40% train,test data split and fit the model. So which is the best approach ?
@janvonschreibe34475 жыл бұрын
There is not really a neat rule. A rule of thumb is to take the same ratio as your test/train set ratio
@ravikshdikola60893 жыл бұрын
if train _scores and cross_validate scores difference is negative value so does that mean that model perform very well
@rajusrkr54444 жыл бұрын
xCELENT VIDEO
@auroshisray91403 жыл бұрын
Thank you sir
@ANILKUMAR-qd8lx6 жыл бұрын
Please can be explain feature selection in model
@shaiksajid613.2 жыл бұрын
what is that accuracy is that train accuracy or test ?
@arunkumaracharya96414 жыл бұрын
You said...if CV = 10, then ten experiments are conducted but did not tell in what ratio train and test sample is split. Also there is different interpretation of random_state on internet. If random_state = 'none' then sample changes and if random_state = any integer then sample does not change irrespective of any integer you choose. But in your case sample did change if any integer was used. Please clarify
@maynorhernandez746 Жыл бұрын
In the case of CV= 10 the ratio is train=0.9, testing=0.1, thats because you split the "cake" in 10 pieces. Let´s see for instance CV= 5. The cake is split in 5 pieces so you have 4 pieces to training and 1 piece to test. So you will have train= 4/5= 0.8 and testing 1/5= 0.2. For a CV= 4. Training = 3/4= 0.75 and testing =1/4= 0.25. I hope this clarify
@mekalamadhankumar32243 жыл бұрын
It is difficult calculate which model suitable to use data because we can use all models to check the good accuracy . This is to lead to big problem in coding part .
@Data_mata Жыл бұрын
❤❤
@piyushaneja71683 жыл бұрын
sir i m confused in this, we are selecting a part of dataset for testing nd rest for training ,in next iteration we are selecting a part for testing(that was already trained in previous iteration? if so then it wont give correct accuracy as model has already seen the data? or Am i missing some point..
@zulfiquarshaikh34613 жыл бұрын
Bro in second iteration data that goes in training ans testing is randomly lifted..but its not the same in second iteration. And second has unseen data for testing
@piyushaneja71683 жыл бұрын
@@zulfiquarshaikh3461 okay thnku bro..!!!
@kareemel-tantawy83554 жыл бұрын
k fold cross validation use to decide which model is the best for regression and classification only or I can use it to decide which model is the best for clustering
@chinedumezeakacha16044 жыл бұрын
Just for classification. Not used for clustering I think.
@sunnysavita90715 жыл бұрын
sir you didn't define the test_size in train_test_split().
@rohithn20564 жыл бұрын
if u don't define automatically train_test_split function takes 75:25 ratio
@nguyenluu30822 жыл бұрын
Can you explain the effect of random_state to the accuracy?
@jackdaws71252 жыл бұрын
each random state is a different randomization of the train-tes split of the data. So the reason the accuracy is changing is that in each case the split was done differently and lead to different results, which is why its quite unreliable and CrossValidation helps us solve it
@yogendrasaikiran44863 жыл бұрын
Iam unable to use that cross validation function in my system
@markmorillo29543 жыл бұрын
Nice viddo
@akashpoudel5715 жыл бұрын
Sir it's cross val comes first n then tunning the parameter always in general??
@krishnaik065 жыл бұрын
First hypertunning then cross validation
@akashpoudel5715 жыл бұрын
@@krishnaik06 Ok sir..
@akashpoudel5715 жыл бұрын
@@krishnaik06 Could u upload some more algorithms with their params meaning ...just the video on hyperparameters for logistic algm,regression ...If u have time sir
@AmitVerma-yg8pp4 жыл бұрын
I am a little confused with cv folds, and no.of values in X and Y dataset.
@KrishnaMishra-fl6pu3 жыл бұрын
If you take your k fold value as 5, then the CV will perform 5 exps Suppose there are 50 records and you took k fold value as 5 Then size of the test data would be = 50/5 i.e 10 Exp1==> test data = df[0:10,:] Exp2 ==> test data = df[10:20,:] Exp3 ==> test data = df[20:30,:] Exp4 ==> test data = df[30:40,:] Exp5 ==> test data = df[40:50,:]
@samueleboh89655 жыл бұрын
Thanks
@MasterofPlay74 жыл бұрын
can you output the model summary and the confusion matrix using cross_val_score?
@chinedumezeakacha16044 жыл бұрын
No I would't think so. I think cross validation is a quick way of determining which ML algorithm is most suitable. When you use which ever one that returns a high CV score, you can then do the model summary and confusion matrix using the confusion matrix library.
@MasterofPlay74 жыл бұрын
@@chinedumezeakacha1604 actually I tried it and you can do it
@prashanthpandu28295 жыл бұрын
I am having a doubt that u have to use cross_val_score on train datset or on the whole dataset
@venkilfc4 жыл бұрын
@@generationwolves If you use cross validation to tune your hyper parameters and improve your model, then you shouldn't apply cross validation on the entire dataset but only on the training data. Test data must be always independent. Otherwise it will result in data leakage. If you just want to have an overall look of the scores of the splits then you can apply on the whole dataset.