Wow, this was amazing. I'm working on machine learning models to diagnose early leackage of valves in piston diaphragm pumps. Thanks Chanin. Really love your videos.
@aimenbaig62013 жыл бұрын
i love your calm teaching style! it's relaxing
@DataProfessor3 жыл бұрын
Thank you! 😊
@Ghasforing24 жыл бұрын
This was a lucid and complete discussion on Hyperparameters tuning. Thanks for the sharing, Professor.
@DataProfessor4 жыл бұрын
Thank you for watching and glad it was helpful 😊
@ajifyusuf76244 жыл бұрын
This video, I think, is one of the best for explanation tuning hyperparameter
@DataProfessor4 жыл бұрын
Thanks for the kind words 😊
@MarsLanding914 жыл бұрын
Superb video. Very Insightful. Question - How are you picking the numbers for the parameters? max_features_range = np.arange(1,6,1) - why did you decide to start at 1 and end at 6? Why are you incrementing by 1 and not by 2, for example? Would love to hear your thoughts on this.
@aiuslocutius97582 жыл бұрын
Thank you for explaining this concept in an easy-to-understand manner.
@DataProfessor2 жыл бұрын
You're very welcome!
@CatBlack013 жыл бұрын
Clear explanation and presentation. Love the analogies and error fixing.
@DataProfessor3 жыл бұрын
Much appreciated! Glad to hear!
@WaliSayed4 ай бұрын
Very clear and details are explained in simple way. Thank you!
@JBhavani7772 жыл бұрын
while its too late for watching but worth it sir, thank you so much for the Gem...keep teaching ; very elaborative explaination
@jorge18694 жыл бұрын
Hello Dr!!!, I have read many of your works because alternatively I have a line of research related to the development of tools based on machine learning, mainly prediction of peptides with different activities. Currently, I use Python to develop and of course publish my papers, currently I am also learning R because I have noticed this language has good libraries to calculate molecular descriptors, for instance "Protr". I would appreciate a video tutorial explaining key steps such as data separation, training and cross-validation and testing with R using the "CARET" library, of course if possible. Greeting and success for this awesome youtube channel!
@DataProfessor4 жыл бұрын
Thanks JF for the comment and for reading my research work. How did you discover this KZbin channel? (so that I can use this information to better promote the channel) Yes, we also use protr package in R for some of our peptide/protein QSAR work. In that case, I might make a video about calculating the descriptors of peptide/protein or even compounds in future videos. In the meantime, please check out the following video "Machine Learning in R: Building a Classification Model" as well as 13 other R video tutorials explaining the machine learning model building process in a step-by-step manner. kzbin.info/www/bejne/moPUpX-uj7uFq9k
@jorge18694 жыл бұрын
Dr. thank you so much for your reply. I discovered your channel here on youtube looking for machine learning tutorials in R, when you mentioned your name in one of your videos where you do an excellent lecture on drug discovery, quickly I look up your profile on researchgate and that's how I realized it was you.
@DataProfessor4 жыл бұрын
JF Thanks for the insights, it is very helpful.
@jgubash1003 жыл бұрын
Liked the contour plots, I'll have to try those too.
@donrachelteo94513 жыл бұрын
Yes indeed this is one of the best explanation on hyperparameter tuning. Just needed clarification: how do we decide the range of values to run in grid search? Hope you can also help do one video on Manual Tuning vs Auto Grid Search Tuning. Thanks 👍
@DataProfessor3 жыл бұрын
Thanks for the suggestion! I'll put it on my to do list.
@donrachelteo94513 жыл бұрын
@@DataProfessor thanks professor
@GeraldTalton2 жыл бұрын
Great video, always helps to see the visualization
@sudhakarsingha2833 жыл бұрын
This is a video with detail discussion on hyperparameter tuning.
@DataProfessor3 жыл бұрын
Thank you for watching!
@madhawagunathilake83042 жыл бұрын
Thank you Prof. for your very insightful and helpful lecture!!
@geoffreyanderson47193 жыл бұрын
A thought experiment: If the generating process continued a lot longer and made far more than 200 examples, what would this do to the tuned final model's predictions? I am talking about the model that was developed on the 200 examples. That is, what happens when it is tried on that new data? Keep in mind that sklearn's make_classification() by design produces noise only, no signal.
@geoffreyanderson47193 жыл бұрын
Thank you for making good content and that is what attracted me to the channel, Data Professor. I say the following only with constructive purpose. There is no signal to find in a random dataset like that sampled by make_classification. Is this correct? Thus the RF is fitting itself to noise only. It's using completely spurious assocations. You would prefer to avoid fitting to noise components in real life as much as possible. Fitting to noise is pure variance error.
@dearcharlyn3 жыл бұрын
Another amazing tutorial, well explained and comprehensible! Thank you data professor! I am currently working on COVID-19 predictor models. :)
@DataProfessor3 жыл бұрын
Thanks! Appreciate the kind words!
@josiel.delgadillo2 жыл бұрын
How do you use gridsearchcv with a custom estimator? I can’t seem to make it work.
@hejarshahabi1144 жыл бұрын
thanks for your video. I also have a question regarding max features that you mentioned "11:48". by max features what do you mean? do you mean the maximum independent elements like x1,x2,...xn.
@DataProfessor4 жыл бұрын
Thanks for watching! Yes, if max_features == all features . The max_features is a parameter that scikit learn uses to determine how many features to use in performing the node split. More details provided here scikit-learn.org/stable/modules/ensemble.html#random-forest-parameters
@hejarshahabi1144 жыл бұрын
@@DataProfessor thank you very much for your quick response, please keep making videos on such topics, you are doing great and I've learnt many things from your channel. BIG LIKE
@DataProfessor4 жыл бұрын
@@hejarshahabi114 Thanks, and greatly appreciate the support 😊
@amiralx883 жыл бұрын
Really nice and clean code I've learned a lot from your video how to optimize mine. Thanks
@infinitygeospatial19722 жыл бұрын
Great video. Very Explanatory. Thank you
@joseluisbeltramone5992 жыл бұрын
Fantastic explanation, Sir (as always). Thank you very much!
@DataProfessor2 жыл бұрын
You are very welcome
@sofluzik4 жыл бұрын
lovely . how relevant is confusion and classification report , and AUC score , ROC with score mentioned above.
@DataProfessor4 жыл бұрын
Hi Rajaram, this article does a good job in providing a detailed distinction of the various metrics for classification neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc
@sofluzik4 жыл бұрын
@@DataProfessor thank you sir
@muskanmishra6625 Жыл бұрын
very well explained thank you so much🙂
@gabrielcornejo22062 жыл бұрын
Great tutorial, thank you very much. I have a question. How I could know which are best 3 features to used to built de best model with 140 n_estimators???
@nibrad97125 ай бұрын
Why did you choose the max feature as 5 while the n estimator to be 200? More specifically, how do I choose these params?
@cahayasatu92014 жыл бұрын
Thank for a great tutorial. May I know how to see/identify what are the 2 features that produces the best accuracy?
@DataProfessor4 жыл бұрын
Hi, if using random forest, the feature importance plot will allow us to see which features contributed the most to the prediction. The shap library also adds this capability to any ML algorithm.
@limzijian982 жыл бұрын
Hi, just wanted to ask , how do you determine the number of n_estimates for a record size of 2mill ?
@budoorsalem11683 жыл бұрын
Thank you for your great video , have you done in hyperparameters tuning for different algorithm like decision tree, Ann, GBR?
@DataProfessor3 жыл бұрын
The first step is to figure out which hyperparameters you want to optimize. You can do that by going to the API Documentations and look for the algorithm function that you want to use and see which hyperparameters there are and adapt accordingly as shown in this video. For example, in Random Forest, the 2 hyperparameters that we choose for optimization is max_features and n_estimators. For example, for ANN, you may choose to optimize the learning rate, momentum and number of nodes in the hidden layer, etc.
@budoorsalem11683 жыл бұрын
@@DataProfessor thank you so much, this is really helped me
@eyimofepinnick3 жыл бұрын
Nice tutorial, so now that I've done all this, hoe can i apply the model, like now use what we've done to predict the X_test data or predict the data if we create an API
@DM-py7pj2 жыл бұрын
Is it not important to know which features when GridSearch tells you the optimal number of features? And what then when, over different runs, you get different n_features?
@kailee34912 жыл бұрын
where can i find the environmental requirements?
@bryanchambers19642 жыл бұрын
I have a very large dataset. 356 columns, I reduced it to 75 using PCA and retained 99.8% variance. I did a clustering model and it works outstanding, I identified 3 clusters out of 8 in which potential customers belong. But my machine learning model is garbage. ROC-AUC score of barely greater than 0.5. I am surprised because if the cluster model works very well than shouldn't the machine model work well? I was wondering if you had any suggestions?
@DanielRong7952 жыл бұрын
may I ask what's ROC-AUC?
@AbhishekSingh-vl1dp Жыл бұрын
How we will decide how much to split data into train set and into the test set ??
@budoorsalem83783 жыл бұрын
thank you so much Professor for this good information, it helped a lot, I wondering if we can do the hyper tuning parameter in random forest regression for continuous data
@DataProfessor3 жыл бұрын
Hi, by continuous data are you referring to the Y variable? If so, then the answer would be yes.
@budoorsalem11683 жыл бұрын
@@DataProfessor yes the target dependent variable is not categorical.. it is different numbers
@DataProfessor3 жыл бұрын
@@budoorsalem1168 Hyperparameter tuning can be performed for both categorical and numerical Y variables (classification and regression, respectively).
@budoorsalem11683 жыл бұрын
@@DataProfessor ok thank you so much
@dennislam1501 Жыл бұрын
what is minimum sample size for decent tuning? 10000? 1000? 100000? data rows i mean
@dreamphoenix2 жыл бұрын
Thank you
@張稚辰3 жыл бұрын
Awesome video thanks
@DataProfessor3 жыл бұрын
Thank you
@SyedZion3 жыл бұрын
Can you please explain the same concept with RandomizedSearch?
@isaacvergara67923 жыл бұрын
Awesome video!
@guoqiang72154 жыл бұрын
I am working on spam mail data set and now try to make hyperparameter tuning to the model
@DataProfessor4 жыл бұрын
Thanks for sharing, sounds like an interesting project.
@franklintello97022 жыл бұрын
I am still trying to find one with real data, because all this automatic generated are hard to apply sometimes.
@levithanprimal24103 жыл бұрын
How am I watching this for free? Thanks Professor!
@DataProfessor3 жыл бұрын
Glad it was helpful and yes we have free data science contents here, would appreciate if you share with a friend or two 😆
@MinhHua-zu2pl7 ай бұрын
please make screen font bigger thank you
@shivamkrathghara33403 жыл бұрын
why 81k ? it should be more than 810k Thankyu professor