Hyperparameter Tuning of Machine Learning Model in Python

  Рет қаралды 45,222

Data Professor

Data Professor

Күн бұрын

Пікірлер: 78
@AI_Boy99
@AI_Boy99 Жыл бұрын
Wow, this was amazing. I'm working on machine learning models to diagnose early leackage of valves in piston diaphragm pumps. Thanks Chanin. Really love your videos.
@aimenbaig6201
@aimenbaig6201 3 жыл бұрын
i love your calm teaching style! it's relaxing
@DataProfessor
@DataProfessor 3 жыл бұрын
Thank you! 😊
@Ghasforing2
@Ghasforing2 4 жыл бұрын
This was a lucid and complete discussion on Hyperparameters tuning. Thanks for the sharing, Professor.
@DataProfessor
@DataProfessor 4 жыл бұрын
Thank you for watching and glad it was helpful 😊
@ajifyusuf7624
@ajifyusuf7624 4 жыл бұрын
This video, I think, is one of the best for explanation tuning hyperparameter
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for the kind words 😊
@MarsLanding91
@MarsLanding91 4 жыл бұрын
Superb video. Very Insightful. Question - How are you picking the numbers for the parameters? max_features_range = np.arange(1,6,1) - why did you decide to start at 1 and end at 6? Why are you incrementing by 1 and not by 2, for example? Would love to hear your thoughts on this.
@aiuslocutius9758
@aiuslocutius9758 2 жыл бұрын
Thank you for explaining this concept in an easy-to-understand manner.
@DataProfessor
@DataProfessor 2 жыл бұрын
You're very welcome!
@CatBlack01
@CatBlack01 3 жыл бұрын
Clear explanation and presentation. Love the analogies and error fixing.
@DataProfessor
@DataProfessor 3 жыл бұрын
Much appreciated! Glad to hear!
@WaliSayed
@WaliSayed 4 ай бұрын
Very clear and details are explained in simple way. Thank you!
@JBhavani777
@JBhavani777 2 жыл бұрын
while its too late for watching but worth it sir, thank you so much for the Gem...keep teaching ; very elaborative explaination
@jorge1869
@jorge1869 4 жыл бұрын
Hello Dr!!!, I have read many of your works because alternatively I have a line of research related to the development of tools based on machine learning, mainly prediction of peptides with different activities. Currently, I use Python to develop and of course publish my papers, currently I am also learning R because I have noticed this language has good libraries to calculate molecular descriptors, for instance "Protr". I would appreciate a video tutorial explaining key steps such as data separation, training and cross-validation and testing with R using the "CARET" library, of course if possible. Greeting and success for this awesome youtube channel!
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks JF for the comment and for reading my research work. How did you discover this KZbin channel? (so that I can use this information to better promote the channel) Yes, we also use protr package in R for some of our peptide/protein QSAR work. In that case, I might make a video about calculating the descriptors of peptide/protein or even compounds in future videos. In the meantime, please check out the following video "Machine Learning in R: Building a Classification Model" as well as 13 other R video tutorials explaining the machine learning model building process in a step-by-step manner. kzbin.info/www/bejne/moPUpX-uj7uFq9k
@jorge1869
@jorge1869 4 жыл бұрын
Dr. thank you so much for your reply. I discovered your channel here on youtube looking for machine learning tutorials in R, when you mentioned your name in one of your videos where you do an excellent lecture on drug discovery, quickly I look up your profile on researchgate and that's how I realized it was you.
@DataProfessor
@DataProfessor 4 жыл бұрын
JF Thanks for the insights, it is very helpful.
@jgubash100
@jgubash100 3 жыл бұрын
Liked the contour plots, I'll have to try those too.
@donrachelteo9451
@donrachelteo9451 3 жыл бұрын
Yes indeed this is one of the best explanation on hyperparameter tuning. Just needed clarification: how do we decide the range of values to run in grid search? Hope you can also help do one video on Manual Tuning vs Auto Grid Search Tuning. Thanks 👍
@DataProfessor
@DataProfessor 3 жыл бұрын
Thanks for the suggestion! I'll put it on my to do list.
@donrachelteo9451
@donrachelteo9451 3 жыл бұрын
@@DataProfessor thanks professor
@GeraldTalton
@GeraldTalton 2 жыл бұрын
Great video, always helps to see the visualization
@sudhakarsingha283
@sudhakarsingha283 3 жыл бұрын
This is a video with detail discussion on hyperparameter tuning.
@DataProfessor
@DataProfessor 3 жыл бұрын
Thank you for watching!
@madhawagunathilake8304
@madhawagunathilake8304 2 жыл бұрын
Thank you Prof. for your very insightful and helpful lecture!!
@geoffreyanderson4719
@geoffreyanderson4719 3 жыл бұрын
A thought experiment: If the generating process continued a lot longer and made far more than 200 examples, what would this do to the tuned final model's predictions? I am talking about the model that was developed on the 200 examples. That is, what happens when it is tried on that new data? Keep in mind that sklearn's make_classification() by design produces noise only, no signal.
@geoffreyanderson4719
@geoffreyanderson4719 3 жыл бұрын
Thank you for making good content and that is what attracted me to the channel, Data Professor. I say the following only with constructive purpose. There is no signal to find in a random dataset like that sampled by make_classification. Is this correct? Thus the RF is fitting itself to noise only. It's using completely spurious assocations. You would prefer to avoid fitting to noise components in real life as much as possible. Fitting to noise is pure variance error.
@dearcharlyn
@dearcharlyn 3 жыл бұрын
Another amazing tutorial, well explained and comprehensible! Thank you data professor! I am currently working on COVID-19 predictor models. :)
@DataProfessor
@DataProfessor 3 жыл бұрын
Thanks! Appreciate the kind words!
@josiel.delgadillo
@josiel.delgadillo 2 жыл бұрын
How do you use gridsearchcv with a custom estimator? I can’t seem to make it work.
@hejarshahabi114
@hejarshahabi114 4 жыл бұрын
thanks for your video. I also have a question regarding max features that you mentioned "11:48". by max features what do you mean? do you mean the maximum independent elements like x1,x2,...xn.
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for watching! Yes, if max_features == all features . The max_features is a parameter that scikit learn uses to determine how many features to use in performing the node split. More details provided here scikit-learn.org/stable/modules/ensemble.html#random-forest-parameters
@hejarshahabi114
@hejarshahabi114 4 жыл бұрын
@@DataProfessor thank you very much for your quick response, please keep making videos on such topics, you are doing great and I've learnt many things from your channel. BIG LIKE
@DataProfessor
@DataProfessor 4 жыл бұрын
@@hejarshahabi114 Thanks, and greatly appreciate the support 😊
@amiralx88
@amiralx88 3 жыл бұрын
Really nice and clean code I've learned a lot from your video how to optimize mine. Thanks
@infinitygeospatial1972
@infinitygeospatial1972 2 жыл бұрын
Great video. Very Explanatory. Thank you
@joseluisbeltramone599
@joseluisbeltramone599 2 жыл бұрын
Fantastic explanation, Sir (as always). Thank you very much!
@DataProfessor
@DataProfessor 2 жыл бұрын
You are very welcome
@sofluzik
@sofluzik 4 жыл бұрын
lovely . how relevant is confusion and classification report , and AUC score , ROC with score mentioned above.
@DataProfessor
@DataProfessor 4 жыл бұрын
Hi Rajaram, this article does a good job in providing a detailed distinction of the various metrics for classification neptune.ai/blog/f1-score-accuracy-roc-auc-pr-auc
@sofluzik
@sofluzik 4 жыл бұрын
@@DataProfessor thank you sir
@muskanmishra6625
@muskanmishra6625 Жыл бұрын
very well explained thank you so much🙂
@gabrielcornejo2206
@gabrielcornejo2206 2 жыл бұрын
Great tutorial, thank you very much. I have a question. How I could know which are best 3 features to used to built de best model with 140 n_estimators???
@nibrad9712
@nibrad9712 5 ай бұрын
Why did you choose the max feature as 5 while the n estimator to be 200? More specifically, how do I choose these params?
@cahayasatu9201
@cahayasatu9201 4 жыл бұрын
Thank for a great tutorial. May I know how to see/identify what are the 2 features that produces the best accuracy?
@DataProfessor
@DataProfessor 4 жыл бұрын
Hi, if using random forest, the feature importance plot will allow us to see which features contributed the most to the prediction. The shap library also adds this capability to any ML algorithm.
@limzijian98
@limzijian98 2 жыл бұрын
Hi, just wanted to ask , how do you determine the number of n_estimates for a record size of 2mill ?
@budoorsalem1168
@budoorsalem1168 3 жыл бұрын
Thank you for your great video , have you done in hyperparameters tuning for different algorithm like decision tree, Ann, GBR?
@DataProfessor
@DataProfessor 3 жыл бұрын
The first step is to figure out which hyperparameters you want to optimize. You can do that by going to the API Documentations and look for the algorithm function that you want to use and see which hyperparameters there are and adapt accordingly as shown in this video. For example, in Random Forest, the 2 hyperparameters that we choose for optimization is max_features and n_estimators. For example, for ANN, you may choose to optimize the learning rate, momentum and number of nodes in the hidden layer, etc.
@budoorsalem1168
@budoorsalem1168 3 жыл бұрын
@@DataProfessor thank you so much, this is really helped me
@eyimofepinnick
@eyimofepinnick 3 жыл бұрын
Nice tutorial, so now that I've done all this, hoe can i apply the model, like now use what we've done to predict the X_test data or predict the data if we create an API
@DM-py7pj
@DM-py7pj 2 жыл бұрын
Is it not important to know which features when GridSearch tells you the optimal number of features? And what then when, over different runs, you get different n_features?
@kailee3491
@kailee3491 2 жыл бұрын
where can i find the environmental requirements?
@bryanchambers1964
@bryanchambers1964 2 жыл бұрын
I have a very large dataset. 356 columns, I reduced it to 75 using PCA and retained 99.8% variance. I did a clustering model and it works outstanding, I identified 3 clusters out of 8 in which potential customers belong. But my machine learning model is garbage. ROC-AUC score of barely greater than 0.5. I am surprised because if the cluster model works very well than shouldn't the machine model work well? I was wondering if you had any suggestions?
@DanielRong795
@DanielRong795 2 жыл бұрын
may I ask what's ROC-AUC?
@AbhishekSingh-vl1dp
@AbhishekSingh-vl1dp Жыл бұрын
How we will decide how much to split data into train set and into the test set ??
@budoorsalem8378
@budoorsalem8378 3 жыл бұрын
thank you so much Professor for this good information, it helped a lot, I wondering if we can do the hyper tuning parameter in random forest regression for continuous data
@DataProfessor
@DataProfessor 3 жыл бұрын
Hi, by continuous data are you referring to the Y variable? If so, then the answer would be yes.
@budoorsalem1168
@budoorsalem1168 3 жыл бұрын
@@DataProfessor yes the target dependent variable is not categorical.. it is different numbers
@DataProfessor
@DataProfessor 3 жыл бұрын
@@budoorsalem1168 Hyperparameter tuning can be performed for both categorical and numerical Y variables (classification and regression, respectively).
@budoorsalem1168
@budoorsalem1168 3 жыл бұрын
@@DataProfessor ok thank you so much
@dennislam1501
@dennislam1501 Жыл бұрын
what is minimum sample size for decent tuning? 10000? 1000? 100000? data rows i mean
@dreamphoenix
@dreamphoenix 2 жыл бұрын
Thank you
@張稚辰
@張稚辰 3 жыл бұрын
Awesome video thanks
@DataProfessor
@DataProfessor 3 жыл бұрын
Thank you
@SyedZion
@SyedZion 3 жыл бұрын
Can you please explain the same concept with RandomizedSearch?
@isaacvergara6792
@isaacvergara6792 3 жыл бұрын
Awesome video!
@guoqiang7215
@guoqiang7215 4 жыл бұрын
I am working on spam mail data set and now try to make hyperparameter tuning to the model
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for sharing, sounds like an interesting project.
@franklintello9702
@franklintello9702 2 жыл бұрын
I am still trying to find one with real data, because all this automatic generated are hard to apply sometimes.
@levithanprimal2410
@levithanprimal2410 3 жыл бұрын
How am I watching this for free? Thanks Professor!
@DataProfessor
@DataProfessor 3 жыл бұрын
Glad it was helpful and yes we have free data science contents here, would appreciate if you share with a friend or two 😆
@MinhHua-zu2pl
@MinhHua-zu2pl 7 ай бұрын
please make screen font bigger thank you
@shivamkrathghara3340
@shivamkrathghara3340 3 жыл бұрын
why 81k ? it should be more than 810k Thankyu professor
@DataProfessor
@DataProfessor 3 жыл бұрын
Haha, thanks for the support!
@yingzisilver9085
@yingzisilver9085 Жыл бұрын
Thank you
Hyperparameter Tuning: How to Optimize Your Machine Learning Models!
52:32
Exploratory Data Analysis in Python using pandas
28:52
Data Professor
Рет қаралды 58 М.
How to stack machine learning models in Python
14:14
Data Professor
Рет қаралды 30 М.
Build your first machine learning model in Python
30:57
Data Professor
Рет қаралды 385 М.
Practical Introduction to Google Colab for Data Science
10:15
Data Professor
Рет қаралды 44 М.