Mastering Hyperparameter Tuning with Optuna: Boost Your Machine Learning Models!

Рет қаралды 11,264

Күн бұрын

In this comprehensive tutorial, we delve deep into the world of hyperparameter tuning using Optuna, a powerful Python library for optimizing machine learning models. Whether you're a data scientist, machine learning enthusiast, or just looking to improve your model's performance, this video is packed with valuable insights and practical tips to help you harness the full potential of Optuna.
Interested in discussing a Data or AI project? Feel free to reach out via email or simply complete the contact form on my website.
Grab the code: ryannolandata.com/optuna-hype...
📧 Email: ryannolandata@gmail.com
🌐 Website & Blog: ryannolandata.com/
🍿 WATCH NEXT
Scikit-Learn and Machine Learning Playlist: • Scikit-Learn Tutorials...
Vid 1:
Vid 2:
Vid 3:
MY OTHER SOCIALS:
👨‍💻 LinkedIn: / ryan-p-nolan
🐦 Twitter: / ryannolan_
⚙️ GitHub: github.com/RyanNolanData
🖥️ Discord: / discord
📚 *Practice SQL & Python Interview Questions: stratascratch.com/?via=ryan
WHO AM I?
As a full-time data analyst/scientist at a fintech company specializing in combating fraud within underwriting and risk, I've transitioned from my background in Electrical Engineering to pursue my true passion: data. In this dynamic field, I've discovered a profound interest in leveraging data analytics to address complex challenges in the financial sector.
This KZbin channel serves as both a platform for sharing knowledge and a personal journey of continuous learning. With a commitment to growth, I aim to expand my skill set by publishing 2 to 3 new videos each week, delving into various aspects of data analytics/science and Artificial Intelligence. Join me on this exciting journey as we explore the endless possibilities of data together.
*This is an affiliate program. I may receive a small portion of the final sale at no extra cost to you.

Пікірлер: 38

@RyanNolanData Күн бұрын

Want to grab the code? I have an article here: ryannolandata.com/optuna-hyperparameter-tuning/

@neilansh 5 ай бұрын

I found about Optuna while working on a Kaggle Competition. This video will help me a lot in Kaggle Competitions. Thanks a lot Ryan 👍💯

@RyanNolanData 5 ай бұрын

No problem, that’s where I found it also

@richardgibson1872 5 ай бұрын

you mean you just copy pasted someone else's code in the code tab right? stop the BS

@RyanNolanData 5 ай бұрын

@@richardgibson1872 ? I have two screens and prep the code for each video

@harishgehlot__ 3 ай бұрын

@@RyanNolanData Hi, actually I am facing issue regarding no space left on device while tuning for prophet model like below OSError: [Errno 28] No space left on device: '/tmp/tmp_m3laolf/prophet_model7a78vylx' There is some internal tmp folder which is filling up, can you help a little if possible ?

@ritamchatterjee8785 7 ай бұрын

can't see the right part of the code some terms are not understandable, please provide the github link else make your recording screen larger btw great videos

@RyanNolanData 7 ай бұрын

Thanks for the feedback. I plan on doing a bulk upload of video files to GitHub in the near future!

@ruihanli7241 3 ай бұрын

Great video! Could explain more about the hyperparameter importance here and what kind of insights can you learn from it.

@RyanNolanData 3 ай бұрын

Thanks and not sure if I did on my other hyper parameter tuning video

@GHOOSTHUN 4 ай бұрын

Great content thanks

@RyanNolanData 4 ай бұрын

Thank you

@deepsuchak.09 7 ай бұрын

Hey man following you since a while.. Big fan!

@RyanNolanData 7 ай бұрын

Thank you! I have a new project video coming out soon. But behind for next weeks videos

@masplacasmaschicas6155 4 ай бұрын

I gotta give optuna a try I usually just use a gridsearchcv or randomsearchcv for hyper parameter tuning

@RyanNolanData 4 ай бұрын

That’s what I used in the past until I discovered optuna

@anatolyalekseev101 7 ай бұрын

Thank you for honest sharing of results! Could it be that train_test_split accidentally created split with unbalanced target? Another reason of getting worse OOS result I can think about is optimizing for mean CV scores without getting variance into account. And the 3rd one I suspect is missing sensitivity study. Like, we found the peak on the train set, but mb in its vicinity it had only valleys or even cliffs (we averaged across data splits but not across neighbor parameters)? And the last option is simple absense of early stopping: the last model can simply be overfit one. Going to recreate your example and find out )

@RyanNolanData 7 ай бұрын

Let me know I’m definitely curious. Didn’t really look into it too much as I wanted to share more of how to use optuna than to build a great model.

@anatolyalekseev101 7 ай бұрын

@@RyanNolanData My conclusion is that what we observed is due to underlying estimator (RandomForest) intrinsic random behavior and small dataset size (under 300 records). Apparently cv=5 is not enough to compensate for that. I wrote a function to fit a forest (without random seed) ntimes on your exact train/test split, and it seems that only after ntimes=50 average scores stop jumping back and forth more than 1 percent. So, while all I said above might hold (I checked only train/test data distributions, there is a skew in histograms, but not a terrible one), solutions for this demo could be: 1) as a quick solution for the demo, using fixed random seed of the estimator inside objective() and final function, too (not only for the reference model). you did a good job specifying random seed for the train_test_split and the first model, I always keep forgetting that. But you still missed providing seeds to the cv (implicit None was used) and the model inside of the objective func, so there was no full reproducibility unfortunately. 2) more realistic for production. using, say, cv=ShuffleSplit(test_size=0.5,n_splits=50). Takes longer but you get much less risk of getting worse pointwise OOS estimates like in the video :-) A few more notes. 1) Tuning n_estimators for the forests has little sense as higher values will always give better (at least, not worse) scores. Adding n_estimators to the tuning set can make sense if you adjust pure ML scores by the runtime though. 2) Choosing RandomSampler for the 1st demo is weird, as default TPE sampler is the one who's intelligent, what's the reason for using something better than sklearn's RandomSearch in a first place. I would expect Random Vs TPE to be always worse, on avg. 3) if you are not gonna modify found best params, instantiating like final_model=RandomForestRegressor(**study.best_params) is a cleaner approach and you won't forget any params

@anatolyalekseev101 7 ай бұрын

@@RyanNolanData It's funny. I repeated optimization with your seeds and sampler, but used cv=ShuffleSplit(test_size=0.5,n_splits=50). It took 30 mins but params {'n_estimators': 940, 'max_depth': 45, 'min_samples_split': 3, 'min_samples_leaf': 1} still won! So the histograms skew between train/test remains the most probable reason now, others 2 not excluding (cliffs/no smoothness in the obj function near the best params, and no early stopping).

@becavas 3 ай бұрын

Good Work. Why Optuna is better that gridsearchcv or randomsearchcv?

@RyanNolanData 3 ай бұрын

It tries to find the exact best answer. Use whatever works best for you but in Kaggle comps a lot of people use optuna

@amirrezaroohbakhsh4578 5 ай бұрын

one of a kind .

@RyanNolanData 5 ай бұрын

Thanks

@hosseiniphysics8346 3 ай бұрын

tnx a lot

@RyanNolanData 3 ай бұрын

No problem

@vishnukp6470 7 ай бұрын

where can i get the code files

@RyanNolanData 7 ай бұрын

I plan on doing a dump on GitHub of all my files in the future so stay tuned!

@introvertwhiz-ll2ip Ай бұрын

sir i don't know why you didn't share the code after making the learning project .Please share sir. q1) At the end Of the video you didn't get the results. So how is the production code really looks like? Do you do any kind of hyperparameter tuning? Or you go on the basis of your knowledge of the different parameters and the intuitions that you have in order to have a much better hyperparameter tuning? So could you please share your knowledge with respect to the production level code? Sir.