Tune xgboost more efficiently with racing methods

Рет қаралды 7,515

Julia Silge

Күн бұрын

Пікірлер: 27

@MattRosinski 3 жыл бұрын

Thanks Julia! Love the inclusion of a linear model for imputing speed and angle with a linear model!

@mattm9069 3 жыл бұрын

Your blogs have helped me so much. Tidymodels for life!

@mkklindhardt 3 жыл бұрын

Thank you Julia! I have been waiting for this unknowingly for too long. Great pleasure to follow your videos and always very insightful! Congratulations with your new space :)

@alexandroskatsiferis 3 жыл бұрын

Another splendid screencast Julia!

@pabloormachea3404 3 жыл бұрын

Impressive! Thanks so much for the educational video - - it makes tidymodels very appealing!

@deannanuboshi1387 2 жыл бұрын

great video! Do you know how to get prediction or confidence interval in r? Thanks~~

@JuliaSilge 2 жыл бұрын

An algorithm like xgboost doesn't involve math that can produce one natively (unless I am mistaken) but you can use resampling to create those kinds of intervals: markjrieke.github.io/workboots/

@gkuleck Жыл бұрын

Hi Julia, Nice video on a topic that I find intrinsically interesting as a baseball AND tidy models fan. I did run into an error when executing the tune_race_anova. Error in `test_parameters_gls()`: ! There were no valid metrics for the ANOVA model. I am not sure how to fix this and I have been careful to follow the scripts. Any idea what might be causing the error?

@JuliaSilge Жыл бұрын

When you see an error like that, it usually means your models are not able to fit/train. If you ever run into trouble with a workflow set or racing method like this, I recommend trying to just plain _fit_ the workflow on your training data one time, or use plain old `tune_grid()`. You will likely get a better understanding of where the problems are cropping up.

@juliantagell1891 3 жыл бұрын

Cheers Julia, Great video! Have been wondering about xgboost a bit lately -in regards to using tidymodels vs using the underlying xgboost package directly, with xgb.train(). I've heard mention that xgb.train() has an "automatic stop", that limits the number of trees when no more improvement is detected. This seems pretty helpful (and a great processing-time saver) rather then having to pre-specify the number of trees used. But I'm certainly not a pro at xgboost, so was just wondering your opinion. I like that tidymodels can be applied to all models but was just wondering if, in doing so, this comes at a cost (for xgboost tuning, specifically)

@JuliaSilge 3 жыл бұрын

Yes, you can specify this (and even tune it to find the best value) in tidymodels. We call this early stopping parameter `stop_iter`: parsnip.tidymodels.org/reference/details_boost_tree_xgboost.html I used it in the last episode of SLICED I was on (with the Spotify dataset) if you want to watch that to see it in action, but I'll try to put together a tutorial/blog post demoing that soon.

@hansmeiser6078 3 жыл бұрын

Thank you Julia! I was asking me myself what would be the benefit. Can you tell us something about the advantages of tune_sim_anneal() too? And when it is better to fill param grid with a grid and not with an integer?

@JuliaSilge 3 жыл бұрын

When you use an integer, the tune package uses a space-filling design rather than a regular grid design for the possible parameters to try. You can read about these two kinds of grids here: www.tmwr.org/grid-search.html#grids We write a bit about iterative search with simulated annealing here: www.tmwr.org/iterative-search.html#simulated-annealing

@hansmeiser6078 3 жыл бұрын

@@JuliaSilge But when I fill grid param with grid_latin or max_entropy, this would be space filling too- or do I missunderstan this?

@hansmeiser6078 3 жыл бұрын

simulated annealing is hard tobac... hope you make a video of it.

@JuliaSilge 3 жыл бұрын

@@hansmeiser6078 Yes, that's right. If you put an integer, then it uses `grid_latin_hypercube()` to make a semi-random space-filling grid as a default: tune.tidymodels.org/reference/tune_grid.html#parameter-grids

@hansmeiser6078 3 жыл бұрын

@@JuliaSilge In a regression-case, what is better for tun_bayes(),tune_sim_anneal(),tune_race_anova()? To provide an external tuned grid (maybe grid_latin or grid_regula), or an integer, where is there the benefit. Could we avoid an averhead- or some redundance?

@AndreaDalseno 3 жыл бұрын

Thank you very much once more for your videos, Julia. Another question for you: is there a way to have a progress bar or something like that to monitor the tuning process (that may take a long time to run)?

@JuliaSilge 3 жыл бұрын

We don't have support for a progress bar due to how we use parallel workers (considering using the future package for this, though, which may open up other options) but you can set various `verbose` options in `control_race()` that may give you some of what you want: finetune.tidymodels.org/reference/control_race.html

@AndreaDalseno 3 жыл бұрын

@@JuliaSilge thank you very much for your kind reply. I tried to use control_grid(verbose = TRUE) in the RandomForest example, just before fitting the grid, but I couldn't get any output (with parallel processing). Can you kindly let me have an example? I will check the future package

@JuliaSilge 3 жыл бұрын

@@AndreaDalseno Ah I'm sorry I wasn't more clear; we are considering adding support for the future package which will likely allow for better progress messaging in the... future. I'm not sure if the `verbose` option will work right now. Here is an example to try: github.com/tidymodels/tune/issues/377

@AndreaDalseno 3 жыл бұрын

@@JuliaSilge thank you very much for your hint. I did: regular_res

@JuliaSilge 3 жыл бұрын

@@AndreaDalseno Yes, you can read more about the current status of how parallel processing works here: tune.tidymodels.org/articles/extras/optimizations.html#parallel-processing-1

@recordyao 3 жыл бұрын

Hi Julia. Great tutorial! I think it's a great time-saving solution for tuning random grid points. It would be awesome if tune_race_anova could work with tune_bayes, in that once random grids are selected from tune_race_anova, it could pass as "initial" into tune_bayes to fine-tune the best. But currently it does not work, as tune_race_anova only finishes one point that fits all folders and tune_bayes needs as least the same number as tuning parameters. Is there an way around? Again, great work! : )

@JuliaSilge 3 жыл бұрын

Ah no, this doesn't currently work as the infrastructure for tune_bayes() does currently expect all the tuning parameters to have been evaluated completely on resamples. You could post an issue on the repo asking if tune_bayes() could be changed to accept the subset and we could discuss it there: github.com/tidymodels/tune/issues

@recordyao 3 жыл бұрын

@@JuliaSilge Thanks for pointing to the right place. It'll be awesome if the two can be combined. But of course, it'll be a lot of works for developers. We users are taking things as granted haha.