Great stuff. Amazing R programming and data analysis skills!
3 жыл бұрын
5:12 your excitement for that distribution LOL
@IamHopeman472 жыл бұрын
Thanks for the great screencast! For me the little introduction to the finetune package was a big eye-opener. Very smart approach. Looking forward to more of your content (especially the advanced topics).
@whynotfandy3 жыл бұрын
I love the new lights! It's a great addition for your intros and outros.
@N1loon3 жыл бұрын
Amazing stuff... I really admire your knowledge in this super complex field. I think it would be cool, if you would do an episode that is mainly centered around all the nuances in tuning. I think this video offered a good general introduction of tuning principles but didn't go too much into all the details such as finding the right balance between over- and underfitting, operating with grids etc... Just an idea! Anyways, really love your content, Julia!
@JuliaSilge3 жыл бұрын
For now, you might check out some of my other blog posts that focus on hyperparameter tuning, like these: juliasilge.com/blog/scooby-doo/ juliasilge.com/blog/sf-trees-random-tuning/
@hnagaty2 жыл бұрын
Great screencast and very useful. Many thanks.
@seadhna3 жыл бұрын
Great video as always! Would it be possible to use native parsnip functions to clean the features instead of doing base string manipulation in a custom function? I think in another video you cleaned the tokens within the recipe step.
@JuliaSilge2 жыл бұрын
We definitely have tidymodels functions for lots of text manipulation, like those in textrecipes: textrecipes.tidymodels.org/ Or functions like: recipes.tidymodels.org/reference/step_dummy_multi_choice.html But sometimes there isn't something that fits your particular data out of the box, in which case you can extend tidymodels like I walked through in this screencast.
@pokelytics43523 жыл бұрын
Fantastic content as always! Quick question, what are your thoughts on training multiple models using the top parameters from HP tuning, and ensembling the predictions/is there an easy way to do something like this with tidymodels? Thanks!!!
@JuliaSilge3 жыл бұрын
Yep, absolutely a great approach to work toward a bit of performance gain! You can implement ensembling in tidymodels with the stacks package: stacks.tidymodels.org/
@pokelytics43523 жыл бұрын
@@JuliaSilge Great thanks so much
@codygoggin10972 жыл бұрын
Great video Julia! What would be the proper function to use in order to fit this best model onto new data and view these predictions?
@JuliaSilge2 жыл бұрын
You use the `predict()` function! workflows.tidymodels.org/reference/predict-workflow.html
@davidjackson76753 жыл бұрын
Thanks, that is interesting as always.
@d_b_ Жыл бұрын
Is one takeaway from 44:03 that I should create a short play game for older people with few players that has printed minatures based in deductive fantasy animal war?
@JuliaSilge Жыл бұрын
HA, I think so, yep!
@ammarparmr3 жыл бұрын
ThankU.. fantastic video If you don't mind, I have a question regarding "mtry".. how we ended up with mtry greater than 6( the number of all the predictors). maybe I am confused with the concept
@JuliaSilge3 жыл бұрын
After the feature engineering, there are a lot more predictors, 30 from the board game category alone. The data that goes into xgboost is the _processed_ data, not the data in its pre-feature-engineering form.
@ammarparmr3 жыл бұрын
@@JuliaSilge well explained Thank you so much
@avnavcgm3 жыл бұрын
Great video! What would then be the best way to 'save' the best trained model so you can predict new with observations in the future that aren't in the train/test split?
@JuliaSilge3 жыл бұрын
You can _extract_ the workflow from the trained "last fit" object and then save that as a binary, like with `readr::write_rds()`. I show some of that at the end of this blog post: juliasilge.com/blog/chocolate-ratings/
@jaredwsavage2 жыл бұрын
Great video Julia. Just a quick question. Have you tried using lightgbm or catboost with boost_trees? They are available in the treesnip package and generally run much faster than xgboost.
@JuliaSilge2 жыл бұрын
HA oh I have had SUCH installation issues with both of those. 🙈 I have a Mac M1 and you can see the current situation for catboost here: github.com/catboost/catboost/issues/1526 I'll have to dig up the lightgbm problems somewhere. Anyway, those are great implementations if you can get them to install!
@jaredwsavage2 жыл бұрын
@@JuliaSilge Wow, as a Windows user I'm usually the one on the wrong end of installation issues. 😁
@charithwijewardena94932 жыл бұрын
Hi Julia I have a question. I'm trying to get my head around the concept of data leakage. You build your model with the outcome being "average", but before you do your split you do EDA on everything. Are we not gaining insight into the test set by doing that? Should we be doing EDA only AFTER splitting our data? Thanks. :)
@JuliaSilge2 жыл бұрын
This is definitely an important thing to think and make good decisions about. On the one hand, anything you do before the initial split could lead to data leakage. On the other hand, you need to understand something about your data in order to even create that initial split (like stratified sampling). It's most important that anything that you will literally use in creating predictions (like feature engineering) is done after data splitting.
@charithwijewardena94932 жыл бұрын
Cool, thank you for the reply. 🙏🏽
@ndiyekebonye2082 жыл бұрын
Still getting an error at the tune_race_anova despite updating all my packages. Installing latest versions of dials and finetune. Is there a way to overcome this.
@JuliaSilge2 жыл бұрын
If you are having trouble with one of the racing functions, I recommend just trying to plain old `fit()` with your workflow, or perhaps using `tune_grid()`. Those functions will help you diagnose where the model failures are happening.
@ndiyekebonye2082 жыл бұрын
@@JuliaSilge thank you so much. Will surely try this
@russelllavery2281 Жыл бұрын
cannot read the fonts
@PA_hunter3 жыл бұрын
Would it be bad if I used tidymodels steps for non-ml data wrangling, haha!
@JuliaSilge3 жыл бұрын
I think some people do this for sure. Some things to keep in mind are that it is set up for learning from training data and applying to testing data, so I'd keep that design top of mind for using in other contexts. You can see this blog post where I used recipes for unsupervised work, without heading into a predictive model: juliasilge.com/blog/cocktail-recipes-umap/
@danmungai5553 жыл бұрын
Enlightening
@barasatu1233 жыл бұрын
Would u plz make video on funmodeling package? And it's different function?