Thank you for taking the time to make these videos! The have been an immense help in my R journey!
@alelust71704 жыл бұрын
Thanks, Julia You always bring some interesting library in your analysis!
@jennyhansen Жыл бұрын
Thank you, Julia. This was tremendously helpful for me!
@umber_wall8 күн бұрын
Thank you, learned so much!
@TURALOWEN2 жыл бұрын
usemodels package is magic!
@yangyang6008 Жыл бұрын
Hi Julia, thank you for the great tutorial! For the training set cross-validation, what is the difference between "bootstraps" and "vfold_cv"? Which method is more appropriate for training a machine learning model? Thank you.
@JuliaSilge Жыл бұрын
You can check out this chapter for the differences: www.tmwr.org/resampling.html#resampling-methods Also, this Cross Validated answer by Max tells you a bit about when you might choose one over the other: stats.stackexchange.com/a/18355/133241 If you have enough data, cross validation is usually the best bet.
@yangyang6008 Жыл бұрын
@@JuliaSilge Hi Julia, thank you very much for your explanation!
@JorgeThomasM Жыл бұрын
Hi @JuliaSilge ! Would be volume = height * width * depth a sort of interaction / new variable? Thanks so much for all these wonderful sessions.
@JuliaSilge Жыл бұрын
Yeah, for sure! We'd call that "feature engineering" because you are creating a custom feature from the original variables based on your domain knowledge of how furniture works. 😄
@seaniam4 жыл бұрын
Love these videos - thanks Julia!
@prod.kashkari30754 жыл бұрын
Thank god for your course and book, I was seriously struggling trying to learn tidymodels from the docs. One thing, in your course, do you want to maybe add how to use the “stacks” package for stacking models and building ensemble learners?
@MoCtheFirst2 жыл бұрын
When using 'predict()' in the end (24:49) i get the Error: "Workflow has not yet been trained. Do you need to call `fit()'? Any suggestions as to what went wrong? Thanks for all the input!
@JuliaSilge2 жыл бұрын
If you want to walk through the blog post to follow along, you can call `predict()` on the fitted workflow that is "insight" of `final_res`: juliasilge.com/blog/sf-trees-random-tuning/ You can check out my latest blog post for a more explicit example of how to do this: juliasilge.com/blog/chocolate-ratings/
@davidjackson76754 жыл бұрын
What about calculating the square inches of the top?
@MattBirch2 жыл бұрын
This is awesome. Thanks!
@artathearta4 жыл бұрын
10:43 why did vfold_cv give you small testing folds? 18:25 I got an error: ``` ... All models failed. See the `.notes` column. ... Warning message: This tuning result has notes. Example notes on model fitting include: preprocessor 1/1: Error in UseMethod("prep"): no applicable method for 'prep' applied to an object of class "c('step_clean_levels', 'step')" preprocessor 1/1: Error in UseMethod("prep"): no applicable method for 'prep' applied to an object of class "c('step_clean_levels', 'step')" preprocessor 1/1: Error in UseMethod("prep"): no applicable method for 'prep' applied to an object of class "c('step_clean_levels', 'step')" ``` I followed your steps exactly, and I even tried directly copying and pasting your code from your blog post. EDIT: I was able to fix this problem by commenting out the workflow() command and instead piping the recipe through prep() after step_knnimpute, and then setting up tune_grid to take in the model as its object and ranger_recipe as its preprocessor.
@JuliaSilge4 жыл бұрын
With that many folds on data this small, that's just how cross-validation works! You can read about bootstrap vs. cross-validation here: stats.stackexchange.com/questions/18348/differences-between-cross-validation-and-bootstrapping-to-estimate-the-predictio I forgot to mention in the video that `step_clean_levels()` is in the development version of textrecipes, so you'll need to install from GitHub to be able use that function: `devtools::install_github("tidymodels/textrecipes")`
@artathearta4 жыл бұрын
@@JuliaSilge I uninstalled the copy of textrecipes I got from CRAN and installed it from GitHub and now it doesn't work even if use prep() 😆 It's all good, I'll still follow along and hope it eventually works with my computer or put it on my macbook. Still great video! Excited to use {usemodels}!
@artathearta4 жыл бұрын
@@JuliaSilge Okay, while I was filling out the steps for submitting a bug on textrecipes, I discovered that it works with the workflow() object if I remove `doParallel::registerDoParallel()` before running `tune_grid`.
@JuliaSilge4 жыл бұрын
@@artathearta Hmmmm, can you make sure you have the most up-to-date version of tune from CRAN? That contains bug fixes for parallel processing on Windows.
@artathearta4 жыл бұрын
@@JuliaSilge tune_0.1.2. I can open a github issue if you'd like
@psxcl98172 жыл бұрын
Hello Julia! thanks for your video. I hope to whether can I obtain the importance values of each feature in vip package instead of plotting it. I did not find the relevant function in vip.
@JuliaSilge2 жыл бұрын
Do you mean you want to get the importance values as a dataframe, rather than a visualization? You can use `vi()` for that: koalaverse.github.io/vip/reference/vi.html
@seunghoonlee52752 жыл бұрын
Thank you so much Julia! It's a great video. I wonder whether I can use weight variable in random forest analysis (or in general tidymodel package). Could you recommend any materials?
@JuliaSilge2 жыл бұрын
Yes, this has been a focus of the tidymodels team this year! You can read more here: www.tidyverse.org/blog/2022/05/case-weights/ Since that post, much of the case weight work has been released to CRAN.
@seunghoonlee52752 жыл бұрын
@@JuliaSilge Thank you so much Julia! I will go over the link.
@ROCK9624 жыл бұрын
Hi Julia! Thank you for your awesome tutorials. I am trying to replicate the Palmer Penguin´s episode, but I am having a problem with the bootstraping step. When I run the bootstraps function from rsample, R is creating empty splits. Do you know what could be the issue?
@JuliaSilge4 жыл бұрын
Wow, no, I haven't seen that before. Can you work on creating a reprex: www.tidyverse.org/help/ And then posting the problem on RStudio Community? rstd.io/tidymodels-community
@mkklindhardt3 жыл бұрын
Hi Julia, Once again thank you for your amazing videos and your great enthusiasm. I have some question. 1) Why do you use knn imputation? You did not really explain why you did not go for linea or mean imputation mode. 2) Can usemodels also be used to prepare my data (recipe, workflow, prep etc) for a linear mixed model? Ultimately I would like to use the same data setup for comparing different regression models, such as; linear mixed models (stepwise AIC regression), kNN regression and Random Forest regression as well as XGBoost. Is it possible to have the same data setup for all my models? I guess that's needed when comparing model performance and evaluate models? Or am I wrong? Thank you
@JuliaSilge3 жыл бұрын
Choosing nearest neighbors for imputation over something like linear imputation or just a single value (mean/median) is similar to making that choice for modeling overall; it lets you use nonlinear, more complex relationships in the data for the imputation. I think this paper is a pretty nice discussion: www.ncbi.nlm.nih.gov/pmc/articles/PMC4959387/ You can see the models that are currently supported in usemodels here: usemodels.tidymodels.org/reference/index.html If you are interested in comparing quite a number of models, you might check out using the tidyposterior package, as described in this chapter: www.tmwr.org/compare.html
@mkklindhardt3 жыл бұрын
Appreciated @@JuliaSilge! Is it "fair" to compare linear regression models with machine learning regression models? 1) are there then specific areas, generally, that one needs to be aware of when comparing linear mixed models with machine learning models (e.g. random forest, XGBoost and kNN)? Such as changes in predictor variables, continuous vs. factor for variables, etc? 2) are there tidymodels ways I can deal with or prevent collinearity and high correlation between variables before I perform the linear regression modelling? Perhaps like an AIC stepwise regression? Is that the same as the vip() function? But then my predictors for the linear model will change compared to the ones in the ML regression modelling, right? Sorry for the many questions.. Hope they are somehow clear. Hope you had a good weekend Julia. Your help is precious to me! Best regards , Kamau
@JuliaSilge3 жыл бұрын
@@mkklindhardt Yep, there is nothing wrong with comparing linear models with models that can account for more complex, non-linear behavior. If you are thinking about comparing models, I recommend reading in detail this section, as well as Chapters 10 and 11: www.tmwr.org/software-modeling.html#model-types In tidymodels, we have preprocessing steps to filter out variables that are highly correlated or a linear combination of each other: recipes.tidymodels.org/reference/index.html#section-step-functions-filters We don't recommend using stepwise regression, for the reasons outlined here: www.stata.com/support/faqs/statistics/stepwise-regression-problems/ More on that here: stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection/20856#20856
@panagiotischionas58284 жыл бұрын
Hi Julia really love your work. A quick question: since you take the log of price as input to your model, if you want to show the actual price predicted by the model, how would you do that?
@JuliaSilge4 жыл бұрын
I used log10(), so you can do 10^price to get it back. 👍
@JamesLee14 жыл бұрын
Hello Julia, thanks for the video. I'm a big fan. Could you please let me know how to make html/notebook outputs from Rmarkdown better looking? When I use your tidytuesday rmd file from your github, the resulting html file has the default singled spaced very small calibre font text. But your website has ~1.5 spaced big custom font that's pretty. If I don't intend to publish my html report on github or online because my work data is sensitive. Could I still make html outputs to have the same formatting as your website? Wowchemy - Academic theme is only for publishing online through github? I would like to change html text formatting locally.
@JuliaSilge4 жыл бұрын
My website uses Hugo and I'm sure you don't want to get that set up just for individual reports. Instead, take a look at some of the styling options you have for HTML reports. There are built-in options using Bootswatch: bookdown.org/yihui/rmarkdown/html-document.html#appearance-and-style Or other contributed formats like html_pretty and html_clean: rmarkdown.rstudio.com/formats.html
@shahidraza55713 жыл бұрын
can you provide me some source where i can learn random forest algorithm for predicting groundwater contamination map due to fluoride using r studio along with Q GIS?
@sjrigatti4 жыл бұрын
Hi. This is great. I work with survival data a lot and I was wondering how an analysis like this would differ with a survival object as the outcome. Is it just a matter of changing the mode of the ranger fit?
@JuliaSilge4 жыл бұрын
No, actually, we still have a bit of work to do for survival models. We have some notes sketched out here: github.com/tidymodels/planning/tree/master/survival-analysis And there are some proof of concepts floating around in various repos. This is something we will work more on in 2021, so look for survival support next year!
@sjrigatti4 жыл бұрын
@@JuliaSilge this seems like something Dr. Harrell at Vanderbilt would be interested in working on. Has he contributed anything at this point?
@JuliaSilge4 жыл бұрын
@@sjrigatti Not at this point, but an interesting idea!
@mindlessgreen4 жыл бұрын
What does it mean to stratify by a continuous variable? And why?
@JuliaSilge4 жыл бұрын
You can read a little more about that here: www.tmwr.org/splitting.html#splitting-methods
@mindlessgreen4 жыл бұрын
@@JuliaSilge Thank you! You also mentioned in another video about stratifying by quantiles. This is very helpful. Thank you so much.