Your videos are very informative! I love that you take the time to show the data first and explain what the variables are. And the fact that you explain the tidy functions and even repeat a bit of what you said in earlier videos is great! You use just the right amount of detail for me at least. Thank you.
@mygeorgyboy2 жыл бұрын
Very nice example. You show all the process, very illustrative. Thank you Julia
@mathteacher17294 жыл бұрын
Thank you so much for this video (and for all your videos). I've been using R for about two or three years and this was just the right amount of detail and exposition for me. Your workflow is clean and easy to follow, I like how you used the help function and your overall layout is nice to (console in the top right). I look forward to trying XGBoost on some data sets now! :)
@mehdi12703 жыл бұрын
Thank you so much Julia for all your tutorial videos. They are easy to follow and very informative.......just great! Please keep posting them. I hope you can find some time to post a video on neural network optimization with Keras in R. I can even start a petition for that. LOL
@BonifaceMakone4 жыл бұрын
These videos are super informative. Keep them coming. Thanks
@ecbytes4 жыл бұрын
Hi Julia, this was video was amazing and very informative! Would you be able to help us find resources for (or post a video about :) ) the math behind these models? I.e. gradient-descent for XGBoost models. Thank you very much for posting these videos! I am learning a ton!
@davidjackson76754 жыл бұрын
I always learn something from your videos.
@nikolaostziokas68474 ай бұрын
Great work, please keep it up! As an idea for another video, it would be nice to use the package bonsai to train a lightgbm regression model and perhaps show the differences against an XGB and a RF model.
@alanjiang29303 жыл бұрын
Watched more than half of your videos within one week. Don't even want to blink! Saw you plotted XGB importance - wonder if there is tidymodel way to plot SHAP values from XGB. Thanks, Julia!
@JuliaSilge3 жыл бұрын
If you are only doing xgboost, you might try the SHAPforxgboost package: cran.r-project.org/package=SHAPforxgboost (it takes a bit of munging the model to get it to work with that package) For modeling in general, I like DALEX for explainability, which also supports tidymodels: modeloriented.github.io/DALEXtra/reference/explain_tidymodels.html We have a chapter in process on explainability in our upcoming book, so keep your eyes out for that: www.tmwr.org/
@alanjiang29303 жыл бұрын
@@JuliaSilge Got it. Thanks for the direction! Again, amazing video series! Really really tidy.
@flachboard843 жыл бұрын
Very helpful video! I look forward to following this example in a future project!
@talitabac3 жыл бұрын
Amazing video, super clear! Thank you, Julia!
@gkuleck Жыл бұрын
Hi Julia! Great video. Have you done a video on multiclass classification? I am struggling to find guidance for this type with text classification. Thanks!!
@JuliaSilge Жыл бұрын
Check out these two: - juliasilge.com/blog/nber-papers/ - juliasilge.com/blog/multinomial-volcano-eruptions/
@gkuleck Жыл бұрын
Thank you!
@haraldurkarlsson11473 жыл бұрын
Very nice presentation of xgboost by the way.
@JerryWho49 Жыл бұрын
Great video, thanks. But I’ve got a question. Say, my local computer is too small to fit a model fast enough. How would I train a model in the cloud? Do you have any best practices?
@JuliaSilge Жыл бұрын
One of the easiest ways to go is to use RStudio on SageMaker: posit.co/blog/getting-started-rstudio-sagemaker/
@haraldurkarlsson11473 жыл бұрын
Julia, I ran this model on a new mac mini and it produced results in about 7 minutes. Much faster than my old mac which I desktop did not dare run it on.
@PA_hunter3 жыл бұрын
Similar time here
@haraldurkarlsson11473 жыл бұрын
I should mention that the mini ran this quietly and I heard no noise from an overworked. The unit is also cool to the touch.
@luisfernandocuestasanchez43433 жыл бұрын
You are the most amazing person I've ever come across Thanks a lot Blessings =)
@angvl8793 Жыл бұрын
Hi Julia ! Great video as always :) ! Can i ask you something please? At around 34.08 if we don't want to use the xgb_grid you are using and we use in the tune_grid() function, something else for the grid parameter, let's say grid = 50 is this ok ? I mean generally is it ok to use grid equal a number ? Thank you very much !
@JuliaSilge Жыл бұрын
Yes, that argument can take a couple of different kinds of values, either a dataframe or an integer value: tune.tidymodels.org/reference/tune_grid.html You can read a bit more about this here: www.tmwr.org/grid-search.html#evaluating-grid
@angvl8793 Жыл бұрын
@@JuliaSilge Thank you again ! :) .
@lucaskramer4384 жыл бұрын
Great explanation, but i have one question: When you call last_fit() you make use of your split object. In my particular case i only was provided with the train and test test initially, so that i dont have a split object. Is there any way to call last_fit() nevertheless? Thanks!
@JuliaSilge4 жыл бұрын
You can't call last_fit() directly if you don't have the split, but you *can* manually do what it is a wrapper for, which is train one last time on the training data and then evaluate one last time on the testing data.
@faiazrummankhan55893 жыл бұрын
All your videos are such a great learning resource for real world EDA and modelling. I was just wondering what theme you are using in rstudio ?
@JuliaSilge3 жыл бұрын
It's one of the themes available via rsthemes: www.garrickadenbuie.com/project/rsthemes/
@JoseAyerdis4 жыл бұрын
If you get a RStudio crash related to Initializing libomp.dylib, but found libomp.dylib already initialized. When using the final workflow and fit it. You can use a workaround on OSX Sys.setenv(KMP_DUPLICATE_LIB_OK = TRUE)
@geilin23944 жыл бұрын
These vids are great. Can we see a classification model with calibration curves, and then recalibrate it, within the tidymodels framework? How long did the hyperparameter tuning take here?
@badrGamer113 жыл бұрын
Always an amazing content thank you
@haraldurkarlsson11473 жыл бұрын
Julia, I was able to follow along and everything looked fine until the final roc_auc curve. I get a mirror image of your curve. I have combed through the code and found nothing wrong. The confusion matrix outcome is similar to yours etc. It seems like a systematic error. I noticed when looked at the data that will generate the curve that indeed my numbers for specificity are somehow switched. While your table starts with specificity of 1 mine starts at zero so the value seem more like 1-specificity to begin with in my case. I am puzzled.
@JuliaSilge3 жыл бұрын
You can look at the first comment at the relevant blog post here: juliasilge.com/blog/xgboost-tune-volleyball/ Since I published this blog post, there was a change in yardstick in version 0.0.7: github.com/tidymodels/yardstick/blob/master/NEWS.md#yardstick-007 that changed how to choose which level (win or lose) is the "event". You can change this by using the `event_level` argument for functions like `roc_curve()`: yardstick.tidymodels.org/reference/roc_curve.html
@artathearta4 жыл бұрын
48:44 my autoplot was flipped along the X = Y axis, I wonder why.
@JuliaSilge4 жыл бұрын
It's because of a global change in how yardstick finds the "first" or base level event: juliasilge.com/blog/xgboost-tune-volleyball/#comment-5015180544
@YannC-p1q5 ай бұрын
Would be amazing if you do a video using nested data (instead of having a nominal variable, nest it and generate a model for each of the levels for example), also using the map_workflow etc.. great as always!
@deltax71598 ай бұрын
What appearance theme are you using here?
@JuliaSilge8 ай бұрын
I use one of the themes from rsthemes: www.garrickadenbuie.com/project/rsthemes/ I think Oceanic Plus? There are lots of nice ones available in that package.
@haraldurkarlsson11473 жыл бұрын
Julia, I do like Markdown but for testing out code I prefer R script simply because I make a lot of mistakes. So I am curious to know why you work in Markdown. Is it so because you have already written and debugged your code and would like to save the lesson in a nicer format?
@JuliaSilge3 жыл бұрын
No, I work in R Markdown regularly. In R I basically am either building package code or I am working in R Markdown. I'm a huge believer in the idea of "literate programming" as a real way to work. I make a lot of mistakes too, but I don't think that reduces the value of combining narrative and code in one document.
@haraldurkarlsson11473 жыл бұрын
I am working on setting up a class for students in my department and am quite torn on whether to go the Markdown or R script route. Since most of the class work will be around coding and simply learning how to R I am inclined to start with the regular setup (script) and then move on to Markdown later. Thanks.
@JuliaSilge3 жыл бұрын
@@haraldurkarlsson1147 The person I know who has thought the most about this is Mine Çetinkaya-Rundel; you can see one of her resources for teaching here: datasciencebox.org/ She recommends teaching R Markdown to emphasize reproducible analyses.
@haraldurkarlsson11473 жыл бұрын
I see. Thanks a lot for the tip.
@haraldurkarlsson11473 жыл бұрын
Julia, I will have a deeper dive into the datasciencebox. However, I will be teaching grad students that should have some inkling of what the basic statistics concepts are. Most have already worked with data, done some data processing, and generated tables and graphs. I would like to teach them R to simplify their lives and give them hopefully a new valuable skill for the current or future work. As grad students the science part is covered.
@raminziaei64114 жыл бұрын
Thanks a lot Julia. I really love your videos. Do you have any plans for making a video on neural network and tuning it in tidymodels? That would be awesome if possible. Please continue these videos. They are really great. Cheers
@Matthew-px9nu4 жыл бұрын
Julia thank you for these great videos keep it up ! Quick question once using last_fit if wanting to predict on NEW data what are the workflow steps ? Last_fit doesn’t really work on new data that wasn’t in the original split. Thank you !
@JuliaSilge4 жыл бұрын
Once you get to last_fit(), check out the objects that are inside of it. One of the columns contains a *fitted model* that can be used on new data. In fact, that fitted model is used on the testing data to compute the metrics!
@Matthew-px9nu4 жыл бұрын
@@JuliaSilge Thank you Julia! Last quick Q, noticed you always process the commands in console from the notebook Rmd, what button do you click to run in console instead of in the notebook?
@JuliaSilge4 жыл бұрын
@@Matthew-px9nu That's probably my most used keyboard shortcut! Ctrl+Shift+Enter for a chunk, Cmd+Enter for a line In RStudio, you can find them under Tools -> Keyboard Shortcuts Help, but there's just a handful that I use regularly.
@vincentpepe10644 жыл бұрын
@@JuliaSilge Hi Julia! Where do I exactly find this? The columns I have are splits, id, .metrics, .notes,. predictions, .workflow. I can't find the fitted model in .workflow either so I'm not sure where it is. Thanks!
@JuliaSilge4 жыл бұрын
@@vincentpepe1064 The .workflow is a *fitted* workflow at this point. For example, try tidying it or predicting on it. I show how to tidy it here: juliasilge.com/blog/palmer-penguins/
@amahoela7303 жыл бұрын
Does anyone know how you can save the workflow for later use? I have problems with it since it is not of format 'xgb.booster', whereas using the function saveRDS might result in compatibility issues in case of future package versions.
@pokelytics43524 жыл бұрын
Great video! Is there any difference between “pivot_longer” and “gather”? They look identical to me, just with the arguments having different names, but want to make sure I’m not missing something.
@JuliaSilge4 жыл бұрын
You can read this blog post that introduced the pivot verbs: www.tidyverse.org/blog/2019/09/tidyr-1-0-0/
@pokelytics43524 жыл бұрын
Julia Silge oh awesome thanks!
@Simonsayztaga4 жыл бұрын
Do you have a course on tidymodels?? Video Course or Tutorials?
@JuliaSilge4 жыл бұрын
You can check out this interactive course on tidymodels: supervised-ml-course.netlify.app/
@artathearta4 жыл бұрын
@@JuliaSilge Amazing resource, thank you
@shamsulhoquekhan9332 жыл бұрын
Can someone tell me why we used sample_prop inside the search grid?
@JuliaSilge2 жыл бұрын
It's what proportion of the total available sample is used for modeling within one boosting iteration: dials.tidymodels.org/reference/trees.html#details
@edneideramalho2363 Жыл бұрын
You are the best!
@tamaraabzhandadze27123 жыл бұрын
Thank you for the great tutorial. I have been haivng a problem with a confusion matrix. namely, when i run the code " final_res_r %>% collect_predictions()%>% roc_curve(dependent_var, .pred_dependent_var)%>% autoplot()", i get the error Can't subset columns that don't exist. x Column `.pred_dependent_var` doesn't exist.. I can not understand how to solve the problem. What am i doing wrong?
@JuliaSilge3 жыл бұрын
Hmmmm, do you see the column with the predicted class probability in it, after you run `collect_predictions()`? You can check out the documentation for `roc_curve()` here: yardstick.tidymodels.org/reference/roc_curve.html And if you continue to have trouble, I recommend creating a reprex and posting it on RStudio Community: rstd.io/tidymodels-community It's often easier to get help with coding problems in a format like that rather than comments.
@tamaraabzhandadze27123 жыл бұрын
@@JuliaSilge Dear Julia! Just amazing to read your response :). I have solved that problem :). however, another problem that I could not solve was related to the variable importance. I managed to create a figure but I can not get the actual values per variable. I tried to use varImp(model_name), xgb.importance(model = model_name). but getting just lovely red text around, without the results :)
@JuliaSilge3 жыл бұрын
@@tamaraabzhandadze2712 I typically use the vip package for variable importance, as I show in this blog post: juliasilge.com/blog/xgboost-tune-volleyball/
@tamaraabzhandadze27123 жыл бұрын
@@JuliaSilge thank you! I have actually posted the question there as well :) . I read your answer and got the results :). I just really have to decide now the cutoff coefficient for choosing some variables out of ten features. p.s. i did factor analyses as well, and could identify 3 variables with good loading, but there it was a bit easier as there are cutoffs for loading :). For XGboost i have no idea what to do :)
@dudeadulto4 жыл бұрын
Hi im getting a warning-error: ! Fold01: model 1/20: The `x` argument of `as_tibble.matrix()` must have colum... Whentune_grid function runs... Found in a github issue, that it's related to "name reparing"... Do you have any idea if it really affects the results of the tunning process, or if thers a update/solution for it?
@JuliaSilge4 жыл бұрын
Hmmmm, do you want to make sure all your packages are updated? That sounds like a message from an older version of the packages. If you are still getting that warning, I recommend creating a reprex and posting on RStudio Community: community.rstudio.com/c/ml/15
@dudeadulto4 жыл бұрын
@@JuliaSilge After reading your responde, I did update all my packages, and the error still occurs, but the process seems to keep running. I will let it finish, and see if it affects the results of the tune_grid
@wecsleyprates32054 жыл бұрын
Hey Julia, congrats again: show up this error: xgb_res
@JuliaSilge4 жыл бұрын
You need to *install* xgboost, actually; you don't have the package installed: install.packages("xgboost")
@wecsleyprates32054 жыл бұрын
@@JuliaSilge yeah...but I don't know what is happening, when I try install the package xgboost gives a error telling me that the xgboost is not available for my R version. My R Studio is the currently version.
@JuliaSilge4 жыл бұрын
@@wecsleyprates3205 Ah, a classic problem that folks run into when things get borked! Check out this SO question + answers: stackoverflow.com/questions/25721884/how-should-i-deal-with-package-xxx-is-not-available-for-r-version-x-y-z-wa
@wecsleyprates32054 жыл бұрын
Thanks @@JuliaSilge...Do you know what means the error below? Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘predict’ for signature ‘"xgb.Booster"’
@JuliaSilge4 жыл бұрын
@@wecsleyprates3205 That sounds like xgboost still isn't getting loaded correctly to me. Could you try creating a reprex showing your problem and posting on RStudio Community? rstd.io/tidymodels-community
@hansmeiser60783 жыл бұрын
Is .pred_win = .pred_class ?
@JuliaSilge3 жыл бұрын
No, .pred_win should be a class probability (like a number) and .pred_class should be the predicted class (like the factor level).