Effective Resampling for Machine Learning in Tidymodels {rsample} R package reviews

  Рет қаралды 4,778

yuzaR Data Science

yuzaR Data Science

Күн бұрын

Пікірлер: 48
@hansmeiser6078
@hansmeiser6078 Жыл бұрын
I am watching R-tutorials since 7 years now. Your visual representations and explanations are the *best,* I've ever seen- so far.
@chacmool2581
@chacmool2581 Жыл бұрын
Wow. Precocious child. When I was seven, I watched cartoons. 🤡
@hansmeiser6078
@hansmeiser6078 Жыл бұрын
@@chacmool2581 Not since I was seven years old 🙂
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Wow, thank you, Hans! That means the world to me!
@517127
@517127 Жыл бұрын
I must agree. Even paid courses doesn't show this quality.
@muhammedhadedy4570
@muhammedhadedy4570 6 ай бұрын
What I love the most about this KZbin channel, is that the quality of the free tutorials is much better than many paid ones. I must admit, that you have a talent in illustrating such a complex topic into very easy method. A true professor you are. I really don't know how to thank you. I would be so grateful if you create a tutorial of machine learning using tidymodels package. From the bottom of my heart, thank you. ❤❤❤❤
@yuzaR-Data-Science
@yuzaR-Data-Science 6 ай бұрын
Wow, thanks! The most positive feedback I've ever got! 🙏🙏🙏 I already tried to get my hands on tidymodels, but they are still a bit chaotic for me. I plan to do it though in the future. So, stay tuned!
@aiz_i564
@aiz_i564 5 ай бұрын
Thank you! It helped me a loot! Excellent explanation!
@yuzaR-Data-Science
@yuzaR-Data-Science 5 ай бұрын
Glad you enjoyed it!
@wbdill
@wbdill Жыл бұрын
Even though I know very little about modeling it is clear that this package kicks ass! The animations on this video illustrating the concepts being described is superb!
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Thank you for a nice feedback and for watching, Brian!
@rodrigonehara3143
@rodrigonehara3143 Жыл бұрын
Incredibly usefull chanel, love your work.
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Much appreciated! That motivates to make more!
Жыл бұрын
Excelent! Great animations and clear explanations. Thanks!
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Glad you liked it! Thank you for watching!
@SUNILYADAV-tv5ze
@SUNILYADAV-tv5ze 4 ай бұрын
Nice lecture for resampling. Please make a video for simulation study
@yuzaR-Data-Science
@yuzaR-Data-Science 4 ай бұрын
Thanks 🙏 Sunil, I’ll do. But it’ll take some time, because I first want to cover frequentists stats. Then come to simulation
@AnimeshSharma1977
@AnimeshSharma1977 Жыл бұрын
Very well explained 👍 Thanks for sharing 🙏 just to be clear, for cross-validation, which model is going to run over the test-data created in the very beginning?
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Yes, correct. The model created on the training set will be running over the test set. Thanks you for nice feedback and for watching!
@AnimeshSharma1977
@AnimeshSharma1977 Жыл бұрын
@@yuzaR-Data-Science thanks for the feedback 👍i was asking to specify if you may since there will be N models for each of the N-folds, there will be a statistic for each of the parameter of the model, isn't it? If so, which one?
@joshstat8114
@joshstat8114 Жыл бұрын
Thanks for the vid sir. Can you create a video about the use of tidymodels in time series model and analysis?
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Thanks, Josh! Sure, it'l take some time, but they are definitely on my list.
@tarasst6887
@tarasst6887 Жыл бұрын
Fantastic videos
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Glad you like them! Thanks for watching!
@abdulmusa6162
@abdulmusa6162 Жыл бұрын
Kudos to our Boss, thanks in million sir, the data scientist gurus in the universe Sir could you make a tutorial on handling class imbalance when dealing with a binary classification
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Good suggestion. I just did random forest with massive class imballance with lots of data and many zeros and few ones, and solved it this way: library(randomForest) set.seed(1) fit=randomForest(response ~ predictor1 + predictor2 + ... + , data = data, importance = T, scale = T, mtry = 5, ntrees=10000, sampsize=c(1000,1000)) So, "sampsize" is the argument that helps. I then got similar results to the logistic regression in terms of importance of variables and their interactions
@abdulmusa6162
@abdulmusa6162 Жыл бұрын
Thanks so much sir for throwing more light you are one in millions
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
you are very welcome!
@yurisilvadesouza3059
@yurisilvadesouza3059 Жыл бұрын
Could I use this approach to remove some BIAS regarding different data sample efforts? Example: I have a dataset with monthly video records of animal interactions with vertebrate latrines. However, some spots have different months' recordings because come latrines weren't there when I installed my equipment, meaning I have different latrines sampling with different sample efforts. I am looking for a way to correct it, but since stats are something new for me, I was wondering if this approach you use could be used in my situation. Please don't stop doing this, we need this kind of informative videos and didactic 🙏🙏🙏. Thank you again for this one!
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
Thank for cool feedback! With the month it depends. It sounds to me, that you want to capture the variance (difference) from month to month, then "strata = " argument is a way to go. So, thats for resampling. But it also useful to check out mixed-effects model, where Month would be your random effect, it account for month then. But if you don't care about the month, just want to average out the monthly effect, then bootstrapping 1000-2000 times would do the trick. cheers
@chacmool2581
@chacmool2581 Жыл бұрын
Good to know about Monte Carlo CV in that package. Where does Monte Carlo fit in the variance|bias continuum vis-a-vis bootstrapping and cross-fold validation? I use tend to use 'caret' instead where I dial in these model performance tests via train_control. We need a vid on sub-space clustering. 😉
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
thanks for many suggestions! I'll do my best to cover ML in tidyverse ASAP, including sub-space clustering ;). For now, I am not sure about your monte-carlo question, I think it will depend on the data. The tidymodels are created by Max Kuhn, the same guy who created caret. So, it should contain everything, caret has.
@chacmool2581
@chacmool2581 Жыл бұрын
@@yuzaR-Data-Science Another topic suggestion: Tweedie GLMs, working with hard right skew distributions with and without (hard & soft) zeros. Zero inflated models.
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
thanks, mate! Zero inflated models are definitely on the list, because I use them often. Not sure I'll do tweedie anytime soon though, because nobody in my field (medicine) is using them.
@chacmool2581
@chacmool2581 Жыл бұрын
@@yuzaR-Data-Science Yeah, no need to look at the whole Tweedie family, but if you are faced with zeros frequently and use zero inflated models, the Tweedie of the 1 < epsilon < 2 family for zero and positive continuous data may be worth a look.
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
@@chacmool2581 cool! thanks for your ideas, Chac! I'll keep it in mind and would explore when I have positive continuous data.
@mohamedaddani3718
@mohamedaddani3718 Жыл бұрын
Hello. Can you do a demo about how to do bootstrap in R. Also the website link is not working. Thank you
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
hey, I actually did a video on bootstrapping before doing this one, so, just browse on my youtube channel and you'll find it. and the link is working perfectly, I just checked it. thank you for watching!
@samirhajiyev6905
@samirhajiyev6905 5 ай бұрын
13.25 you have used rand_forest() instead of lm() on code line 124. Could you please clarify?
@yuzaR-Data-Science
@yuzaR-Data-Science 5 ай бұрын
thanks for noticing! I think I wanted to use random forest for classification, I just used the "lm" in the name of the object where I saved the model. thanks for watching!
@galan8115
@galan8115 Жыл бұрын
Regarding this, yuzaR, I was wondering if it existed something along the concept of keeping % of classes in our test-train split, but for numerical values in order to draw two populations that got the most similar means, sd and all of that. Thank you :D
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
oh, it's a great question! I honestly never tried or needed it. I would be interested myself. I neves seen this in the tutorial, and could ask the creators of the package, but I don't know any case for a moment, I would need it. The thing is, if you stritify the numeric predictor and use a few breaks (strata = horsepower, breaks = 10) you'll make the testing and training sets (their distributions actually) very similar and then the means and SDs will be very similar. hope that helps. thanks you for watching!
@galan8115
@galan8115 Жыл бұрын
@@yuzaR-Data-Science Yep, just edited it with the thank you cause I just watched it that part! You sir, are really fast answering comments!! Thank you again and Have a good day
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
@@galan8115 you are welcome!
@serhatakay8351
@serhatakay8351 Жыл бұрын
So how can we use cv sampled or bootstrapped models in prediction using predefined test set?
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
As in the beginning of the video with the train-test split. CV and bootstrapping can be used to find the best model with ONLY the training set, but then you use this model to get a final R2 or RMSE from this model applied to a training set to predict our response variable in the test set. hope that helps
@serhatakay8351
@serhatakay8351 Жыл бұрын
@@yuzaR-Data-Science exactly that definetely helps. I wish there had been a method to combine the values obtained from cv, kendi of ensembling. Thanx for the reply and the videos you share
@yuzaR-Data-Science
@yuzaR-Data-Science Жыл бұрын
you are very welcome! thank you for watching!
R package reviews | dlookr | diagnose, explore and repair your data quick!
17:13
小蚂蚁会选到什么呢!#火影忍者 #佐助 #家庭
00:47
火影忍者一家
Рет қаралды 121 МЛН
REAL 3D brush can draw grass Life Hack #shorts #lifehacks
00:42
MrMaximus
Рет қаралды 12 МЛН
MY HEIGHT vs MrBEAST CREW 🙈📏
00:22
Celine Dept
Рет қаралды 79 МЛН
Bootstrapping Main Ideas!!!
9:27
StatQuest with Josh Starmer
Рет қаралды 464 М.
Pearson correlation with p values and fancy graphs in R
11:22
Agri Analyze
Рет қаралды 12 М.
Sampling Data with SMOTE, Tomek Links, and Nearmiss in R
17:45
Spencer Pao
Рет қаралды 6 М.
R package reviews | glmulti | Find The Best Model !
13:27
yuzaR Data Science
Рет қаралды 12 М.
Make Multiplots Like a Pro with {patchwork} | R package reviews
10:26
yuzaR Data Science
Рет қаралды 2,9 М.
Transform Your Data Like a Pro with {tidyr} and Say Goodbye to Messy Data!
13:17
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 354 М.
小蚂蚁会选到什么呢!#火影忍者 #佐助 #家庭
00:47
火影忍者一家
Рет қаралды 121 МЛН