Bar Charts with {ggplot2}
13:38
Жыл бұрын
Join Tables with {dplyr}
10:29
Жыл бұрын
Combine Tables with {dplyr}
4:31
Жыл бұрын
Пікірлер
@laurareinoso4684
@laurareinoso4684 14 сағат бұрын
Hi Yury, I always enjoy watching your videos! I'm working with fishing data from 2005 to 2023, but I'm still unsure which type of regression would work best for my data. Some species aren't reported in certain years, but I don't think it's due to errors or outliers. I've also noticed some data that seems off, like 486 instead of 4.86, likely because of how it was reported. Do you have any recommendations on how to handle this? Thanks a lot!
@yuzaR-Data-Science
@yuzaR-Data-Science 12 сағат бұрын
thanks Laura! appreciate your feedback! I studied fisheries too, by the way :) in Kiel, Germany. Now, without seeing the data and understanding hypothesis, one can't say "what type of regression" to use. Outliers and mistakes should be solved, of course ;) thus, consult a statistician or epidemiologist from your department.
@tighthead03
@tighthead03 Күн бұрын
Thanks for your content, this is super useful. How can performance be used with tidymodels glm models? Any examples you can point to please?
@yuzaR-Data-Science
@yuzaR-Data-Science 12 сағат бұрын
that's a great question. unfortunately not. that's why I am still hesitating making tidymodels videos. they don't work with the most practical stats packages together, yeat. and since I do more stats then ML, I will cover stats topic first before going ML and AI
@tighthead03
@tighthead03 10 сағат бұрын
@@yuzaR-Data-Science thanks for the feedback keep up the excellent videos 👍
@yuzaR-Data-Science
@yuzaR-Data-Science 3 сағат бұрын
I will 🙏
@Violetblue1307
@Violetblue1307 2 күн бұрын
I really like the way you explain codes, beginning with simple levels and increasing gradually the complexity. I also understand every argument you wrote, so interesting! Previously, I used to copy codes online or through some classes but I can't know all of them, so I can't design graphs by myself. Now it changed. Thank you!
@yuzaR-Data-Science
@yuzaR-Data-Science Күн бұрын
I am so glad to hear that, Violet! Thanks a lot! I felt the same by going through lots of R content, that's why I started to produce my own R content ... it was actually only for me in the beginning, so, I learn better, but then I started to get more and more positive feedback, like yours ;) and being helpful by producing useful content makes me kind of happy :)
@nguyentho9467
@nguyentho9467 5 күн бұрын
Can you make video about path analysis process?
@yuzaR-Data-Science
@yuzaR-Data-Science 4 күн бұрын
what is "path analysis process"?
@nguyentho9467
@nguyentho9467 5 күн бұрын
so impressive with your knowledge and video, thank you so much.
@yuzaR-Data-Science
@yuzaR-Data-Science 4 күн бұрын
Glad you enjoyed it! Just send you the link in other comment too: we.tl/t-tBLvcJ55xT
@kydaviddoyle1969
@kydaviddoyle1969 7 күн бұрын
Another great video!! One question what is the source of the tables you use to show the interpretation of P-value, bias factor, and size of correlation coefficient?
@yuzaR-Data-Science
@yuzaR-Data-Science 6 күн бұрын
Good question! Generally, you just find some interpretation and use the reference. It does not have to be the one from the video. However, if you want to use those, here are three references: Hinkle DE, Wiersma W, Jurs SG. Applied Statistics for the Behavioral Sciences. 5th ed. Boston: Houghton Mifflin; 2003. Jeffreys, H. 1961. Theory of Probability. 3rd ed. Oxford: Oxford University Press. Raiola, Gaetano & Di tore, Pio. (2012). Statistical study on bodily communication skills in volleyball to improve teaching methods. Journal of Human Sport and Exercise. 7. 10.4100/jhse.2012.72.12.
@juniorsouza4826
@juniorsouza4826 11 күн бұрын
Please post a video about mixed models, suggesting the best data treatments, packages and graphing. Thank you very much! Your channel notifications are my favorite!
@yuzaR-Data-Science
@yuzaR-Data-Science 11 күн бұрын
Thanks for your nice words! I will definitely do multiple videos on mixed models, since I use them everyday. It’ll take a while because I already have a long list of videos to be done, but mixed models will be there, I promise!
@juniorsouza4826
@juniorsouza4826 10 күн бұрын
@@yuzaR-Data-Science Amazing! I'm looking forward to the mixed model series.
@yuzaR-Data-Science
@yuzaR-Data-Science 10 күн бұрын
@@juniorsouza4826 cool :)
@drmedleech
@drmedleech 11 күн бұрын
Amazing! Thank you.
@yuzaR-Data-Science
@yuzaR-Data-Science 11 күн бұрын
Glad you liked it! 🙏
@Ange-y1k
@Ange-y1k 11 күн бұрын
First of all, thank you. The video may be old, but I'm only watching it now. How do you go about post-hoc testing when comparing two variables with more than two categories? (I am having an error saying: 'x' must have 2 columns)
@yuzaR-Data-Science
@yuzaR-Data-Science 10 күн бұрын
hi, your error message is most likely due to having only 1 category, not >2. sure, you can do post-hocs easily, here is how, but I think I talked about it in the video: install.packages("rstatix") library(rstatix) contingency_table <- table(mtcars$cyl, mtcars$am) contingency_table pairwise_prop_test(contingency_table)
@Ange-y1k
@Ange-y1k 10 күн бұрын
@@yuzaR-Data-Science Thank you for your answer. I used the same code, but replaced it with variables, each with 3 categories, it didn't work and gave me that code error.
@yuzaR-Data-Science
@yuzaR-Data-Science 8 күн бұрын
again, >2 categories is not the problem. here is the proof for two categorical variables with 3 categories each: ggstatsplot::ggbarstats(mtcars, cyl, gear). Something is wrong in your data, may be in your table, or some packages are not installed.
@martian_heidegger
@martian_heidegger 11 күн бұрын
More like vulva plots.
@mayar9078
@mayar9078 12 күн бұрын
Thank you ❤
@yuzaR-Data-Science
@yuzaR-Data-Science 11 күн бұрын
You are very welcome 🙏
@akindefisayo3267
@akindefisayo3267 12 күн бұрын
Please can you make a comprehensive tutorial on plotting and interpreting Taylor's diagram in R? Can we use R to plot it? how does the data arrangement look like to achieve such graphical representation?
@yuzaR-Data-Science
@yuzaR-Data-Science 10 күн бұрын
Hey, thanks for nice words! Glad my content is useful! :) To be honest, I never needed the Taylor's diagram before, but I'll put on the list and might do one tutorial in the future. Hope you'll stick around for some time. Cheers
@akindefisayo3267
@akindefisayo3267 10 күн бұрын
@@yuzaR-Data-Science I actually need the Taylor's diagram to evaluate some model performance in my current Master's thesis and I can't find any tutorial on it. That's why I requested. Thank you.
@yuzaR-Data-Science
@yuzaR-Data-Science 8 күн бұрын
well, Taylor diagrams are on the list. till then, if you need to evaluate some model performance, there is no better way I know than the {preformance} package. I have done a video on it too. It's a bit old, but I still use the performance package every day
@akindefisayo3267
@akindefisayo3267 12 күн бұрын
Hello. Thank you for this highly educative and helpful resources. It has really help me in my research in graduate school. Please can you make a comprehensive tutorial on plotting and interpreting Taylor's diagram in R? Can we use R to plot it? how does the data arrangement look like to achieve such graphical representation?
@yuzaR-Data-Science
@yuzaR-Data-Science 10 күн бұрын
Hey, thanks for nice words! Glad my content is useful! :) To be honest, I never needed the Taylor's diagram before, but I'll put on the list and might do one tutorial in the future. Hope you'll stick around for some time. Cheers
@khaledf3977
@khaledf3977 13 күн бұрын
Nice
@yuzaR-Data-Science
@yuzaR-Data-Science 13 күн бұрын
Thanks!
@shrikantdeshmukh7951
@shrikantdeshmukh7951 15 күн бұрын
Can i export of csv or xlsx
@yuzaR-Data-Science
@yuzaR-Data-Science 15 күн бұрын
I don't think so. May be with "gt" package. Please, let me know, when you figure it out. cheers
@Ange-y1k
@Ange-y1k 15 күн бұрын
Thank you very much for this work and the clarity of the explanations. However, I would like to know if it is possible to demonstrate using splines as a reviewer recently told me that using polynomials is obsolete.
@yuzaR-Data-Science
@yuzaR-Data-Science 15 күн бұрын
Yes, absolutely! Just use gam instead of glm and apply s(numeric_predictor), as I showed in the video.
@jeandenys7
@jeandenys7 18 күн бұрын
Thanks Dr. I found this after your coment yesterday on linkedin. Very clear!
@yuzaR-Data-Science
@yuzaR-Data-Science 17 күн бұрын
Excellent! I just say and answered this comment, Jean! Thanks a lot for such a generous feedback! I hope other videos would also be useful for you. Warm welcome to my channel! :)
@jammaningas8850
@jammaningas8850 18 күн бұрын
Very easy to follow! Nice!
@yuzaR-Data-Science
@yuzaR-Data-Science 18 күн бұрын
Glad you think so! 🙏 you might like other videos too 😉
@juniorsouza4826
@juniorsouza4826 19 күн бұрын
AMAZING! I was watching your video content to learn how to report tests and interpret results and you delivered much more than I expected. Please never stop posting videos exploring the diversity of tools and applications of the R Language. Thank you.
@yuzaR-Data-Science
@yuzaR-Data-Science 17 күн бұрын
Thanks so much for such a positive feedback! :) I'll try my best to post more often! The quality - quantity trade off is important though :) Hope other videos also deliver more than they promise. Please, always feel free to give a feedback, especially when you think I can improve something on the video production site.
@eimienwanlanibhagui4859
@eimienwanlanibhagui4859 22 күн бұрын
Thank you for your video. I have a question. What happens when visual/qualitative and quantitative/numeric inspections of regression assumptions are in disagreement? Residuals, Std. Residuals, and Sqrt(Std. Residual) plots are all not horizontal (they show a pattern) yet BP test (check_heteroscedasticity)) is reporting that p > 0.05. Which test should you go for because it is suggested that you perform both visual and numerical tests.
@yuzaR-Data-Science
@yuzaR-Data-Science 17 күн бұрын
Good question! You'll rarely meet all the assumptions perfectly. Visual diagnostics is more important than tests with their p-values. Take shapiro wilk normality test. For lots of data it will always be significant even for the perfectly normal distribution. You might then remove outliers, transform your data or even consider different type of model, like QR or GAM.
@eimienwanlanibhagui4859
@eimienwanlanibhagui4859 16 күн бұрын
@@yuzaR-Data-Science Thank you. I read under the comments of this video or that of QR where you said you don't transform data because of the difficulty in interpretation. But in the literature, almost everyone does data transformation of some sort: logx, log(x+1), Box-Cox, square root, 1/x etc. I have seen only few studies where they re-transformed the data to its original form and then applied bias-correction technique. I think I would explore QR, GAM, and one I just recently came across: MARS - Multiple Adaptive Regression Spline (I think this is what the acronym represents). It is quite interesting that a non-normal dataset subjected to regression would comply with the assumptions of regression through quantitative tests yet fail the visual tests, especially the one for homogeneity of variance. Well my sample size is < 30 (28 or 29) and I think a sample size of 30 is the minimum often recommended for adequate statistical power. The variables in the data are environmental types and they are known not to be normally distributed. Finally, these variables represent mean values.
@yuzaR-Data-Science
@yuzaR-Data-Science 16 күн бұрын
sure, most of the literature I know (and I know by far not all!) say you can transform, but don't build up on explaining that you need to transform back or any other next step. MARS is interesting, I'll look it up, thanks for the tipp. if you have <30 observations, the normality test should be ok. but still, visual inspection is the way I go, because there will be always some assumption unmet. if you know your variables are normally distributed, they you can use LM or t-test, then no need to go to non-parametric methods. some of my colleagues go to non-parametric by default, which is another extreme, I avoid. cheers
@eimienwanlanibhagui4859
@eimienwanlanibhagui4859 16 күн бұрын
@@yuzaR-Data-Science Thank you once again. The variables are not normally distributed. Histograms show that and both Shapiro-Wilk and Anderson-Darling tests confirmed that. Then why LM? You have to follow orders 😁. There are high and low values of dependent variables in the data which cannot be removed. The data has been aggregated [mean] to a study location [N = 28]. Which is why I want to explore QR which I learnt from your video. I had thought about GLM and GAM prior to your comment on the latter. By the way, let me thank you for your video on the R package that allows you to compare the performances of model. The one that enables you to do a spider plot of the model evaluation metrics. Great work 🙌.
@yuzaR-Data-Science
@yuzaR-Data-Science 16 күн бұрын
thanks again, mate! greatly appreciate your feedback. jea, I know those stupid orders you have to follow :) one of my favorite - we'll always done it like that. brrr, guise bumps. QR might help, but it's pretty data hungry. so, don't give up on QR when it produces huge CIs of if some quantile do not work. I use it for my next paper and the results it delivers is massive and insightful. May be try to use some simpler non-parametric methods, (mann-whitney or just median regression = 0.5 quantile in QR) so that your order-givers are not overwhelmed.
@45tanviirahmed82
@45tanviirahmed82 23 күн бұрын
I have a request! Can you please talk about All the assumptions of commonly used statistical tests used in Research in One Single Video? (like different t tests, Anova, Regressions and their non-parametric counter parts). This will help me a lot to stop making mistakes while choosing the model. Before jumping into any kind of test, we must need to meet the assumptions, right?
@yuzaR-Data-Science
@yuzaR-Data-Science 17 күн бұрын
wow, that one is deep! thanks! we don't need to meet them, we need to check them. the assumptions help to decide which test to use. I usually don't force the assumptions, e.g. via transforming data, but find the most suitable test for the real data. Making a single video on assumptions would be a whole documentary and I might do this one day. But now I think there are soo many methods, each of them has numerous assumptions and there are so many opinions about the importance of them, that I also get frustrated about them. but until I make such a video, don't try to meet them, but understand them and adjust, by using Quantile Regression, Robust Regression, or Bootstrapped regression ... all the topics I already covered on my channel. sorry for late reply, I was on holidays. hope that was helpful
@mmdigital123
@mmdigital123 23 күн бұрын
Great videos, perhaps the best in R community I have ever seen.
@yuzaR-Data-Science
@yuzaR-Data-Science 23 күн бұрын
Thank you so much 🙏 that means a lot to me! ☺️
@laurareinoso4684
@laurareinoso4684 25 күн бұрын
The confidence intervals are not shown in the graph of the Normality of Residuals. Do you know how can I visualize it or does it have to do with the package itself?
@yuzaR-Data-Science
@yuzaR-Data-Science 24 күн бұрын
sure, there one or other dependencies packages missing. just update all the packages in rstudio and install all the dependencies which will be suggested to you. and your cis will appear
@laurareinoso4684
@laurareinoso4684 14 күн бұрын
@@yuzaR-Data-Sciencethank you! It worked :)
@yuzaR-Data-Science
@yuzaR-Data-Science 14 күн бұрын
Glad it did ;)
@laurareinoso4684
@laurareinoso4684 25 күн бұрын
Hi, I’m Laura from Colombia. Your videos have been of great help for my research project which is obligatory to graduate from a major in Ecology. Thank you so much
@yuzaR-Data-Science
@yuzaR-Data-Science 24 күн бұрын
Glad it was helpful! :)
@yinanzhang2764
@yinanzhang2764 26 күн бұрын
Thx so much wish to your coming vedio
@yuzaR-Data-Science
@yuzaR-Data-Science 24 күн бұрын
Most welcome!
@saifalshehhi281
@saifalshehhi281 26 күн бұрын
Thank you very much
@yuzaR-Data-Science
@yuzaR-Data-Science 24 күн бұрын
You are welcome!
@Bader-gs4ig
@Bader-gs4ig 26 күн бұрын
very very informative. thanks mate
@yuzaR-Data-Science
@yuzaR-Data-Science 24 күн бұрын
Glad you enjoyed it, thanks for watching mate!
@hoppybrewologist
@hoppybrewologist Ай бұрын
Love your work - can you show PCA plots sometime please
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
Thanks you soo much! :) I plan to do PCA for sure, but I don't know whether I can manage this year. The to-do list is kind of long already, and I plan to cover lot's of modelling stuff, so, supervised methods. I will then come to unsupervised, like PCA etc.
@lkobzik
@lkobzik Ай бұрын
Please renew the link to the code for this video, I just missed the window, joining today but it is 8 days after your link which lasted 7 days...Also in addition to my concrete positive feedback by joining, please accept enthusiastic compliments on the style and content of your videos!
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
Hey hey 👋 thanks for joining the community and for such a nice feedback on my style! That's always very helpful, since people have different tastes, and when I do content which resonates with others - that's very fulfilling! I just updated the link for multivariable regression. Thus enjoy the article and let me know when you need other pdfs with code too. Very warm welcome and thank you for being here! 🙏
@lkobzik
@lkobzik 28 күн бұрын
I saw a message saying you renewed the link, thank you, but the renewed link (now 3 days old) still gives an expired message from wetransfer...sorry to be a bother....also your reply here has disappeared so I hope I am not hallucinating 🙂but there are 2 messages for the same video in the group perks page so I think I am sane
@lkobzik
@lkobzik 28 күн бұрын
I refreshed the page and your reply is back, so that is progress, please see prior message@@yuzaR-Data-Science
@yuzaR-Data-Science
@yuzaR-Data-Science 27 күн бұрын
hey hey, wetransfer reduced the number of days to 3 (here is what they say: Transfer expires in 3 days). then I have to udjust all the messages. I'll try to handle that issue asap. until then I renewed the link on the community tab. but here it is also for your convenience: we.tl/t-lSFy6cX2Ct
@ritaboateng8874
@ritaboateng8874 Ай бұрын
Can you share the code
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
Sure, Rita. for this video it is an one-liner: ggstatsplot::ggbetweenstats(ISLR::Wage, education, wage, "np") but if you want the code for other videos and the description of all the details, consider join the channel, because I send the pdf of a blog-post which I write for every video to the members of the channel. cheers
@robertmarbun
@robertmarbun Ай бұрын
any chance you share the code? Thanks
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
sure, Robert. for this video is an one-liner: ggstatsplot::ggbetweenstats(ISLR::Wage, education, wage, "np") but if you want the code for other videos and the description of all the details, consider join the channel, because I send the pdf of a blog-post which I write for every video to members. cheers
@robertmarbun
@robertmarbun Ай бұрын
@@yuzaR-Data-Science I already join and subscribe to your channel
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
dear, Robert, thanks for subscribing. the "join" is a different button. it offers you to monthly support me. but it is of course not necessary. you can just stop every video and write down the code, it's usually not much, and then you don't have to pay anything, youtube is free. it's only when you want the explanations and the code in the form of blog-article from the video, that you might choose to support me. I provide such pdfs for the members upon request. Kind regards!
@Danxexcel-e1e
@Danxexcel-e1e Ай бұрын
Hi Yury, I knew that you have a great blog (YuzaR blog), but now I find it is no longer exist, is it right? or we need a membership to access it?, it was a wonderful blog.
@ramoda13
@ramoda13 Ай бұрын
Nice and very helpful video.
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
Glad you like it! Thanks for watching!
@jeandenys7
@jeandenys7 Ай бұрын
Love this work! Practical.
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
Glad you like it! Thanks for watching!
@mauriciomorales3165
@mauriciomorales3165 Ай бұрын
Hi Yury, AMAZING WORK WITH THIS! Explain a lot!! I'm performing some logistic regression analysis right now, I would like to make you a few question, is it possible to contact you by email, github or twitter? if not, I can add more context in this comment. Your help will be great and really appreciate!
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
Thanks a lot Mauricio for such a nice feedback! I can try to answer your questions here on youtube comments to the best of my abilities and to the time possible. Cheers
@mauriciomorales3165
@mauriciomorales3165 Ай бұрын
@@yuzaR-Data-Science Thank you! I checked some of your papers, they helped me a lot with some of my question! Well, here the context: I have a model that predict severity (disease - non-disease) based on a genotype, for example AA vs AG|GG. That is the basic model, so now I add sex and age, for my second model, then obesity and the last one include more comorbidities: allergies, arterial hypertension and so on... I perform a mixed selection process and based on AIC criteria I select the best model, all good so far, however, here my questions: some people add the crude OR and the adjusted OR. By definition, I know that adjusted OR is the ones that is adjusted when you add more independent variables to the model, however, is there a way to put the crude and the adjusted all in one model? For example, with the function glmulti::glmulti() and finalfit::fit2df() I can put in a table with all my variables the crude and the adjusted OR, so looks like there is a way to calculate and plot both, Could you provide more information about this? My other question, do you know if there is a way to check for confounder using code, for example, I read applied logistic regression book, in the book the author mention that you can check this by interaction between variables. However, I'm not sure if I can perform a test or a plot that could say me "this is a confounder in your model". Last question, for the stats::glm() function, you can define interaction by ":" and "*", so imagine that I need to set the interaction between the genotype and obesity, because I know that in my data, obese guys have more probability to have the disease when they have specific genotype. So one option is to allow the interaction between this variables" glm(severity ~ genotype * obesity + other variables + ...) or using ":" instead. My question is: this interaction is correct in the biological way? What do I mean, is the interaction really representing the condition genotype - obese together? I understand the meaning in the code, but I do not know if i can extrapolate that to the biological meaning. Any comment are really appreciate here! Thank you for your time and help!!!!
@mauriciomorales3165
@mauriciomorales3165 Ай бұрын
I forgot to mention, that most of my variables are binaries, age is the only one with numerical data type
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
Well, first, there is no difference between odds ratios from any multivariable logistic model and adjusted odds ratios. All ratios from multivariable regression are adjusted for the confounders you put in. So, don't worry about this terminology to much. Secondly, to control for confounders, you can use techniques such as: Stratification: Dividing the sample into groups based on the confounder and analyzing the relationship between the independent and dependent variables within each group. Statistical adjustment: Including the confounder as a covariate in the statistical model. Finally, use "*" instead of ":" because "*" will implicitly use ":" anyway. Hope that helps!
@yuzaR-Data-Science
@yuzaR-Data-Science Ай бұрын
don't forget to make binaries a factor then ;)