GLM in R

Рет қаралды 52,405

3 жыл бұрын

In this video we walk through a tutorial for Generalized Linear Models in R. The main goal is to show how to use this type of model, focusing on logistic regression, and talk a bit about why it's a good tool to know.
The tutorial discusses both GLM and multilevel models, but the video has been split into two parts.
github.com/ccs-amsterdam/r-co...
We also have a more dedicated tutorial for GLM in R. It's best to view on the github html preview, but youtube messes up these links, so you'll have to find the link on this page under 'statistical analysis - generalized linear models'
github.com/ccs-amsterdam/r-co...
A great book on GLM (of which I'm not sure whether the digital version should be freely available, but hey, I just stumbled upon this pdf:
www.utstat.toronto.edu/~brunne...

Пікірлер: 30

@randomdude4411 8 күн бұрын

This is a brilliant tutorial on GLM in R with a very good breakdown of all the information in step by step fashion that is understandable for a beginner

@djyi2174 2 жыл бұрын

Thank you so much for the tutorial.

@philip_che 3 жыл бұрын

Thank you for these videos!

@kariiamba7324 2 жыл бұрын

Thankyou for this helpful video

@ammarparmr 2 жыл бұрын

Very well explained !!! However, using the coefficients in the summary in my opinion is by far mush easier to understand than the way with tab model

@kasperwelbers 2 жыл бұрын

Hi Ammar, sorry I missed this comment, but I would like to break a lance for odds ratios ;). Benefit of the log odds ratios is, I think, only that the sign corresponds to the effect direction. But the values are very hard to interpret. With odds ratios you can say things like "for a unit increase in x, the odds of y increase by a factor 2 (aka twice the odds)". Is there a benefit of using the log odds ratios that I'm overlooking?

@user-gd2yz3dj3b 2 жыл бұрын

Hi Kasper, thank you for wonderful video. I have a question, which is about R2 and R2 adjusted of GLM models on R. How we can get R2 and R2 adjusted on R console? On my console, I can not find these values when I run a code “summary()”. Any specific code to get them on console?

@kasperwelbers 2 жыл бұрын

Hi, great question! The thing is, there actually isn't a R2 or R2 adjusted for GLM. Instead, to evaluate model fit, it is more common to compare models (in the second link in the description, see logistic regression -> interpreting model fit and pseudo R2). There ARE, however, also some 'pseudo R2' measures, such as the R2 Tjur seen in the video. These measures try to imitate the property of R2 as a measure of explained variance. You'll never get these scores in the basic glm output though, because there are many possible pseudo R2 measures. But there are packages that implement them. For instance, the 'performance' package has an r2() function which calculates a (pseudo) r2 for different types of models. I'd also recommend reading about the model comparison approach though (if you don't know about it already), because journals often like to see this rather than or in addition to some pseudo R2.

@user-gd2yz3dj3b 2 жыл бұрын

@@kasperwelbers Thank you so much for quick reply! It was really helpful and easy to understand:) One mor question! I will be conducting GLM in my master’s thesis. Which one would you recommend? 1. Report AIC value (and I would write like “this model had the smallest AIC value) 2. Try calculating pseudo R2 measures and report them

@kasperwelbers 2 жыл бұрын

@@user-gd2yz3dj3b I'd actually recommend reporting Deviance AND some pseudo R2. The pseudo R2 is nice to help along interpretation, but deviance is more appropriate, and also provides a nice test to see if adding variables to a model provides a significant increase in fit. Say you have models of increasing complexity (i.e. adding variables): m0, m1 and m2. For glm's, you can then use: anova(m0, m1, m2, test = "Chisq"). In the ouput, the deviance column for the m1 row tells you how much deviance decreased compared to m0, and the pr(chi) column tells you whether this increase was significant (and same for m2 compared to m1). Alternatively, you could use sjPlot's tab_model and just add the AIC and/or deviance directly to the table: tab_model(m0, m1, m2, show.aic = T, show.dev = T).

@user-gd2yz3dj3b 2 жыл бұрын

@@kasperwelbers Thank you so much, Kasper! I will try calculating deviance and pseudo R2 using the code you suggested :) Can I ask another question via email or something? I’m sorry to be a pain, but I think you can answer another big question I have🙇‍♂️

@kasperwelbers 2 жыл бұрын

@@user-gd2yz3dj3b No problem! I do however prefer to keep questions based on these videos confined to youtube (and not too big). Especially at the moment with the whole corona teaching situation I'm swamped with emails, and I do need to prioritize my direct students. For bigger questions, I also do think it's best to find someone at your uni (ideally supervisor or someone in same department). Not only because they supposedly can invest more time, but also because in more specific problems there tend to be differences across disciplines / traditions in how to do statistics.

@hm.91 2 жыл бұрын

Thank you!

@MyPimpstyle Жыл бұрын

Hi Kasper, what/how much does the intercept tells us in this case?

@kasperwelbers Жыл бұрын

Good Question! It's similar to ordinary regression, in that it just means: the expected value of y if x (or all x-es in a multiple regression) is zero. This is mainly interpretable if there is a clear interpretation of what x=0 means. For instance, say your model is: having_fun = intercept + b*beers_drank. In that case, the intercept is the expected fun you have if you haven't had any beers. Now saw we have a binomial model. Our dependent variable is binary, namely whether or not a person had a hangover the day after a party. This time, the effect is more like (but not exactly, i'm ignoring the link function): hangover = intercept * b^beers_drank. Notice that ^ in b^beers_drank. Thats the multiplicative part: we expect that the odds of having a hangover increase by a 'factor of b' for every unit increase in beers. But whats most relevant for us now is that an exponent of zero is always 1! So b^0 (zero beers) is 1. So here as well, it means that when x is zero, the intercept is just our expected value. If we've transformed our coefficints to odds ratios, then if we haven't had any beers, the intercept would represent the odds that someone had a hangover. So if the intercept is 2, it would mean that the odds that someone who didn't have any beers has a hangover is 2-to-1, so a probability of 0.66 (odds of 2-to-1 means 2 people out of 3). That sounds weird, but they probably had whisky instead. I don't know how much that helped. The key takeaway is that like with ordinary regression, it's mainly interpretable if you have a clear idea of what x=0 means.

@audreyq.nkamngangk.7062 8 ай бұрын

Thank you for the tutorial. Is it possible to create a glm model with a variable to explain which has 3 modalities

@kasperwelbers 8 ай бұрын

If I understand you correctly, I think it's indeed possible to model a dependent variable with a tri-modal distribution with glm. Actually, you might not even need glm for that. Whether a distribution is multimodal is a separate matter of the distribution family. A tri-modal distribution might be a mixture of three normal distributions, three binomial distributions, etc. Take the following simulation as an example. Here we create a y variable that is affected by a continuous variable x, and a factor with three groups. Since there is a strong effect of the group on y, this results in y being tri-modal. ## simulate 3-modal data n = 1000 x = rnorm(n) group = sample(1:3, n, replace=T) group_means = c(5,10,15) y = group_means[group] + x*0.4 + rnorm(n) hist(y, breaks=50) m1 = lm(y ~ x) m2 = lm(y ~ as.factor(group) + x) summary(m1) ## bad estimate of x (should be around 0.4) plot(m1, 2) ## error is non-normal summary(m2) ## good estimate after controlling for group plot(m2, 2) ## error is normal after including group

@954giggles 2 жыл бұрын

Do you need to install any packages to run the glm code?

@kasperwelbers 2 жыл бұрын

The glm function is in the stats package, which comes shipped with the basic R installation. So you dont necessarily need other packages. But in the tutorial I do use some packages for convenience, such as the sjplot package for making a regression table. If you run this without sjplot the results are the same, but you'll need to do some calculations yourself. For instance, logistic regression gives log odds ratio coefficients, so you'd need to take the exponent (exp function) to get the odds ratios. Tldr; you dont need to install packages, but it does make life easier

@michellelaurendina 3 ай бұрын

THANK. YOU.

@rubyanneolbinado95 2 ай бұрын

Hi, why is R studio producing different results even though I am using the same call and data.

@kasperwelbers Ай бұрын

Hi! Do you mean vastly different results, or very small differences? I do think some of the multilevel stuff could in potential differ due to random processes in converging the model, but if so it should be really minor.

@draprincesa01 Жыл бұрын

how can i vizualized if some variables are factors like yes or no

@kasperwelbers Жыл бұрын

I think sjPlot handles those pretty nicely! There's some great explanations on the website, under the regression plots tab: strengejacke.github.io/sjPlot/

@JT-ph3hk Жыл бұрын

use the function str(yourbasename). If the variable is not yet a factor you can transform it using the following yourbasename$nameof the factor

@DavidKoleckar 4 ай бұрын

nice audio bro. you record in bathroom?

@kasperwelbers 4 ай бұрын

Ahaha, not sure whether that's a question or a burn 😅. This is just a Blue Yeti mic in the home office I set up during the COVID lock downs. The room itself has pretty nice acoustic treatment, but I was still figuring out in a rush how to make recordings for lectures/workshops and it was hard to get clear audio without keystrokes hitting through.