Design Matrices For Linear Models, Clearly Explained!!!

  Рет қаралды 12,206

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Жыл бұрын

In order to use general linear models (GLMs) you need to create design matrices. At first, these can seem intimidating, but this StatQuest puts together a bunch of examples and illustrats them all so that they are clearly explained.
If you'd like to support StatQuest, please consider...
Patreon: / statquest
...or...
KZbin Membership: / @statquest
...buy my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
statquest.org/statquest-store/
...or just donating to StatQuest!
www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
#StatQuest

Пікірлер: 36
@statquest
@statquest Жыл бұрын
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@accountname1047
@accountname1047 Жыл бұрын
As a pure mathematician who studied a fair bit of design theory i find the use of designs in statistics fascinating
@statquest
@statquest Жыл бұрын
:)
@ahmednafir2286
@ahmednafir2286 Жыл бұрын
Amazing content, thank you so much Josh 🙌
@statquest
@statquest Жыл бұрын
Glad you liked it!
@user-ps7pg8pd5n
@user-ps7pg8pd5n Жыл бұрын
God this is so useful and just saved my module! Thank you so much. Oh I love StatQuest.
@statquest
@statquest Жыл бұрын
Hooray! BAM! :)
@user-ps7pg8pd5n
@user-ps7pg8pd5n Жыл бұрын
Hey Josh what if the two predictors have interactions? Which represents in R like this: lm(Expression~type* weight, data) Is there a video explaning this topic hopefully? :)
@statquest
@statquest Жыл бұрын
@@user-ps7pg8pd5n Not yet.
@marvinbcn2
@marvinbcn2 Жыл бұрын
Brilliant as usual! I'm just wondering, when comparing the two regression lines, how you would deal with the design matrix in case the slope is different for normal and mutant mice. Would it be acceptable to split the third column into two, with 0 and non-zero values to "turn on and off" the slope corresponding to each type?
@statquest
@statquest Жыл бұрын
If you have enough data, you can estimate different slopes for each line. If you don't have enough data (to estimate all of the parameters) then you can use something called "mixed models" which just makes assumptions about what is going on instead of using data to make estimates.
@MrCracou
@MrCracou Жыл бұрын
In his case the model is just: y = beta0 + beta1*quantitative + beta2*qualitative + epsilon You create a model with different slopes, you need something like: y = beta0 + beta1*quantitative + beta2*qualitative + +beta3*quantitative*qualitative + epsilon
@muditgupta5968
@muditgupta5968 Жыл бұрын
Hey Josh, amazing content, I wanted request you to create a statquest on K-mer counting and NLP. Thanks
@statquest
@statquest Жыл бұрын
I'll keep that in mind.
@MrCracou
@MrCracou Жыл бұрын
I tend not to introduce it that way. When i begin with the linear regression, i introduce with dummy variables and them I make them notice that the ANOVA is just a specific case of linear regression with those dummy variables and that the test is just a partial Fisher test. I hope that you will explain contrasts as they often generate error among students
@statquest
@statquest Жыл бұрын
This is actually part 3 in the series. Part 1 is linear regression: kzbin.info/www/bejne/pJyVdIR_idKSm9E Part 1.5 is multiple regression: kzbin.info/www/bejne/sHq3enmKqM6phJo Part 2 is t-tests and ANOVA: kzbin.info/www/bejne/hHeYkJWqhMZ2n8k and then this one.
@Salvador_Dali
@Salvador_Dali Жыл бұрын
hey josh! thx a bunch for this awesume video! a question to the last example (batch effect): is this basically what is considered to be the interaction term in the multifactorial anova? PS: i love the illustrated guide to machine learning! keep up the good work and make more of these please!
@statquest
@statquest Жыл бұрын
Interactions are a little different. An example of an interaction affect would be at 7:12 if mutants had a different slope than control.
@MrErluz
@MrErluz 9 ай бұрын
@@statquest Are you planning to make a video that explains the interaction term? Great videos by the way..
@statquest
@statquest 9 ай бұрын
@@MrErluz It's on the to-do list, but probably not for a while.
@leedongsik2
@leedongsik2 Жыл бұрын
I'm a beginner. Your explanation is surprisingly clear. But I am confused because the dummy variable coding and design matrices look very similar. Can you tell what the difference is if possible?
@statquest
@statquest Жыл бұрын
I believe they are the same.
@bjurv
@bjurv Жыл бұрын
StatQuest could you please number all your fantastic videos so it would be easier to find their order in your series of lectures? E.g now I have a hard time to find your first video "GLM PART 1" on this series on "GLMs"
@statquest
@statquest Жыл бұрын
You can find all of my videos, organized, here: statquest.org/video-index/
@user-ju9wx1fv5u
@user-ju9wx1fv5u 8 ай бұрын
First of all, your videos are the best thing that exists on the internet. I just bought your linear regression study guide. Secondly, if I were expecting the slope for the line between the control and mutant mice to have different slopes could I create a fourth column in my matrix. The third column would have the x values for the control mice in the first four rows and and then zeros in the last four rows. The fourth column would have zeros in the first four rows and then the x values for the mutant mice in the last four rows. In my equation I would have the slope for the control mice multiplied by column three plus the slope for the mutant mice multiplied by column four. Then when I calculate my F value my parameters for that equation (p-fancy) would be 4 and I could calculate if it fits better than any simpler version. In my data I am working with a situation like this and I would like to know if this all is valid.
@statquest
@statquest 8 ай бұрын
If you are expecting different slopes, then you have something called an "interaction" and you can add an "interaction term" to your equation. For details on how to do this, see: stats.stackexchange.com/questions/19271/different-ways-to-write-interaction-terms-in-lm p.s. Thank you for supporting StatQuest!!! BAM! :)
@user-ju9wx1fv5u
@user-ju9wx1fv5u 13 күн бұрын
@@statquest Thank you so much! My graduate studies pulled me in a different direction the last few months but now I am back on this question and I have one more problem. Let's say I wanted to see if the relationship between weight and size for control mice is stronger than the relationship between weight and size for mutant mice. In other words, if we look at the plot at 8:12. Does the green line fit the green points better than the red line fits the red points? Edit: I know I could find a p-value for each line and see which one has a smaller p-value and R squared. But this does not tell me how confident I am that one correlation is actually better.
@statquest
@statquest 13 күн бұрын
@@user-ju9wx1fv5u What you're asking is whether or not there is an interaction between the status, mutant vs control, and the things we measured. To test for this, you would add an interaction term to your equation and it's something that deserves a whole video to explain. In the mean time, check out this link: developer.nvidia.com/blog/a-comprehensive-guide-to-interaction-terms-in-linear-regression/#:~:text=An%20important%2C%20and%20often%20forgotten,value%20of%20another%20independent%20variable.
@user-ju9wx1fv5u
@user-ju9wx1fv5u 12 күн бұрын
@@statquest Thanks! I was able to use the links you gave me to figure out the stuff from my first comment. I think I have a solution to my second comment too. I will support this channel on Patreon because it has been extremely helpful to me.
@statquest
@statquest 12 күн бұрын
@@user-ju9wx1fv5u BAM! :)
@andreadelcortona6230
@andreadelcortona6230 Жыл бұрын
Thanks
@statquest
@statquest Жыл бұрын
:)
@stevebarratt888
@stevebarratt888 Жыл бұрын
I think something is slightly blurry in the explanation for those (me) coming to this with hypothesis testing, rather than models, centered in the mind: You're not completely explicit why you want to compare different/simpler models, and if/why this constitutes the basis of a hypothesis test. How does this sound?: To test if the measured weight* is significantly different between mouse genotypes, one starts by constructing a comprehensive glm (of multiple parameters, if necessary) to predict the measured weight. Then by applying the F = ... equation, you compare the fit of this first glm, with another one which specifically leaves out the parameter of interest, mouse genotype, but includes all the other parameters. This thereby isolates the influence of the genotype parameter in predicting mouse weight. * I guess in this example it's actually weight relative to size..
@statquest
@statquest Жыл бұрын
Sounds good to me.
Machine Learning Fundamentals: Sensitivity and Specificity
11:47
StatQuest with Josh Starmer
Рет қаралды 338 М.
StatQuest: Linear Discriminant Analysis (LDA) clearly explained.
15:12
StatQuest with Josh Starmer
Рет қаралды 744 М.
Best KFC Homemade For My Son #cooking #shorts
00:58
BANKII
Рет қаралды 52 МЛН
Who has won ?? 😀 #shortvideo #lizzyisaeva
00:24
Lizzy Isaeva
Рет қаралды 64 МЛН
아이스크림으로 체감되는 요즘 물가
00:16
진영민yeongmin
Рет қаралды 58 МЛН
HAPPY BIRTHDAY @mozabrick 🎉 #cat #funny
00:36
SOFIADELMONSTRO
Рет қаралды 16 МЛН
ROC and AUC, Clearly Explained!
16:17
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
Linear Regression, Clearly Explained!!!
27:27
StatQuest with Josh Starmer
Рет қаралды 1,3 МЛН
Design Matrices For Linear Models, Clearly Explained!!!
14:40
StatQuest with Josh Starmer
Рет қаралды 129 М.
Bayes' Theorem, Clearly Explained!!!!
14:00
StatQuest with Josh Starmer
Рет қаралды 345 М.
Using Bootstrapping to Calculate p-values!!!
8:08
StatQuest with Josh Starmer
Рет қаралды 104 М.
Support Vector Machines Part 1 (of 3): Main Ideas!!!
20:32
StatQuest with Josh Starmer
Рет қаралды 1,3 МЛН
Maximum Likelihood For the Normal Distribution, step-by-step!!!
19:50
StatQuest with Josh Starmer
Рет қаралды 538 М.
Gradient Boost Part 1 (of 4): Regression Main Ideas
15:52
StatQuest with Josh Starmer
Рет қаралды 791 М.
Best KFC Homemade For My Son #cooking #shorts
00:58
BANKII
Рет қаралды 52 МЛН