Using Linear Models for t-tests and ANOVA, Clearly Explained!!!

  Рет қаралды 415,367

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

This StatQuest shows how the methods used to determine if a linear regression is statistically significant (covered in part 1) can be applied to t-tests and ANOVA. It also introduces the concept of a "design matrix". Part 1 of this series on GLMs (general linear models) is here: • Linear Regression, Cle...
For a complete index of all the StatQuest videos, check out:
statquest.org/...
If you'd like to support StatQuest, please consider...
Patreon: / statquest
...or...
KZbin Membership: / @statquest
...buy my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
statquest.org/...
...or just donating to StatQuest!
www.paypal.me/...
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
Correction:
7:40 There should be parentheses around the SS differences in the F-statistics to have correct equations; (SS(mean)-SS(fit))/(p_fit-p_mean)
#statquest #regression

Пікірлер: 401
@statquest
@statquest 4 жыл бұрын
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@falaksingla6242
@falaksingla6242 2 жыл бұрын
Hi Josh, Love your content. Has helped me to learn a lot & grow. You are doing an awesome work. Please continue to do so. Wanted to support you but unfortunately your Paypal link seems to be dysfunctional. Please update it.
@elnurazhalieva1262
@elnurazhalieva1262 4 жыл бұрын
Rarely do I recommend a youtube channel for someone, but this channel is must-watch!
@statquest
@statquest 4 жыл бұрын
Thank you! :)
@redaaitouahmed8250
@redaaitouahmed8250 5 жыл бұрын
You're making the life of a student so much easier and happier ... Thankkkkk youuuuuu !!!
@statquest
@statquest 5 жыл бұрын
You're welcome!!! :)
@luig2121
@luig2121 Жыл бұрын
I literally watch your videos as if I'm watching TV. I don't know how you've pulled this off but you are incredible
@statquest
@statquest Жыл бұрын
Wow, thank you!
@hadihadiyar1185
@hadihadiyar1185 4 жыл бұрын
Hi, I got my master in Epidemiology, trying to review statistics and found your channel, you are awesome, you really make statistics easy to understand, TRIPLE BAM for you
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@Russet_Mantle
@Russet_Mantle 4 жыл бұрын
This is a really smooth transition from linear models to ANOVA, which is sadly not covered in many stats textbooks.
@statquest
@statquest 4 жыл бұрын
Thanks!
@PunmasterSTP
@PunmasterSTP 6 ай бұрын
I remember learning about t-tests well before linear regression, but it's cool seeing things applied in a different way, especially while going into the deeper concepts. This whole playlist is a stats and machine-learning goldmine!
@statquest
@statquest 6 ай бұрын
BAM! Yes, usually t-tests are taught before linear regression, but I like teaching them in this order (regression first) since the extension of a t-test into ANOVA is way more obvious.
@PunmasterSTP
@PunmasterSTP 6 ай бұрын
@@statquestThat sounds like a good plan.
@justind6931
@justind6931 6 жыл бұрын
It actually takes me a while to realize the F-statistic shown in this video is the same as standard T-statistics. Great vid!
@statquest
@statquest 6 жыл бұрын
Thanks!!! I know, it's a little weird to look at a t-test from this perspective, but it shows how the F-statistic is a generalization of T-statistics. (Here's a cool hint - just like the F-statistic is a generalization of T-statistics, Chi-square statistics are a generalization of normal statistics....)
@zahrahadavand2290
@zahrahadavand2290 2 жыл бұрын
Awesome, there's nothing that can't be understood when you explain it, thanks a millionnnn
@statquest
@statquest 2 жыл бұрын
Thank you very much! :)
@amorrismusic
@amorrismusic 3 жыл бұрын
Never in my life has learning math been easier. Excellent work Josh!
@statquest
@statquest 3 жыл бұрын
Thank you very much!!! :)
@justalittleguy733
@justalittleguy733 Жыл бұрын
i am seriously failing my beginner stats course because try as i might the lectures are quite literally incomprehensible. i owe you my life!!! thank you for these amazing videos -- i feel like this is the first time ALL semester I am understanding something!
@statquest
@statquest Жыл бұрын
HOORAY! I'm glad the videos are helpful.
@baharehbehrooziasl9517
@baharehbehrooziasl9517 5 ай бұрын
The interesting thing about this video is that it taught me something that I haven't noticed I didn't know!
@statquest
@statquest 5 ай бұрын
bam! :)
@rwei2049
@rwei2049 7 жыл бұрын
this is the clearest explanation of design matrices I've ever seen!! Thank you soooo much Joshua!
@DamianEQuijanoA
@DamianEQuijanoA 5 жыл бұрын
Hablo muy poco inglés, pero tu metodología de enseñanza( es muy profesional) es magnífica. A pesar que es inglés, yo logro entender mejor que todas las clases de estadísticas en español. Haces un enorme esfuerzo para que tus clases sean intuitivas y fáciles de comprender para personas no expertas en estadísticas. Te felicito.
@statquest
@statquest 5 жыл бұрын
Muchas gracias!!!!
@xxMissCaprIce
@xxMissCaprIce Жыл бұрын
I think you might have just saved my life. This is so clearly explained, thank you!
@statquest
@statquest Жыл бұрын
Glad it helped!
@Hajar1992ful
@Hajar1992ful 4 ай бұрын
Thank you for your amazing videos Josh. You make us smarter!
@statquest
@statquest 4 ай бұрын
Glad you like them!
@howardip7965
@howardip7965 5 жыл бұрын
Your videos are very well-prepared and informative. Great teaching materials. You are so generous. Thanks a million.
@statquest
@statquest 5 жыл бұрын
Thank you very much! :)
@Sn-nw6zb
@Sn-nw6zb 6 жыл бұрын
Wow, this is smart way to explain ANOVA test, it looks so complicated at first, now it looks straight forward after resembling with linear regression. Great video!!!
@statquest
@statquest 6 жыл бұрын
Hooray!!! I'm so glad you like this video - it's one of my all time favorites. :)
@statquest
@statquest 6 жыл бұрын
Hooray! :)
@_Chafia
@_Chafia 6 жыл бұрын
I hope you will have the time to answer just in few words please! R sqr tell us how x is useful to predict y, so in the case of a t test or anova how to use it? we just talk about F & p, can we say it explains some % of the variance between treatments or it's useless!? Thank you so much Mr. Starmer
@statquest
@statquest 6 жыл бұрын
This is a great question. The traditional way to teach and perform t-tests (and ANOVA) only results in 't' or 'F' statistics and a p-value - no R-squared. However, as you see in this video, it's easy to also report R-squared - you just have to want to do it. The case of t-tests and ANOVA are just like regression and R-squared tells you the same thing - it gives you an estimate on the magnitude of the difference. The p-value just tells you that it is significant. If you did a t-test and got a small p-value, but also a small R-squared, then you could easily deduce that there's not a huge difference between the two groups (even if is statistically different). In contrast, if you did a t-test and got a small p-value and a large R-squared, then you would know that there's a big difference between the two groups. So we can see that R-squared is useful for even the t-test. I suspect that one reason presenting R-squared with t-test results is rare, is that often with t-tests, it is easy and very common to plot the data - so people will show you their data and give you the p-value. Seeing the data is sort of like a "visual R-squared" - you can see if the data are very close to each other or far apart.
@_Chafia
@_Chafia 6 жыл бұрын
THANK YOU SO MUCH.... YOU ARE VERY KIND SIR. I summarize if you allow : "significant p-value + R-squared" = how much is the différence Really GREAT! Thanks again & Good luck!
@user-bz7fj1fk2m
@user-bz7fj1fk2m 4 жыл бұрын
You are blessed and STAY BLESSED. You significantly changed my life with STAT!!!
@statquest
@statquest 4 жыл бұрын
Thank you very much! :)
@SreenikethanI
@SreenikethanI 3 ай бұрын
StatBlessed :D
@lilmoesk899
@lilmoesk899 7 жыл бұрын
Thanks for the video. I'll have to watch this one a couple more times to fully digest it. It's the first time I've heard of a design matrix, so I'll have to spend some time looking into that.
@clarasavary6265
@clarasavary6265 6 жыл бұрын
Thank you very much for all your clear explanations. It's a real pleasure to listen to you and learn more about Statistics !
@statquest
@statquest 6 жыл бұрын
You're welcome! I'm glad to hear you think the videos are helpful. :)
@autumnp4077
@autumnp4077 4 жыл бұрын
Really appreciate the refresher of the regression on the side of the t-test! REINFORCEMENT FOR THE WIN!
@statquest
@statquest 4 жыл бұрын
Yes! :)
@charlotteiosson6235
@charlotteiosson6235 6 жыл бұрын
These videos are brilliant! I'm completing my PhD and there really isn't enough statistics support available which is as accessible as these videos (and considering we're meant to be doing research, that's not really good enough!) - thanks!
@redcat7467
@redcat7467 2 жыл бұрын
I just a video on Confidence Intervals back from 2015 and the song was pretty much the same, yet what a difference!
@statquest
@statquest 2 жыл бұрын
:)
@mohammadalidastgheib2688
@mohammadalidastgheib2688 2 жыл бұрын
Thank you for your clear explanations.
@statquest
@statquest 2 жыл бұрын
Bam! :)
@emmafoley8987
@emmafoley8987 6 жыл бұрын
I've really had trouble understanding what a t test *is* and this was super helpful.
@statquest
@statquest 6 жыл бұрын
Hooray!!!! :)
@seanpitcher8957
@seanpitcher8957 Жыл бұрын
Bought the book. Nicely done and useful!
@statquest
@statquest Жыл бұрын
Awesome, thank you!
@markaitkin
@markaitkin 6 жыл бұрын
Love your videos. I have 3 requests... 1. Degrees of freedom 2. Linear regression with regularisation 3. Log linear regression and why coefficient indicates % change Thanks so much!
@statquest
@statquest 6 жыл бұрын
Thanks so much! The degrees of freedom StatQuest is high, high on the to-do list. It is never far from my mind. I have it about 1/2 done in my head, but the second half is tricky - some situations are easier to illustrate then others - but it's just a matter of setting aside time just for it and nothing else and it will get done. The good news is that I'm maybe 1 or 2 months away from doing StatQuests on ridge, lasso and elastic-net regression - all examples of linear regression (or, more generally, generalized linear regression since these ideas can be applied to logistic regression) with regularization. So that's sure to happen soon (just as soon as I can!) The last one, log-linear regression, is the logical follow up to logistic regression. I may do a "big picture/main ideas" StatQuest on that as soon as I can. It's on the list!
@markaitkin
@markaitkin 6 жыл бұрын
StatQuest with Josh Starmer thanks for your reply. Can't wait for the next videos
@usfbge
@usfbge 4 ай бұрын
Hi Josh, Your vidoes are amazing, easy to follow and understand. Just wondering if you could upload video on GLMM, LMM models and when to use which model? This will help to clarify.
@statquest
@statquest 4 ай бұрын
I hope to do that one day, however, it will probably be a while since I'm writing a book on neural networks right now.
@aickoyvesschumann3400
@aickoyvesschumann3400 4 жыл бұрын
Great video! I think you should put parentheses around your SS differences in the F-statistics to have correct equations; (SS(mean)-SS(fit))/(p_fit-p_mean). Divisions have generally a higher priority than differences, but you want to first subtract and then divide.
@statquest
@statquest 4 жыл бұрын
Great suggestion! I've added your correction to a pinned comment that will be easy for other people to find.
@markobe08
@markobe08 5 жыл бұрын
I will just go on a liking spree on all of your videos
@statquest
@statquest 5 жыл бұрын
Hooray! :)
@alvarorodriguez3552
@alvarorodriguez3552 4 жыл бұрын
Best statistics teacher on internet!!!!
@statquest
@statquest 4 жыл бұрын
Thank you very much!!!! :)
@369standrealfine
@369standrealfine 5 жыл бұрын
Thank you so much for your videos.
@statquest
@statquest 5 жыл бұрын
Thanks!
@mariaaureliano8411
@mariaaureliano8411 4 жыл бұрын
Thank you! Really great and helpful videos!
@statquest
@statquest 4 жыл бұрын
Glad you like them!
@alexanderkononov4068
@alexanderkononov4068 5 жыл бұрын
Maaan! I found you, I found glm, finally! Thanks!
@statquest
@statquest 5 жыл бұрын
Hooray! :)
@shichengguo8064
@shichengguo8064 4 жыл бұрын
Hi Josh, It's time to bring linear mixed models. Thankkkk Youuuuu!!!
@statquest
@statquest 4 жыл бұрын
I'll keep that topic in mind.
@junmingzheng7456
@junmingzheng7456 5 жыл бұрын
OMG, now that's how ANOVA and linear regression is connected.
@woodypham6474
@woodypham6474 4 жыл бұрын
What else i can say about this clip? You're the best
@statquest
@statquest 4 жыл бұрын
Hooray!!! :)
@ronykroy
@ronykroy 5 жыл бұрын
I keeep coming here to hear the Baaaaam !! :)
@statquest
@statquest 5 жыл бұрын
Hooray! :)
@zzzluke8906
@zzzluke8906 Жыл бұрын
Your videos are extremely helpful! Can you go through things like kruskal-wallis test and why it is not sensitive to normal distribution? If you can share some insights on chi-squared test etc, it would be really helpful too!
@statquest
@statquest Жыл бұрын
I'll keep those topics in mind.
@albertrodrigo2432
@albertrodrigo2432 2 жыл бұрын
It would be a triple BAM if you could do a quick Stat Quest about residual diagnosis in linear models!
@statquest
@statquest 2 жыл бұрын
I'll keep that in mind.
@rookiedrummer6838
@rookiedrummer6838 3 жыл бұрын
Thanks @Josh i have a some questions:- 1] Suppose we have 5 independent variables and a label ,How does ANNOVA calculates p-value for each feature in this case? 2] Does it fits a regression for each indipendentVariable~Label separately and than calculates p-value?
@statquest
@statquest 3 жыл бұрын
I describe how p-values are calculated for individual features in these videos: kzbin.info/www/bejne/sHq3enmKqM6phJo kzbin.info/www/bejne/nqDOcn-aftinbs0 The concepts apply to ANOVA in the exact same way.
@Dekike2
@Dekike2 5 жыл бұрын
First of all, Thank you so much, Josh, for the time you spend sharing your knowledge about statistics. Students need more people like you... I wanted to ask something likely silly, can you make an ANOVA with an unbalanced sample? What can I do if some categories have more data than others? Thanks again, Josh!! I am looking forward to hearing from you!!!
@statquest
@statquest 5 жыл бұрын
ANOVA works fine with unbalanced samples. You just have more rows in your design matrix for one category than another.
@rahuldey6369
@rahuldey6369 3 жыл бұрын
1) So are we basically comparing the variability of the each data point of that categorical feature around the sample mean to the variability of individual data points around the grouped mean? Or how can I explain in a simple sentence what these tests are and what we can infer? 2) This is a univariate analysis right? 3) In the figures of Gene Expression you've taken 4 data points as example. What those are? I mean to say, how can I interpret those? are those control and mutant categories encoded? This was the only video, that dared to visualize what T-test & ANNOVA are
@statquest
@statquest 3 жыл бұрын
1) Yes, that's the main idea 2) The t-test is univariate. However, this series of videos also gives many multivariate examples. 3) Those 4 data points reflect how many mRNA transcripts are measured. If that doesn't mean anything to you, just imagine we counted something, like green apples, at 4 different grocery stores.
@rahuldey6369
@rahuldey6369 3 жыл бұрын
@@statquest In that sense those green apples are the dependent variable in our dataset and are we grouping them by 4 different grocery store?
@statquest
@statquest 3 жыл бұрын
@@rahuldey6369 yes
@rahuldey6369
@rahuldey6369 3 жыл бұрын
@@statquest Thank you so much for the clarification. Best wishes. Looking forward to learn more from you
@Kaaaaaaaam
@Kaaaaaaaam 6 жыл бұрын
These videos are great! Thanks!
@ashokmulchandani2841
@ashokmulchandani2841 6 жыл бұрын
I love your voice both while singing and explaining statistical concepts. Thank a ton for these videos. Do you mind if I can request you the videos on the following topics 1) 2 or more factor ANOVA (to be used as reducing the number of the independent variable) 2) Linear Multiple regression (to be used as reducing the number of the independent variable) 3) DOE and Taguchi
@statquest
@statquest 6 жыл бұрын
Glad you like the videos! I've added Taguchi, DOE and 2 or more factor ANOVA to my to-do list. I believe that my video on Multiple Regression in R may already satisfy your second request: kzbin.info/www/bejne/nqDOcn-aftinbs0
@ashokmulchandani2841
@ashokmulchandani2841 6 жыл бұрын
StatQuest with Josh Starmer Thanks 😀
@USER_GBME
@USER_GBME 12 күн бұрын
Hi, love your videos. Just a quick checkup to see if I'm still on track. In the previous videos, I thought that you mentioned 'Degree of freedom' as an equation of (n-Pfit)/(Pfit-Pmean), if so, in the ANOVA example, since Pfit = 5, Pmean = 1, does the 'degree of freedom' equals (n-5)/4? if not, I think I need a solid explaination on this matter.
@statquest
@statquest 11 күн бұрын
Linear models have 2 different degrees of freedom - one for the numerator of the equation (n-pfit) and one for the denominator (pfit-pmean).
@hongdalin5953
@hongdalin5953 6 жыл бұрын
hi Joshua, thanks for sharing. These videos are step-by-step processing and makes so much sense to me than the hedious textbooks. I was wondering if you can make a videos on repeated measures ANOVA biting into small pieces, thanks in advance.
@kartikeyachaudhary4983
@kartikeyachaudhary4983 4 жыл бұрын
Bro, thank you so much man......
@statquest
@statquest 4 жыл бұрын
Thanks! :)
@mook1481
@mook1481 3 жыл бұрын
please do a MANOVA video !! this was so useful, Im doing a 2x2x3 MANOVA for my research project and would really appreciate a video :)
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@somasundar8030
@somasundar8030 3 жыл бұрын
You are the best
@statquest
@statquest 3 жыл бұрын
Thanks!
@urdeathisnear885
@urdeathisnear885 4 жыл бұрын
Hi Josh, great work on these videos, very helpful! One question: is it safe to say that ANOVA is just a generalized t-test for >2 groups?
@statquest
@statquest 4 жыл бұрын
Sure, I think that is a safe thing to say.
@lillycarinabenedikt65
@lillycarinabenedikt65 3 жыл бұрын
I really like all of your videos :) Could you please put ads in the beginning and end and less in the middle? If there is an ad in between every 5 minutes, it's very distracting and I need so much time to get back to the topic and to my concentration.
@statquest
@statquest 3 жыл бұрын
I'm sorry about the ads. Unfortunately KZbin sticks those in the middle automatically and it's not something that I can control.
@karannchew2534
@karannchew2534 3 жыл бұрын
1:36 The goal of t-test is to compare means (eg two groups or categories of data) and see if they are significantly different. 9:23 ANOVA. Compare three or more groups of data.
@statquest
@statquest 3 жыл бұрын
bam!
@BeefLoverMan
@BeefLoverMan 3 жыл бұрын
This channel is a gift from the math gods. Question: I'm having a hard time linking this to Design of Experiments methods. It seems like it should be an easy connection, but I somehow can't quite work it out in my head. How would one use this to calculate the explained variation by individual terms of a linear model? 1 term == 1 "category"? And how do degrees of freedom factor into it?
@statquest
@statquest 3 жыл бұрын
The next video in this series may help you understand how to design experiments: kzbin.info/www/bejne/eaKveKmtnpJohsU
@hsinyenwu
@hsinyenwu 7 жыл бұрын
Thanks so much for this video!!! Never heard anyone explain those concepts so well. Do you have any plan to make videos about multiple comparisons adjustment?
@oliseh2285
@oliseh2285 4 жыл бұрын
Amazing video Josh!!! Could you also do a video of two-way ANOVA with block design and calculating the significance of the factors, their interaction, block and the residuals? It would be great!
@statquest
@statquest 4 жыл бұрын
I'll keep it in mind.
@oliseh2285
@oliseh2285 4 жыл бұрын
@@statquest that will be awesome. Triple BAM!!!
@vtd2024
@vtd2024 7 жыл бұрын
Hey Johua, why we should sing lolz. Love your lesson man!
@thomasamet5853
@thomasamet5853 3 жыл бұрын
Thank you so much Josh for all your amazing content and great silly songs. I don't manage to wrap my head around the reason you say the fit equation is written out like: y = mean_control + mean_mutant at 6:48 and 9:05. I would have written something like y = mean_control * x + mean_mutant (1-x), x taking 1 or 0. Any explanation on that from you or someone else is appreciated.
@statquest
@statquest 3 жыл бұрын
Because my equation is being multiplied by the design matrix, it is essentially the exact same thing that you have.
@thomasamet5853
@thomasamet5853 3 жыл бұрын
​@@statquest Bam!! Thank you for the explanation
@lin1450
@lin1450 2 жыл бұрын
Could you please make a video about "Granger Causality" for time series? That would be a tripple bam!!
@statquest
@statquest 2 жыл бұрын
I'll keep that in mind.
@lin1450
@lin1450 2 жыл бұрын
@@statquest Thank you so much. I really love and highly appreciate your content!! Helped me a lot!
@LEK-0000
@LEK-0000 5 жыл бұрын
He explains very simple concepts .
@antoniovivaldi1
@antoniovivaldi1 8 күн бұрын
Dude i have a makeup exam in five days, wish me luck ^^
@statquest
@statquest 8 күн бұрын
Good luck!!
@VCC1316
@VCC1316 2 жыл бұрын
... the mutant mice are just normal mice that have a specific gene that has been knocked-out, and live in the sewers with 4 turtles. Also, this is really a fantastic intro to ANOVA, hats off.
@statquest
@statquest 2 жыл бұрын
Thank you!
@TheAugustinePark
@TheAugustinePark 4 жыл бұрын
At 4:20 of the video, you mention the reason we combine the two lines of best fit into a single equation is to make the steps for computing "F" identical for regression and the t-test meaning a computer can do it automatically. In terms of what this actually looks like, I think this means having a single equation means one value for SS(fit) (instead of 2) which means we can use the "F" equation for regression. Is my reasoning correct? Also, why does a single equation mean a computer can do it automatically? Why could a computer not do it automatically if we had 2 equations? Thanks I love your videos!
@statquest
@statquest 4 жыл бұрын
Sure, a modern computer can handle more than one equation. But back in the day memory was limited and that limited the number of tests a computer could perform. So the the original idea was to unify as much of linear models into a single framework called "General Linear Models", with the idea that one equation could be used in a general setting on a computer without having to check a bunch of different conditions. In the early days, different conditions meant different look-up tables for figuring out the p-values and since computers had very little memory, this limited what they could do.
@JimRohn-u8c
@JimRohn-u8c 3 жыл бұрын
Can you do a Video on Tukey-Kramer HSD please. I’m a Chemist and we use that at work but I’m having a difficult time getting an intuitive understanding of it. Thank you for this channel!
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@LouisChiaki
@LouisChiaki 3 жыл бұрын
Design matrix sounds like just doing the one-hot encoding in ML for categorical features.
@statquest
@statquest 3 жыл бұрын
It's similar, but the differences are important. For example, design matrices typically have the first column dedicated to an overall or base mean value, and usually one-hot encoding does not do this.
@yenhoeooi9
@yenhoeooi9 3 жыл бұрын
Hi Josh, great video here. Would really appreciate if you have a statquest on the F-statistics/f-value and also on degree of freedom. Its kinda hard for me to grasp the concept of these two topics.
@statquest
@statquest 3 жыл бұрын
The first video in this series explains F-statistics and f-values: kzbin.info/www/bejne/pJyVdIR_idKSm9E
@Tyokok
@Tyokok 5 жыл бұрын
Hi Josh, quick Q. Isn't the test you explained here F-test? Isn't t-test use t-score=(slope beta-0)/standarderror , and then get p-value from t-table? or are they the same thing? little confused here. Thank you!
@statquest
@statquest 5 жыл бұрын
This is a great question. The t-test is just a specific type of F-test. If you have statistics software, you can compare the results and see that the p-values are the same (however, the F-statistic itself will be the square of the t-statistic. Why the square? Because, as you saw in the first video in this series, the F-statistic can never be negative, but the t-statistic can.) There are multiple ways to calculate a t-test, this using an F-test is my favorite because it is much more flexible. Does that make sense?
@Tyokok
@Tyokok 5 жыл бұрын
@@statquest I knew you would took it to the further level. So basically the two tests are both about model parameters hypothesis significance test, just use different methods, so p-value should refer the same thing. BAM! Thank you so much!
@asifsaad3296
@asifsaad3296 Жыл бұрын
thanks for making it unavailable without payment, people like you are making money off of such an open source platform!
@statquest
@statquest Жыл бұрын
I am really sorry! I don't know what is going on. I've contacted KZbin and have not heard anything back. This is breaking my heart because I never wanted this to happen, but somehow it is. I am sorry and doing everything I can to fix this.
@itsIs263
@itsIs263 5 жыл бұрын
introduction songs seems to be composed by Phoebe Buffay
@statquest
@statquest 5 жыл бұрын
Considering how memorable her songs were, I'll take that as a complement.
@itsIs263
@itsIs263 5 жыл бұрын
StatQuest with Josh Starmer hahaha ,sure it is. And thank you for making stats so crystal clear and funny 💐
@forooghfarajzade8206
@forooghfarajzade8206 Жыл бұрын
supppppppppppppper perfect thaaaanks
@statquest
@statquest Жыл бұрын
Thanks!
@tomrandolph3179
@tomrandolph3179 2 жыл бұрын
I am absolutely the least mathematically minded person you will ever meet. Can you do a StatQuest explaining basic statistics terminology so that a sixth-grader could grasp the concepts?
@statquest
@statquest 2 жыл бұрын
I've already got a bunch of videos that go through the basics. Start at the top of this list and work your way down: statquest.org/video-index/#statistics
@urdeathisnear885
@urdeathisnear885 4 жыл бұрын
Hi Josh, upon reviewing this, I'm wondering why you say you're using a t-test, but you actually calculate an F-statistic? In this case, isn't the two group case you show an F-test (i.e. a two group ANOVA) ?
@statquest
@statquest 4 жыл бұрын
t-test = two group ANOVA. In other words, a t-test is just a specific example of ANOVA, and an ANOVA is just a specific example of general linear models. In this case, the F-statistic is just the square of the t-statistic that we wold have gotten for a t-test and the p-values are the exact same. There are two ways to do a t-test, the way most people teach it and by using a general linear model, both give you the exact same results.
@urdeathisnear885
@urdeathisnear885 4 жыл бұрын
@@statquest Great, thanks for explaining the relationship between them, very helpful! But technically, because in this video you are comparing the ratios of variances and not the difference between means across groups, this is an f-test, not a t-test, right? Or does t-test not necessarily imply comparing the difference between means (though I've seen this in multiple other resources) ?
@statquest
@statquest 4 жыл бұрын
@@urdeathisnear885 In both the t-test and in ANOVA, we are testing to see if the difference between (or among) the means is statistically signifiant. The concepts are the exact same. The differences in the equations are just technical details. In other words, if someone asked me to give them directions from my house to the grocery store, I could give them multiple routes to get there - all of them, however, would qualify as "directions from my house to the grocery store".
@urdeathisnear885
@urdeathisnear885 4 жыл бұрын
@@statquest Sure, but in your analogy, there is likely one, optimal route to the grocery store, right? So to take the reverse approach and go from real-world to stats analogy, I guess a related question I have is: there are two types (F, T) of tests that yield two different statistics that share the same concepts, but surely there may be times where it's preferable to use one over the other, else why would there be two separate tests? If so, could you maybe give a simple example of when you'd prefer one over the other? Thanks, this feedback is really helpful!
@statquest
@statquest 4 жыл бұрын
@@urdeathisnear885 Ah, I have to be careful with my analogies. The F-test and Student's t-test yield different, but mathematically related, statistics. The F-distribution generalizes the t-distribution, just like the F-test generalizes Student's t-test, and it can be shown, mathematically, that a 2 sample ANOVA is equivalent to Student's t-test. So there is no difference and no reason to choose one over the other. That said, the Student's t-test was later modified (updated) by Welch to allow for unequal variances in the two groups. So there is a difference between Welch's t-test and a 2 sample ANOVA - and this is important. If you think you have different variances, then you need to use Welch's t-test (not Student's t-test or an F-test).
@ai1888
@ai1888 6 жыл бұрын
Will the F-statistic calculated from this method be equal to the t-statistic? I understand that you are trying to standardize the way to calculate the t-test by using methods from linear regression, but does it produce the same values that a regular t-test does?
@benedettodiciaccio3024
@benedettodiciaccio3024 6 жыл бұрын
According to this website [ onlinecourses.science.psu.edu/stat501/node/297/ ], the t-statistic and F-statistic produce equivalent p-values when the F-statistic's degrees of freedom in the numerator is 1. The relationship is t^2(n-p) = F(1,n-p), which apparently means the p-values for each will be identical. Don't know why that is but videos on the relationship between those two distributions may help. Anyway, I assume the relationship applies here in which the df = 1 for the F-statistic numerator when comparing two groups. As a side note, most slopes for p-values in multiple linear regression are calculated with t-tests. However, F-tests comparing the variance between models with and without the slope produce an identical p-value due to the above mentioned relationship. Thinking of slope significance in terms of how much more variance the model explains with vs without the slope seems much more intuitive to me, and I'm glad I found these videos.
@glaswasser
@glaswasser 3 жыл бұрын
can you make a statquest about linear mixed models / random effects? I'm extremely confused about them, when to use them and how to interpret the results...
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@Denise_lili
@Denise_lili 2 жыл бұрын
Thank you for making these amazing videos! Questions: Are we calculating R squared for the t-test and ANOVA as well? Additionally, if the p-value is small, does that mean both 1) the fitted lines are statistically significant and 2) Two categories' means are significantly different from each other?
@statquest
@statquest 2 жыл бұрын
Yes, you are calculating R^2 for the t-test and ANOVA. If the p-value is small, then using two means and fitted lines result a significant reduction in residuals compared to using a single mean and a single fitted line. Thus, the two categories' means are significantly different from each other.
@Denise_lili
@Denise_lili 2 жыл бұрын
@@statquest Do you mean compare R squared between 1) Control mice and 2) Control mice and mutant mice?
@statquest
@statquest 2 жыл бұрын
@@Denise_lili No, we are comparing using one mean that is calculated from all of the data - control and murant - to using two means, one for control and another one for mutant.
@MR-yi9us
@MR-yi9us 2 жыл бұрын
@@statquest I also have a silly question, since we are calculating F-statistics why do we call it a t-test?
@statquest
@statquest 2 жыл бұрын
@@MR-yi9us If you think of the t-distribution like a knife, then the F-distribution is a Swiss army knife. It does what the t-distribution does, but much more. That being said, when the t-test was first created, they were only thinking about it, and not the broader class of problems (like ANOVA), and so the t-distribution was enough to get the job done. Thus, it's called a t-test. However, because the F distribution does everything the t-distribution does and more, we use the F-distribution here to be consistent among all the different things we can do.
@ravikiranrao05
@ravikiranrao05 3 жыл бұрын
Such a great video, Josh. Really enjoyed your videos. Can you please recommend a text book which reflects your way of teaching? Are there any such which I'll be hooked at reading (just like your videos)? Thanks
@statquest
@statquest 3 жыл бұрын
I'm writing my own book right now. I hope it is out next year.
@ravikiranrao05
@ravikiranrao05 3 жыл бұрын
@@statquest Woah! Looking forward to read that.
@RishiRajKoul
@RishiRajKoul 2 жыл бұрын
if you have a video on time series, please share or make one
@statquest
@statquest 2 жыл бұрын
I'll keep that in mind!
@chinmayagarg3977
@chinmayagarg3977 4 жыл бұрын
We can fit a vertical line passing through all the points of control data which will give the Least sum of squared residuals, @3:09, right? If that's the case then why did we fit horizontal line? Thanks in advance P.S.: The channel is awesome. Recommended it to many.
@statquest
@statquest 4 жыл бұрын
Sure, a vertical line would minimize the squared residuals, but you can't use it to make predictions. What Gene Expression value would you predict with a vertical line? All of them, and that makes vertical lines useless.
@teammdyss
@teammdyss Жыл бұрын
@@statquest sorry, wouldn't that effectively mean that for t-test we're not really looking for a best fit? Calling a something a fit then it's really not makes things confusing, in all the examples you show so-called "fit" is represented as a "mean" so wouldn't "just find equation for mean line" a better rule of thumb rather than talking about "lest squares"? Head melting right now
@statquest
@statquest Жыл бұрын
@@teammdyss Maybe a better way to say it is "best fit given some restrictions", and those restrictions are 1) the number of parameters we want to use and 2) we want a model that is useful for making predictions.
@nr7507
@nr7507 2 жыл бұрын
Thank you, I had a few questions. At 6:37, is there a reason we did not include the residuals in the overall equation of y? Also, why do we need the y equation at 6:13 to create a design matrix? Is it just not just a matrix where the number of ones corresponds to the number of data points for control and zero for mutant and vice versa for the next data point number of entries? Also, does the sample size have to be the same per category to create a design matrix? Great Tutorial!
@statquest
@statquest 2 жыл бұрын
1) This equation simply represents what goes into to the design matrix. The residual is the difference between this equation and what is observed. 2) The equation just illustrates how we create the design matrix and what it represents. 3) You don't need to have equal numbers of samples for each category (they can be different).
@beautyisinmind2163
@beautyisinmind2163 2 жыл бұрын
Hi Professor Josh, Anova(F-test) is often used in Filter method for feature selection. Theory says, Anova should be used for feature selection when target is Binary but I saw in some practical use people also uses Anova when target is multi class. So Anova(F-test) can also be applied if our target is not binary and has multiple classes? another question Anova assumes features to be normally distributed, But in practice most of the time we encounter data that are not fully normal in such case does it matter much to apply it? or Transformation is compulsion?
@statquest
@statquest 2 жыл бұрын
ANOVA is really only intended to be used when the dependent variable is continuous.
@yimingshao4240
@yimingshao4240 2 жыл бұрын
Thanks a lot for your video, it's really helpful! but i have a question, why the equation of y can be written as y= mean (control) + mean (mutant), where are the residuals in each set of data?
@statquest
@statquest 2 жыл бұрын
I'm not sure I understand your question. The residual for each measurement is paired with that measurement, so it is easy to keep track of.
@mrcharm767
@mrcharm767 2 жыл бұрын
great goldmine i found !!!. btw one concern : don't u think in t-test it is somewhat showing the dependence among the independent variables?? when we consider ss(fit)
@statquest
@statquest 2 жыл бұрын
I'm not sure I understand your question. Can you rephrase it?
@mrcharm767
@mrcharm767 2 жыл бұрын
@@statquest sure thanks for replying .. in timestamp 6:00 we see we are taking residuals refrencing both the independent variables right ?? they there is reference between them does that mean they are dependent . Please let me know if im clear
@statquest
@statquest 2 жыл бұрын
@@mrcharm767 Again, I'm not really sure what you are asking. The independent variables are independent. We are using the residuals to determine if a model with 2 independent variables results statistically significantly smaller residuals than a model with just one independent variable.
@mrcharm767
@mrcharm767 2 жыл бұрын
@@statquest in short if i ask does the calculation depend on the distance among these 2 independent variables?
@statquest
@statquest 2 жыл бұрын
​@@mrcharm767 Regardless of the distances between the two variables, we calculate the squared residuals using the same formula: (observed - predicted)^2
@brunog.campos3236
@brunog.campos3236 5 жыл бұрын
If the t-test indicates that mutant and control are different, but the anova indicates that there was no difference between the groups what should I do?
@scottrjjh
@scottrjjh 2 жыл бұрын
I'm sure I'm missing something here, but why do we even need to do the whole design matrix thing? To get SS(fit) aren't we just adding the means of each group then taking the sum of squares of the residuals from that horizontal line?
@statquest
@statquest 2 жыл бұрын
The design matrix makes it easy to do these tests with a computer.
@aaryan9058
@aaryan9058 2 ай бұрын
Hey Josh, Could you please answer this? If i calculate p-value using this method and also using student's t-test. Will it be the same? If yes, why? If not, why?
@statquest
@statquest 2 ай бұрын
It will be the same. The F-distribution is just the square of the distribution. For more details: coursekata.org/preview/book/fd645e20-5a0d-482e-ad16-ee689acb7431/lesson/15/6#:~:text=The%20F%2DDistribution%20and%20T%2DDistribution%20are%20Actually%20the%20Same&text=The%20reason%20is%20that%20fundamentally,get%20exactly%20an%20F%2Ddistribution!
@silentsuicide4544
@silentsuicide4544 2 жыл бұрын
i don't think if i get it completely. when we have two features like in the first example, over the graph is written "t-test", but we are calculating f-score, which using f-distribution gives us the p-value, but the definition for t-test is that it is every hypothesis test in which the test statistics follows a t-distribution under the null hypothesis. My question is why is it called "t-test" if we are using f-score and f-distribution to get p-value?
@statquest
@statquest 2 жыл бұрын
The F-distribution is a generalization of the t-distribution. In other words, the F-distribution can do everything we can do with a t-distribution and more.
@shamshersingh9680
@shamshersingh9680 7 ай бұрын
Hi Josh, at time stamp 6.48 when you write the equation y = mean of control + mean of mutant, where have the residuals gone. How will we get the value of y using this equation without residuals. As y = mx + c in linear regression helps get y values from given x and same concept is being applied here. So why are dropping the residuals.
@statquest
@statquest 7 ай бұрын
We drop the residuals because it doesn't make any sense to include them in the predictions we make with this equation. The residuals only make sense when we are evaluating how well the model fits the data. But with predictions based on new data, we don't know the actual values, so we don't know the residuals.
@kautsarfadlyfirdaus1879
@kautsarfadlyfirdaus1879 4 жыл бұрын
Thank you for the amazing video, as always. If you have time to spare, I want to ask about 'how to test the model with the new data?' If I understand correctly, then we just need to calculate the new data with the following equation y = switch*mean_control + switch*mean_mutant edit: when i watch the video again, it seems like the purpose is to find wether the mean between the values is significant or not. Am i correct?
@statquest
@statquest 4 жыл бұрын
The purpose of the t-test is to determine if there is a significant difference between mice with the normal gene and mice with the mutant gene. However, we can also use the model to make predictions with new data. If my test tells me that there is a significant difference between normal and mutant mice, if you tell me you have a mutant mouse, I can tell you that the gene expression should be the mean of the mutant mice. If my test tells me that there is not a significant difference, then I will use the mean of all the mice, normal and mutant, as my prediction.
@kautsarfadlyfirdaus1879
@kautsarfadlyfirdaus1879 4 жыл бұрын
@@statquest I see, now I undertand better, thank you Mr. Josh.
@wisamtariq4412
@wisamtariq4412 5 жыл бұрын
Many thanks, great channel! I have a question please.. does t test approach here is what's called "one way ANOVA".. and f test for "factorial ANOVA" since there are more levels for the categorical variable?
@danielsobczynski2107
@danielsobczynski2107 2 жыл бұрын
Hi Josh, great video as always. Just wanted to ask, what happens to the residual in the equations earlier in the video that had “+ residual” in them? Thanks so much for your help, definitely learning alot
@statquest
@statquest 2 жыл бұрын
What time point, minutes and seconds, are you asking about? (However, I'm guessing that you are asking about the difference between the equation that perfectly fits the data, because it includes the means + the residuals, and the equation that generates the residuals (because it only includes the means). The equation that does not include the residuals is the one we use to make predictions with future data.
@danielsobczynski2107
@danielsobczynski2107 2 жыл бұрын
@@statquest Thanks Josh, that is the point I was asking about, I will review the video again once more
@raghavgaur8901
@raghavgaur8901 4 жыл бұрын
Hi Josh,I just wanted to confirm that if we have a data with very high cardinality then we would use anova and if we have data with only two categories then we would t test right?
@statquest
@statquest 4 жыл бұрын
When you only have 2 categories, you use a t-test. When you have more than 2 categories, you use anova. However, as you can see, a t-test is just a special case of the anova.
@raghavgaur8901
@raghavgaur8901 4 жыл бұрын
@@statquest thanks for answering
@cristianleoni6852
@cristianleoni6852 4 жыл бұрын
Great job as usual, but this is still quite a confusing topic for me, will Pmean aways be one? Also is there a nice explanation for the formula of the F value? And how does F value relate to p value?
@statquest
@statquest 4 жыл бұрын
Did you watch part 1 in this series? If not, it should answer all of your questions: kzbin.info/www/bejne/pJyVdIR_idKSm9E
@keewonlee3787
@keewonlee3787 5 жыл бұрын
this is dope
@statquest
@statquest 5 жыл бұрын
Thanks! :)
@eye_oph
@eye_oph 2 жыл бұрын
Hi Josh, great video as always. Just wanted to ask, How to do the post hoc tests in linear models just like post hoc tests in ANOVA to explore differences between two groups? Thank you.
@statquest
@statquest 2 жыл бұрын
Post-hoc tests with ANOVA are just a matter of defining your "design matrices", which I illustrate in the next video in this series: kzbin.info/www/bejne/eaKveKmtnpJohsU
@eye_oph
@eye_oph 2 жыл бұрын
​@@statquest If there are three drugs: drug A, drug B, and drug C, we use drug A as the reference level. We then use dummy coding to compare B vs. A; C vs. A in the linear model. In the linear model, we can determine the difference of B vs. A; C vs. A by calculating the p value of the coefficient. However, it seems that we can not determine the difference of B vs C in the above linear model? Thank you for your reply.
@chongliwinston5225
@chongliwinston5225 4 жыл бұрын
Dear Josh, just to make sure that F value = ((ss(mean)-ss(fit))/(p(fit)-p(mean)))/(ss(fit)/(n-p(fit))) right? Not F value = (ss(mean)-(ss(fit)/(p(fit)-p(mean)))/(ss(fit)/(n-p(fit))) where the numerator is “ss(mean)-ss(fit)”over(p(fit)-p(mean)) instead of ss(mean) - “ss(fit)” over (p(fit)-p(mean)) right?
@statquest
@statquest 4 жыл бұрын
That's correct - I was a little sloppy with the parentheses when I made these videos.
@chongliwinston5225
@chongliwinston5225 4 жыл бұрын
StatQuest with Josh Starmer noooo, this is very helpful, really appreciate that. I have to make sure just because I am not that familiar with this.
@vanya.antonov
@vanya.antonov 5 жыл бұрын
Hello, Joshua! I am a bit confused at 7:42. If I understand correctly, you estimate the t-test p-value by computing the F-value (and using the F-distribution?). Although, according to Wikipedia, the test statistics in t-test follows the Student's t-distribution (and not the F-distribution). So, I was wondering if the t-test you describe here is the same as the standard t-test from the Wikipedia?
@juliar5741
@juliar5741 4 жыл бұрын
I have the same question here. @StatQuest
@minederguy4932
@minederguy4932 4 жыл бұрын
How do you calculate the residuals for the equation + design matrix? Wouldn't that involve subtracting a matrix from a scalar?
@statquest
@statquest 4 жыл бұрын
The design matrix is just a general way to specify how each measurement fits into the equation.
@TheAugustinePark
@TheAugustinePark 4 жыл бұрын
In terms of when we should use linear regression vs. t-tests vs. ANOVA for testing our data, is linear regression for when our independent variable is continuous while t-tests and ANOVA for when our independent variable is discrete (e.g. categorical variables)? Thank you!
@statquest
@statquest 4 жыл бұрын
Technically, it is all linear regression. However, they give it different names. t-tests are when you have two distinct groups and ANOVA is when you have more than 2 distinct groups.
@minahabibi1008
@minahabibi1008 2 жыл бұрын
I totally understand your expalnation but I did not understand what is the idea or concept we need to relate AVONA or test with regression
@statquest
@statquest 2 жыл бұрын
We have a single method, linear models, that does all of these things and much more. It's like a swiss army knife for statistics. To see more cool things you can do with linear models, see: kzbin.info/www/bejne/eaKveKmtnpJohsU
Bam!!! Clearly Explained!!!
2:49
StatQuest with Josh Starmer
Рет қаралды 59 М.
Linear Regression, Clearly Explained!!!
27:27
StatQuest with Josh Starmer
Рет қаралды 1,3 МЛН
Win This Dodgeball Game or DIE…
00:36
Alan Chikin Chow
Рет қаралды 41 МЛН
Ozoda - Lada ( Official Music Video 2024 )
06:07
Ozoda
Рет қаралды 19 МЛН
Design Matrices For Linear Models, Clearly Explained!!!
14:40
StatQuest with Josh Starmer
Рет қаралды 133 М.
Using Linear Models for t tests and ANOVA, Clearly Explained!!!
11:38
StatQuest with Josh Starmer
Рет қаралды 55 М.
How To Know Which Statistical Test To Use For Hypothesis Testing
19:54
Amour Learning
Рет қаралды 777 М.
How to read a box plot (a.k.a. a box-and-whisker plot) - Nick Desbarats
6:53
Practical Reporting Inc.
Рет қаралды 71 М.
ROC and AUC, Clearly Explained!
16:17
StatQuest with Josh Starmer
Рет қаралды 1,5 МЛН
StatQuest: Principal Component Analysis (PCA), Step-by-Step
21:58
StatQuest with Josh Starmer
Рет қаралды 2,9 МЛН
Linear Regression, Clearly Explained!!!
27:27
StatQuest with Josh Starmer
Рет қаралды 260 М.
Statistics 101: Linear Regression, The Very Basics 📈
22:56
Brandon Foltz
Рет қаралды 1,9 МЛН
T-test, ANOVA and Chi Squared test made easy.
15:07
Global Health with Greg Martin
Рет қаралды 299 М.