Explaining The One-Sample t-Test

  Рет қаралды 7,489

Very Normal

Very Normal

5 ай бұрын

An intuitive guide to the one-sample t-test and why it looks the way it does
Stay updated with the channel and some stuff I make!
👉 verynormal.substack.com
👉 very-normal.sellfy.store

Пікірлер: 36
@hiradnisari
@hiradnisari 5 ай бұрын
Brilliant video as usual, keep up the nice work you clearly seem to be headed towards being a great educational content creator geared towards statistics
@s.k_525
@s.k_525 5 ай бұрын
Love your videos. Just studied that topic today in the class, and this video was like a revision.❤🎉
@eddiea8468
@eddiea8468 Ай бұрын
Great video! This channel is awesome!
@K33go175
@K33go175 5 ай бұрын
Absolutely love your videos, I was interested in biostats and thought I’d ask where you’re enrolled!
@smolpup2532
@smolpup2532 5 ай бұрын
Great videos, don’t change when u blow up
@Unaimend
@Unaimend 5 ай бұрын
Hey, thanks for the video. Will you also go in bayesian or Monte-Carlo simulation based things?
@very-normal
@very-normal 5 ай бұрын
Yesss, Monte Carlo sooner than Bayesian, but both are in the works
@Unaimend
@Unaimend 5 ай бұрын
@@very-normal Nice. Again. Thank you for this video. This was quite a good explanation of where all those formulas come from. I was always just provided the formula to calculate the t-statistic but never knew why I had to use this specific formula, this video explained it really well
@xavierlarochelle2742
@xavierlarochelle2742 4 ай бұрын
You mention that you're leaning towards the bayesian approach to statistical data analysis, which is cool because its also the case for me! I noticed you don't have a video on bayesian statistics on your channel. Are you planning on doing one? And if so, are you going to give a shot at explaining the frequentist-bayesian divide? It seems to me like a hot topic rn and I'm still looking for good ways to vulgarize the issue. Love your channel!
@very-normal
@very-normal 4 ай бұрын
Yeah, thanks for watching! Part of what makes it hard is that I’m still sorting out how to include stuff like MCMC and Stan when we move past stuff like conjugate priors. Maybe I’m overcomplicating it lol. And yeah, I’ve been brainstorming how to approach a frequentist-Bayesian video as well, it would be a great long form video
@xavierlarochelle2742
@xavierlarochelle2742 4 ай бұрын
@@very-normal I don't think it's possible to overcomplicate MCMC haha. Personally, I like the way McElreath puts it in his book statistical rethinking. He basically puts MCMC (and Stan) as a new and improved method of estimating the posterior distribution, former methods being the analytical approach (working out the math = hard), grid approximation (computer intensive) and quadratic approximation (limited to normally distributed posterior). This might be an oversimplification however, I'm new to this stuff.
@lbsl7778
@lbsl7778 3 ай бұрын
First of all, thank you for your videos, I find them great to expand and reinforce what I learn in my statistics classes. Just one question: at 12:04 how did you calculate the null distribution?
@very-normal
@very-normal 3 ай бұрын
Thanks! That’s exactly what I had in mind with these videos, so I’m glad they’ve been helpful. The dataset I simulated has 30 observations, so this results in a null t-distribution with 29 degrees of freedom. One degree of freedom is “spent” estimating the sample mean that’s used in the test statistic, so that’s why it’s n-1 (where “n” is sample size)
@lbsl7778
@lbsl7778 3 ай бұрын
@@very-normal Thanks for answering so fast! Not even my teachers are that quick. So, I assume simulating the dataset you mentioned has some hard math or something like that and that's why you didn't get into how to simulate it?
@very-normal
@very-normal 3 ай бұрын
Oh no, it’s not hard math, I actually generated data from a specific normal distribution and I rounded the values to make them look like integers. I simulated it to look like plausible data Originally, I didn’t think it would be helpful to show how I simulated it but it ended up coming up!
@lbsl7778
@lbsl7778 3 ай бұрын
I'll watch a video about how you simulated it. Anyway, thanks again! @@very-normal
@GG-ot8zg
@GG-ot8zg 5 ай бұрын
I know that the sample variance is the estimator for population variance but if we are testing whether we can reject H0 why don't we use H0 mu in the calculation of the std at the denominator? It's a random thought. I can also imagine the impact as we use H0 mu instead of the sample mean for the sample variance we will get a higher variance as we won't use the value that minimize the deviation in that sample so I guess the tstat will be smaller and the test less likely to pass. Anyway I was wondering if there is a reason related to the tstat distribution maybe being biased if we don't use that and yada yada yada, does it make sense?
@very-normal
@very-normal 5 ай бұрын
That’s a good question. I don’t know the precise answer, but I feel it’s because we’d still like the sample variance to just be calculated from the data. Using the null mean in the calculation would tie the variance to the null hypothesis, but I think we are meant to only use the data in the variance. You’re definitely right that it would probably have downstream effects on the error rates of our decision making
@Unaimend
@Unaimend 5 ай бұрын
And another question. In your video on the normal dist., you mentioned that the mean and variance of the normal are independent and that this is important for the t-dist., but I don't think that you explicitly discussed this here. So why does it matter and what would change if there were dependent?
@very-normal
@very-normal 5 ай бұрын
I’m happy that you’re keeping past material in mind! That’s what I want people to do. I had originally intended to mention this again in the next video just to shorten this one. In short, if the sample mean and sample variance are dependent on some way, it suggests there may be some covariance between them. My educated guess is that this will contribute to higher variance in the resulting distribution, more than what is predicted by CLT. This will results in more faulty decisions downstream (type-I and type-II errors). I’m sure the sampling distribution will still look bell shaped, but would have even fatter tails than the t-distribution. This could be checked via simulation, could be interesting to look at
@Unaimend
@Unaimend 5 ай бұрын
@@very-normal Thanks for the reply. That sounds quite right. I love that you try to connect intuition with the math. Sadly not many teachers are able to do this.
@immunostatst3435
@immunostatst3435 Ай бұрын
This video series is really fantastic, thank you! One truly enlightening concept has been the way that you explain the relationship between data, random variables, probability distributions, and statistics, and how all of these concepts are foundational for NHST. One naive clarification question. A difference that I’ve noted between the videos in this series and similar introductory teaching materials*, is that you don’t seem to focus much on the concept of the ‘sampling distribution,’ and the distinction between data derived from a single sample and the frequentist concept of repeated sampling from the population. Is it accurate to say that the ‘distribution of *the* sample mean’ that you refer to in this video is a ‘distribution of *repeated* sample means’, - same for the distribution of the sample variances? IOW, doesn’t the CLT allow us to approximate the ‘mean of sample means’ rather than the mean of a single sample of n observations? Apologies if this is simply semantics, but historically, when I’ve tried to apply NHST concepts to the lab, as in, I’ve done an experiment one time, e.g. collecting the body weight of 10 mice on high fat diet, and now I want to make an inference about all mice on a high fat diet, I’ve confused these concepts. (*apologies if this is an unfair/inaccurate characterization of your videos and I’ve missed something, certainly possible!)
@very-normal
@very-normal Ай бұрын
Thanks for watching! Hopefully I can clarify a little bit here. Yes, I think the way you've connected it is correct. When I talk about "the distribution of the sample mean," I'm also implicitly referring to the fact that the sample mean is a random variable and will vary if we collect different datasets. And this idea does extend to the sample variance. It is estimated from data, so it will have its own (sampling) distribution. But just because something is a random variable doesn't mean that its distribution is easy to describe. But in the case of the sample mean, CLT tells us we can approximate the distribution of the sample mean with a Normal distribution. The mean of this distribution ("mean of sample means") is the population-level mean, which is usually the thing you want to know. It's not quite that CLT lets you "approximate" the population means, but it tells you that the sample mean you actually see is "close" to this population mean. Unfortunately with NHST, you have no way to confirm that the population mean actually is, only what it isn't. The quantity you'd be interested in is the average body weight of lab mice (presumably on some intervention that will alter it). This is a population-level quantity you want to know. But you can only collect data from a subset of this population. So, sample average you calculate will be a little different from the population-level. 10 isn't a lot, but theoretically if you use many more mice, the average weight of your sample will get closer to the population weight you want. I hope this clarifies a bit more!
@immunostatst3435
@immunostatst3435 Ай бұрын
@@very-normal Yes it does, thanks so much for the fast and thoughtful reply!
@becktronics
@becktronics 5 ай бұрын
Great video! I feel like you're supplementing all the topics that I was shaky on when I was in college... The t's "student" test or distribution had so many strange notational decisions that it made it confusing to me thinking that it was a distribution, but at the same time you have to calculate a t_{\text{df}} value with n - 1 DoF... Your example helped me finally tie together p-values, distinguish between samples & populations, and the code example was great to help me confirm that I was actually following which symbols signified what. I think I'm still a little fuzzy on where the t's distribution actually comes from. How are we comparing the t's statistic to the t's distribution generated from our null hypothesis? Is the p-value itself that value? Keep up the awesome content! You're helping demystify statistics for a much wider audience :) Are there any PDEs that get solved stochastically that implement statistical methods to solve them? I'll try not to pack too much in one comment, but you've got me hooked!
@very-normal
@very-normal 5 ай бұрын
Thanks! That’s what I was hoping people would get out of this series of video. I chose to omit the origin of the t-distribution in the video because I felt it was too technical without giving much insight. With some mathematical manipulation, the t-statistic can be shown to be the ratio of two random variables: 1) a standard normal and 2) a function of a chi-squared distribution (specifically: the square-root of a chi-squared distribution with n-1 degrees of freedom, divided by n-1). Taken together, this ratio produces the t-distribution. Because standardization is a common operation, this ratio also appears very frequently in statistics, so much so that it’s more convenient for us to give it it’s own common name so that it’s easier to refer to. And yes you’re right! The p-value comes from seeing where our test statistic (here, the t-statistic) falls within the null distribution (here, a t-distribution with n-1 degrees of freedom). I’ve always felt that these old naming conventions really hurt how we learn about this test. Hope this helps!
@becktronics
@becktronics 5 ай бұрын
@@very-normal Thank you for your comprehensive response! Never knew that it was related to the chi-squared distribution... In school, I had to use statistics to quantify different transport phenomena and thermodynamic experiments. Unfortunately, the t's test didn't make any sense to me during the labs or it'd be covered in ~5 minutes during the pre-lab lecture. So oftentimes I'd just have a mean, SD, and RMS. Curve fitting was a huge thing too that I never quite understood, but at least the libraries are there for that... :) Do you have plans for a chi-squared video in the future? I'd love to see what you cook up!
@very-normal
@very-normal 5 ай бұрын
Yeahhh, I’ve been dabbling with the more advanced function estimators and it’s tough. I’m glad I don’t have to be the one to implement those models lol And yeah! My goal with this series is to try to cover most introductory topics in statistics and see how I feel from there, so tests involving chi-squared statistics are definitely in that scope. Thanks again for your continued viewership!
@luizhenriqueamaralcosta629
@luizhenriqueamaralcosta629 5 ай бұрын
Amazing video, but i would like to see some of the calculations
@very-normal
@very-normal 5 ай бұрын
What kind of calculation did you have in mind?
@Unaimend
@Unaimend 5 ай бұрын
8:13 But usually I don't know the pop. mean and sd, so I guess in the real word one would use the sample mean/sd? Nvm. explained like 10 seconds later
@Unaimend
@Unaimend 5 ай бұрын
At 7:53, why do you divide sigma^2 by n? This probably has sth. to do with the CLT. EDIT: Its discussed around minute 9 in the video on the normal distribution
@very-normal
@very-normal 5 ай бұрын
Right! The CLT implies that the variance of the sample mean is the population variance, divided by the sample size. In more technical terms, it’s the variance of the test statistic, not the original data
@Unaimend
@Unaimend 5 ай бұрын
​@@very-normal I am not sure I understand the last sentence regarding the variance. The CLT says that the dist. of xbar will converge to N(mu, sigma^2/n), where mu and sigma are the "true" population means of X_i. So do you mean the variance of the distribution of the X_is by "variance of test statistic". And thank you for taking the time to answer to all my questions.
@very-normal
@very-normal 5 ай бұрын
Ah sorry for the confusion. In this context, the test statistic itself is x-bar, the sample mean. As you mentioned, sigma-squared divided by n is the variance of the distribution of x-bar (aka the test statistic). The variance of the individual X_i’s is actually the population variance, since they are the data. Since x-bar is calculated from the data, it itself is a random variable and therefore also has a distribution. And thankfully, CLT tells us what this distribution is. The parameters of this distributed are related to the population parameters, but are not always equal to them (as seen in the variances). I hope this helps clarify! This is a sticking point for even many of the graduate students I work with
@Unaimend
@Unaimend 5 ай бұрын
@@very-normal I think I got it. Thanks again :). I guess I will find out when if I really understood it when we discuss the two sample t-test
Explaining The Two-Sample t-Test
8:36
Very Normal
Рет қаралды 3,2 М.
Understand The Normal Distribution and Central Limit Theorem
14:40
100😭🎉 #thankyou
00:28
はじめしゃちょー(hajime)
Рет қаралды 27 МЛН
1🥺🎉 #thankyou
00:29
はじめしゃちょー(hajime)
Рет қаралды 75 МЛН
Pokey pokey 🤣🥰❤️ #demariki
00:26
Demariki
Рет қаралды 6 МЛН
Be kind🤝
00:22
ISSEI / いっせい
Рет қаралды 19 МЛН
Explaining Probability Distributions
12:54
Very Normal
Рет қаралды 11 М.
T-test, ANOVA and Chi Squared test made easy.
15:07
Global Health with Greg Martin
Рет қаралды 264 М.
Monte Carlo and Bootstrap Methods Introduction
27:07
Fourth Z
Рет қаралды 1,7 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 226 М.
Statistical Inception: The Bootstrap (#SoME3)
13:50
Very Normal
Рет қаралды 27 М.
The better way to do statistics
17:25
Very Normal
Рет қаралды 153 М.
The most important skill in statistics
13:35
Very Normal
Рет қаралды 284 М.
What do statisticians research?
17:26
Very Normal
Рет қаралды 14 М.
How To Know Which Statistical Test To Use For Hypothesis Testing
19:54
Amour Learning
Рет қаралды 724 М.
100😭🎉 #thankyou
00:28
はじめしゃちょー(hajime)
Рет қаралды 27 МЛН