p-hacking and power calculations

  Рет қаралды 45,347

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 74
@statquest
@statquest 2 жыл бұрын
Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@lilmoesk899
@lilmoesk899 7 жыл бұрын
Thanks so much for your videos. I'm not sure why, but "power" has been one of those concepts in statistics that I haven't ever really been able to get my head around. This video definitely helps!
@esperanzazagal7241
@esperanzazagal7241 4 жыл бұрын
I want to give you a hug. Thank you for explaining this.
@statquest
@statquest 4 жыл бұрын
Hooray! Thank you very much! NOTE: I have an updated version of this video here: kzbin.info/www/bejne/fnWmgIiOeph7g68
@ricardoveiga007
@ricardoveiga007 7 ай бұрын
Highly educational and entertaining! Thank :)
@statquest
@statquest 7 ай бұрын
Thanks!
@jackyhuang6034
@jackyhuang6034 4 жыл бұрын
Josh, can you do a video on effect size? Thanks
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind.
@mariadelujancalcagno
@mariadelujancalcagno 7 жыл бұрын
You are a genius to explain the things! Thank you!
@LiquidLithe
@LiquidLithe 6 жыл бұрын
Excellent video! I’m attempting to replicate the first part (with the increased false positives) and am having a bit of trouble. Basically, I’m taking n number of samples of an x and a y from the same random Gaussian (with mean of 0 and variance of 1), and storing those in lists to access later. I’m then t-testing those n number of samples, storing all indices with p-values of 0.05-.1 and then using those indices to go back and add one random value (sampled from the same Gaussian) to both the original x and the original y. Is that the right process? I’m finding the proportion of false positives increases, but not by nearly as much as 30%!
@statquest
@statquest 6 жыл бұрын
Thanks for double checking my numbers! To be honest, I think that on average the number is closer to being between 7% and 15%. I just re-ran my original R code and got something close to 30%, but with different "seed" values, I see that that values isn't very common. Here's the R code: set.seed(326) mu
@LiquidLithe
@LiquidLithe 6 жыл бұрын
Thanks for the code! I see now where my simulation differed from yours (I was doing it in Python). Cheers and keep up the beautiful work!
@alexhaowenwong6122
@alexhaowenwong6122 Жыл бұрын
Seems power is akin to resolution on a camera lense--being able to focus well enough to see that two small or distant objects that appear to be one are in fact distinct objects.
@statquest
@statquest Жыл бұрын
I like that analogy. Very cool.
@brianp9657
@brianp9657 6 жыл бұрын
Thanks again for all the videos! I had a question about effect size. If Effect Size = (mean of experimental group - mean of control) / standard deviation, is there a way to quantify "large" effect sizes in this way vs. "smaller" - or is there some other way to quantify effect size?
@statquest
@statquest 6 жыл бұрын
It really depends on what you are studying. In biology, a 2-fold change is considered "large", in other fields, 2-fold is small. So you need to know the context of the data that you are working with.
@safecomiguel
@safecomiguel 6 жыл бұрын
Josh, these are amazing! Thank you! Idea for future Stat Quest: How to build a predictive model for a rare event. I have seem different things such as over-sampling the minority class, under-sampling the majority class, incorporating the prior probabilities to adjust y intercept due to the over or under sampling, and others such as SMOTE, gradient boosting, and other modeling techniques. Heck, just an explanation on why the y intercept needs to be adjusted when over or under sampling would be great video.
@tanzine91
@tanzine91 5 жыл бұрын
Hi Josh, I need some understanding on "(1) Use your results as preliminary data and do a proper power calc" and may be also on (2). What I understood from your vid was, we can say a t-test with sample size=N at 5% alpha works as intended if, when we perform many t-test say 1000 times on bootstrapped sample from the pop, 5% of the time there will be p
@statquest
@statquest 5 жыл бұрын
I believe you are correct. If you are expecting a difference but do not detect one (the p-value is not small enough) use the current data as preliminary data and use it for a power analysis. This will determine the proper sample size, 'N'. Now re-do the experiment collecting "N" samples and see what you get.
@TheSpades21
@TheSpades21 4 жыл бұрын
Copy and pasted from wikipedia on Power of a Test: "For a type II error probability of β, the corresponding statistical power is 1 − β. For example, if experiment E has a statistical power of 0.7, and experiment F has a statistical power of 0.95, then there is a stronger probability that experiment E had a type II error than experiment F. This reduces experiment E's sensitivity to detect significant effects. However, experiment E is consequently more reliable than experiment F due to its lower probability of a type I error." Is this not saying, that the test with the lower power inherently has a lower probability of having a type I error and thus a lower p-value? Isn't this also saying that power= the probability that you will get a high p-value (the opposite of what Josh says in this video)? Please help me understand how I am wrong
@statquest
@statquest 4 жыл бұрын
No, that's not saying that low power = low p-value. To see how alpha and beta are related, see: www.theanalysisfactor.com/confusing-statistical-terms-1-alpha-and-beta/#:~:text=%CE%B1%20(Alpha)%20is%20the%20probability,1%20%E2%80%93%20%CE%B2%20is%20power).
@TheSpades21
@TheSpades21 4 жыл бұрын
@@statquest Thank you for this great resource! I still do not understand, however how my previous assertion that high power = high alpha = high p-value is wrong. I will go through my reasoning and could you please tell me where my reasoning is faulty? Regarding the hypothesis testing graph (in the resource you sent me), as you decrease the threshold in rejecting the null hypothesis(Ho), you are simultaneously decreasing beta but increasing alpha. So, doesn’t increasing power by decreasing beta inherently increase alpha? Also since alpha is related to p-value, doesn’t increasing alpha also increase p-value? So, due to transitive property, doesn’t increasing power = increasing p-value? Thank for your help!
@statquest
@statquest 4 жыл бұрын
@@TheSpades21 Alpha is simply the threshold that we use to decide if our p-value is small enough to reject the null hypothesis. In other words, p-values are calculated independently of alpha. So whatever you set alpha to has no effect on the p-value. If I get some data and do a t-test and calculate a p-value = 0.00001, I get that value regardless of whether alpha = 0.05, 0.01 or 0.9.
@TheSpades21
@TheSpades21 4 жыл бұрын
@@statquest Got it, so alpha is a threshold set by the researchers. So, researchers are able to increase power by 2 methods: 1) By increasing the alpha threshold or 2) increasing sample size, decreasing variation, or increasing effect size. This leads me to a follow up question, are there any ways to increase the probability of correctly accepting the null hypothesis aka (1-alpha) and does this affect the p-value?
@statquest
@statquest 4 жыл бұрын
@@TheSpades21 If the null is true, then you can decrease alpha.
@himals1277
@himals1277 6 жыл бұрын
what do you mean that you added "bogus data"?where did you get it from??
@statquest
@statquest 6 жыл бұрын
The "totally bogus" data are the samples that are all from the same type of mouse. In the example, we have an inbred mouse strain and I weighted 3 of those mice and called them "sample #1" and then I weighted another 3 mice and called them "sample #2". The t-test tests if both samples come from the same distribution (or, in this case, the same mouse strain). If I get a small p-value, then the t-test things the two samples came from 2 different distributions. So all of the data in this example comes from the same distribution (the weights of mice that are all the same type). I call this "bogus data" because we already know the what the t-test should do, and the t-test should conclude "there is no difference between the two samples". The trick is that even when we know the answer, and the answer is "there is no difference between the two samples", the t-test can fail and conclude the opposite. The example here shows how the t-test can be forced to fail if you just keep adding additional measurements to datasets that give you "almost significant" p-values. The data is "bogus" because I know that that the mice are all from the same strain. Does this make sense?
@himals1277
@himals1277 6 жыл бұрын
yes a lot more clear now.Thanks!
@tanzine91
@tanzine91 5 жыл бұрын
@@statquest Hi, even when you add bogus data to 0.05 < p < 0.1 datasets, even when they all become "significant", the total % of significant p values are around 10% (sum of first and second bar frequency divided by 1000). How did you get 30%? Thanks.
@jorgemedina8377
@jorgemedina8377 5 жыл бұрын
Thank you for the video. I had a question: How does one determine the power of a test relative to some other test? If we are deciding what test to use, how can we determine which test is the most powerful one?
@ah2522
@ah2522 4 жыл бұрын
The power (aptly named) is basically your ability to reject the null hypothesis when you SHOULD reject the null hypothesis. Frame your tests under such context would answer your question, regardless of what test you're using.
@miguel.gargicevich
@miguel.gargicevich 4 жыл бұрын
on 3.06 where you show all the p-values of the different sets and you mention the ocassional"false positive. Don't you want to have p-values smaller than 0,05?
@statquest
@statquest 4 жыл бұрын
We only p-values less than 0.05 if there is a true difference. If there is no difference, then a p-value < 0.05 will be a false positive. I have a new video coming out soon that will illustrate this concept better.
@miguel.gargicevich
@miguel.gargicevich 4 жыл бұрын
@@statquest If there is true difference with the different sets? Great looking forward to the video.
@statquest
@statquest 4 жыл бұрын
@@miguel.gargicevich If there is a true difference, then we want a small p-value.
@miguel.gargicevich
@miguel.gargicevich 4 жыл бұрын
@@statquest I see now !! thank you
@chauphamminh1121
@chauphamminh1121 6 жыл бұрын
great video ! Thanks so much for the tutorial. So understandable. But I still have a problem with the last part. I wonder if increasing the sample size helps to distinguish the 2 distributions, so how big the sample size should be ? Let's re-use your example with diet mice and non-diet ones. First, I draw 2 sample sets (weight some mice from each set), calculate the SD and write down the size N for each the sample. With the formula SE = SD/ sqrt(N), I can estimate SE for each the sample. So what's next ? I do not sure how we can choose an appropriate N, because intuitively, just rise the size to get better result, is it right ? Hoping you can help me out. Thanks in advance ! Btw, keep up the good work.
@statquest
@statquest 6 жыл бұрын
Essentially, once you know the approximate location of the means and the approximate standard deviations, you can increase "N" (the sample size) to see what effect that has on the standard error (you can see how much it shrinks) and then know when you have N large enough to do the experiment. However, I don't recommend doing these calculations by hand. Instead, I would plug the means and the standard deviations into a program that calculates power, and then increase N until there is a good chance that my data will get a small p-value if there is no difference between the two samples. Does that make sense?
@chauphamminh1121
@chauphamminh1121 6 жыл бұрын
@@statquest Thank you for the reply. I got the idea, but the part "increase N until there is a good chance that the data will get a small p-value" still makes me confused. Is that if I keep increasing the size, the result would be more accurate ? How we can point out N by seeing the changing of SE, I mean how can we detect a "good chance" to pick N, or just by hunch ? Thanks !
@statquest
@statquest 6 жыл бұрын
@@chauphamminh1121 Yes, the more you increase the sample size, the more precise your estimate for the mean will be. When your sample size is small, then your estimate for the means is not very precise, and you can not be confident that one mean is different from another. However, when the sample size is large, then your estimate can be very precise. With a precise estimate for the means, you can have more confidence that they are different. So, you use a program to determine how much confidence you can have that the two means are different for different values of N. You then pick the value for N that will give you enough confidence that the means are different. Does that make sense? The "confidence" is the p-value. The program will tell you what p-values you will get for different values for N. Choose a value for N that results in a small p-value (usually you want the p-value to be less than 0.05). Does that make sense?
@chauphamminh1121
@chauphamminh1121 6 жыл бұрын
@@statquest Thanks a lot. It makes sense now. Thankss !!
@statquest
@statquest 6 жыл бұрын
@@chauphamminh1121 Hooray! :)
@beckwilde
@beckwilde 6 жыл бұрын
great explanation
@luisluiscunha
@luisluiscunha 6 жыл бұрын
Chorus... Stat...Quest...Stat...Quest ahahahahah Thanks man!
@statquest
@statquest 6 жыл бұрын
You're binging! Awesome!!! :)
@SoupCannot
@SoupCannot 4 жыл бұрын
I think this video has the potential to be confuse people about the concept of effect size. In some contexts, effect size can just be the difference between means, as it is presented here. In my field, it's much more common to talk about effect size as Cohen's d, which takes into account the variation in the data by having the standard deviation in the denominator; so "effect size" and "variation in the data" aren't necessarily two separate concepts, depending on how effect size is defined.
@statquest
@statquest 4 жыл бұрын
What's your field?
@SoupCannot
@SoupCannot 4 жыл бұрын
@@statquest Neuro-imaging. Overall, love your videos, by the way. Take home message at the end is great for scientific integrity.
@statquest
@statquest 4 жыл бұрын
@@SoupCannot Very cool! I'll keep that in mind. I'm working on a new "p-hacking" video that I hope comes out in the next month.
@ltoco4415
@ltoco4415 4 жыл бұрын
@@statquest Eagerly waiting for the new video on p-hacking. Will that cover power as well?
@statquest
@statquest 4 жыл бұрын
@@ltoco4415 I've split this content into 2 videos so I can cover a little more ground (and hopefully do it in a way that makes more sense).
@coolsonic8982
@coolsonic8982 6 жыл бұрын
Great video
@statquest
@statquest 6 жыл бұрын
Thank you! :)
@pulutogo8266
@pulutogo8266 4 жыл бұрын
how did you calculate the 30% ?
@statquest
@statquest 4 жыл бұрын
What time point, minutes and seconds, are you referring to?
@pulutogo8266
@pulutogo8266 4 жыл бұрын
@@statquest At around 4 minutes, you said when adding a new value with a p-value between 0.5 to 1.0. The following new t-test of the data will be more possible to be significant with a rate of 30% higher than the old data. Is it a simulated result? if not, could you please explain how did you get this value? Thank you so much for all the videos, you are really a great teacher.
@statquest
@statquest 4 жыл бұрын
@@pulutogo8266 At 2:41 I generated 1,000 of datasets from a normal distribution and used them to do 1,000 t-tests and create 1,000 p-values. Of those 1,000, 53 were < 0.05. Then, for all of the t-tests that gave me a p-value between 0.05 and 0.1, I added one more random value to each sample and re-did the t-test. 30% of the new t-tests gave me p-values < 0.05. Does that make sense?
@pulutogo8266
@pulutogo8266 4 жыл бұрын
@@statquest Yes, it makes sense, thank you for answering the question. Have a nice day.
@MDMAx
@MDMAx 2 жыл бұрын
11:21 "to the one true mean" :D
@statquest
@statquest 2 жыл бұрын
:)
@MDMAx
@MDMAx 2 жыл бұрын
Lol, just do more experiments. Very informative and descriptive lesson. Ty!
@statquest
@statquest 2 жыл бұрын
Thanks!
@abhaydadhwal1521
@abhaydadhwal1521 6 жыл бұрын
what is a good p-value Josh...?
@statquest
@statquest 6 жыл бұрын
It depends on a lot of things. People generally think anything less than 0.05 is fine. However, for me, I would be cautious with a p-value that is very close to 0.05 (like 0.04). But it largely depends on the field you are working in.
@stoic-999
@stoic-999 3 жыл бұрын
Hi guys, has anyone of you gone through the probability moocs by NN Taleb where he critiques p-value and presents a very good content on p-value hacking ?
@statquest
@statquest 3 жыл бұрын
Not me. However, I've since updated this video with a bunch of new one's. You can find a complete list of my videos here: statquest.org/video-index/
@abhaydadhwal1521
@abhaydadhwal1521 6 жыл бұрын
whats a t-test that u're saying again and again Josh...?
@statquest
@statquest 6 жыл бұрын
A t-test is a statistical test that is used to determine if two samples come from the same overall population or from two different populations. For example, you might have a weight loss drug and want to compare people who took the drug to people who didn't. You could use a t-test to determine if there is a fundamental difference between the population that took the drug and those that didn't.
@chrismalone8470
@chrismalone8470 4 жыл бұрын
you forgot to explain p hacking
@statquest
@statquest 4 жыл бұрын
I've actually updated this video. See: kzbin.info/www/bejne/fnWmgIiOeph7g68
@임선생-m9s
@임선생-m9s 2 жыл бұрын
Ugh!
@statquest
@statquest 2 жыл бұрын
:)
StatQuest:  One or Two Tailed P-Values
7:06
StatQuest with Josh Starmer
Рет қаралды 57 М.
Power Analysis, Clearly Explained!!!
16:45
StatQuest with Josh Starmer
Рет қаралды 328 М.
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 16 МЛН
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
To Brawl AND BEYOND!
00:51
Brawl Stars
Рет қаралды 17 МЛН
False Discovery Rates, FDR, clearly explained
18:27
StatQuest with Josh Starmer
Рет қаралды 223 М.
P-values Broke Scientific Statistics-Can We Fix Them?
10:40
SciShow
Рет қаралды 410 М.
Population and Estimated Parameters, Clearly Explained!!!
14:31
StatQuest with Josh Starmer
Рет қаралды 370 М.
p-hacking: What it is and how to avoid it!
13:45
StatQuest with Josh Starmer
Рет қаралды 146 М.
The Randomness of Correlation and its Hacking by Bigdataists
14:50
N N Taleb's Probability Moocs
Рет қаралды 26 М.
The probability distribution of p values
9:41
N N Taleb's Probability Moocs
Рет қаралды 28 М.
Covariance, Clearly Explained!!!
22:23
StatQuest with Josh Starmer
Рет қаралды 575 М.
In Statistics, Probability is not Likelihood.
5:01
StatQuest with Josh Starmer
Рет қаралды 1,2 МЛН
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 16 МЛН