Why the p-Value fell from Grace: A Deep Dive into Statistical Significance

  Рет қаралды 136,131

DATAtab

DATAtab

Күн бұрын

Пікірлер: 412
@datatab
@datatab Ай бұрын
If you like, please find our e-Book here: datatab.net/statistics-book
@airman122469
@airman122469 2 ай бұрын
Why the p-value fell from grace: “because people abused the crap out of it to state flatly false shit”
@anthonyandrade5851
@anthonyandrade5851 2 ай бұрын
Well put!! I wrote "p-value is useless, because 1) people use it as a bad answer to the wrong question; 2) people thinks it's magic; and 3) people are unethical". But forget about it, you've nailed that!
@vecernicek2
@vecernicek2 2 ай бұрын
Pretty much. I don't like this way of thinking - people don't understand p-values and how to interpret them, therefore we must do away with p-values. No, just freaking learn to use your brain and how to interpret data!
@graciasthanks4771
@graciasthanks4771 2 ай бұрын
You may be right. My personal opinion however is that the main problem is improper use by unethical or incompetent people (or both). Proper use of p vale is highly valuable (pun intended) for certain problems. Thus, the best solution IMHO is enforcement of proper use (which unfortunately journals don’t seem to care for) combined with evaluation of alternate and complementary analyses when practical.
@brentmabey3181
@brentmabey3181 2 ай бұрын
Goodhart's Law (originally an economics thing but I think it applies to the magic p
@DaveE99
@DaveE99 2 ай бұрын
⁠@@brentmabey3181 kinda like how any stable biological or stable psycho social aspect of life will be used to maintain and advance the system of contained opposition within larger counter insurgency tactics for a system to maintain power. It’s like a sort of 4 dimensional chess figuring this out as reasons for things can exist on different levels but the higher levels always build on top of what already existed and simply amplify it
@larsbitsch-larsen6988
@larsbitsch-larsen6988 2 ай бұрын
The problem is not the p-value, it is understanding that statistics is only a tool based on math, not a medical conclusion. We assign numbers to the project and then assume that the calculation is the answer. But the calculation is only the answer to the numbers we have assigned, and that is not necessarily the medical question. Assigning numbers is based on assumptions, sometimes good assumptions other times our assumptions are completely wrong, which means basically garbage in garbage out.
@martian8987
@martian8987 Ай бұрын
not to mention that people were using statistical tests incorrectly, without understanding their data and just doing a one tailed p-test or whatever; there are criteria!
@mytech6779
@mytech6779 Ай бұрын
The art of good judgment is a necessary skill.
@RichiefromPhilly
@RichiefromPhilly 2 ай бұрын
I’m a PhD and I’ve worked in the pharmaceutical industry for 25 years. It never ceases to amaze me that physicians lack a basic understanding of statistics.
@leolucas1980
@leolucas1980 2 ай бұрын
Physicists also usually don't know much about inferential statistics. They just look at how well a theroetical curve fits the experimental data.
@KarlFredrik
@KarlFredrik 2 ай бұрын
Even worse in social sciences. But then their expertise isn't in mathematics. Think it's good to scrap it since it's pretty much used to give a veneer of being scientific while it's often really not.
@fgg3841
@fgg3841 2 ай бұрын
​@@KarlFredrik I was about to say the same. It's painfully obvious when reading published studies from the social sciences that most social science researchers don't understand. You'd think they'd swallow their pride and ask someone from the statistics department for advice. But no, they're the "experts." Makes me question what else the "experts" are blind to, but don't question their expertise or you're "anti-science"
@tj7935
@tj7935 2 ай бұрын
Microbiologist here. Same! I'm blown away by how often I ask someone why they used the statistical test that they used and their answer is "it's what my advisor wanted" or "it's what I've used historically". That's not how your suppose to chose your statistical test
@TheThreatenedSwan
@TheThreatenedSwan 2 ай бұрын
Most academics don't understand basic statistics.
@ComMed101
@ComMed101 2 ай бұрын
Fantastic video. Your clarity of statistical concepts is only bested by your rare capability of simplifying complicated concepts into lucid chunk-sized explanations. Thank you for your amazing work!
@datatab
@datatab 2 ай бұрын
Hi! Thank you very much for the nice comment! Yes, we make a lot of effort to explain everything as simply as possible, but it's also a hell of a lot of work.
@silver6054
@silver6054 2 ай бұрын
I think some of the bias against p-value, especially perhaps in the social sciences, has come from the prevalence of p-hacking. This, as you mentioned, is basically trolling through the data, finding something with a "good" p-value, and then testing the appropriate null-hypothesis (which, by construction, you can then reject!). While this is, as you said, a bad scientific approach, I assume the most honest among them think it's ok because "there really is something significant there, see!" without realizing that it breaks the underlying model (5% of p=0.05 hypotheses can be due to random chance when there is no real effect). Your solution to some of the ills, replication studies, is a problem in some areas, as in academia there is often little reward for repeating a study, and if it does find the same result as the original, it may be hard to publish ("We already knew that"). Perhaps for larger scale things, like drug treatments, this is less of an issue, but even there a company is more likely to test their "me too" drug to show its effectiveness rather than repeat studies of older treatments (in case they turn out to show the old is perfectly effective!).
@datatab
@datatab 2 ай бұрын
Thank you very much for your detailed feedback! I completely agree with you! Difficult topic, but the p value itself is probably not to blame for that. We will try to adress some of the solutions in a following video, but part of it is also replication. Thanks and Regards Hannah
@AndrewBlucher
@AndrewBlucher 2 ай бұрын
The failure of many "sciences" to replicate studies is a basic one, calling into question any claim to be science at all. In my view the root cause is the whole science journal system, where publishing houses publish for profit work that has already paid for and done by academics. They don't publish confirmation reports, nor do they publish reports where "we tested this idea but it didn't work". It's not just social sciences, any field with science in its name is also suspect. Computer Science, for example. As rife with fraud as any "soft" science.
@drblukerjr1953
@drblukerjr1953 2 ай бұрын
The other issue is this: With Big Data and its huge number of observations, many many relationships can be seen as statistically significant-not attributable to random chance-when those relationships are merely artifacts of sample (or population) size. The issue of p-value harvesting is also in play here, but no one with any professional ethics and statistical integrity would accept such findings outside of a rigorous multi variate analysis that reveals the magnitude, i.e., strength, of the presumed or indicated relationship, by using a simple regression coefficient.
@kodowdus
@kodowdus 2 ай бұрын
The solution to multiple comparisons was laid out back in 1991 in the fledgling journal Epidemiology in an editorial by the prominent methodologist Charles Poole entitled "Multiple comparisons? No problem!".
@fgg3841
@fgg3841 2 ай бұрын
​​@@kodowdus I'm fairly certain that there were known solutions to the problem of multiple comparisons decades before 1991. This the problem of academia being so insular
@jonathanlivengood767
@jonathanlivengood767 2 ай бұрын
Another reason I've seen for abandoning the p-value (especially from the Bayesian crowd) is that p-values don't answer typical research questions. Critics say that researchers (implicitly) want to know the probability of their hypothesis given their data. That is, they want the posterior probability of their hypothesis. But p-values are not posterior probabilities. The p-value gives you the probability of data at least as extreme as those observed conditional on the hypothesis. That looks backwards relative to the goal of figuring out the probability of the hypothesis. The criticism is related to the misinterpretation problem: If a researcher wants to know the probability of their hypothesis, they may be more likely to misinterpret the p-value as a posterior probability.
@ArgumentumAdHominem
@ArgumentumAdHominem 2 ай бұрын
The problem is though the original question you state cannot be answered altogether, neither by Frequentist nor Bayesian approaches. To have posterior, one must have a prior. But there is no way in the universe of knowing a prior on a scientific hypothesis. Heck, something as established as universal law of gravity was proven wrong by Einstein's theory of relativity. The data collected over thousands of years of observation was right, the fit was excellent, but the hypothesis was still wrong. P-values are great for quick and dirty identification of potential candidates for effects in a giant pile of data. They tell you that something is off and might be interesting, but don't tell you what exactly. Bayesian methods are great at delving deeper, and infusing prior knowledge about the function of the world into a causal model. P-values are thus better for exploratory analysis, and Bayesian for confirmatory, IMHO, as our ultimate goal is to get to a causal model of how something works, not whether it works. But expecting poor biologists to do Bayesian modeling for every experiment is excessive, as there is a significant overhead in complexity.
@mtaur4113
@mtaur4113 2 ай бұрын
​@@ArgumentumAdHominem This. The most vocal Bayesian supremacists always strike me as way overconfident about having solved an immortal problem that will always be with us. The p-value answers a specific hypothetical question no one was really interested in. Bayes gives a fake answer to an ill-formed question we *only wish* we could ask and answer. Choose your poison. Heck, a lot of the time, if your prior probability is assigned 0.5, Bayes and the p-value are *identical*. The interpretation is heuristic at best, with the double-edged convenience and danger that comes with it.
@DaveE99
@DaveE99 2 ай бұрын
@@ArgumentumAdHominem what sort of bastion overhead are you talking about?
@ArgumentumAdHominem
@ArgumentumAdHominem 2 ай бұрын
@@DaveE99 Doing a t-test requires somewhat less skill and time than designing a DAG model, fitting it and interpreting results.
@littlepigism
@littlepigism 2 ай бұрын
P-value does not answer a scientific question is the most "scientific guatekeeping" statment I have read in a long time
@marksegall9766
@marksegall9766 2 ай бұрын
Good video. Some additional background. The False Positives (Alpha or Type I error) and False Negatives (Beta or Type II error) must always be kept in mind while doing hypothesis testing. No matter which decision we make there is always a chance we make one of these error. The 0.05 is the alpha value. Alpha is the false positive rate, the probability that the sample statistic is in the most extreme 5% of the distribution by chance. We must always decide on an alpha BEFORE running the experiment, otherwise we run the risk of p-hacking (picking an alpha that makes the results significant). The false positive is the 'Boy Who Cries Wolf' error; we say there is an effect when there really isn't. To avoid the false positive we can lower the alpha, say to 0.01 or 0.001. But the problem with this is that by lowering the alpha, we increase the chance of a false negative. A false negative is when we say the drug has no effect on weight loss when it does have an effect. We miss the chance of finding a useful drug to treat serious diseases. Replication is the part of the modern scientific method that helps decrease the probability of making both false positive and false negatives. After reading the comments: a meta analysis is a systematic summary of all the replications done in a field. There is a difference between 'statistical significance' and 'business significance'. Strong Inference, the design of experiments to test two competing theories, is the best way to make sure your experiments are scientific. en.wikipedia.org/wiki/Strong_inference
@datatab
@datatab 2 ай бұрын
Many many thanks for your additional comments! Regards Hannah
@custos3249
@custos3249 2 ай бұрын
It's ok. You can call out psychology by name. No need to throw extra words at it.
@02052645
@02052645 2 ай бұрын
I generally align with Bayesian statistics. My feeling about p-values is that they should not be used to draw conclusions in hypothesis tests, but they are useful in getting a gut feel for the implausibility of data given some hypothesis. I find them useful in evaluating my priors: "My intuition tells me that a p-Value of 1% corresponds to a posterior probability of 75% for the alternative hypothesis so I should select a prior for which this is the case." They're also useful as a quick gut check: if your p-Value is so small you need to use scientific notation to express it you know ahead of time more or less what a full Bayesian analysis will conclude.
@S.Qamar113
@S.Qamar113 2 ай бұрын
I really love your clear and straightforward explanations with examples. This is the first time I've truly understood the p-value. Kudos to you!
@datatab
@datatab 2 ай бұрын
Glad it was helpful! Many thanks for your nice feedback! Regards Hannah
@holmavik6756
@holmavik6756 2 ай бұрын
P-values remain highly relevant but the ”magical” critical values 0.10, 0.05 and 0.01 should only be used in very specific cases, and not as some kind of universal measures of what is ”relevant” or not
@ihorb7346
@ihorb7346 2 ай бұрын
So, if there is not decision threshold, what p-value is relevant for?
@holmavik6756
@holmavik6756 2 ай бұрын
@@ihorb7346 to quantify the extent to which the data agree with the null hypothesis
@mattzobian
@mattzobian 2 ай бұрын
P-vals are based on a lot of assumptions. Abusing, misusing, or ignoring these assumptions are where we most often trip up and potentially abuse the concept.
@rogerwitte
@rogerwitte 2 ай бұрын
Thanks for this. I think there is at least one additional thing wrong with the way we treat P values - namely we think 1 in 20 is a small probability. This is, in part, that cognitively we cannot intuit probabilities that are close to 0 or 1. there is a large difference between probabilities of 1 in a hundred and 1 in a million, but we just think both probabilities are small.
@toddcoolbaugh9978
@toddcoolbaugh9978 2 ай бұрын
I just commented along a similar line. In the physical sciences it's not unheard of to achieve p values of .01 or lower. So I get very suspicious of research using "physical science" tools, e.g. GC-MS showing p values sometimes larger than 0.2, a 1 in 5 chance of a false rejection of the null hypothesis and claiming "strong" evidence for their alternative hypothesis.
@davidlean8674
@davidlean8674 2 ай бұрын
1. Nice Summary of p-value. Good for people learning stats. 2. "If the facts don't conform to the theory they must be disposed of"
@HughCStevenson1
@HughCStevenson1 2 ай бұрын
Coming from an engineering background I've often found 0.05 to be quite a high probability! Certainly just an indication...
@graham5250
@graham5250 2 ай бұрын
Thank you! A comment on the 0.05 threshold pointed out by @HughCStevenson1 - 50 years ago, I was taught that it developed, along with much of modern statistical methodology, from the study of agricultural fertilizers; a farmer would accept one failure in their time in charge of the farm, typically 20 years, hence 1/20 chance of the observed benefit being by chance. With drugs, it strikes me that using a higher value for beneficial effects (particularly of a low cost treatment) would make sense, BUT a lower value for harmful effects. Would statins have been more or less widely prescribed if this had been done?
@gcewing
@gcewing Ай бұрын
Yeah... Would you trust a bridge if the engineer that designed it said "It'll be fine, it only has a 5% chance of falling down."
@therealjezzyc6209
@therealjezzyc6209 Ай бұрын
but 0.05 sounds small, until you realize it is literally just 1/20. That is actually a crazy high number at massive scales
@guilhermepimenta_prodabel
@guilhermepimenta_prodabel 2 ай бұрын
There are two main considerations when using the p-test. First, the samples must follow a normal distribution, and second, the samples should have equal variances. To address these, start by performing a normality test, such as the Shapiro-Wilk test. Next, conduct a test for homogeneity of variances, like the F-test. If both conditions are met, you can proceed with the t-test. However, in my experience, many samples fail the normality test, necessitating the use of a non-parametric test. Non-parametric tests are generally more robust. Even if all three tests are passed, there's still a risk of being misled due to multiple testing. To mitigate this, it's important to adjust the p-value threshold each time you conduct an additional test. The problem is not in the test itself, but the need of more robust scientific methodology.
@johncampbell388
@johncampbell388 Ай бұрын
I think you meant to say t-test, not p-test, right?
@necaro
@necaro 2 ай бұрын
This is pure gold!! The clarity of the explanation is outstanding! Congratulations Prof.
@m.zillch3841
@m.zillch3841 Ай бұрын
P
@dr.gordontaub1702
@dr.gordontaub1702 2 ай бұрын
Wonderful video. Very well explained. My 'Engineering Statistics' course starts in a couple of weeks (my ninth year teaching this class) and I will post a link to this video on the course canvas site when we get to hypothesis testing as an excellent explanation of the meaning of the p-value and its strength and weaknesses.
@key_coffee
@key_coffee 2 ай бұрын
Thank you for another engaging and powerful video. When I first read "p-value is not scientific", I was pre-empting discussion of how significance levels themselves are arbitrary and seemingly without rationale. How did we come to decide that a significant event can be observed by chance once in only 20 random samples? (when considering p < 0.05 as significant; ditto for 10% or 1% significance levels). This is grounded in the central limit theorem, but whether expressed in terms of 5% or 1.96SD, the thresholds seem more convenient than scientific. Nevertheless, these standards are important for the universal interpretation and continuation of research and I'm glad for it, though directly interpreting the p-value as a probability may help to meaningfully discuss confidence in a result regardless of where it sits relative to the significance level. Looking forward to the next!
@MAamir-m2c
@MAamir-m2c 2 ай бұрын
Fantastic insights. Great job done. With these debates, we expand beyond simple interpretations.
@marcydoyle9279
@marcydoyle9279 2 ай бұрын
Excellent explanation, we were always taught that a p-value was only used for an initial study for supporting, or not supporting further studies because it is not accurate enough on it’s own to come to any satisfactory conclusion.
@datatab
@datatab 2 ай бұрын
Thank you! That's a great point. A p-value should indeed be seen as just one piece of the puzzle in research. Many thanks for your feedback! Regards Hannah
@ArgumentumAdHominem
@ArgumentumAdHominem 2 ай бұрын
This depends on the aim of the study. If the question is: whether something has an effect, then p-values are great. If the question is: why something has an effect, one needs more powerful machinery.
@berdie_a
@berdie_a 2 ай бұрын
Just wanna share my thoughts on this: #1 Rejecting the null hypothesis means that the alternative is probably true is not an inaccurate description. Hypotheses are formulated such that they partition the parameter space. This means that once you reject H0, the only option left would be Ha. A more accurate description about the trueness of the alternative hypothesis would be to also consider the power of the test involved. So though the p-value alone does not paint a full picture, it’s not entirely wrong to say that the data is in support of the alternative. #3 Variability is also accounted for in these tests. Highly variable data will yield less significant results since the sampling distribution (and in turn test stat distribution) they’ll produce would be more flat, resulting in a much smaller critical region (or even wider acceptance region). Sample size is also accounted in these statistical tests. The standard error of an estimator is a function of the sample size n. If n is small then the s.e is large, yielding more nonsignificant tests stat values. I guess the issue is p-hacking, and due diligence or lack thereof.
@georgehugh3455
@georgehugh3455 2 ай бұрын
I'm glad you put out #1 so well. #3 is true also. but it depends on what these naysayers are proposing to actually value after they trash out good ol' p
@gcewing
@gcewing Ай бұрын
One needs to be careful about what Ha actually is, though. The Ha says that the observed difference was not due to chance. It does NOT say that the difference was caused by the drug you're testing!
@berdie_a
@berdie_a Ай бұрын
​@@gcewing Mathematically, the Ha is simply a partition of the the parameter space. As for "The Ha says that the observed difference was not due to chance. It does NOT say that the difference was caused by the drug you're testing." You are right, the rejection of H0 means that the observed difference is unlikely under H0. Mathematically, none of these tests concludes causation. This is the purpose of literature. Statistical tests and experimental designs are simply powerful tools that help verify this.
@hedleypanama
@hedleypanama 2 ай бұрын
#HoldIt Sir Austin Bradford-Hill stated several arguments in favor of causality. Among them, eight have a refutation each. The only argument can be considered as "criterion", because it lacks refutation: consistency! Consistency means that several or more of the studies point in the same direction! This goes in the same line of the main argument of the refutation of the p values. It was written in the 60s'
@darkwingscooter9637
@darkwingscooter9637 2 ай бұрын
p-values have been critiqued as unscientific since inception. It's amazing how it came to be used as a modern day oracle.
@KarlaKloppstock
@KarlaKloppstock 2 ай бұрын
Great video. One of the underlying issues may be that non-statisticians who depend statistical results have, understandably, a very hard time to grasp and accept what these values actually tell them. In other words, as people want to know "the truth" it is already difficult to accept the notion of likelihood, let alone its twisted sister, an approximation of such a likelihood expressed in terms of a likelihood the opposite not being true. From a practical standpoint it must be so confusing. All you want is a yes or no, maybe you still accept a "80% perhaps", but a "high confidence that there is a high probability that the opposite is not true", must be so confusing. Your video should be taught in every school, because the basic concepts are what most people struggle with.
@caty863
@caty863 2 ай бұрын
*"high confidence that there is a high probability that the opposite is not true"* That must be the most convoluted sentence I've ever seen. If this is the language that you folks have to deal with on a daily basis, then I am sorry for you guys!
@mcyte314
@mcyte314 2 ай бұрын
Thansk for the vlear explanation. This is basically the 101 of statistics, yet many researchers have no clue about this. Notably, I was NOT taught this most important insight in my biostatistics course but only by my thesis supervisor who had re Intuitive Biostatistics from Harvey Motulsky and therefore knew it.
@IvanovBR
@IvanovBR 2 ай бұрын
Thank you for putting this in such a simple and clear way. My statistics professors should have learnt from you!
@HemantNagwekar
@HemantNagwekar 2 ай бұрын
Thank you for clarifying the P-value meaning. Data makes sense when shared with a data story. Just a p-value for decision is like missing the context in an outcome.
@Felipe_2097
@Felipe_2097 2 ай бұрын
I just found your channel and I really appreciate all the information being well organized in the description. A quick note on what's the video about and references.
@teodorocriscione4399
@teodorocriscione4399 2 ай бұрын
The assumption on the distribution of a population characteristic can be also a misleading factor. For example, in social phenomena we rarely observe normal distributions. I think this is an important piece of information to add at the critiques on p-value. Especially, when we are trying to compare results from different studies.
@eduardosuela7291
@eduardosuela7291 Ай бұрын
Averages of whatever random variable would tend to be normal. Because they are additions. Variables that would come from many random factors would be lognormal. Many times, we don't focus on the underlying original value, but on a composite, like an average. Makes sense?
@justinstephenson9360
@justinstephenson9360 Ай бұрын
Great instructional video. I particularly enjoyed the explanation that a low p value merely indicates that the null hypothesis is unlikely to be true but that it does not say anything about the alternative hypothesis. I see this all the time in "independent" studies of traffic data - it does not matter whether the study is for low traffic neighbourhoods, cycling lanes, pedestrianisation or even road widening and similar to improve vehicular traffic flow - the same basic flaws are seen (do not get me started on the assumptions that the researchers use to even get to a significant p-value) - if the p-Value is low enough the study concludes not that the null hypothesis is untrue but that their alternative hypothesis must be true.
@doctorg2571
@doctorg2571 2 ай бұрын
Thank you for your brilliant explanation. The banning of the use of p-values by the psychology journal you mentioned suggests to me that (a) academics who publish in it are not competent in statistics and (b) tne journal’s peer reviewers aren’t either. I recommend the competent researchers should avoid submitting articles to such journals. As an aside, I have encountered more weakness with stats competence within the psychology discipline than any other. Psychology had better watch out. It is on the cusp of being discredited by real scientists such as me who don’t want them giving science a bad name. Dr G. PhD, MSc (Statistics) with distinction.
@moc5541
@moc5541 2 ай бұрын
Econ Journal Watch publishes peer-reviewed articles on economics. In the current online issue (free) you will find the article "Revisiting Hypothesis Testing With the Sharpe Ratio." Although the focus of the article is on a measure of financial performance, not medical treatment performance, the author references research into medical statistics and thoroughly explains the pitfalls of just relying on the p-value.
@FelipeSantos-sw4kk
@FelipeSantos-sw4kk 2 ай бұрын
That's some good class. We should teach people on how to write the null hypothesis.
@richardambler2546
@richardambler2546 2 ай бұрын
The easiest way to understand the p-value in this example, I think, is in terms of simulations: randomly assign fifty subjects to one group and the remaining fifty to the other group; calculate the difference in means between the two groups; repeat many times, and then determine the proportion of differences thus calculated that were at least as great as the actual difference encountered; that's approximately the p-value. As you say, it's not the p-value's fault that it is often abused or misunterpreted in statistics, and it remains a useful tool for identifying potential effects. Great video!
@pshehan1
@pshehan1 2 ай бұрын
Not surprised that a Journal in social psychology would ban the p-value. It is useful for those in the hard sciences. I have a PhD in nuclear magnetic resonance and most of my later work was in medical research. When in 2013 "skeptics" began proclaiming a "pause" in rising global temperature beginning in 1998 they would only trust RSS satellite data which began in1979, claiming surface data was corrupted. I would routinely point out the lack of statistical significance. The "pause" was dropped as a topic of discussion when the 1998 extreme el nino year was balanced by the 2016 el nino year. Mind you the temperatures before and after 1998 to 2016 were not statistically significant, but "skeptics" don't do statistical significance. I was told I was only trying to confuse people with those "plus minus thingies." The whole record is, and agrees with surface data within statistical significance. Ironically surface data has lower uncertainties. RSS satellite data version 4.0 (Skeptical Science temperature trend calculator) 1979-1998 Trend: 0.116 ±0.161 °C/decade (2σ) 1998-2015 Trend: 0.085 ±0.202 °C/decade (2σ) 1998-2016 Trend: 0.116 ±0.187 °C/decade (2σ) 1979-2024 Trend: 0.210 ±0.050 °C/decade (2σ) Gistemp v 4 surface data 1979-2024 Trend: 0.188 ±0.035 °C/decade (2σ) Hadcrut 4 surface data 1979-2024 Trend: 0.171 ±0.031 °C/decade (2σ) Berkeley surface data 1979- 2024 Trend: 0.192 ±0.029 °C/decade (2σ)
@jaimeduncan6167
@jaimeduncan6167 Ай бұрын
A very good explanation of the issue. It is important to emphasize that the rejection of the null hypothesis does not provide insight into the underlying mechanism. Pharmaceuticals often have secondary effects, and it is conceivable that the drug in question may have an unpleasant taste, prompting individuals to consume large quantities of water before eating. This increased water intake could result in reduced food consumption. Consequently, it may be more prudent to recommend drinking water before meals rather than advocating for a drug with unknown secondary effects.
@mytech6779
@mytech6779 2 ай бұрын
The main problem with using P-value is that the significant value cutoff is completely arbitrary. This is not a problem with P-value directly, but it is a problem with application. Why is 0.049 worthy of further investigation but 0.051 is not, or 0.01, or any other value?
@rei8820
@rei8820 Ай бұрын
How would you solve this problem?
@mytech6779
@mytech6779 Ай бұрын
@@rei8820 Just include the calculated p value and let the reader decide if it is sufficiently significant for their purposes.
@rei8820
@rei8820 Ай бұрын
@@mytech6779 It doesn't solve the problem. It just makes things even more arbitrary.
@mytech6779
@mytech6779 Ай бұрын
​@@rei8820 I am using "arbitrary" to mean not based on an intrinsic property, there is nothing more arbitrary than 0.05 because there is nothing intrinsically limiting about that value. Individuals each have their own criterian based on specific goals, you can't know what would be significant for all of those future readers at the time of publication. There is also the declining art of exercising judgement. Having the experience and knowledge to read a situation and set of results to form an opinion is critical in all fields. I have seen far too many bureaucrats and papers that try to eliminate judgment with arbitrarilly drawn hard lines and just end up with obvious misclassifications because their algorithm failed to include or properly weight some parameter, or couldn't account for complex interactions of parameters. The ability to step back look at the big picture and think, "hmm something is off. What is wrong with this picture?"
@alaksiejstankievicx
@alaksiejstankievicx 2 ай бұрын
The other way of implicit p-hacking is to use p-value as stop criteria for experiment (never do this), you can just stop too early on small sample which by accident quirky, but p-value will delude you that what you have find is significant. It also the problem of frequentist approach which operates on probability spaces not very conform with the mind of most people, Bayesian approach is more natural in this way and not fine you for data re-treatment, however, it has its own dangerous caveats.
@eduardosuela7291
@eduardosuela7291 Ай бұрын
Approach from machine learning goes in that line. Not accepting the "first island" of accuracy and trying to go beyond with more cases. Also, the train test paradigm is something philosophically very useful. It helps prevent pitfalls and self cheating
@georgehugh3455
@georgehugh3455 2 ай бұрын
_"'By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.' Analysis of the data should not end with the calculation of a p-value. Depending on the context, p-values can be supplemented by other measures of evidence that can more directly address the effect size and its associated uncertainty (e.g., effect size estimates, confidence and prediction intervals, likelihood ratios, or graphical representations)."_ [American Statistical Association]
@pacotaco1246
@pacotaco1246 2 ай бұрын
The null hypothesis is like the serious version of the "nothing ever happens" meme
@downformexico5470
@downformexico5470 2 ай бұрын
Great Video! I like your presentation style. Calm, slow paced and very good emphasis. Keep it going!
@nickstaresinic4031
@nickstaresinic4031 2 ай бұрын
Clear, concise, and *very* engagingly presented! You've got a fan. Regarding the content, I'll just 'second' many of the well-stated earlier comments: The problem isn't the p-value _per se_; it's the way in which it's misunderstood by many who perfunctorily apply it -- owed, in large part, to the imprecise way that it's taught in many Elementary Stats courses. (In that regard, good of you to take care to emphasize the essential "...a difference of [x] **or more**...")
@fthomasdigaetano3886
@fthomasdigaetano3886 2 ай бұрын
What a fantastic video! Thank you! I were I still working with students in a research methods course, your video would be mandatory. Excellent!
@marcioissaonakane4185
@marcioissaonakane4185 2 ай бұрын
Excellent video. Clear explanations and very nice graphical elements. Congratulations!!
@kodowdus
@kodowdus 2 ай бұрын
It should also be pointed out that 1) the correct application of "significance testing" relies on the use of power calculations to determine an appropriate sample size, and these calculations in turn rely on the (somewhat arbitrary) pre-specification of what would constitute a meaningful effect size, and 2) power calculations typically do not take into account subgroup analyses, which are critical for purposes of determining the extent to which the results of a given study are generalizable.
@riadhosein7362
@riadhosein7362 2 ай бұрын
You are so awesome!!! Thanks for making such wonderful and informative videos! You have a divine gift for teaching complicated concepts in a stepwise and easy-to-follow manner. Much love and good wishes to you.
@datatab
@datatab 2 ай бұрын
Thank you very much for your kind feedback!!! And we're pleased that you think we explain the topics simply. That's exactly what we try to do with a lot of effort. Regards, Hannah
@AndersonMattozinhosdeCastro
@AndersonMattozinhosdeCastro 2 ай бұрын
OMG: this vídeo is the most great explanation of p-value ever! Thanks!
@datatab
@datatab Ай бұрын
Glad it was helpful! And thanks!!! Regards Hannah
@Eta_Carinae__
@Eta_Carinae__ 2 ай бұрын
Some things: I'm aware of situations like residuals testing, where doing a hypothesis test actually ends up incentivising smaller samples. In this case, p-values might be problematic. If you're gonna criticise p-values for incentivising researches to come up with ad hoc null hypotheses, it's probably fair to criticise Bayesian methods for incentivising ad hoc prior distributions. You can't do Bayesian stats without loading on the assumptions about the prior distribution.
@andrewhancock2451
@andrewhancock2451 2 ай бұрын
This is an excellent review of the basics. Much needed (at least for me)
@robertarvanitis8852
@robertarvanitis8852 2 ай бұрын
Clear and well presented. The job of statistics is to identify sources of variability and put them all to common measure. That demands an appropriate modesty. The uncertainty is only half of what we need to make decisions. We still must reckon the potential benefit against the cost if we are wrong. We can not remove risk, only make considered judgements.
@sirpancho
@sirpancho 2 ай бұрын
Loved it! as a scientist from different field this is a great explanation
@tuajeaw
@tuajeaw 2 ай бұрын
Great video. Thank you. I have only 1 small nitpick to comment: the bigger sample size doesn't always translate to more credible result, you also need to consider how the samples are picked as well. For easy example, 100 randomly picked samples probably generate better results than 500 bias hand-picked samples.
@hakimsenane7875
@hakimsenane7875 2 ай бұрын
Thank you for this clear and great reminder. Yes, carefully looking at the fully qualified context before reaching a conclusion (that you might re-visit under new data) is always important. What are the alternative option(s) proposed by the critics of p-value?
@MateusCavalcanteFonseca
@MateusCavalcanteFonseca Ай бұрын
o melhor video no topico de P-Value que eu ja vi na vida
@andrewj22
@andrewj22 Ай бұрын
9:02 If the only results ever published are those with low p-values, then when replication studies are done, only those with low p-values will be published too. This means we have an representative sample of published studies informing our assessment of hypotheses. After enough time and studies, eventually all null hypotheses will be proven wrong unless we start publishing studies with negative results.
@hu5116
@hu5116 2 ай бұрын
Got an idea. I’m not an overt statistician although have had some training and experience in such. Seems to me that p-value is missing something, and I think this could be the key to the criticisms you mention. That is, I think we also need a “Confidence” value OF the p-value. If people reported such a confidence in addition to the p-value, then I think that might address at least some of the issue people have with p-value. Again, great video!
@joshuaprince6927
@joshuaprince6927 2 ай бұрын
I came here so ready to rip this video apart for uncritically disparaging p-values. This is why we watch videos until the end! Great video about proper applications of hypotheses testing, and some easy misapplications!
@andrewclimo5709
@andrewclimo5709 2 ай бұрын
Excellent!!! I'd forgotten everything I know about stats, and now I can see there is a place I might actually use a p value!!
@perpalmgren2820
@perpalmgren2820 2 ай бұрын
Again an absolutely wonderful clear and pedagogical video.❤️🥰🙏🏻
@datatab
@datatab 2 ай бұрын
Hi Per, many many thanks for your nice feedback : )
@PM2022
@PM2022 2 ай бұрын
Based upon your video, it seems that the criticism/rejection of P-value is really about the fact that a single study won't be conclusive anyway--so what's the point of calculating a P=-value (given that, as you said, its function is to generalize one's findings beyond the sample). So, it seems that the critics are saying (though you are not showing that criticism that way) that the researcher should stop at reporting the sample-based findings instead of generalizing them (via the P-value). As for generalization, that can be done via meta-studies down the line when enough individual studies self-restricted to the samples have accumulated. I hope you could comment on this.
@datatab
@datatab 2 ай бұрын
Thank you for your insightful comment! You raise a crucial point about the limitations of p-values and the role of individual studies versus meta-studies. The primary criticism of p-values is that they can be misleading when used in isolation. Meta-analyses can be incredibly valuable for generalizing findings across multiple studies. By combining results from different studies, meta-analyses can provide a more comprehensive understanding of an effect and mitigate the limitations of individual studies. In order to carry out a meta-analysis, there must of course be many individual studies, whereby it is important that all relevant figures are named so that researchers can carry out a meta-analysis. We will discuss this topic in part in the following video. Regards Hannah
@sturlamolden
@sturlamolden 2 ай бұрын
She did not really grasp the issue. Classical statistics do not answer the questions people are likely to want to ask, but rather some nonsensce recondite questions. The p-value does not have the meaning people think it has. What it means is (1) if the null hypothesis is correct, and (2) you reapeat the same experiment many times, and (3) then a certain proportion of the experiments will yield equal or larger effect size than the observed. There are three weird assumptions here: First: “If the null hypothesis is correct…” What if the null hypothesis is wrong? The p-value is only meaningful if the null hypothesis is correct. But since it can be wrong, we never know if the p-value is valid or not. It then follows it can only be used to reject a true null hypothesis, but not a false one, which is nonsense. The null hypotesis might be false, and if it is, the p-value is invalid. It is a logical fallacy. Traditionally the p-value was thought of a “evidence against H0”. Consider a “q-value” that is similar except “evidence against HA”: Now we assume HA is true and compute the fraction of equal or smaller effect sizes, in an infinite repetition of the experiment. In general p + q ≠ 1. in fact we can even think of a situation where both are small. Second, the p-value assumes we repeat the same experiment many times with a true null hypothesis. Only an idiot would do that. So we do not need to calculate this, as we have no use for this information. Third, it takes into account larger effect sizes than the one we obtained. We did not observe larger effect sizes than the observed, so why should we care about them? In mathematical statistics this means that the p-value violates the likelihood pronciple, which is the fundamental law of statistics. The likelihood principle was ironically discovered by the inventor of the p-value. The likelihood principle says that statistical evidence is proportional to the likelihood function. Fourth, if you fix the significance level to 0.05 and run a Monte Carlo, the p-value will on average be 0.025. It is inconsistent. The summed effect of this weirdness is that the p-value can massively exaggregates the statistical evidence and is invalid for any inference if the null hypothesis happens to be false. In conclusion we cannot use it. There is a silly answer to this: What if we compute the probability that the null hypothesis is true, given the observarions we have made? That is what we want to know, right? Can we do this? Yes. Welcome to Bayesian statistics. The theory was developed by the French matematician Laplace, a century before the dawn of “classical” statistics and p-values. There was only a minor problem: It often resulted in equations that had ro be solved numerically, by massive calculations, and modern computers were not invented. Classical statistics developed out of a need to do statistics with tools at hand: around 1920: Female secretaries (they were actually called “computers”), mechanical calculators that could add and subtract, and slip stics that could compute and invert logarithms. With this tools one could easily compute things like sum of squares. To compute a square, you would take the logarithm with the slipstick, type it in on the calculator, push the add button, type it in again, pull the handle, use the slipstick to inverse log the number it produced. Write it down on a piece of paper. Repeat for the next number. Now use the same mechanical calculator to sum them up. Et voila, sum of squares. The drawback was that classical statistics did not answer the questions that was asked, but it was used from a practicale point of view. Today we have powerful computers everywhere and efficient algorithms for computing Bayesian statistics are developed, e.g. Markov Chain Monte Carlo. Therefore we can just compute the propability that the null hypothesis is true, and be done with it. The main problem is that most researchers think they do this when they compute the p-value. They do not. Convincing them otherwise has so forth been futile. Many statisticians are pulling their hair out in frustration. Then there is a second take on this as well: Maybe statistical inference (i.e. hypothesis testing) is something we should not be doing at all? What if we focus on descriptive statistics, estimation, and data visualization? If we see the effect size we see it. There might simply be a world where no p-values are needed, classical or Bayesian. This is what the journal Epidemiology enforced for over a decade. Then the zealot editor retired, and the p-values snuck back in. Related to this is the futility of testing a two-sided null-hypothesis. It is known to be false a priori, i.e. the probability of the effect size being exactly zero is, well, also exactly zero. All you have to do then to reject any null hypotheses is to collect enogh data. This means that you can always get a "significant result" by using a large enough data set. Two-sided testing is the most common use case for the p-value, but also where it is mot misleading. In this case a Bayesian approach is not any better, because the logical fallacy is in the specification of the null hypothesis. With a continous probability distribution a point probability is always zero, so a sharp null hypothesis is always known to be false. This leads to a common abuse of statistics, often seen in social sciences: Obtaining "significant results" by running statistical tests on very large data sets. Under such conditions, any test will come out "signiicant", and then be used to make various claims. It is then common to focus on th p-value rather than the estimated effect size, which is typically so small that it has no practical consequence. This is actually pseudo-science. This is a good reason to just drop the whole business of hypothesis testing and focus on descriptove statistics and estimation.
@VitorFM
@VitorFM 2 ай бұрын
​@@sturlamolden thanks for a detailed text and info. I always saw the p-test so empty.... It starts with the assumption that 5% difference is a good target. Based on nothing! I was always challenged by the idea that the P-value was a choice os someone and everybody adopted it without ever questioning the real meaning! At best it is used to create steps between small groups to classify them, loosing the infinite possibilities in big population.
@VitorFM
@VitorFM 2 ай бұрын
@@sturlamolden by the way, what about the medicine classification of groups based on steps? Like IQ classification? Does it make any sense?
@larslindgren3846
@larslindgren3846 2 ай бұрын
​@@sturlamoldenI don't understand what you propose. While bayesian methods are very powerful, they require a probability distribution of the effect before the experiment. Generally there is no objective way to summarize all evidence available before the experiment in a probability distribution. Therefore can the probability distribution of the effect size not be calculated after the experiment regardless of how much Computational power you have. I think a Bayesian example calculation in many cases can help the understanding of the results and complement the p-value. The Bayesian results will however always depend on more arbitrary assumptions than the p-value.
@stefaandumez2319
@stefaandumez2319 2 ай бұрын
I was expecting to find included elements of Bayesian thinking as a solution to problems with the focus on p-value. (as explained in 'Bernoulli's Fallacy' by A.Clayton) I listened to this book and it made a lot of sense to me, but could use your clarifying touch to explain and/or dispell.
@datatab
@datatab 2 ай бұрын
Thank you for your thoughtful comment! I'm glad to hear you found 'Bernoulli's Fallacy' by A. Clayton insightful. Bayesian thinking indeed offers a valuable perspective on addressing issues related to the traditional reliance on p-values. I will put it on my to-do list and try to make a video for it! Regards Hannah
@ucchi9829
@ucchi9829 2 ай бұрын
Not worth reading imo.
@stefaandumez2319
@stefaandumez2319 2 ай бұрын
@@ucchi9829 Why? Besides the ideological spin, many of his statistical arguments seem to make sense. But I'm not a statistician and am willing to be convinced that it doesn't?
@ucchi9829
@ucchi9829 2 ай бұрын
@@stefaandumez2319 I don't think he's a statistician either. One reason is his very unreasonable views on people like Fisher. There's a great paper by Stephen Senn that debunks his characterization of Fisher as a racist. The other reason would be his unconvincing criticisms of Frequentistsm.
@carloshortuvia5988
@carloshortuvia5988 2 ай бұрын
You named it in passing , causality shouldn't be taken for granted. Let's say, a drug has a far-fetched biochemical effect , a certain effect can be fully atributed to mistakenly as there are background interactions/noises which haven't been factored in, rendering either conclusion void.
@toddcoolbaugh9978
@toddcoolbaugh9978 2 ай бұрын
I went to an environmental chemistry talk at an environmental justice conference that analyzed dust collected under couches in low income homes. A graph of flame retardant concentration vs cancer incidence was shown to support banning flame retardants. The statistical output from Excel indicated that 1) there was about 40% chance the correlation was coincidental and of course 2) gave no support to the causation since the ONLY thing they analyzed for was the flame retardant. To me, this was an irresponsible presentation aimed at an unsophisticated audience by a researcher who, by their own admission, arrived at a conclusion before collecting the data and were unwilling to let poor statistics get in the way of the party line.
@oetgaol
@oetgaol 2 ай бұрын
The trouble with abandoning the p-value and calling it quits is that it won't remove the incentives in research that lead to p-hacking. So any replacement will still be subject to the same bad incentives. You might just end up in a situation where people will be p-replacement-hacking. Also the p-value is set at 0.05 quite arbitrarily a discussion could be had if we might need to define stricter cutoff points for significance. (E.g. 0.01, 0.001 or even 5 sigma that is used in astronomy for instance.)
@soilsurvivor
@soilsurvivor 2 ай бұрын
Excellent explanation. Thank you!
@creambuncreambun4511
@creambuncreambun4511 2 ай бұрын
Thank you very much for this wonderful explanation
@matthewleitch1
@matthewleitch1 Ай бұрын
The video correctly states the definition of the p-value. What the p-value is not is any kind of sensible probability that people want to know. We want to know the probability of each hypothesis being the truth. People tend to think, wrongly, that the p-value is the probability that the null hypothesis is true. And then there's the arbitrary 5% level and the inherent bias towards the null hypothesis. We now have the technology to do better than p-values and we should move on.
@alexandredamiao1365
@alexandredamiao1365 2 ай бұрын
Fantastic video! Very clear and concise! Thank you for this content!
@303ks
@303ks Ай бұрын
Thank you for this video but I was hoping you would expand a little more into P-hacking for which even though you did include some of its elements into the video (without actually naming it as such) but I believe it warranted more. Perhaps a future video could be done for P-hacking specifically
@DrJohnnyJ
@DrJohnnyJ 2 ай бұрын
The original, great article was, ""The significance of statistical significance tests in marketing research" by Sawyer and Peters in 1983. Another abused measure is using R2 instead of S.E. and b.
@Swarm47
@Swarm47 Ай бұрын
Bayes factors are definitely better. Aside from getting a more granular look at probability for the alternative, you can also get a level of evidence for the null hypothesis. Huge help if you are trying to show that groups might be the same (if you want to combine them) or that another variable isn't confounding anything. Really irks me when papers switch the conservative criterion to a very liberal one (in terms of final results) when they want the null to be true.
@praveensurapaneni4272
@praveensurapaneni4272 2 ай бұрын
A very good presentation. I only wish that the video was started by stating that the P in P-value stands for Probability, then it would have been much easier for everyone to understand the concept. 🙏🏾
@georgbrindlinger1008
@georgbrindlinger1008 2 ай бұрын
very good video and content, thanks. I love it how excited you get about statistics :)
@BumbleTheBard
@BumbleTheBard 2 ай бұрын
P-values are highly prone to misinterpretation and there are other problems than the ones you list here. Confidence intervals are also just as prone to misinterpretation and are frequently misused. Perhaps you could do a video on those. (E.g. Richard Morey, et al, The fallacy of placing confidence in confidence intervals, Psychon. Bull. Rev. (2016) 23:103-123. People have a tendency to calculate confidence intervals and then misinterpret them as Bayesian credible intervals.
@lauritzmelchoir9275
@lauritzmelchoir9275 2 ай бұрын
This content is Statistics 101, so it is staggering that it has to be repeated for the benefit of people who are so ignorant that they have no business doing research in the first place.
@martinhuhn7813
@martinhuhn7813 2 ай бұрын
Well, this was a nice explanation of a lot of statistics, but it also revealed, how misleading the p-value can really be. To get to the core of the problem, you have to understand, that there are basically no two natural things you could compare, that do not have the slightest difference at all. Chose the sample size of your test big enough, and that difference will show up as (statistically) significant. So, ironiously, the fact, that a particular difference in the data is statistically significant tells you MORE about potential relevance, if the sample size was small. Why? Because with small sample sizes, significance can only happen as a result of relatively big and relatively consistent differences. If we care about relevance and meaning, p-values are just the wrong tool. You can use them, individually consider all the other factors and then make an educated guess about relevance, but why should you? There are better tools (hedges g, cohens d) for that purpose. That does not mean, that p-values are worthless. A high p-value instantly tells you, that any difference you meassured, would likely also have occured as a result of random chance. That´s good to know. At least, it keeps scientists from talking to confidently about a lot of effects they supposibly meassured, which could easily be random variation in the samples and it is fine to routinely do that.
@justliberty4072
@justliberty4072 2 ай бұрын
It would be interesting to see a discussion of how sample selection bias and population/sample non-normality couple with use of the t-test and p-value to make these problems worse.
@tracielee7857
@tracielee7857 2 ай бұрын
Excellent explanation of a complicated topic. Do you think some companies p-hack by using very large sample sizes? A large enough sample (say, 10000 or more) will often result in significant p-values even if the effect size is minor, but they can make claims in their advertising, like, "Our flu medicine is proven to signficantly reduce the duration of flu symptoms." The fine print has to state what they mean about reduced duration (I saw one that said half a day).
@berdie_a
@berdie_a 2 ай бұрын
Having a large sample means that it would be closer to the truth. It’s also common practice to include more samples, given proper adjustment in the statistical tests to account for the increase in type 1 error rate. See: adaptive bio equivalence. The deception here is not the p-value, but rather the non reporting (or misreporting) of the effect size.
@nox5555
@nox5555 2 ай бұрын
@@berdie_a In Theory. In reality it means that they have way more space to fudge Data...
@ValidatingUsername
@ValidatingUsername 2 ай бұрын
Add a deep knowledge course on p value, phacking, removing data points, finding outliers, and including reports on outliers removed to any research position. discussing p value is almost an umbrella now for ethical reporting and interpretation of trends in a data set.
@DrJuanTaco
@DrJuanTaco 2 ай бұрын
No Sir, we in healthcare will continue to use this inappropriately to make your medical decisions, thanks.
@estebanchicas6340
@estebanchicas6340 2 ай бұрын
*Tests the use of vitamin C for the treatment of 50 cancers with a sample size of 10 patients for each cancer *Finds an improvement in 1 cancer with p=0.0499 Headline: "VITAMIN C CURES CANCER, STUDY SHOWS!1!1!1!"
@zephsmith3499
@zephsmith3499 2 ай бұрын
Discuss confidence intervals next, please.
@ImperviousSC
@ImperviousSC 2 ай бұрын
Yes, I agree this would be a great topic and I think this is extremely important.
@stegsjenga5088
@stegsjenga5088 2 ай бұрын
Yes I would watch this
@reed13k73
@reed13k73 2 ай бұрын
An additional criticism I would add is people using P-values from tests for normal data on non-normal data. They get irritated when you point out the comparison is invalid and they used the wrong test method.
@johnanderson7076
@johnanderson7076 2 ай бұрын
There's probably some of that. Part of the problem though, is human analysis is multi-variate. When designing an experiment, it is best to change one thing and measure the effect. Humans have a variety of things and others who influence them to variing degrees. It's very difficult to design experiments to eliminate those effects. It's essentially analogous to the Many Body Problem in Physics.
@sanderd17
@sanderd17 2 ай бұрын
8:25 a p value of 0.05 is only 1/20. So if you have 20 researchers testing the same thing, it's very likely one of those gets a significant p value by pure chance, and only that research will get published. Similarly, if you research 20 possible effects of one drug (blood pressure, cholesterol, weight,...), it's also very likely to get one significant p value. And then researchers are again pushed to publish that one result.
@hu5116
@hu5116 2 ай бұрын
Great video! Best explanation I have ever heard.
@StrivetobeDust
@StrivetobeDust 2 ай бұрын
The biggest problem is clear: the sample must be random. Searching for a null that is falsified by the data AFTER the data has been was violates randomization requirements.
@ucchi9829
@ucchi9829 2 ай бұрын
The sample need not be a random sample. You can deploy randomization. See Ronald Fisher's DOE book or George Box's book on DOE.
@mhmhmhmhmhmhmmhmh
@mhmhmhmhmhmhmmhmh 2 ай бұрын
Excellent job. It is also important to separate substantive hypotheses (the new drug has/has no effect) from statistical hypotheses (the means do or do not differ). Also how strict should one be with model assumption violations?
@sitrakaforler8696
@sitrakaforler8696 2 ай бұрын
00:07 Understanding the importance and calculation of P value 02:36 Understanding the significance of the P value in hypothesis testing 05:03 Understanding the significance of P values in hypothesis testing 07:38 Misinterpretations of P-values 10:16 P values can be misused, leading to low-quality research. 12:52 P-values should be banned and not used in research 15:25 P-value combines effect size, sample size, and data variance for objective assessment 17:56 Importance of quality in research and statistical software
@cadekachelmeier7251
@cadekachelmeier7251 2 ай бұрын
I think that p-values are particularly easy to misinterpret. It spits out a probability, so it's really easy to misinterpret it as "the probability the effect isn't real". Especially since the a lower p value does correspond to a lower probably the effect isn't real and the difference is pretty subtle. Having a culture where p
@eagle43257
@eagle43257 2 ай бұрын
P-Value is not dead, not misleading and scientific.
@johnlv12
@johnlv12 2 ай бұрын
Totally agree
@Rome101yoav
@Rome101yoav 2 ай бұрын
New to the channel. Looks great! Quick question: how likely are your videos to get proper subtitles?
@АнастасіяТертична-с7г
@АнастасіяТертична-с7г Ай бұрын
Those who blame mistreatment of p-value no different from those who seek escape goats. This is natural fenomena I proclame. 100 years ago, the average researcher could read, analyze and digest all the publications in own and related fields. There have never been so many people on this planet as today. Even if we assume % of academia in population stay same (it's increasing) we arrive at unprecedented number of researchers. With exponential growth of publications researchers either have to adjust their toolkit (i.n. scientific method) or adapt otherwise. May AI help us.
@vdinh143
@vdinh143 Ай бұрын
Correct me if I'm wrong, but a single number cannot possibly represent 3 metrics to their full extents, can it? You can find an average of some numbers to represent them, but an average of 10 doesn't tell you whether the original data had numbers in the range of 5 to 15 or -1,000,000 to 1,000,000
@yusmanisleidissotolongo4433
@yusmanisleidissotolongo4433 2 ай бұрын
Thanks so much for the video. The title itself misguided me. I think that further clarification on the meaning of achieving significance or not is needed here. I have been said that researchers always want to achieve significant differences. However, I think that not achieving it in multiple replications is not necessarily implying that an error was committed. Is significance always achievable?
@datatab
@datatab 2 ай бұрын
Thank you for your comment and for watching the video! You bring up an important point about the interpretation of statistical significance. Statistical significance, often indicated by a p-value less than 0.05, suggests that the observed effect is unlikely to have occurred by chance under the null hypothesis. However, achieving significance is not always guaranteed, nor should it be the sole aim of research. While achieving significance can be an indicator of an effect, it is not always achievable or necessary. The focus should be on the overall evidence, consistency across studies, and the practical relevance of the findings. If there's interest, I can delve deeper into these aspects in future content. Regards Hannah
@yusmanisleidissotolongo4433
@yusmanisleidissotolongo4433 2 ай бұрын
@@datatab Thanks so much Hannah. My inquiry referred to pointing out to the questions: Is achieving no significance wrong? Are different researchers investigating the same problem committing errors if they do not achieve significance?
@Gaborio1
@Gaborio1 2 ай бұрын
I loved the video, informative and engaging. But I think you are also missing another criticism, the problem of not the p-values per se but the comparison to an arbitrary threshold to make a decision
@alejandrobermudez8799
@alejandrobermudez8799 2 ай бұрын
I have a question, does rejecting the null hypothesis imply accepting the alternative hypothesis?
@ronaldanane5451
@ronaldanane5451 2 ай бұрын
Thank you for the video. Given a) Effect size b)Sample size c) variability of data Can you kindly clarify your point on the “variability” ???.
@junfon7097
@junfon7097 2 ай бұрын
Please do your own research!
@ronaldanane5451
@ronaldanane5451 2 ай бұрын
@@junfon7097 needless exclamation ‼️
@berdie_a
@berdie_a 2 ай бұрын
Less variable data = more sensitive to small changes Highly variable data = less
@davidbastow5629
@davidbastow5629 2 ай бұрын
@ronaldanane5451 Look at the chart at 2:25 in the video. Each person icon represents one result as depicted by their vertical position on the graph. The 5 kg difference is between the average of the two groups. But, if you look at the lowest person in the left hand group, their score is very similar to the highest person in the right hand group. A few seconds later, she demonstrates what it would look like if there was more variance in the groups. We now see that the lowest person on the left has the same score as the highest person on the right. So with high variance, we can be less sure that there is actually a difference between the two groups. Another way: the variance guides us as to how well the average represents the scores of the group, how close to the average were the scores it is summarising.
@ronaldanane5451
@ronaldanane5451 2 ай бұрын
@@davidbastow5629 thank you,great man.
@RichardAmesMusic
@RichardAmesMusic 2 ай бұрын
I don’t think p values ever saw much use in the hard sciences. They’re a tool of the soft sciences that tries to extract some insight where there probably isn’t any. What were the p values on the measurements used to verify Newton’s and Einstein’s theories of gravitation?
@qmiscm
@qmiscm 2 ай бұрын
Einstein’s theory of relativity and everything else Einstein has polluted science with is absurd. Einstein must not be part of any scientific or social discourse. Unfortunately, there is no one place among the zillions of completely confused Einstein worshipers where I can present my unequivocal argument to this effect and bring about the inevitable overhaul of physics. I have to go under every erroneous comment like I’m doing here, which is inhuman when you think about it. Einstein is the greatest fraud in history, and this must become common knowledge sooner or later. How do I prove the above? It seems the easiest way is to go on X (formerly known as Twitter) and search for the handle bryghtlyght. And no, proving the catastrophic error of relativity-a theory that invalidates itself-has nothing to do with statistics. It is an absolute scientific argument that commands the immediate removal of relativity from physics in its entirety, without replacement, along with all of its supposed progeny.
@chrisyoung9316
@chrisyoung9316 2 ай бұрын
Very nicely explained. thanks
@Darisiabgal7573
@Darisiabgal7573 Ай бұрын
I want to give an example of why p-value works and fails. Before I give the example when one looks at significance difference and then follows with correlations. Significance is highly context dependent and correlation does not imply a directional dependency. Suppose I divide the human genome into a billion segments. I have a disease and my first test of the disease and is a nucleotide variant on the tip of chromosome 1. I do the test and do not reject the null hypothesis with alpha of 0.05. Because I rejected this I move to the next variant and repeat and also reject. At the fourth variant my p-value was 0.02 there forever I rejected the null hypothesis. This rejection however is in error. It was in error because my de-facto alpha is now 0.2 and this I should have chosen a threshold of 0.0125. This is called Bonferroni’s correction, and the problem in the 80’s and 90’s is that people would publish associations then publish a different association and so on. By dividing their association studies into multiple publications the required correction was not evident. However, there is an additional problem. Let’s say those 4 sites I examined are not independent (they aren’t because they are close together). The correction makes the assumption that different tests are independent. It’s not so much a problem here but let’s watch what happens if we do 1 billion variant sites. The threshold value on 1 billion tests is 0.05E-9 or 0.0000000005. And so we can see we aptly get rid of false positives but can greatly increase false negatives. This is a risk we have to take. The problem however becomes rather obvious, to have a study that achieves such low p-value we need to increase the power of the study in many cases it exceeds the number of patients available to study. But that’s not the only problem. There is an assumption that the dependencies are not tightly linked. Now let us suppose that variants 1, 2 and 3 are in linkage disequilibrium, this 1, 2, and 3 are surrogates for each other and only one test is necessary. The corrected p-Value threshold should be 0.025 and the null hypothesis is to be rejected for a p-Value of 0.02 of the variant at site 4. The problem is this, for many disease sites in human genome we do not know, and it’s a computational nightmare to correct for the LD between adjacent sites within the Bonferroni correction. So there have been some rather large genome wide association for gene-mediated diseases. In many studies we have prior knowledge of environmental and genetic relative risk via identical twins studies. However the genome wide association studies and environmental risk studies are only capable of determining a small amount of risk (e.g. DAISY study of T1D). Some genetic studies, like those on Rheumatoid arthritis, despite and incomplete list of risk causing genes, have managed to formulate a list of genes that are predictive of disease which cannot be improved on by adding more risk genes, indicating a complex relationship between genes of lower risk. There are other problems in GWAS that are beyond the scope of the discussion. So here’s the answer to this problem I arrived at. When we look at something we have a certain perspective, the best critique of a single perspective is the classical story of people sitting in a cave looking at images on a wall thinking these are reality. The best we can do with a single perspective is to create a confidence interval upon a line, as we look at a line we have a distribution of probabilities that the interval between two points is correct. If we took another perspective we might narrow the line, the line might increase in a different dimension. But if we assume that the first result was multidimensional then the second perspective has narrowed the probability distribution in those dimensions. If we looked still from another perspective, then we can narrow the distribution more. This is where p-value fails, if a study is the only way chosen to look at a result without increasing power or choosing a different analytical methodology then we are locked into the initial confidence of our result. For p-values of near 0.05 this is generally poor. Conclusion. A p-value is valuable but only if we understand the methods and problems associated with that obtaining that value. Moreover the weaker the confidence the greater the need for experiments using different approaches.
ANOVA - A Full Lecture to learn Analysis of Variance
56:49
Hypothesis testing (ALL YOU NEED TO KNOW!)
1:08:17
zedstatistics
Рет қаралды 295 М.
Family Love #funny #sigma
00:16
CRAZY GREAPA
Рет қаралды 10 МЛН
Это было очень близко...
00:10
Аришнев
Рет қаралды 6 МЛН
This dad wins Halloween! 🎃💀
01:00
Justin Flom
Рет қаралды 33 МЛН
бабл ти гель для душа // Eva mash
01:00
EVA mash
Рет қаралды 2,2 МЛН
Is Most Published Research Wrong?
12:22
Veritasium
Рет қаралды 6 МЛН
The biggest beef in statistics explained
21:04
Very Normal
Рет қаралды 77 М.
Webinar: AI Insights on Back To School (BTS)
42:49
Sila Insights
Рет қаралды 15
🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...
17:11
Adam Finer - Learn BI Online
Рет қаралды 218 М.
Victor Davis Hanson: The Final Case for Donald J. Trump
1:14:20
Robinson Erhardt
Рет қаралды 270 М.
What are p-values?? Seriously.
26:00
zedstatistics
Рет қаралды 179 М.
The Oldest Unsolved Problem in Math
31:33
Veritasium
Рет қаралды 11 МЛН
Family Love #funny #sigma
00:16
CRAZY GREAPA
Рет қаралды 10 МЛН