Dance p 3 Mar09

  Рет қаралды 79,747

Geoff Cumming

Geoff Cumming

Күн бұрын

I use a simulation from my ESCI software to illustrate the enormous variability in the p value, simply because of sampling variability. That's the dance of the p value. Never trust a p value--it's too unreliable! Use estimation, not NHST! Most researchers don't appreciate just how unreliable the p value is!

Пікірлер
@Silverwing_99
@Silverwing_99 9 жыл бұрын
Dear Prof Cummings, thank you so much for this video and you outstanding paper on . Understanding the new statistics. Effect sizes, confidence intervals, and meta-analysis" This was truly an astounding piece of work, so well written, even a simple clinician like myself could understand it Thanks Johnny
@Oldmaneatingatwix100
@Oldmaneatingatwix100 11 жыл бұрын
Currently reading your book "Understanding the New Statistics". Great stuff- proving very helpful for constructing my thesis. It has encouraged me to incorporate both a narrative and meta-analytic element to my literature review. I now appreciate this approach gives a much more accurate analysis of the evidence. I also am responsible for developing e-learning materials for my undergrad stats course- the simulations and examples you give are a great inspiration. Thank you for writing this book.
@gracegirl04
@gracegirl04 13 жыл бұрын
FANTASTIC!!! As an epidemiologist I say THANK YOU for educating people about the "sacredness" of the p-value from research
@ctwardy
@ctwardy 14 жыл бұрын
Good introduction to statistics, confidence intervals, and problems with P-values and null-hypothesis testing. Very well done -- Geoff put a lot of work into this.
@EricVance
@EricVance 10 жыл бұрын
I played this video during our +LISA (Laboratory for Interdisciplinary Statistical Analysis) at Virginia Tech lab meeting today. None of my 20 students were surprised about the "Dance of the P-values" because they simulate p-values in their Bayesian class with Prof. Scotland Leman and already know about the unreliability of p-values. So what's next? Yes, we should all hug confidence intervals and report them in the context of the effect size of the phenomena we're testing. I think a major lesson is that small samples lead to variable results.
@geoffdcumming
@geoffdcumming 10 жыл бұрын
Thanks Eric. Well done to your students (and you!) for the insight via Bayes. Going Bayesian may well be increasingly the way of the future. I choose to advocate CIs and classical estimation because it seems to me more immediately accessible and more likely to be taken up. I just haven't seen Bayesian materials yet that seem to me sufficiently accessible for beginners. Yes, small samples lead to variable results. BUT the dance of the p values is just as wide for large samples. !Yes, True! Conditional on an initial p of e.g. .05, the dance of subsequent p is just as wide, whatever the N. See my 2008 Perspectives on Psych Science article for theory and simulation. Also, CIs vary of course, but the width of any single CI gives a reasonable idea of the width of the dance (unless N is tiny), whereas a single p value tells us very little indeed. And we have evidence that many (probably most) researchers tend to greatly underestimate the width of the p interval. (Prediction interval for p.)
@geoffdcumming
@geoffdcumming 12 жыл бұрын
Thanks Michael. Yes, the CIs bounce around, but any single CI (in practice we have only one) gives us some idea about the whole dance, because the length of our CI gives us some idea of how bouncing around there is in the dance. Knowing our single p value gives us virtually no idea about the dance of the p values. Possibly surprising is that the dance of the p values is similarly wide, whatever the sample size. Hard to grasp, but true. See chap 5 in the book, or a 2008 article of mine.
@geoffdcumming
@geoffdcumming 11 жыл бұрын
Thanks for that--I wish you well with your thesis. And for the e-learning materials. I'm currently at the Annual Convention of the Association for Psychological Science, in Washington DC, giving a workshop and two posters, and doing a book signing. There is a ton of interest. The new statistics is definitely the way of the future. I hope the book and ESCI serve you well! Geoff
@abraxaf
@abraxaf 11 жыл бұрын
Thanks for checking it and your recommendation! I found Altman mentions median CI in "Practical Statistics...", although he warns that for small sample size the CIs will be twice that of mean CIs and that CI for median difference is not so easily calculated as for mean difference. Looks like I'm going to be winsonizing soon! Thanks again for your help.
@jeffreyflint
@jeffreyflint 11 жыл бұрын
I think what you are saying is that the results improve (converge) the more iterations of the study you perform. That is true whether or not you use CIs or p-values. If I see many CIs of the same study, I agree that my eye can estimate the probable or mean CI in a way that averaging p-values cannot do. But having many CIs to study is a luxury. The whole problem is what to do when you have a single trial. I think the point is that you are a Bayesian, which I appreciate.
@Komelsky
@Komelsky 13 жыл бұрын
Great, thank you! The biggest problem with this example though (in my point of view) is that the N is inadequate to the effect size. Studying effects of this size with these typical N values is just insane! And that's even before considering how many variables - in a real experiment, in practice - you won't be able to control.
@geoffdcumming
@geoffdcumming 11 жыл бұрын
Thanks Jeffrey. Informally, I'm an everyday Bayesian, as pretty much everyone is. But my statistics advocacy is for CIs not credible intervals, and I use no analysis of Bayesian priors, etc. Yes, if we have a single study, we have a challenge, but CIs are still superior to p (Coulson ref). One way to think about their advantage is that they give better info of what a replication is likely to give. Even if we have only a single study. Geoff
@mthompsoNZ
@mthompsoNZ 12 жыл бұрын
While i haven't read the references you cite, I prefer to agree with David. The basic problem is that the sample size is too small to reliably observe the effect. You are also playing 'roulette' with the CIs, as there is no guarantee where in the CI the true mean is actually.
@WikiofScience
@WikiofScience 10 жыл бұрын
Although I agree with Cumming's call for CIs and meta-analysis, I disagree with some of the assumptions in this video. I commented on that in a recent article in Frontiers in Psychology, and here goes some excerpts from that comment: Firstly, Cumming’s "dance of p’s ...is not suitable for representing Fisher’s ad hoc approach (and, by extension, most NHST projects). It is, however, adequate for representing Neyman-Pearson’s repeated sampling approach". The role of the p-value for each approach is different, for Neyman-Pearson's approach being "a simple count of significant tests irrespective of their p-value". Secondly, as it turns out, Cumming’s simulation is "a textbook example of what to expect given power", under Neyman-Pearson's approach). For example, 52% of tests should be significant at α = .05 in the long run, when power is set to 52%. Thirdly, Cumming doesn't compare p's and CI's fairly. "To be fair, a comparison of both requires equal ground. At interval level, CIs compare with power". While Cumming’s simulation reveals that about 95% of sample statistics fall within the population CI (out of 95% expected), 52% of those sample statistics are statistically significant (out of 52% expected). Furthermore, "at point estimate level, means (or a CI bound) compare with p-values, and Cumming’s figure reveals a well-choreographed dance between those. Namely, CIs are not superior to Neyman-Pearson’s tests when properly compared although, as Cumming discussed, CIs are certainly more informative." ---- Perezgonzalez JD (2015). Frontiers in Psychology (doi: 10.3389/fpsyg.2015.00034, journal.frontiersin.org/Journal/10.3389/fpsyg.2015.00034/full).
@geoffdcumming
@geoffdcumming 10 жыл бұрын
Thanks JDP, and I did read your comments in Frontiers. Yes, Fisher and N-P articulated very different procedures, based on different philosophies, and, yes, common practice is a mish-mash of the two. Fisher, to his credit, thought in terms of more than one experiment, and regarded a p value as useful input to a decision, to be made after considering all circumstances. But he also said something like (from memory) "we should believe a result when we know how to run an experiment so it's likely to get p < .05". Which is strong evidence that he did not appreciate how drastically p dances. If most of a series of replications are to give p < .05, power would have to be extremely high. Your para 3: Sure, the % of replications giving p < .05 equals power, as it must. Sure, CIs behave correspondingly. Sure, we can translate back and forth between a CI and a p value (if we know the mean and N also). The key point of the dance is that a researcher's usual situation is to have a single result. The p value for that, often calculated to 2 or 3 decimal places, is seductively but misleadingly precise. It shouts certainty--an effect exists or it doesn't. Of course textbooks warn us, but still... Researchers don't appreciate that any p value is unreliable, in the sense that it's highly likely to be very different on replication. The precision of p is indeed seductive but misleading. In dramatic and valuable contrast, a single CI puts it in our face that there's uncertainty, and it quantifies that uncertainty by having length. Sure, the CI bounces with replication, but any single CI gives us an idea of how much bouncing. I'm glad you agree that CIs are more informative. (They are also central for estimation, which is a far superior approach to both Fisher p, and N-P decision making.) May all your confidence intervals be short! Geoff
@st33pestasc3nt
@st33pestasc3nt 10 жыл бұрын
Geoff Cumming CIs are extremely informative, yet there's no reason one cannot show both CIs and NHST. NHST can also measure this "bouncing" if a p-value is presented in conjunction with a post-hoc power estimate. And p-value need not vary so much on replication with high power.
@geoffdcumming
@geoffdcumming 14 жыл бұрын
@ctwardy Hi Charles, Thanks! Yep, my first experiment with Camtasia. One day I'll do a super slick version. In the meantime I'm writing the book 'Introduction to The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis'. (Routledge, 2011), Geoff
@DavidDiez
@DavidDiez 12 жыл бұрын
What's with the war on the p-value? The p-value, like every other statistical tool, provides one way to view the data, not the complete picture. If someone uses the p-value as a be-all-end-all, that is user error, not the fault of the tool. The p-value should not be strung up as the scapegoat for publication bias or poor promotion decisions, as seems suggested by the video. It also shouldn't be the only way data is summarized, but I don't know any statisticians advocating a "p-value only" stance
@bakubaka4482
@bakubaka4482 5 жыл бұрын
Alt-Hype brought me here.
@Fortastius
@Fortastius 4 жыл бұрын
Which video was it where he directed people to this video? I had to find this video manually.
@fringeelements
@fringeelements 3 жыл бұрын
Something that should be stated: psychology isn't some particularly pathological field. When you say "typical of psychology" - most people by default trust tagged authorities, and so will look for ANYTHING to contain the critique. It shouldn't be like this, they should be able to go, "you know, I don't actually know if any other separated field IS better than psychology in terms of mean or median sample size and/or effect size, so I can't ipso facto declare this problem DOES NOT also apply to every other field." But because it is, it's important to pre-respond to that containment tactic that most people will engage in.
@geoffdcumming
@geoffdcumming 3 жыл бұрын
Yep, p values are *highly* variable, whatever the research field! Psychology drank the NHST and p value kool-aid more than half a century ago, and distinguished scholars have been giving cogent critiques of NHST and how it's used ever since. Lots of other research fields have their own version of the same story. It's about time we all moved on, to estimation, whether that is classical (using confidence intervals) or Bayesian (credible intervals), for just about all research situations. If you would like to play with the dance of the p values for yourself, check out Gordon Moore's great 'esci web' software. Runs in any browser and is wonderfully fast: www.esci.thenewstatistics.com/ then choose 'dances', then click at red 9 (bottom panel at left). Click at '?' at top right of left panel to see tips that pop up on mouse hover--to help figure out what you can do. Enjoy! Geoff
@geoffdcumming
@geoffdcumming 11 жыл бұрын
Ah, have just looked at the ucl stuff. Interesting, and new to me. But simple--the method assumes the ranks follow a normal distribution. Probably not a bad way to go, tho' for small datasets the CI endpoints are limited to being at one of the original data values, which could be a bit arbitrary. So, that rank approach could be fine, tho' I think robust would usually be my choice. Geoff
@michaelchirico9530
@michaelchirico9530 10 жыл бұрын
"quite large for PSYCHOLOGY"
@geoffdcumming
@geoffdcumming 11 жыл бұрын
Thanks Jeffrey for those 2. Consider the dance of the CIs--the bouncing-around sequence of CIs with replication. Our CI is one from such a sequence. Our CI tells us about the sequence because its length gives some idea of the extent of bouncing around. Our CI indicates where, most likely (83% chance) the mean of a replication would fall. Not Bayesian, just coming straight from the simulation of dancing CIs. Our CI is much more informative than any single p value. Hug a confidence interval today!
@geoffdcumming
@geoffdcumming 11 жыл бұрын
Yes, a transformation, is often a good strategy if there's a reason for choosing a particular transformation, e.g. log if we suspect a multiplicative process. Or if we have a very large dataset and the transformation does bring the shape quite close to normal. In which case transform, then find mean and CI, then back-transform the centre and two limits of the CI. That or use a robust method. I don't know of any convincing work on CIs with ranks. May all your confidence intervals be short! Geoff
@geoffdcumming
@geoffdcumming 11 жыл бұрын
A reply to zenomons, who asked if software available. (I don't know why that comment isn't visible.) YES! To dance with p, go to:  thenewstatistics-dot-com and download ESCI ("ESS-key" Exploratory Software for CIs), for Windows or Mac. Use 'Dance p' page of the 'ESCI chapters 5-6' module. Free. Runs under Excel, simple! That site has info about: Cumming, G. (2012). "Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis". New York: Routledge. Enjoy! Geoff
@jeffreyflint
@jeffreyflint 11 жыл бұрын
CIs and p-values are mathematically equivalent. Rejection by CI and rejection by p-value are two ways of expressing the same thing. In both cases, there needs to be honesty to choose alpha prior to the experiment. In both cases, experiments with a low sample size will give less repeatable results.
@geoffdcumming
@geoffdcumming 13 жыл бұрын
@Kalid0scop3 See comment I've just posted. There seems to be a block on stating a URL, so I had to omit the triple-w at the start and use the word 'dot', but I'm sure folks can figure it out. Enjoy. Geoff
@OblateSpheroid
@OblateSpheroid 2 жыл бұрын
Thank you for your work.
@geoffdcumming
@geoffdcumming 2 жыл бұрын
Thanks Oblate! You might also like two videos on significance roulette. Either search for that term here at KZbin, or go to tiny.cc/SigRoulette1 and tiny.cc/SigRoulette2. Yep, p values are scarily unreliable! Geoff
@RobinBeaumont
@RobinBeaumont 13 жыл бұрын
great but i wonder rather than just showing histograms of the p values for various power levels would be good the show the actual distributions as you have in your excellent 2008 article, also I think you mention that when the null hypothesis is true the p distribution is uniform, just out of interest does the power level/effect size correspond to a parameter value for the p value distribution somehow?
@ctwardy
@ctwardy 14 жыл бұрын
@geoffdcumming Do let me know when that's out. Best wishes. -Charles
@st33pestasc3nt
@st33pestasc3nt 10 жыл бұрын
Straw Man? That's not at all what p-values are supposed to mean. Those are errors of human interpretation. Stats profs TRY to teach students how to use NHST, but most are abysmal teachers and most students blot out unpleasant Stats class memories anyway. Therein lies the problem. Then researchers go on to p-hack or overly rely on p-values on their rushed journey to academic fame, not knowing what they're doing. P-value is not the strength of evidence. It's simply the probability of observing an effect that size due to sampling error given the null hypothesis is true, i.e. the chance of making a Type I error if you take that effect seriously. No one ever said it was a standalone beacon of definitive proof. Just a marker for Type I error. But wait, why number error? Oh yeah, there's a type II error too. Anyone remember that from Stats class? From most publications in psychology, social sciences and life sciences, you'd think not. Given there are 2 types of error in NHST, modern researchers' single-minded obsession with type I error seems incredibly naive. Type II error vastly increases as sample size shrinks and as you try to detect smaller "true" effect sizes. Both types contribute to whether one may be wrong in interpreting sample data. In your simulation, the P-value is not very replicable due to type II error. You're not measuring Type I error (since the null hypothesis is not true by design). Under your conditions, replicability of low p-value would be a measure of statistical power (the chance of getting a low p-value given there is a true effect; the complement of type II error). But with low sample (n=32), type II error can be very high and power can be very low. So there becomes a high chance of observing a "not significant" p-value upon replication, despite the true effect in the population. The test simply lacks power to consistently detect the effect. This is all you're showing with the "dance". Remember that p > 0.05 does not mean no effect. It just means your sample lacked the evidence to show it ("fail to reject H0" not "accept H0"). P-value is very consistent with definition if simulated properly (i.e. under null hypothesis). If the true effect size is zero, you'd see 5% or fewer p-values < 0.05 after a "large" number of replications. Exactly what one would expect. What your simulation has uncovered is simply the elephant in the room with NHST, the issue of statistical power that too many researchers ignore. A meta-analysis of recent behaviour ecology publications showed the median power was well under 40% (i.e. chance of Type II error > 60%!!). NHST is not at fault here. Its rules clearly indicate the result is garbage, if one chooses to remember them. Also note most classical statistics rely on asymptotic (large sample) properties. NHST invokes an assumption of Normality of the mean (or regression coefficient or whatever). While Central Limit Theorem proves that true asymptotically, it may be very far from Normal with small samples. With n=32 or lower, you not only get more type II errors but these large sample properties also start to break down and parametric model assumptions may fail. There is a much deeper flaw in experimental design if relying too much on large sample behaviour with small sample data. Though that seems common practice in Psychology for whatever reason. Also note type I and type II errors are just related to sampling errors. Then there's misspecification (applying the wrong statistical distribution or model to the data). And non-sampling errors also exist, often dwarfing sampling error. A dedicated attempt at error analysis, model diagnostics and search for latent biases/confounders should preclude any hypothesis testing. Otherwise garbage in, garbage out. NHST is fine if used properly. However, in using it most researchers appear to be more clueless than Alicia Silverstone. The method is not wrong, just blindly misused.
@MichaelBakunin25
@MichaelBakunin25 6 жыл бұрын
Sure, for medium effect size must be n=64 minimum (power=0.80), not n=32.
@MichaelBakunin25
@MichaelBakunin25 6 жыл бұрын
And we do not need type I & II errors rhetoric. From Fisher point of view we have no enough evidence to reject null if p>0,05. So redisign or sample increasing will be very much appreciated.
@abraxaf
@abraxaf 11 жыл бұрын
I'm a big fan of your book, and I'm using this video for teaching this week. This video leaves no doubt that p is a capricious devil. How do you deal with data that fail the normality assumption? I have some data that are skewed, is it safe to use use mean + CI in this case? It is easy to calculate a p-value (using Wilcoxon rank sum etc), but I can't see anything in your book on how to deal with non-normal data.
@DavidDiez
@DavidDiez 11 жыл бұрын
"Would you like your tenure dependent on winning this version of p roulette?" If that's what is being done, it's a red flag for deeper problem with publication bias and tenure decisions. Those problems will not be fixed by switching to confidence intervals.
@geoffdcumming
@geoffdcumming 11 жыл бұрын
Thanks David. I agree that switching to CIs does not solve all problems, but it is still a great step forward. There's more in my tutorial article to appear in Jan in 'Psychological Science'. Just released online: tiny do cc slash tnswhyhow Enjoy... Geoff
@DavidDiez
@DavidDiez 11 жыл бұрын
Geoff Cumming Thanks for sharing the article -- I read the abstract and skimmed the rest. It has a lots of good stuff :) A quotation (from another paper) cited early in article: "The current practice of focusing exclusively on a dichotomous reject-nonreject decision strategy of null hypothesis testing can actually impede scientific progress" Is it actually common in your field for people to publish a p-value without a confidence interval? I can't recall a time where I've seen that happen in the statistics / biostatistics literature that I've reviewed. It really never should happen. However, it still isn't clear to me that moving to confidence intervals does anything (I do love some of the other ideas you mention, e.g. prespecification of studies). Another quotation: "If p reveals truth, and we replicate the experiment-doing everything the same except with a new random sample-then replication p, the p value in the second experiment, should presumably reveal the same truth." If a researcher believes replicating p-values is the purpose of replicating a study, that is alarming and a represents a failure in a researcher's scientific education IMO. Again, the issues that cause concern to me that are noted in the video have nothing to do with p-values but rather how journals make publication decisions and how that affects tenure. A recent Economist (www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble) includes the following quotation that highlights the main issue in my mind: "researchers and the journals in which they publish are not very interested in negative results. They prefer to accentuate the positive, and thus the error-prone. Negative results account for just 10-30% of published scientific literature, depending on the discipline." I'm still unclear how confidence intervals address publication bias since CIs can be used to evaluate whether a result is statistically significant just as easily as anyone can using a p-value. It does nothing for addressing the issue of publication / tenure biases, which seemed to be the heart of the issue in the video. I think our views align closely on identifying the primary problem (publication bias), but we disagree on one segment of the proposed solution (I don't think CIs offer any benefits to fixing this problem over p-values). Is that a fair assessment?
@lorenzobraschi2010
@lorenzobraschi2010 10 жыл бұрын
Excellent video, making a point worthwhile making. I only hate (a little) that you used Comic Sans for the text :D Anyway, I've been following your New Statistics approach quite carefully.
@abraxaf
@abraxaf 11 жыл бұрын
Many thanks for your suggestions. Did you get a chance to look at the UCL link? They say one can rank-transform the data, find the 95% confidence limits of the median and then back-transform to the original scale. (Sadly they give no reference for their method.)
@meechos
@meechos 4 жыл бұрын
Has anyone found a jupyter notebook for this? If not I may have a go at it and let me know if you are interested in having a look. Thanks!
@geoffdcumming
@geoffdcumming 4 жыл бұрын
Dimitris, Thanks for your interest! I don't know about such a workbook, but we recently released a javascript version by Gordon Moore. Here's the blog post: thenewstatistics.com/itns/2020/08/03/gordons-dances-vivid-simulations-bring-statistical-ideas-alive/ ...scroll down to item 4 for dance of the p values. To access esci-web and play with the dances yourself, go to our site: thenewstatistics.com/itns/ ...then at the ESCI menu click 'ESCI on the Web'. Click to open 'dances' then the bottom checkbox, in Panel 9. For tips, click '?' at the top. Control sampling with the buttons in Panel 3. Turn on sound, adjust speed..., change N and effect size... Enjoy! Geoff P.S. For further statistical diversion, search KZbin for 'significance roulette', to find two videos.
@jeffreyflint
@jeffreyflint 11 жыл бұрын
On the other hand, if you were to use a previous study's CI as a notion of what a future study's results might be (replication?), that is clearly a Bayesian notion.
@melvin6228
@melvin6228 10 жыл бұрын
Wait wait wait, this is all fine and dandy. But what about if there is no effect size, how is the distribution of p-values then? I mean, this is only proves that a medium effect size with n = 32 cannot be found 48% of the time. Isn't the p-value supposed to be 'the final arbiter' in the idea that it is hard to get to a low p-value when there really is no effect? I make the following conventional bet: if the effect size is low (e.g. .1), then the chance of obtaining a p-value smaller than .05 will be around 10%.
@gunnarenglund7445
@gunnarenglund7445 10 жыл бұрын
With no effect size the p-value is uniformly distributed on the interval (0,1)
@muffinman1
@muffinman1 Жыл бұрын
very interesting. Thanks!
@geoffdcumming
@geoffdcumming Жыл бұрын
Thanks! You may be interested in Significance Roulette, possibly an even more dramatic demo of the craziness of p values. Search KZbin for 'Significance Roulette' to find two videos. Enjoy! Lots more in my books, especially the intro book. See www.thenewstatistics.com Geoff
@geoffdcumming
@geoffdcumming 11 жыл бұрын
Thanks Jeffrey. Mathematically equivalent, yes, but not at all equivalent in terms of useful information provided. Psychologically not at all equivalent. Cognitive evidence that people are likely to make much better interpretations with CIs than p values is Coulson et al, at tiny dott cc slashh cisbetter Also p values vary greatly for any N. Surprisingly, the prediction interval for p is very wide, with width independent of N. Cumming 2008 in Perspectives on Psychological Science. Geoff
@geoffdcumming
@geoffdcumming 10 жыл бұрын
Reply to Sudipto Sen: Thanks for contributing. Yes indeed! There is evidence that p values just a little below .05 are indeed more prevalent than those just a little above. No doubt "p-hacking" contributes to this, meaning all sorts of dubious selection, tweaking, and other poor data analytic practices. Far better to simply not to use p values at all. Estimation using CIs is just the most easily available of a number of good alternatives. May all your confidence intervals be short! Geoff
@geoffdcumming
@geoffdcumming 13 жыл бұрын
To dance with the p values yourself, go to website: thenewstatistics-dot-com and download ESCI ("ESS-key" Exploratory Software for Confidence Intervals), for Windows or Mac. Use the 'Dance p' page of the 'ESCI chapters 5-6' module. All free. Runs under Excel, simple to use. That site also has info about my book: Cumming, G. (2012). "Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis". New York: Routledge. Enjoy! Geoff
@jaredRStudio
@jaredRStudio 7 жыл бұрын
Thanks for this Geoff, I was wondering if you were reporting both CI and p-values what interpretation would you trust when they conflict, so if CI don't cross zero but the p value is greater than 0.05?
@geoffdcumming
@geoffdcumming 7 жыл бұрын
Thanks Jared. First, in most simple cases the CI and the p value are based on exactly the same statistical model, and therefore there always must be consistency: p=.05 when the 95% CI extends exactly to zero (or other null hypothesised value); p < .05 when zero is outside the CI; and p > .05 when zero is inside the CI. In more complex cases, with measures other than means (e.g. some standardised effect size measures), calculating p values and CIs can involve approximations, and sometimes slightly different approximate calculation approaches are most common for p values, and CIs. Therefore there can sometimes be slight inconsistencies. Note, however, two further vital points: 1. I advise strongly against interpreting CIs merely in terms of inclusion or otherwise of zero. That wastes so much of the information a CI provides, and amounts to mere dichotomous decision making, which is one of the terrible things about NHST and p values. Interpret the whole interval, and don't use p values (or dichotomous decision making) at all! 2. Even if you wish to take note of whether or not the CI includes zero, it's vital to remember the enormous sampling variability of the p value: CIs bounce around because of sampling variability, but at least the extent of any single interval makes this uncertainty salient. In stark contrast (the whole point of the dance demo) the p value varies remarkably, but any single p value hides this. So there's nothing very special about the precise position of the end of a CI, just as there's nothing very special about the precise value of any p. Simply don't take note of either, or how they compare--regard them as fuzzy. Lots more in my books, especially the intro book. See www.thenewstatistics.com If you have the CI, then the p value adds nothing and is likely to mislead. Simply don't use p values! (The journal 'Political Analysis' recently banned p values from its pages.) Geoff
@jeffreyflint
@jeffreyflint 11 жыл бұрын
I don't see it that way since there is no assertion of truth in any study, so it doesn't make sense to talk about replication with regards to CIs. The only thing known for sure is that for alpha = 5%, if the null hypothesis is true, then 95% of studies will accept the null hypothesis. In any one trial, there is no way to know whether the CI is an indication of "truth"; it is only a 95% binomial variable.
@geoffdcumming
@geoffdcumming 13 жыл бұрын
@Komelsky Insane, yes, but typical for much published research in psychology and other disciplines. Cell biology routinely uses N=3! It is statistical power that determines the distribution of the p value. Power of only around .5 is typical of much published research. Crazy! Explore any other N, or power, using the Dance p simulation yourself--I've just posted a comment with details. Geoff
@DominicFlynn
@DominicFlynn 13 жыл бұрын
What's that tune? I'm sure I've heard it somewhere.
@basta84
@basta84 10 жыл бұрын
I don't see the issue: your null hypothesis is false. Some samples give you enough evidence to indeed reject the null hypothesis, while other samples do not. What's the big deal?
@geoffdcumming
@geoffdcumming 11 жыл бұрын
Glad you like the book ('Understanding the new statistics...', Routledge). And the video! (Info at: thenewstatistics ddoott com) For many non-normal cases, using robust statistics is the way to go. Use 'trimmed mean' and 'winsorized SD', and then calculate CIs, which are as informative as ever! Rand Wilcox has several books on the topic. ESCI could include cool pics of robust methods, but I'll need another lifetime! Best of luck with it. Geoff
@Kalid0scop3
@Kalid0scop3 13 жыл бұрын
Could you post your simulation program?
@geoffdcumming
@geoffdcumming 11 жыл бұрын
Thanks Jeffrey. Lots of vital issues there for discussion! Most basic for me is the enormous value of shifting from dichotomous reject vs don't-reject (NHST, p) to estimation (CIs), as is most informative for building a cumulative quantitative discipline. Prediction relates to replication, which is at the core of science, and CI does much better than p in giving info about replication. There is info about book, intro articles, podcast, video, etc at thenewstatistics dottt com Geoff
@jeffreyflint
@jeffreyflint 11 жыл бұрын
Psychologically not equivalent I can accept. I think people should choose the tool they are most capable and comfortable with. However, your statement about "prediction interval for p is very wide" implies that the same is not true for CIs. Can't be if the two are mathematically equivalent. Also, why is a prediction interval for a p-value a reasonable metric? The discipline is either to reject or accept the hypothesis using either a p-value or a CI. The p-value histogram is not relevant
@geoffdcumming
@geoffdcumming 12 жыл бұрын
Thanks David! I disagree! The best explanation, with evidence, why p values are so terrible and damaging is the chapter by Rex Kline at tiny dott cc slashh klinechap3 Lots more in my book 'Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis' Info at thenewstatistics dott com Evidence that confidence intervals are better than p, and indeed that it is better to use just CIs, without p, at tiny dott cc slashh cisbetter Enjoy! Regards, Geoff
Intro Statistics 9 Dance of the p Values
11:36
Geoff Cumming
Рет қаралды 93 М.
p-values: What they are and how to interpret them
11:21
StatQuest with Josh Starmer
Рет қаралды 1,2 МЛН
Support each other🤝
00:31
ISSEI / いっせい
Рет қаралды 81 МЛН
coco在求救? #小丑 #天使 #shorts
00:29
好人小丑
Рет қаралды 120 МЛН
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН
Choosing a Statistical Test for Your IB Biology IA
9:58
Daniel M
Рет қаралды 822 М.
Is Most Published Research Wrong?
12:22
Veritasium
Рет қаралды 6 МЛН
The New Statistics: Effect Sizes and Confidence Intervals (Workshop Part 3)
35:46
The New Statistics: Confidence Intervals, NHST, and p Values (Workshop Part 1)
29:09
Statistics made easy ! ! !   Learn about the t-test, the chi square test, the p value and more
12:50
Global Health with Greg Martin
Рет қаралды 2,2 МЛН
Bayes theorem, the geometry of changing beliefs
15:11
3Blue1Brown
Рет қаралды 4,6 МЛН
A visual guide to Bayesian thinking
11:25
Julia Galef
Рет қаралды 1,9 МЛН
Power Analysis, Clearly Explained!!!
16:45
StatQuest with Josh Starmer
Рет қаралды 334 М.
Support each other🤝
00:31
ISSEI / いっせい
Рет қаралды 81 МЛН