First of all, I loved the dance. Both my partner and I laughed at the random (or not so random) honking paired with emotionally labile dancers. A wonderful program. (sorry in advance, neither of us are statisticians, just trying to learn enough to do our job well) So my partner and I have a bit of disagreement resulting from watching this video. We were wondering about in what applications p-value shouldn't be trusted. Effect size & CI are a given as extremely important & required context. Assuming Bias to be negligible & sample size to be a sufficient for the field of research due to a wonderfully constructed experiment... One argument was that a p-value is not a great instrument for planning a study, but is better for planning management. For example, if I did a study & achieved p = 0.05 showing a +'ve correlation between poking you in the right eye & improving your tax returns then anyone who comes in with poor tax returns I could poke them in the eye & feel 19/20 likelihood to improve their tax returns. However if I attempted to prove this association with further research I may have difficulties. The other interpretation was that p-value is a poor tool for both research planning & developing management. The poor predictability on where p-value may fall in any one experiment means that its utility, even once obtained to be 'significant', lends poor predictive value for both.
@peterm3989 Жыл бұрын
I’m very pleased to have stumbled across this video and channel!
@nicolenew1708 Жыл бұрын
❤❤❤
@muffinman1 Жыл бұрын
very interesting. Thanks!
@geoffdcumming Жыл бұрын
Thanks! You may be interested in Significance Roulette, possibly an even more dramatic demo of the craziness of p values. Search KZbin for 'Significance Roulette' to find two videos. Enjoy! Lots more in my books, especially the intro book. See www.thenewstatistics.com Geoff
@OblateSpheroid2 жыл бұрын
Thank you for your work.
@geoffdcumming2 жыл бұрын
Thanks Oblate! You might also like two videos on significance roulette. Either search for that term here at KZbin, or go to tiny.cc/SigRoulette1 and tiny.cc/SigRoulette2. Yep, p values are scarily unreliable! Geoff
@aliciaramir03 жыл бұрын
Best statistics video ever! 💞
@geoffdcumming3 жыл бұрын
Thanks very much Norma! Clearly, you are a highly intelligent and insightful person! Do you also know two videos of a different demo of just how crazy p values can be? Suppose you do an initial study and calculate p = .01. What p value would an exact replication (exactly the same, just with a new random sample) give? Turns out that a VERY wide range of p values are perfectly possible. If initial p = .05, then replication p will, of course, be, on average, a bit larger, but still there is massive uncertainty. For a demo, with explanation, search KZbin for 'significance roulette' and find two videos. Enjoy! Geoff
@Houshalter3 жыл бұрын
This is totally backwards. You are looking at the probability of getting a good p value if an effect is real. We want the opposite. How much more likely an effect, given a p value? If you ran such a simulation, you'd find the p value correlates with the existence of an effect better than any other measure. It's random, because data is random. Sometimes a few patients given a good drug will drop dead anyway. Your p value will be high because it should be. It's not a convincing result, get more data.
@geoffdcumming3 жыл бұрын
Thanks for the comment! You are correct that, with p values, we have to deal with weird, counter-intuitive backwards logic! That's at the heart of the problem of statistical significance testing. It's why it's so hard to understand intuitively. Yes, as researchers, we see a p value and would love to know the effect size in the population. Sure, a larger ES will, on average, give a smaller p, BUT there is huge variation in the p value, simply because of sampling variability. That's what the dance illustrates. You can play for yourself with the wonderful 'esci web' software built by Gordon Moore. Runs in any browser: www.esci.thenewstatistics.com/ Click 'dances', then 'dance of the p values' at red 9, bottom panel on left. Click '?' top right in left hand panel to see tips on mouse hover--to explain what's going on. Set small or large ES, small or large N, and you'll see p values dance wildly. Sure, smaller N gives p that is, on average, larger, etc, etc, but in just about every situation p varies widely. The single value of p gives no idea of the extent of uncertainty, whereas a CI does: the length gives us excellent info about the amount of uncertainty. That's all explained in multiple ways, and illustrated, in my 2008 article: tiny.cc/pintervals Here's another way to think about the issue: Suppose you do an initial study and calculate p = .01. What p value would an exact replication (exactly the same, just with a new random sample) give? Turns out that a very wide range of p values are perfectly possible. If initial p = .05, then replication p will, of course, be, on average, a bit larger, but still there is massive uncertainty. That's also explained in that 2008 article, with formulas. For a demo, with explanation, search KZbin for 'significance roulette' and find two videos. Enjoy! Geoff
@KingOfKings343 жыл бұрын
Ryan Faulk sent me here
@geoffdcumming3 жыл бұрын
Thank you to Ryan! Hope you enjoy, and wonder... Geoff
@fringeelements3 жыл бұрын
Something that should be stated: psychology isn't some particularly pathological field. When you say "typical of psychology" - most people by default trust tagged authorities, and so will look for ANYTHING to contain the critique. It shouldn't be like this, they should be able to go, "you know, I don't actually know if any other separated field IS better than psychology in terms of mean or median sample size and/or effect size, so I can't ipso facto declare this problem DOES NOT also apply to every other field." But because it is, it's important to pre-respond to that containment tactic that most people will engage in.
@geoffdcumming3 жыл бұрын
Yep, p values are *highly* variable, whatever the research field! Psychology drank the NHST and p value kool-aid more than half a century ago, and distinguished scholars have been giving cogent critiques of NHST and how it's used ever since. Lots of other research fields have their own version of the same story. It's about time we all moved on, to estimation, whether that is classical (using confidence intervals) or Bayesian (credible intervals), for just about all research situations. If you would like to play with the dance of the p values for yourself, check out Gordon Moore's great 'esci web' software. Runs in any browser and is wonderfully fast: www.esci.thenewstatistics.com/ then choose 'dances', then click at red 9 (bottom panel at left). Click at '?' at top right of left panel to see tips that pop up on mouse hover--to help figure out what you can do. Enjoy! Geoff
@erikgarcia8633 жыл бұрын
I wish I could double like this. Amazing demonstration and fantastic educator!
@geoffdcumming3 жыл бұрын
Thanks Erik, very nice of you to say so! In case of interest, I'll mention that Bob and I are working on a second ed. of our intro textbook, with totally new software. Full info, our blog, and download of the new software (some still being refined) at thenewstatistics.com Also, here's part of a recent reply to an earlier comment: It's such an important idea that the p value is simply not reliable, and not nearly enough folks understand that. Confidence intervals are WAY more informative and useful! We now have software that anyone can run in a browser that allows exploration of the dance of the p values. At our site thenewstatistics.com go to the ESCI menu and click on 'ESCI on the web'. Click (top left) on 'dances' and then explore as you wish. Click '?' (top right) to get mouse-over tips. Click to open Panel 9, Dance of the p values, then explore as you wish. You may also care to search at KZbin for 'Significance Roulette' for yet more demos of how crazy p values are, and see that it's even more crazy that anyone uses them to make any decision that matters. Enjoy! Geoff
@madihaismail29543 жыл бұрын
Are this book and soft paid?
@geoffdcumming3 жыл бұрын
Madiha, Thanks for your interest. At our website www.thenewstatistics.com you can find full details of: --- my first book, Understanding the New Statistics, published by Routledge in 2012 --- our (with Bob Calin-Jageman) intro textbook, Introduction to the New Statistics, published by Routledge in 2016 --- all versions of the ESCI software, for free download --- the esci software, which runs in jamovi, all written in R and open source---all a free download---to serve as the basis for the second edition of our intro textbook, to be published by Routledge next year --- our statistics blog Enjoy, Geoff
@aymanazizuddin1833 жыл бұрын
Watching this in 2021 and wow what a great explanation
@geoffdcumming3 жыл бұрын
Thanks Mohammad, glad you liked it! It's such an important idea that the p value is simply not reliable, and not nearly enough folks understand that. Confidence intervals are WAY more informative and useful! We now have software that anyone can run in a browser that allows exploration of the dance of the p values. At our site www.thenewstatistics.com go to the ESCI menu and click on 'ESCI on the web'. Click (top left) on 'dances' and then explore as you wish. Click '?' (top right) to get mouse-over tips. Click to open Panel 9, Dance of the p values, then explore as you wish. You may also care to search at KZbin for 'Significance Roulette' for yet more demos of how crazy p values are, and see that it's even more crazy that anyone uses them to make any decision that matters. Enjoy! Geoff
@cezreycor4 жыл бұрын
Great video and clearly explained. Many thanks for your help Geoff!
@geoffdcumming4 жыл бұрын
Thanks! As you probably appreciate, for me the key is pictures, good pictures, or, even better, moving pictures, as in dynamic simulations. If I can see a picture I feel I have a chance of understanding. For diversion, you may enjoy searching KZbin for 'dance of the p values' and 'significance roulette' for videos with demos of just how crazy it is to use or rely on p values and statistical significance! May all your confidence intervals be short, Geoff
@cezreycor4 жыл бұрын
@@geoffdcumming Many many thanks again, fantastic learning platform and hope many more students appreciate the hard work you put in your videos! Thanks for the 'dance of the p-values' and 'significance roulette' suggestions, awesome material!
@randvids60724 жыл бұрын
Hi Geoff, you mentioned the width of the CI is predictive in some way of future CI widths, but what value is that to a researcher? Won't the range of the CI bounce around as shown in your simulation, and doesn't that imply that the confidence interval itself will miss the true mean as it shifts along the axis of possible means some percentage of the time? perhaps as often as the pvalue shifts?
@geoffdcumming4 жыл бұрын
Thanks Randvids, You are absolutely correct that both CIs and p values dance around with replication--as the simulation illustrates. The large extent of the dancing, in both cases, is probably way more than most folks would predict or expect. Yep, the world is full of random variation, unless we're lucky enough to be able to work with samples that are huge. By definition, a 95% confidence interval will miss the true population value on 5% of occasions (assuming random sampling, etc)--these are displayed red in ESCI. So, in a lifetime of seeing and interpreting CIs, some unknown 5% will miss what they are estimating. (In real life, those intervals don't come red, unfortunately!) BUT there is a vital difference between the dancing of CIs and p values. Any single CI gives a pretty good idea how wide, how frenetic, the dance is. In real life we get only a single CI, not part of the dance, so it's highly valuable that any one CI gives us a good idea of the extent of uncertainty. We can be 95% confident (no more, but no less) that our single CI has landed so that it includes the true population value. Hooray! In stark contrast, any single p value gives us virtually no information about the dance it came from. The next p value in the dance may be much bigger, or much smaller. But a p value is a single value, sometimes even reported to 3 decimal places (!), which shouts 'accurate', 'trustworthy' at us--despite it telling us very little. In contrast, the single CI makes the uncertainty salient--its length shouts 'there is uncertainty', 'there is doubt'. It even quantifies the extent of that uncertainty, so we can be very happy if we get a very short CI, and be appropriately disappointed and circumspect if we get a very long CI--indicating that our study may have been pretty useless. Overall, it's of great value to a researcher to know how precise any result is. The CI gives the best information available in the data on that. We now have software that anyone can run in a browser that allows exploration of the dance of the p values. At our site www.thenewstatistics.com go to the ESCI menu and click on 'ESCI on the web'. Click (top left) on 'dances' and then explore as you wish. Click '?' (top right) to get mouse-over tips. Click to open Panel 9, Dance of the p values, then explore as you wish. You may also care to search at KZbin for 'Significance Roulette' for yet more demos of how crazy p values are, and see that it's even more crazy that anyone uses them to make any decision that matters. Enjoy! Geoff
@niels_bal4 жыл бұрын
Brilliant video
@geoffdcumming4 жыл бұрын
Thanks Niels! We now have software that anyone can run in a browser that allows exploration of the dance of the p values. At our site www.thenewstatistics.com go to the ESCI menu and click on 'ESCI on the web'. Click (top left) on 'dances' and then explore as you wish. Click '?' (top right) to get mouse-over tips. Click to open Panel 9, Dance of the p values, then explore as you wish. You may also care to search at KZbin for 'Significance Roulette' for yet more demos of how crazy p values are, and see that it's even more crazy that anyone uses them to make any decision that matters. Enjoy! Geoff
@meechos4 жыл бұрын
Has anyone found a jupyter notebook for this? If not I may have a go at it and let me know if you are interested in having a look. Thanks!
@geoffdcumming4 жыл бұрын
Dimitris, Thanks for your interest! I don't know about such a workbook, but we recently released a javascript version by Gordon Moore. Here's the blog post: thenewstatistics.com/itns/2020/08/03/gordons-dances-vivid-simulations-bring-statistical-ideas-alive/ ...scroll down to item 4 for dance of the p values. To access esci-web and play with the dances yourself, go to our site: thenewstatistics.com/itns/ ...then at the ESCI menu click 'ESCI on the Web'. Click to open 'dances' then the bottom checkbox, in Panel 9. For tips, click '?' at the top. Control sampling with the buttons in Panel 3. Turn on sound, adjust speed..., change N and effect size... Enjoy! Geoff P.S. For further statistical diversion, search KZbin for 'significance roulette', to find two videos.
@milrione84254 жыл бұрын
You are the best ever... really...
@margaretsilveriovillalba49474 жыл бұрын
I love how he explains this so clearly. Clears out my anxiety about stats! Thanks so much Geoff!
@geoffdcumming4 жыл бұрын
Thanks Margaret for your warm words. As you probably appreciate, for me the key is pictures, good pictures, or, even better, moving pictures, as in dynamic simulations. If I can see a picture I feel I have a chance of understanding. May all your confidence intervals be short! Geoff
@claire22474 жыл бұрын
Dr. Cumming, this is a job well done. :)
@geoffdcumming4 жыл бұрын
Thanks very much Claire! With Open Science practices marching ahead, meta-analysis only becomes more important, more central. Even beginning students should learn about at least the basics of meta-analysis, and the forest plot makes it all visual and simple. Bob and I are working at the moment on the second edition of our intro text 'Introduction to the new statistics: Estimation, Open Science, and beyond'. May all your confidence intervals be short! Geoff
@milrione84254 жыл бұрын
You are the best instructor ever... Why did I just find this T_T Where were you... hahah
@fringeelements4 жыл бұрын
??? He's right here. And you can bask in his instruction right here.
@lucasteel82634 жыл бұрын
Thanks for this really clear and interesting video! :)
@geoffdcumming4 жыл бұрын
Thanks Luca! You prompted me to do a quick blog post: thenewstatistics.com/itns/2020/07/29/reminder-significance-roulette-still-tells-us-a-p-value-cant-be-trusted/ Geoff
@nikolaikrot85164 жыл бұрын
amazing!
@geoffdcumming4 жыл бұрын
Thanks Nikolai! I agree. I was blown away when I first started playing with these simulations, inspired by a 1995 paper by Frank Schmidt. That led to my videos and a paper explaining how the enormous variability in the p value remains true, even for large N, large true effects. (Yes, the average p may be smaller for large N, large effects, high power, but the amount of dancing around is still enormous.) See tiny.cc/pintervals For more, you may care to search KZbin for 'significance roulette', to find two videos. Enjoy! Geoff
@CienciacomCerteza4 жыл бұрын
@@geoffdcumming Thanks for the video, really exciting and worthy. I'm curious about what paper of F. Schmidt did you talk about...?
@geoffdcumming4 жыл бұрын
@@CienciacomCerteza The article by Frank Schmidt, details below. It was published in the first volume of the new journal 'Psych Methods'. The editor was much criticised for publishing such an article that was so strongly 'undermining' what was then regarded as best practice. History has vindicated both Schmidt and the editor: the article has received more than 1400 citations! See especially Tables 1 and 2, and the discussion around there. Sampling variability is often so large, and has effects that most researchers don't have good intuitions about! Thanks for asking. Enjoy! Geoff Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers June 1996 Psychological Methods 1(2):115-129 DOI: 10.1037/1082-989X.1.2.115
@AngryDrake4 жыл бұрын
That's pretty amazing Excel magic there.
@geoffdcumming4 жыл бұрын
Thanks! For more, you may care to search KZbin for 'significance roulette', to find two videos. Enjoy! Geoff
@mr_jimjamyam86795 жыл бұрын
Hey sir I’m in your class
@KirillBezzubkine5 жыл бұрын
Thx man. Those vids are good
@hendrikbruns35805 жыл бұрын
It would have been important to also mention the fact that the number of observations drawn from the population (i.e. the sample size) increases the "predictability" os inferences from the p-value.
@geoffdcumming5 жыл бұрын
Hendrik, Thanks for your comment. For a given effect size in the population, if we increase N (sample size) then the p values we get are on average smaller. Yes, that's certainly true. But the perhaps surprising thing is that they still bounce around to a very large extent. Statistical power sets the sampling distribution of the p value, so if we increase power by increasing N and/or increasing population effect size, then we shift to a different sampling distribution of p, with a lower mean. But the p interval (meaning the interval within which 80% of p values will lie) is still very long. Consider another approach. Suppose we know only that an initial study gives p=.05. Then a single exact replication will give p that's likely to be very different from that initial p. The distribution of such a replication p is derived, illustrated and explained in my 2008 article below. The remarkable think is that this distribution does not depend on the N of the initial study--assuming the replication has the same N (and everything else) as the initial study. It depends just on that initial p value. Hard to grasp, I know--it took me ages--but note that getting p=.05, for example, with very large N will happen only if the observed effect size is very small. In which case, a replication will give, most likely, a slightly different (but also small) effect size, and this is likely to have a very different p. I think the best way to appreciate just how unreliable p is, even for large samples, is to watch two videos: At KZbin, search for 'significance roulette' A p value is never to be trusted, even with large or very large N! Geoff I discuss all this, with pictures, simulations, and formulas, in tiny.cc/pintervals Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3, 286-300. doi: 10.1111/j.1745-6924.2008.00079.x
@bakubaka44825 жыл бұрын
Alt-Hype brought me here.
@Fortastius4 жыл бұрын
Which video was it where he directed people to this video? I had to find this video manually.
@miodraglovric50935 жыл бұрын
Geoff, what you have done here is a very misguided analysis. You have simply shown that statistical testing based on inadequate sample sizes does not make sense. Question: did you purposefully chose sample sizes 32 leading to the power of only 0.52! Your last sentence in the video should have been: this is a classical case not to perform underpower studies, not that the p-values have erratic behavior. In addition, you have selected huge standard deviations (relative to the means), hence there is a substantial amount of overlap between two populations. Suggestion 1: Increase the sample sizes, achieve better power, and your p-value dance will be totally different. Suggestion 2: Decrease both standard deviations to 10 and keep the sample sizes unchanged (n=32). You will be highly disappointed when you see the results: no p-dances at all. Almost all p-values will be < 0.05. Your video about "p-dances" should be retitled to "how to intentionally perform bad statistics".
@geoffdcumming5 жыл бұрын
Miodrag, thank you for your comment. You are correct that different power will give different proportions of replications that give p<.05. That's true by the definition of power. Sure, with different power, the p value dances are different--they traverse generally lower p values (if power is much higher) and generally higher p values (if power is much lower). But, alas, whatever the power, p values dance wildly. That's the surprising fact about p values, and what the dances video (and ESCI software) is intended to illustrate. In my 2008 paper at tiny.cc/pintervals I discuss the nature and extent of p value dances from a number of different perspectives. I also derive, explain and illustrate what I call 'p intervals'. A p interval is the 80% prediction interval for the p value, following an initial study. These intervals are surprisingly wide (we have empirical evidence to back that claim--published researchers tend to severely underestimate the length of these intervals; see econtent.hogrefe.com/doi/abs/10.1027/1614-2241/a000037 ). For example, following a study that gave a one-tailed p = .01, the p interval for the two-tailed p given by a replication is (.000006, .22), meaning that there is a 10% chance of p<.000006 and a 10% chance of p>.22. Lots of p values dance beyond that already long interval! I chose N=32 and SMD = 0.5 (the effect size in the population) because they are typical for many research fields in social and behavioural science. Surveys of power have shown median power to find a true effect of that size have been around .5 for the last 50 years or so. In that sense, it's roughly 'typical'. For a different exploration and demonstration of the extreme variation in the p value, see my two videos on Significance Roulette. Simply search KZbin for 'significance roulette'. Or the links are: kzbin.info/www/bejne/hZSteqCJZpudiJY kzbin.info/www/bejne/imSYfIapgdxjf68 My summary would be that any use of p values is 'performing bad statistics'! See also the most recent 3 blog posts at www.thenewstatistics.com There's lots happening at the moment in the movement towards better inferential techniques in the 'post p<.05 world'. And that's great progress. Roll on the new statistics! Geoff
@geoffdcumming5 жыл бұрын
@@miodraglovric5093 Moidrag, Thank you for this further comment, and especially for your positive and encouraging remarks. Your 3 paras: 1. Yes, Trafimow & Marks really threw out the baby with the bathwater! In my view it was a great move to outlaw NHST, but they should have continued to evaluate for possible publication manuscripts that used classical or Bayesian estimation. Actually, I hope NHST and p values will quietly wither away, as people grow to understand that they are superfluous and damaging. I hope that heavy-handed banning won't be necessary, but I may easily be wrong. The paper you mention is by committed Bayesians. I know 3 of them well, and have had many discussions with them over the years. We agree on many things, but not on the value of single CIs. They take the hard line, also taken by strict Frequentists, that only the formal definition of a CI can be used for interpretation. Meaning that we can only recite 'my interval comes from a notional infinite sequence of intervals, 95% of which include the population value'. Which on Planet Earth is not much help. In my books and papers I advocate and explain a broader more pragmatic approach, based on familiarity with dances--the dance of the CIs, dance of the p values. In most practical cases it's quite reasonable to say 'I'm reasonably confident that my interval includes the true value, although I always keep in mind that it may be red'. (In ESCI dances of CI, those intervals that miss the population value are shown in red.) 2. Yes, move away form any dichotomous decision making. Having an interval null hypothesis is an advance, but doesn't go the full way to estimation, which I advocate. I'm not a fan of renaming CIs as compatibility intervals, although that may be one of several not-too-bad ways to think of CIs. 3. Yes, high power gives different dances of p values, but even with very high power, when virtually all p values are less than .05, there is still large and erratic jumping around. The sig roulette videos are relevant. Below is a table based on the 2008 paper I mentioned. (I hope it displays reasonably well). pobt p interval Probability(replication p > .05) .001 (.0000003, .14) .17 .01 (.00001, .41) .33 .05 (.0002, .65) .50 .2 (.002, .79) .67 (.33 chance p < .05) 'pobt' is the two-tailed p value given by an initial study (of any size, power, true effect size) 'p interval' is the 80% prediction interval for the two-tailed p given by an exact replication. third column is the probability that the exact replication gives p>.05 So, even obtaining .001 does not guarantee that an exact replication will give even p<.05. About a 1 in 6 risk that a replication is not even significant at .05. Yes, the p value is alarmingly erratic. It dances furiously! Geoff
@shaylagene6 жыл бұрын
Where can I get that tshirt!?
@geoffdcumming6 жыл бұрын
Sorry, collector's item! There's also a blue version, for the intro book, but I'm afraid I haven't been able to arrange on online store to sell that one either :-( For fun, try searching, at KZbin, for 'significance roulette'. Enjoy--and thanks for your interest! Geoff
@olliepontone56956 жыл бұрын
10/10
@geoffdcumming6 жыл бұрын
Thank you Ollie! You may also be interested in Significance Roulette. There are two videos--the easiest way to find them is by searching KZbin for 'significance roulette'. Enjoy! Geoff
@bmatrixxx6 жыл бұрын
Thank you Geoff. That was a nice summary
@geoffdcumming6 жыл бұрын
Thanks! Meta-analysis just keeps on getting more and more important, in this age of the replication crisis and rise of Open Science. Bring on more replication, more meta-analysis, and open data and materials. And may all your confidence intervals be short!
@wknajafi6 жыл бұрын
thanks for amazing videos sharing. I liked your simple and warm presentations
@lindsaycumming8166 жыл бұрын
Thanks! Hey, there's nothing more gripping than a good statistics video, no?! BTW, note that the 'number of views' stated by KZbin is way too low, because the main viewing of all these videos that go with our intro statistics textbook 'Introduction to the New Statistics: Estimation, Open Science, and Beyond' is via the publisher's website that goes with the book, and that site hosts its own copies of the videos--so all that video watching is not reflected in the KZbin viewing numbers. Geoff
@trongxhesse47426 жыл бұрын
Is it possible to have the first sample mean direct on the bottom like in the version before?
@geoffdcumming6 жыл бұрын
Sorry René, I don't understand what you are after. In this video the empirical distribution of sample means--the heap of green dots--is sitting on the lower horizontal axis. That's what ESCI gives us standardly. Geoff
@jishesyoutubechannel7 жыл бұрын
How many of y'all teachers sent you here zzzzzzzzz
@geoffdcumming7 жыл бұрын
Josh, I hope you had sweet dreams, of p values leaping all over the place! To teachers who find this video useful--good on you, way to go! You might also find two more recent videos diverting--more attempts by me to make dramatic some basic statistical ideas. You could use the two links below, or go to KZbin and search for 'Significance Roulette'. Enjoy (as you weep...). Sweet dreams... Geoff tiny.cc/SigRoulette1 tiny.cc/SigRoulette2
@jaredRStudio7 жыл бұрын
Thanks for this Geoff, I was wondering if you were reporting both CI and p-values what interpretation would you trust when they conflict, so if CI don't cross zero but the p value is greater than 0.05?
@geoffdcumming7 жыл бұрын
Thanks Jared. First, in most simple cases the CI and the p value are based on exactly the same statistical model, and therefore there always must be consistency: p=.05 when the 95% CI extends exactly to zero (or other null hypothesised value); p < .05 when zero is outside the CI; and p > .05 when zero is inside the CI. In more complex cases, with measures other than means (e.g. some standardised effect size measures), calculating p values and CIs can involve approximations, and sometimes slightly different approximate calculation approaches are most common for p values, and CIs. Therefore there can sometimes be slight inconsistencies. Note, however, two further vital points: 1. I advise strongly against interpreting CIs merely in terms of inclusion or otherwise of zero. That wastes so much of the information a CI provides, and amounts to mere dichotomous decision making, which is one of the terrible things about NHST and p values. Interpret the whole interval, and don't use p values (or dichotomous decision making) at all! 2. Even if you wish to take note of whether or not the CI includes zero, it's vital to remember the enormous sampling variability of the p value: CIs bounce around because of sampling variability, but at least the extent of any single interval makes this uncertainty salient. In stark contrast (the whole point of the dance demo) the p value varies remarkably, but any single p value hides this. So there's nothing very special about the precise position of the end of a CI, just as there's nothing very special about the precise value of any p. Simply don't take note of either, or how they compare--regard them as fuzzy. Lots more in my books, especially the intro book. See www.thenewstatistics.com If you have the CI, then the p value adds nothing and is likely to mislead. Simply don't use p values! (The journal 'Political Analysis' recently banned p values from its pages.) Geoff
@markremark7 жыл бұрын
Such an eye opener! I'm so glad you came across all of this! The implications are just staggering!
@geoffdcumming7 жыл бұрын
Thanks Mark! I couldn't agree more--I would say that, wouldn't I?! But, yes, most (almost all?) researchers simply don't appreciate how little information a p value gives, and how very different the p value given by a replication is likely to be. Regards, Geoff P.S. Info about our intro statistics textbook, and our statistics blog, are at: www.thenewstatistics.com
@markremark7 жыл бұрын
Brilliant!
@geoffdcumming7 жыл бұрын
Thank you! P.S. You may like two more recent videos: At KZbin search for 'significance roulette' Geoff
@lizard42507 жыл бұрын
That is a sick freakin beat my dude
@geoffdcumming7 жыл бұрын
Thanks Liz! May all your confidence intervals be short... Geoff
@jaredRStudio7 жыл бұрын
Thanks for this Geoff - do you have any spread sheets if you have a mixed design? Thanks
@geoffdcumming7 жыл бұрын
Sorry, no. ESCI doesn't extend that far. In Chapter 15 of ITNS (the intro book, see www.thenewstatistics.com ) we discuss, briefly, designs with one repeated-measure IV and one independent-groups IV, but without ESCI to do the data analysis for an example. Thanks for your interest, and best of luck. Geoff
@isak66267 жыл бұрын
Hi, Does the first book add anything to the second?
@geoffdcumming7 жыл бұрын
Isak, The first book, Understanding The New Statistics (2012), is more technical in places, with formulas and discussion about the non-central t distribution, and about how precision for planning is calculated. The second book, Introduction to The New Statistics (2017), starts from the very beginning and covers the intro course, or a bit more. So the answer to your question is 'not really, except for a bit of techie stuff, formulas, etc'. On the other hand, there's lots in the second book that wasn't in the first---despite the second being an intro book. The second has chapters on regression, and two-way designs, which weren't in the first. The biggest addition is that the second book integrates Open Science all through, starting Chapter 1. The first intro book to do so! Thanks for your interest. I hope ITNS serves you well--together with its improved ESCI and a truckload of materials (videos, data sets, quizzes, examples...) at the companion website. Geoff
@isak66267 жыл бұрын
Thank you!
@zaimahbegum-diamond16607 жыл бұрын
Somebody at a party😂😂😂
@ziadmoh41104 ай бұрын
like if somebody gonna mention standard error at a party!
@furli138 жыл бұрын
Hi Geoff, is the estimation process affected by Type 1 or 2 errors when making multiple comparisons (i.e. in a crossover with more than a pre and post measurements)?
@geoffdcumming8 жыл бұрын
Hi Nico,Thanks for your query. In a sense, estimation side-steps Type 1 and 2 errors, which are defined only in the context of NHST and it's black-white dichotomous decision making. Much better to permit the full range of options in between that estimation considers! But the multiple comparison problem is general, and applies also to estimation. If we examine ten 95% CIs and choose the one that is most interesting, we risk 'seeing faces in the clouds', cherry picking, capitalizing on chance. There is a heightened risk that our chosen CI is one of the 5% that don't include the true value--and in ESCI would be red. We discuss that further in our new intro textbook, info at www.thenewstatistics.com All the best, Geoff
@animel8ve8 жыл бұрын
really really helpful
@halabishki8 жыл бұрын
UR A FKN ANIMAL. thanks so much
@IanRBryce9 жыл бұрын
Some questions: Up front you mentioned Estimation being good... but it is not mentioned again. Does this mean confidence intervals? And... your presentation seems to assume known statistics. How would you handle drug testing, where only the null hypothesis statistics are known (only random effects), because the positive hypothesis (it improves outcomes) is of unknown strength and cannot be modelled. That is why drug trials use only the null hypothesis as set p = .05 in advance.
@Silverwing_999 жыл бұрын
Dear Prof Cummings, thank you so much for this video and you outstanding paper on . Understanding the new statistics. Effect sizes, confidence intervals, and meta-analysis" This was truly an astounding piece of work, so well written, even a simple clinician like myself could understand it Thanks Johnny
@michaelchirico953010 жыл бұрын
"quite large for PSYCHOLOGY"
@st33pestasc3nt10 жыл бұрын
Straw Man? That's not at all what p-values are supposed to mean. Those are errors of human interpretation. Stats profs TRY to teach students how to use NHST, but most are abysmal teachers and most students blot out unpleasant Stats class memories anyway. Therein lies the problem. Then researchers go on to p-hack or overly rely on p-values on their rushed journey to academic fame, not knowing what they're doing. P-value is not the strength of evidence. It's simply the probability of observing an effect that size due to sampling error given the null hypothesis is true, i.e. the chance of making a Type I error if you take that effect seriously. No one ever said it was a standalone beacon of definitive proof. Just a marker for Type I error. But wait, why number error? Oh yeah, there's a type II error too. Anyone remember that from Stats class? From most publications in psychology, social sciences and life sciences, you'd think not. Given there are 2 types of error in NHST, modern researchers' single-minded obsession with type I error seems incredibly naive. Type II error vastly increases as sample size shrinks and as you try to detect smaller "true" effect sizes. Both types contribute to whether one may be wrong in interpreting sample data. In your simulation, the P-value is not very replicable due to type II error. You're not measuring Type I error (since the null hypothesis is not true by design). Under your conditions, replicability of low p-value would be a measure of statistical power (the chance of getting a low p-value given there is a true effect; the complement of type II error). But with low sample (n=32), type II error can be very high and power can be very low. So there becomes a high chance of observing a "not significant" p-value upon replication, despite the true effect in the population. The test simply lacks power to consistently detect the effect. This is all you're showing with the "dance". Remember that p > 0.05 does not mean no effect. It just means your sample lacked the evidence to show it ("fail to reject H0" not "accept H0"). P-value is very consistent with definition if simulated properly (i.e. under null hypothesis). If the true effect size is zero, you'd see 5% or fewer p-values < 0.05 after a "large" number of replications. Exactly what one would expect. What your simulation has uncovered is simply the elephant in the room with NHST, the issue of statistical power that too many researchers ignore. A meta-analysis of recent behaviour ecology publications showed the median power was well under 40% (i.e. chance of Type II error > 60%!!). NHST is not at fault here. Its rules clearly indicate the result is garbage, if one chooses to remember them. Also note most classical statistics rely on asymptotic (large sample) properties. NHST invokes an assumption of Normality of the mean (or regression coefficient or whatever). While Central Limit Theorem proves that true asymptotically, it may be very far from Normal with small samples. With n=32 or lower, you not only get more type II errors but these large sample properties also start to break down and parametric model assumptions may fail. There is a much deeper flaw in experimental design if relying too much on large sample behaviour with small sample data. Though that seems common practice in Psychology for whatever reason. Also note type I and type II errors are just related to sampling errors. Then there's misspecification (applying the wrong statistical distribution or model to the data). And non-sampling errors also exist, often dwarfing sampling error. A dedicated attempt at error analysis, model diagnostics and search for latent biases/confounders should preclude any hypothesis testing. Otherwise garbage in, garbage out. NHST is fine if used properly. However, in using it most researchers appear to be more clueless than Alicia Silverstone. The method is not wrong, just blindly misused.
@MichaelBakunin256 жыл бұрын
Sure, for medium effect size must be n=64 minimum (power=0.80), not n=32.
@MichaelBakunin256 жыл бұрын
And we do not need type I & II errors rhetoric. From Fisher point of view we have no enough evidence to reject null if p>0,05. So redisign or sample increasing will be very much appreciated.
@joshhyyym10 жыл бұрын
I like this guy. Often statistics is presented as so exacting... but it isn't, it shows trends, and levels of noise, and to some degree it's far more valuable to get some analysis of data than to worry about minutia indefinitely.
@melvin622810 жыл бұрын
Wait wait wait, this is all fine and dandy. But what about if there is no effect size, how is the distribution of p-values then? I mean, this is only proves that a medium effect size with n = 32 cannot be found 48% of the time. Isn't the p-value supposed to be 'the final arbiter' in the idea that it is hard to get to a low p-value when there really is no effect? I make the following conventional bet: if the effect size is low (e.g. .1), then the chance of obtaining a p-value smaller than .05 will be around 10%.
@gunnarenglund744510 жыл бұрын
With no effect size the p-value is uniformly distributed on the interval (0,1)