Non-statistician here again, and I completely followed your line of thought. I've struggled for um-teen years to understand why would anyone use nonparametric statistics for analysis, but in my case I got stuck in the quagmire of - nonparametric means something like NO PARAMETERS! Which just sounds like throwing a broad net 🥅 to catch what you can. Nonetheless, I've "artificially" created ranks from raw data, and applied the same statistical RANKING techniques for analysis, but I never felt comfortable about it. Truth be told, it always left a taste in my mouth as if my boss was manipulating "stuff" to push it through some procedure just to get things to work. And I'd keep quiet because, well, I'm not a Statistician, and I don't want to sound silly. Now, after viewing this video, I don't feel too bad at all. 👏 Excellent video once again. ✨
@QuantPsychАй бұрын
Thanks! I wouldn't say it's unethical to do nonparametrics, it's just suboptimal. I used to feel the same ick about transformations (e.g., log transformations). It felt like cheating. It's not, but I don't think it's really optimal.
@trini-rt6xnАй бұрын
@QuantPsych I swear you can read minds. You're absolutely right about log transformations. 😆 And I completely understand, and feel comfortable now that you have said that they have their place. I appreciate the "space" you've given me to express myself freely, and I believe this is so because you are able to think outside the box. BAM!👏
@TheBjjninja9 ай бұрын
Dr. Fife 'won't stop, can't stop'
@StatisticsSupreme2 жыл бұрын
But what if your variable is ordnial? Are ranks not the best way to model ordinal data?
@QuantPsych2 жыл бұрын
Good point! I hadn't thought of that.
@seejendo32902 жыл бұрын
I officially need a video from you on what you’re referring to when you say “convergence issues” - the internet is not explaining this well for non-math folk. Pretty please?
@seejendo32902 жыл бұрын
And maybe just some examples of rank based models and other non-parametric models, how they’re supposed to be used, and how they’re used badly in the wild.
@QuantPsych2 жыл бұрын
It's on my to-do list :)
@mahmoudhamza67652 жыл бұрын
Thanks a lot for the videos in general. I am starting to become addicted to ur channel Also, thanks a million for the open discussion. I have a few questions but I will briefly describe what I think first. Please correct me if I am wrong. First, using the normal distribution assumption is very tempting since we have an arsenal of classical tests that are based on it. when the assumption is violated, we have multiple options: 1- use the Central limit theorem to assume the normality of the sampling distribution This has limitations: a- limited by certain statistics such as mean and proportions b- large enough sample size. That's vague. Sometimes a sample as large as 500 observations is not enough in highly skewed data or with extreme outliers 2- nonparametric rank tests [problems: not modeling the data anymore] 3- bootstrapping: I mean here for a test of the difference between two groups, we can create a distribution of mean/median/sd/variance based on each sample and then compare these distributions) 4- Other methods - glm, quantile regression, robust stats My questions are: 1- when is bootstrapping not enough for comparing differences between group(s)? 2- for the other methods in item 4, which one do you use/prefer? a video/resource/thoughts would be highly appreciated 3- This package in R www.danieldsjoberg.com/gtsummary/ is saving me a lot of time. It can very easily formulate summary stats and models as elegant tables in R. However, it is using the nonparametric ranked tests as the default for comparing groups in **table1 summary stats**. Is this acceptable for descriptive statistics - table1 patients' characteristics in statistical analysis? Sorry for the long comment
@QuantPsych2 жыл бұрын
A couple comments: re: central limit theorem. Yes, technically, models are quite robust to normality violations (because of CLT). But, they're not robust to nonlinearity. Unfortunately, nonlinearity and non-normality go hand in hand. re: bootstrapping. Modern robust methods use bootstrapping to estimate probabilities. But, again, that's not a model. I think it's fine for a quick and dirty estimation method, but it's probably better to find the right model. I use generalized linear models. I believe there's a playlist on my channel for those. re: gtsummary. Looks like a cool package for preparing tables. I'll have to check it out. If I understand your question, I don't think it's a problem to do rank tests for a basic demographics table. That's not really your model. I do gripe about doing tests of these, but not because of the type of model chosen. I'm more concerned about people getting distracted from what the actual paper is about.
@mahmoudhamza67652 жыл бұрын
@@QuantPsych That was insightful. Thanks!
@seanmahoney77073 ай бұрын
Thanks for your videos, which I've just discovered and really like. I'd learned that non-parametric data exists (ex. from Linguistics, a perception of a person's accentedness on a scale of 1 to 9, where it's impossible for the layman listener to quantify another person's accent via a scaled metric). Surely non-parametric tests are best for non-parametric data, no? I'll have to look into robust methods, lowest lines, and random forests. Thank you for the hints to alternatives, and for your ever-present sense of humour. Keep up the wonderful work!
@QuantPsych3 ай бұрын
There's no such thing as non-parametric data. The term "parametric" means you have a model trying to estimate a population value (e.g., a slope or an intercept). It sounds like you mean non-ratio or non-interval data. In that situation, an ordered logistic might be the appropriate parametric model.
@AbdullahN82 жыл бұрын
Thanks for the insight.. Can you give practical real-life examples in R when nonparametric are routinely used in biostatistics and when to use robust, loess or random forest in those situations?
@QuantPsych2 жыл бұрын
I'm not sure that loess/robust/RF are the alternative for the situations I'm talking about. I would probably do like a gamma regression model instead of a mann-whitney. But, I'll think about doing another video.
@toad84272 жыл бұрын
The loud music bit, the “oh snap” 😂😂
@MikkoHaavisto1 Жыл бұрын
how had you never heard of ordinal variables? that is stuff for the first statistics course...
@olenapo4895Ай бұрын
Your videos are my new before bed asrm
@QuantPsychАй бұрын
Ha! My loud voice?
@galenseilis59712 жыл бұрын
Thanks for elaborating on your perspective, Dustin. I'll be happy to respond in time. Hopefully not an entire year later though! ;-) I'll post something back here to ping you when I have posted a response.
@QuantPsych2 жыл бұрын
Deal :)
@olgierd245 Жыл бұрын
I wonder what the response would be
@galenseilis5971 Жыл бұрын
@@olgierd245 I am working on a response when time allows.
@galenseilis5971 Жыл бұрын
@@QuantPsych Hmm, well it took almost a year... Somehow the time slips away. I've put a response on my blog. I won't post it here because I think KZbin will automatically delete it anyway, but it should be easy to find. I'll also try to send the link to your Rowan University email.
@galenseilis5971 Жыл бұрын
@@olgierd245 The response can be found on my blog.
@WeirdPatagonia9 ай бұрын
Can I get some insight, I am kind of desparate and it seems that Wilcoxon test is my only way out: I obtained a dataset of a pre-post intervention, with n=10 and no control group. The measurement was conducted using a scale ranging from 0 to 12 to assess the outcome of a physical test before and after the intervention. The objective is to determine if there is a difference due to the intervention. I conducted a Wilcoxon test for paired data, which yielded a significant result - a good start. However, no linear model met the assumptions (not surprising). I attempted a Huber regression, but it didn't yield any changes in the outcome. I also tried modeling the difference between post and pre-intervention scores, and then dichotomizing them into 1 (improved) and 0 (not improved). However, it appears that I lost information as the result turned out to be non-significant. Thus, it seems to me that the only analysis I can perform that adequately accounts for paired data, a small sample size, and doesn't rely on assumptions of normality is the Wilcoxon test.
@QuantPsych9 ай бұрын
Have you plotted your data? Don't use statistical tests to determine if you've violated normality. Look at the plot and see if the fitted line passes through the data. If it doesn't, then you can use generalized linear models instead of a wilcoxen.
@dimitrioskioroglou43166 ай бұрын
I totally agree with you... ranks are not that useful. The way I think it is that ranks result from an underlying latent process. We need to understand and properly model the process, not the ranks which represent a snapshot. It is not the easier thing to do. But better trying tricky stuff than chasing ghosts.
@pedropequeno7353 Жыл бұрын
Thanls for putting my thoughts into words, maybe I am not going crazy
@christheatheist28322 ай бұрын
I feel bad for laughing at "rebel without a correlation" I am suitably ashamed of myself.
@StatisticsSupreme2 жыл бұрын
To me ranks are models - not transformations. With a tranformation you can go back to the original data, even if you lost the original data, because you have a tranformation formula. With ranks, once you lost the original data, you can not go back. Same with other models.
@QuantPsych2 жыл бұрын
I'm not sure what you mean. Ranks are models even though you can untransform them?
@StatisticsSupreme2 жыл бұрын
@@QuantPsych Ranks are models. They are not transformations. Because you can_not untransform them.
@pipertripp2 жыл бұрын
So are we mostly talking about using models for explanation or maybe inference vs prediction here? I'm guessing that you're more interested in trying to explain a phenomenon mathematically and so the nonparameteric models like RF aren't super useful b/c they don't yield something that explains the phenomenon with a closed form expression like a GLM would? Sorry, I'm really new to statistics and this is over my head right now, but definitely interesting.
@QuantPsych2 жыл бұрын
I suppose that's a fair assessment. Yes, if you're just doing prediction, maybe parametric models don't matter as much.
@dryinpan98602 жыл бұрын
You know what would really show Galen? Some forecasting methods in FLEXPLOT... I'm so sorry, I just want to see it.
@QuantPsych2 жыл бұрын
Persistent one, aren't you :) You can file a feature request on github. It's been over 15 years since I've done any forecasting, but maybe it won't be too hard to modify flexplot to handle that.
@dryinpan98602 жыл бұрын
@@QuantPsych Oow sorry to post this in the wrong area. It doesn't HAVE to be in flexplot. It would be great to see you do a series on forecasting in R and working with time series data! Thank you again for all your teachings.
@ikitoki9 ай бұрын
You make me feel guilty for using rank-based, non-parametric tests in the past. But I did not know any better. This is what they taught me to use when the sample groups are so small that I can't test for the normality of the distribution. They also told me that in general, non-parametric tests are less powerful than parametric ones, so I thought it would be better to use a less powerful test and only report the most significant results. I actually thought I was being conservative in using rank-based, non-parametric tests.
@QuantPsych9 ай бұрын
I used to use those a lot too. I don't know that I'd go so far as saying they're bad, they're just not modeling the data. I prefer to model the data.
@JakeCo-pf6ty Жыл бұрын
I suppose this would be less about the model and more about the inferential procedure, but nonparametric methods like the bootstrap can be quite useful and in some cases, not a pit stop, but the end goal (or best general test for a certain quantity). Think of mediation models and the indirect effect: the product of regression coefficients is typically not normal (except in very large samples), so the bootstrap serves as a good-great alternative that won't break down when methods like the Sobel do. (That isn't to say there isn't a parametric procedure for this, you could look at the regression coefficients jointly or PRODCLIN developed by MacKinnon for the product of the coefficients), but these methods would break down when assumptions are violated all the same. In other words, are theoretical sampling distributions /always (or ideally) better than empirical sampling distributions that don't try to force a form or shape to a particular problem? I would say no, but I can see your perspective.