There are various standard deviation rules that are going around; in my opinion (and others), they are all invalid for the same reason that the 2 sd rule is invalid. How many outliers you will identify in a distribution of data that does not have any outliers will dependent on sample size. Ideally, an outlier rule should be relatively independent of sample size.
@RianRizvi8 жыл бұрын
While I agree that, rejecting observations beyond two standard deviations does deform the sample distribution, I don't agree with the main point that this technique is *always* a mistake. This technique validly compares datasets that have a common distribution curve. Consider Normal datasets A and B that we truncate by rejecting readings beyond 2 standard deviations. If the mean of truncated-dataset-A is greater than the mean of truncated-dataset-B, it follows that the mean of A should be greater than the mean of B. A prototypical use case of this technique is for inference using readings that are occasionally wrong. For example, you have thermometers in the field that throw out extremely high wild readings on occasion and at random. You want to find the site with the highest mean temperature. The truncated datasets will give more accurate results.
@lucyk2634 Жыл бұрын
Totally agree but why you cut video in the middle of your sentence? I was looking forward to what you gonna say
@Hautchen8 жыл бұрын
Your criticism of this approach assumes that the identified outliers will be removed from the dataset, but this isn't always why outliers need to be identified.
@nenadantic43626 жыл бұрын
In effect creating a platykurtic sample distribution. Seems quite silly.