Removing Outliers From a Dataset

  Рет қаралды 384,603

Siobhan O'Toole

Siobhan O'Toole

Күн бұрын

Пікірлер: 47
@theg.ksinstinct9344
@theg.ksinstinct9344 4 ай бұрын
Great video! I did not know before how to get z-scores with this easy way, I always computed it using the formula. More appropriate way to deal with outliers, in case when you don't want to keep them and just remove them, then get the z-score for your variable of interest just as the lady did. Check the z-score and the variable together by both ascending and descending order of the z-score if you want, just as the lady did. Then had to the data section, select cases, select if condition is satisfied, write the formula in the formula portion - ABS(z_price)
@JustKevStockholm
@JustKevStockholm 12 жыл бұрын
Boxplots are also very useful - from a visual standpoint - when checking for outliers :)
@WInspire
@WInspire 7 жыл бұрын
you should do a formal test for outliers, not visual inspection of boxplot
@menzir
@menzir 9 жыл бұрын
Do you have any references I can use to cite this method in my thesis?
@rsx22Z
@rsx22Z 5 жыл бұрын
How can i determine the critical value which helps delimitate the outliers. For instance your value was 3.29, but why >?
@iwillsearchtheuniverse5371
@iwillsearchtheuniverse5371 3 жыл бұрын
yess im still confused abt this
@RectalDesign
@RectalDesign 7 жыл бұрын
God bless you. Every piece of documentation and every guide I've read tells me "just delete the crazy values lol."
@myfictionalthoughts
@myfictionalthoughts 4 жыл бұрын
Thanks for the advice, but how do I know which ones are the actual outliers? My data is not as simple as this one in this video.
@benotoole1990
@benotoole1990 5 жыл бұрын
trying to do this for university and we have the same surname, the universe works in weird ways
@martinbrestovansky7813
@martinbrestovansky7813 5 жыл бұрын
thanks Siobhan, really useful video, but what if I have outliers on both sides of the histogram? How can I set up the filter in Variable View Missing Column in this case?
@sharist9343
@sharist9343 5 жыл бұрын
Hey, thanks for the amazing video. Would it somehow possible to quote the techniques you used? really would like to use this in my thesis. Kind regards :)
@Electrify85
@Electrify85 9 жыл бұрын
Why 3.29? Isn't 1.96 was the key number?
@SiobhanPhD
@SiobhanPhD 9 жыл бұрын
Benjamin Smith 1.96 and 3.29 are both very important numbers in stats. 95% of "normal" data will be within 1.96 standard deviations of the mean. We use 1.96 for lot's of things like 95% confidence intervals, alphas of .05... For outliers however, 1.96 would be far too stringent - we'd eliminate data that really isn't outliers - data points that really do belong in our data set. So to make sure we are only excluding data that really is extreme we use 3.29. 99.9% of "normally distributed" data will be within 3.29 standard deviations of the mean. So if you have a data point that has a z value of 3.6 let's say, it is an extremely unlikely data point. Granted it isn't an impossible one - but it is highly unlikely and therefore considered an outlier.
@chanothebest
@chanothebest 9 жыл бұрын
+Siobhan O'Toole using 3.29 for my data , mine is still not normally distributed. So, would it be right to use 1.96 to make it more normally distributed.
@angelperdomo5
@angelperdomo5 8 жыл бұрын
+Jamie Frederick Howard look at z distribution table. These are standardized scores. 0 lies in the middle (mean)
@mattadoritmo
@mattadoritmo 8 жыл бұрын
Holy crap! Thank you so much for this! This just helped solve a problem I've been working on for a couple days
@Brickkzz
@Brickkzz 8 жыл бұрын
Is there an automated way of removing outliers from positive and negative end? I have a data set of 120 000 cases and hundreds of outliers...
@Brickkzz
@Brickkzz 8 жыл бұрын
+Bob. Okay, I found a solution :) Use DATA -> SELECT CASES... Then in the Select window, choose: If condition is satisfied: *click IF* then type the name of your Z- variable < 3.29 and name of your Z-variable > -3.29. e.g. ZVALUE < 3.29 and ZVALUE >-3.29. This way SPSS will only use the central 99.9% of your cases for analyses :)
@zivleong5151
@zivleong5151 9 жыл бұрын
if i delete it~~the previous test such as normality test and compute mean need to redo?or will change automatically?
@ivantau9753
@ivantau9753 7 жыл бұрын
Well I believe you first need to test the data for outliers and once you did, and remove the outliers, then you can test the data...
@PurpleRawr2
@PurpleRawr2 10 жыл бұрын
Great video. I don't understand the 'id' section though. Is this just the IV? What if I have two IV's? Thanks!
@Mysticflamee
@Mysticflamee 10 жыл бұрын
ID is a subject number.
@SiobhanPhD
@SiobhanPhD 10 жыл бұрын
Gizem ateş Yes exactly, it's just a subject or respondent identification (ID) number. A lot of my students want to use the numbers that are listed in SPSS as ID numbers but that doesn't work. Once you sort the data those numbers no longer line up with the same cases/subjects and then you are lost. If you find a data entry error you have to go through sometimes hundreds of files to figure out which one it is and correct the data entry error. And everyone, EVERYONE, I've ever worked with has made at least a couple of data entry errors. You IVs will go in columns labeled with whatever names you want to give them somewhere to the right of the ID number. Hope that helps.
@ScroogeMD22
@ScroogeMD22 9 жыл бұрын
But if I exclude a case for a certain variable, shouldn't I exclude this case for all other variables too?
@Arch0s
@Arch0s 7 жыл бұрын
I have been looking for an answer to exactly this for some time now. My logical thoughts on it are that it depends on why you are removing an outlier, as you have probably read you shouldn't just exclude because they exceed a standard deviation too many. You need to look at your data and consider if the outlier is a product of an error in data recording for example I suspect you could just remove that one data point.. e.g. height recorded as 182m which is impossible but reasonable if in CM - recording error. Perhaps with social sciences there were confounding / external variables that impacted upon that particular cases' data. If that is the case I suspect the correct method would be to remove the case entirely as you could argue that the external factors may/would have affected all of the data. Would very much welcome any other thoughts on this.
@MrKKrid
@MrKKrid 5 жыл бұрын
how i define PC mom column? It use mean or something ?
@KittyFIxX
@KittyFIxX 11 жыл бұрын
i love the deleting technique :)
@SiobhanPhD
@SiobhanPhD 10 жыл бұрын
It's not my go to move - but sometimes it's the easiest way. A quick fix when I just need to move on!
@vkunst85
@vkunst85 9 жыл бұрын
Thank you for your video, do you perhaps know who I can cite for this method?
@novitaasastr
@novitaasastr 8 жыл бұрын
hello i want to ask, if i want to do the run test, i have to use the new data or old data that still have the outlier? *sorry for my bad english
@ivantau9753
@ivantau9753 7 жыл бұрын
I believe, once finding the outlier you need to just remove it from the existing data. Imagine someone conducting an experiment needs to throw all his data just bcs of the outliers, that would be ridiculous....If I understood your question correctly...
@whiteshadow59
@whiteshadow59 11 жыл бұрын
thank you very nice explanation, been frustrated all day, thanks
@chunnjh94
@chunnjh94 10 жыл бұрын
Very useful video! Thanks for uploading!
@chiangtongseng777
@chiangtongseng777 10 жыл бұрын
what is PC_mom?
@SiobhanPhD
@SiobhanPhD 10 жыл бұрын
Truly, I don't recall exactly but it was something like Parental Compassion, which we calculated separately for Mom and Dad so we had a PC_Mom and a PC_Dad for that analysis and this video just shows the PC_Mom.
@jamietan3288
@jamietan3288 9 жыл бұрын
+Siobhan O'Toole PC_mon is mean or sum? is it a items or variable?
@AbdulRazak-cz3rq
@AbdulRazak-cz3rq 7 жыл бұрын
yes, I want to know that too +Siobhan O'Toole what is PC_mom? the mean of the pc_mom1,pc_mom2.....etc?
@weihuang6207
@weihuang6207 2 жыл бұрын
Very helpful. Thank you so much!
@SiobhanPhD
@SiobhanPhD 12 жыл бұрын
Quite true.
@SiobhanPhD
@SiobhanPhD 12 жыл бұрын
But I'm a numbers girl myself!
@acru518
@acru518 4 жыл бұрын
Thanks a lot
@ShahbazAhmedENG
@ShahbazAhmedENG 3 жыл бұрын
Hoaglins method use 1987
@azapura
@azapura 11 жыл бұрын
nicely explained but you assume too much of the average semi-educated student...
@cahpintermail
@cahpintermail Жыл бұрын
Hi, I wonder if you can give me some references on literature in doing this deletion on outliers.. Thank you in advance.
Identifying Multivariate Outliers with Mahalanobis Distance in SPSS
8:24
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН
How to detect outliers in SPSS
7:52
how2stats
Рет қаралды 292 М.
Detecting and dealing with outliers
8:08
educresem
Рет қаралды 62 М.
Boxplots & Outliers in SPSS - Identify and Deal with Outliers (4-8)
12:05
Research By Design
Рет қаралды 96 М.
Removing Outliers From a Dataset
5:29
Dr. Vimal
Рет қаралды 984
What is mathematical thinking actually like?
9:44
Benjamin Keep, PhD, JD
Рет қаралды 8 М.
Identifying Outliers in SPSS
10:13
Dr. Todd Grande
Рет қаралды 108 М.
Correlations and validity
4:44
Siobhan O'Toole
Рет қаралды 91 М.
How to use excel to calculate z score and find outliers
7:08
DrJiang Jingze
Рет қаралды 13 М.
Dealing with an outlier - Winsorize
4:47
how2stats
Рет қаралды 60 М.
The Dome Paradox: A Loophole in Newton's Laws
22:59
Up and Atom
Рет қаралды 658 М.