A Two Step Transformation to Normality in SPSS

Рет қаралды 139,594

Күн бұрын

Пікірлер: 271

@mikemmoon 9 жыл бұрын

Wow! I have been trying every transformation under the sun for several of my variables for 2 straight weeks with no luck. This is like magic. Now I just might finish my PhD dissertation by the end of the summer after all. Many thanks!

@abigailfulton9185 5 жыл бұрын

For anyone who doesn't know you use the series mean and standard deviation that he uses in the video. IT WORKS AND HE SAVED MY LIFE!

@chukwuemekaemenekwe746 9 жыл бұрын

Gary your tutorial just saved my day. Been struggling with different transformation techniques. Seeing yours just brightened my day. Danke!!!

@oscaronam7862 9 жыл бұрын

Thanks a lot Gary, I had been struggling to normalize my skewed data but when I used the two steps in your paper and video that you explain clearly, my data is now normal - confirmed by Kolmogorov-Smirnov and Shapiro-Wilk tests. Very helpful video!

@gftempleton 9 жыл бұрын

+Oscar Onam That's great, Oscar. Good luck.

@aristrolltle8580 2 жыл бұрын

Sometimes, the KS test and SW test refutes the normality hypothesis, although the skewness and kurtosis values are ok.

@ellieking4132 5 жыл бұрын

I was really struggling to work out how to make my data normally distributed in order to do my analysis and this video has saved me. Thank you so much for taking the time to share this method with is, and answer our queries! I really appreciate it :)

@riderho1 5 жыл бұрын

Ellie King where did he get the second and third value ? (,?,?)

@ellieking4132 5 жыл бұрын

@@riderho1 My understanding (and what I used) is the second and third values are the mean and SD of the variable that you are transforming.

@khalidrehman4202 7 жыл бұрын

thank u so much...this video is too much informative..my data are not normally distributed but after watching this video..i apply this procedure.now my data was normal.

@arturogarcialomeli5745 4 жыл бұрын

You save my day, thanks a lot, i think fi you want to obtain the mean and de standard deviation you need to process you data before apply this method, you are gonna be cited in my thesis!!!!!

@asmaae1993 4 жыл бұрын

Hi Mr Arturo Do you have any idea about how we can transform back the data from this form when we want to report the results!! Thank you

@learningwithms8293 2 жыл бұрын

Thanks a lot, Dr. Templeton, It is really helpful, I used mentioned process and found it to be useful not only for me but also for my entire department.

@gftempleton 9 жыл бұрын

People often ask why their sample size is reduced by 1 when using this technique. The reason this happens is as a result of the first step, the values range from 1/n to 1. All values must be a fraction for step 2 to work, so it skips over the 1 (associated with the biggest value). In order to fix this, you should replace the missing value (the result of applying step 2 to the 1) with 1-(1/n). For example, if you start with a sample of 1,000, the Two-Step will likely result in a sample of 999. To use the missing record, you'll need to find it (it's the "1" value resulting from the first step among all cases). Replace the 1 with 1-(1/1000), or 1-.001, or .999. This won't change results much, but will ensure every case is used. Of course, I'd put a small note in the paper about any transformation step needed.

@Marie-sh6zm 8 жыл бұрын

Hi i would like to ask, how many times am i allowed to normalized the same data? thanks in advance!

@gftempleton 7 жыл бұрын

In my opinion, there are no rules as long as you report exactly what you have done. Let the reviewers or advisors help you.

@amadeo3844 7 жыл бұрын

Is there an automated way to do this replacement? I have over one-hundred columns of data ranked and each one has a value of "1.00". Do i manually need to go in and find the 1.00 in each column and change it to .999?

@aaljumaili 4 жыл бұрын

This video has great value. Thank you so much, Gary for saving my day

@gftempleton 9 жыл бұрын

When using this technique in research, it may help in the peer review process to cite the published article referenced at the end of the video: Templeton, G.F. 2011. "A Two-Step Approach for Transforming Continuous Variables to Normal: Implications and Recommendations for IS Research," Communications of the AIS, Vol. 28, Article 4.

@charlottekik6475 8 жыл бұрын

Thank you so much!

@gftempleton 8 жыл бұрын

You're welcome, Charlotte!

@melissacagle3707 5 жыл бұрын

Hi Gary! Should I transform my variables like this for conducting a Principle Component Analysis in order to form an index? Thank you for your video!

@veronicawong9023 4 жыл бұрын

Thank you so much...its 2021 now ur video save my life !

@Dailyvidsboom 3 жыл бұрын

@@veronicawong9023 orang mana?

@jjabb 4 жыл бұрын

Thank you. This saved my life. I had been struggling with numerous transformations, but did not work. It worked for me. I also used the 1-(1/n). You get a citation from me.

@shahmahmood3908 9 жыл бұрын

My question is, from where you wrote the value for the second question mark (?) and from the third question mark (?). You didn't show that from where you got the series mean that copied and past from notepad and the standard deviation. I will really appreciate if you could help me in this regards. IDF.NORMAL(RDistanc,?,?)

@norwegianresearchtraininginsti 4 жыл бұрын

did you find out the answer to this your question you share with me. He has mentioned he copied them from Notepad that is what I have heard

@juliabachmann639 4 жыл бұрын

@@norwegianresearchtraininginsti you can find it in his paper, it says: To accomplish Step 2 in Excel, use the NORMINV() function, having the following syntax: NORMINV(Step 1 result, imposed mean, imposed standard deviation) Where, Step 1 result = the result of Step 1, which must be in probability form Imposed mean = mean of the variable resulting from the transformation (!!) Imposed standard deviation = standard deviation of the resulting variable (!!)

@norwegianresearchtraininginsti 4 жыл бұрын

@@juliabachmann639 I will check that his paper Iam interested in that method

@doancongthanh93 4 жыл бұрын

@@juliabachmann639 I think it's the mean of the variable that has been transformed. You can see these values in the Histogram chart at 2.33

@juliabachmann639 4 жыл бұрын

@@doancongthanh93 ahh okay, I see. thank you- that's very helpful!!

@alexjimenez9940 4 жыл бұрын

What a great solution! Thank you very much Gary for your help!

@marhan2757 9 жыл бұрын

I guess, the interpretation does not change much because of this transformation, i.e. st. deviations and means stay the same, while kurtosis and skewness significantly improve. Also, this technique solves the problem with outliers (that are actually not). thanks a lot for such a great solution!

@Afra.Rezagholizadeh 4 жыл бұрын

I did all the steps several times and used my own data series' mean & standard deviation, his numbers , and 0,1 but every time this error appears: >At least one of the arguments to the IDF.NORMAL function is out of range. The >first argument (probability) must be positive and less than one. The third >argument must be positive. The result has been set to the system-missing >value. what should I do? when I used 0,1 and my numbers despite of this error series of new data appeared as normalized data but I don't know they are reliable or not...

@gftempleton 4 жыл бұрын

Did you use three arguments? The syntax for the second step requires 1) the result of step 1 (this is in fractional or probability form, 2) mean, 3) standard deviation.

@Afra.Rezagholizadeh 4 жыл бұрын

@@gftempleton thanks for your answer. yes I did. for both data series this error came up but also new columns was added to my spss worksheet! I'm gonna use them but considering these errors I don't know how reliable are they...

@FooodConfusion 3 жыл бұрын

Wow so easy Loved your way of explanation simple and to the point

@jaimeb.384 7 жыл бұрын

What mean and standard deviation are you using? It is not clear in the video.

@kamalpreetrakhra8071 9 жыл бұрын

Hello Dr. Templeton, I am a PhD student and have found that using the technique mentioned greatly improves the skewness and kurtosis value. However data is still not normally distributed. I have also tried log10 and loge transformations. Is there anything else that I can use? I dont have the option of dichotomizing the data. Please can I have the copy of your article for further details

@tsedesiree 2 жыл бұрын

Hi Kamalpreet, I got the same problem too. I applied Dr. Templeton's technique but my data is still skewed. Did you figure out how to transform the data into normal distribution? I want to perform a 3way anova so can't just use KW test. Appreciate if you could get back, thanks:)

@duckhunterforex7577 2 жыл бұрын

@@tsedesiree Same here, hope you already found the answer and would share it with us all, looking forward. Thank you in advance

@10VGomez 2 жыл бұрын

Hi, I'm also interested in normalizing a variabe. I have used: ln(x), log(10), 1/x, sqrt(x) and this method but nothing works.. I have heard about johnson transformation method. I haven't tried yet, but it said this method works almost always since it finds an optimal function that normalize your data. Let me try and I will tell you, If somebody knows how to use this method in spss please share the info =)

@smrutimokal7452 4 жыл бұрын

Thanks Gary for the wonderful video and the article. I always have trouble when normalising the data since transformation like log doesn’t usually work.. but this is great .. simply wonderful. Thank u again

@gftempleton 4 жыл бұрын

I'm glad it helped, Smruti.

@gftempleton 4 жыл бұрын

Awesome to hear that it worked for you, Smruti. Good luck on your research.

@fatimabezbiq4212 2 жыл бұрын

We make transformation only for dependent variable? Or for all variables of our model?

@gftempleton Жыл бұрын

There is no rule when you are trying to satisfy the assumptions of the test. Only you should report all procedures.

@fatimabezbiq4212 Жыл бұрын

@@gftempleton Thank you

@monicaalas4421 5 жыл бұрын

Once we have normality, how can I run a regression with the original data (taking into account the normalized data) so that I can use it in my predictive model?

@riderho1 5 жыл бұрын

Monica Alas hi monica,mind sharing how did u obtain the predictive model ? And the criteria taken into consideration like Correlation matrix,etc?

@fatosakbulut3171 5 жыл бұрын

Hi Mr.Templeton. I have 8 groups of data to analyze. Some of them are normal and some of them are not. Should I implement your method for all of the groups in order to compare them in one -way Anova analysis? Please please help...

@abigailfulton9185 5 жыл бұрын

Where do you get the series mean from and the standard deviation? Please can anyone help!

@gftempleton 5 жыл бұрын

Calculate the mean (average) and standard deviation from the original data. Use those in the second step if you want to approximate original units.

@oliviapenaramirez4379 4 жыл бұрын

Thanks Gary. But every time I run the trasnformation, this error appears. >At least one of the arguments to the IDF.NORMAL function is out of range. The >first argument (probability) must be positive and less than one. The third >argument must be positive. The result has been set to the system-missing >value. What should I do? Where do I copy the mean and the standard deviation? Thanks in advance

@gftempleton 4 жыл бұрын

Step 2 will not work on 0's or 1's. If the problem is a 0, convert it to 1/n as an estimate. If the problem is a 1, convert it using 1-(1/n) to estimate.

@ker329 Жыл бұрын

Hi Gary. I followed all steps but I got warning 4940 at least one of the argument in idf normal function is out of range! would you know why?

@fatihaelagri7753 3 жыл бұрын

Hello My question is, i have a serie whose distribution does not follow the normal distribution, I tried the logarithmic transformation on Eviews but the p-value of jarque-bera is always lower than 0.05 So what transformation to do in Eviews?

@gftempleton 3 жыл бұрын

Eviews has each step. The first is a fractional rank (rank represented in proportions) and the second is a normal inverse function. It appears "Normal (Gaussian)" is showing you this here: www.eviews.com/help/helpintro.html#page/content/mathapp-Statistical_Distribution_Functions.html

@AmrArafat 4 жыл бұрын

Can you perform this method in stata, please? Thanks

@jayaprakashsalian1804 6 жыл бұрын

how do we get the value back from transformed data i.e after i perform transformation i do regression using normal value now after the result i need to know how to get the actual data from transformed data. For logarithmic transform we use the base to get the value back how do we do it here

@aishasebunya2675 4 жыл бұрын

Gary, thank you so much. This is awesome. All other methods failed for my work. I really appreciate this and will of course cite you :)

@gftempleton 4 жыл бұрын

I'm glad it worked for you, Aisha. Good luck on your research.

@alice-nckucsielee8265 4 жыл бұрын

@@gftempleton Thank you so much but a lot of people are asking for the true of mean and STD mystery XD

@gftempleton 4 жыл бұрын

@@alice-nckucsielee8265 I'm not sure I understand your question. Units are interpreted as "normalized x." I hope that helps.

@alice-nckucsielee8265 4 жыл бұрын

@@gftempleton Hi Gary, thanks for replying. I meant the mean and the standard deviation we have to put into the quote at 1:38. What should we put into it? the original mean and STD or after step1 transformation mean and STD :)?

@tarignassr6690 6 жыл бұрын

Hi, You didn't show that from where you got the series mean that copied and past from notepad and the standard deviation.

@tonaliayengia6710 Ай бұрын

Can anyone please tell where did you get the series mean and standard deviation from in the last step.

@norwegianresearchtraininginsti 4 жыл бұрын

My question is the same like Shah 4 years ago where did you get the values you included series mean and standard deviation

@franck-paulinpehnmayo7148 2 жыл бұрын

From the original data.... Mean and SD

@researchory 8 жыл бұрын

Thanks Gray. How to interpret coefficients after converting dependent variable using IDF.normal function? for example, if one unit increase in the independent variable, how it is affecting the dependent variable? Thanks,

@gftempleton 8 жыл бұрын

Assuming you transform using the series mean and standard deviation, Interpret exactly the same as you would original units. I would note that you normalized the original units. Alternatively, you can transform using mean=0 and sd=1 and interpret as standardized normal original units.

@mouradelhanafii8198 4 жыл бұрын

In case we need to describe this procedure in data analysis or results, what should we mention exactly along with the reference?

@gftempleton 4 жыл бұрын

If you are asking about units, just say the results are in normalized units. Of course, you would explain you used the Two-Step in the methods. Models using original and transformed (e.g., natural log or Two-Step) are separate models (i.e., different error terms) and should be interpreted differently (this is not so obvious to some but models are commonly treated distinctly). Regarding the method, step 1 is simply a fractional rank and step 2 is the application of the inverse normal function applied to the results of step 1.

@mouradelhanafii8198 4 жыл бұрын

Gary Templeton THANK YOU SO MUCH, indeed.

@alice-nckucsielee8265 4 жыл бұрын

I found that when my variable transformed through the fractional rank and those became 1 will be blank after transformed to IDF.normal, anyone encounters the same situation as I did?

@alice-nckucsielee8265 4 жыл бұрын

I found that those are the top (the biggest) values in their group.

@gftempleton 4 жыл бұрын

Convert any missing value resulting from any Step 1 result of "1" to 1-1/n. Then, apply Step 2. Report this in your methods.

@alice-nckucsielee8265 4 жыл бұрын

@@gftempleton I think I got it!!!! Thanks for your kindness in replying so fast, I do need some help quite urgent. You are awesome!!!!

@linduchyable 8 жыл бұрын

hello, i have a problem i need a help with.Is the process of removing outliers from a variable more than one time considered manipulating or changing the data?i have loans for public. its mean .17093 st.dv .955838 skewness 7.571 kurtosis 61.436 most of the cases of this loan is an outliers after several times of ranking and replacing the missing values with the mean i reach this output mean .2970 stdv .22582 skewness 2.301 kurtisos 3.885 and it ends ub to be positively skewed. i dont know what to do shall i keep it this way or take thevery first one or do i have to continue knowing that the percentiles 5, 10, 25,50 and 75 ends up with the same number.2072. And i still have to do the regression please help:(

@陳姿蓉-r7s Жыл бұрын

Hello, I followed the steps above. And one of Fractional rank value was 1, and it would be a missing data(no data shown) after the transformation . I don’t know how to solve the problems 😅 Look forward to your reply!

@陳姿蓉-r7s Жыл бұрын

I got that answer in previous comment! Thanks!

@saro4761 8 жыл бұрын

Thanks Gary for this absolutely great video.

@jayaprakashksalian 6 жыл бұрын

Hi, I just have one doubt if we want to convert back to absolute value how can we we do that.. for example i have regression model and i converted the dependent variable and now i want to see what will be the absolute value of y

@gftempleton 6 жыл бұрын

It depends on the units you use. There are two basic uses of the Two-Step: 1) convert to standardized units (use mean=0 and sd=1 in the second step) or 2) convert to normalized original units (use original series mean and sd). So, the interpretation depends on usage. If you use the second step, there is no reason to convert back as you are in the original units (just normalized).

@jayaprakashksalian 6 жыл бұрын

Gary Templeton thanks a lot

@jayaprakashksalian 6 жыл бұрын

Just one more question to be sure, so after applying this method for transformation the regression equation remains same i.e y=b0+x1*b1+x2*b3 and it doesn't change like it changes when we do log transformation.

@gftempleton 6 жыл бұрын

Reverting back isn't necessary if you transform using the original series mean and standard deviation. You are already in the original units. Also, remember that using the exponential function to revert back to original units from logged units is problematic when some original values are negative. In that case, the natural log would produce missing values. To avoid this, researchers will shift the values so none of them are negative, then do the natural log transformation. This means reverting back using the exponential is useless, unless the preconditioning is reversed appropriately. The natural log has many flaws and is inferior to the Two-Step in achieving normality and achieving significant results. See: Templeton, G.F. and Burney, L. 2017 . “Using a Two-Step Transformation to Address Non-Normality from a Business Value of Information Technology Perspective,” Journal of Information Systems, Vol. 31, No. 2, pp. 149-164.

@jayaprakashksalian 6 жыл бұрын

I am really thank full for your response.. Thanks a lot

@fizzaabidi.3094 Жыл бұрын

Everything is ok but after using this method it give me outlier what should i do

@kanya1998 9 жыл бұрын

much appreciated Mr. Gary, it works perfectly well!

@gftempleton 9 жыл бұрын

+mahirwe anthony Great and good luck!

@amalhussein9960 4 жыл бұрын

Thanks Gary Templeton for this informative video. After doing the two steps how can we interpret the output of the regression analysis

@gftempleton 4 жыл бұрын

Not original units, but normalized units Example If you transform assets to normal and put it in an equation, it is interpreted as normalized assets It's the same as with any transformation

@farhaniqbal3421 4 жыл бұрын

Thanks Gary. One Question is about the shapiro test.Once I transformed my variables, all improved in terms of skewness and kurtosis. However, the shaprio-wilk test still shows non-normal distribution (p

@gftempleton 4 жыл бұрын

Feel free to try other transformations (e.g., natural log, power transformations, truncating, winsorizing). However, that is time consuming. If you can find a statistical package that uses Box-Cox, which is tests many different power options, that may be a good use of time. However, reviewers of your work could also tolerate that you attempted a normality transformation that improved the situation. Worst case, you'll have to use non-parametric procedures (which, coincidentally utilize transformations - usually ranking).

@madiharazzam1098 7 жыл бұрын

this is not for Lickert Scale. How to transform data of Lickert Scale.please help

@elijahd.spragueph.d8905 Жыл бұрын

Can these steps be used after taking ordinal questions and converting them to scale in SPSS?

@dashama 9 жыл бұрын

Loved your video! Blessings and Love, Dashama

@afaquehussain7678 4 жыл бұрын

Qno.1 I have 3 dependent variables. Two of them are in range of normal skewness value i.e. +1 to -1 and have kurtosis in range of +3 to -3, but the third remaining dependent variable is not in normal range of skewnes or kurtosis. I want to transform that variable with square root transform to run parametric tests. So the question is, Can I transform that one variable only and run parametric test on the variables or I should transform all three variables before doing test? should I transform all three variables together even the two of them are already normally distributed? will it create problems to transform only one non normal variable? q.no.2 Can I infer and interpret my data for normality on the basis of skewness and kurtosis only rather than gooing for shapiro wilk test?

@MrFantastic161 8 жыл бұрын

Thank you so much for this! I'm currently doing my dissertation and the non-normal data kind of shot me in the foot for the proposed analytical methods. Much appreciated!

@gershomhabile7215 2 жыл бұрын

Very useful information but I'm getting lost where you are copying the mean and standard deviation as such I'm stuck. Kindly help where to copy the mean and the standard deviation. You only mentioned that you copy from your notepad, but what about me, where do I copy from? I'm stuck, someone help asap please.

@franck-paulinpehnmayo7148 2 жыл бұрын

It's from the original data

@l.briant3537 6 жыл бұрын

Great video Gary, thank you. Just like below, I have some of my variables having an "out of range" error: >At least one of the arguments to the IDF.NORMAL function is out of range. The >first argument (probability) must be positive and less than one. The third >argument must be positive. The result has been set to the system-missing >value. Why is this the case?

@sepiahell1417 3 жыл бұрын

same :( can anyone help plsssssss

@eda1976bdy 9 жыл бұрын

Yes it is a great explanation but i have try on my own data but unable to get normality..i've used log10 and sqrt. the results still the same...a bit changes but no changes on normality. what to do. ple advice.tq

@oumelkhirmoulay1416 2 жыл бұрын

Thank u very much for this video pkease i have a question about the nature of this transformation i want to write a sentence to explaine a methods of transformation like this : "the data were arcsin transformed" please if i use this method RV.Normal. what i say

@gftempleton 2 жыл бұрын

Just say it was transformed to normal using a two-step procedure described in Templeton (2011). The full reference is at the end of the video.

@ibrahimsaid28 6 жыл бұрын

Thank you for posting it; then Which method is better to normalize data ?; and what if all methods (log; ln; sqrt; trunc fail to normalize my data?

@theasset7472 Ай бұрын

which type of transformation is it?? what is it called???

@gftempleton Ай бұрын

I call it "the two-step"

@mostafajerari7560 2 жыл бұрын

Thank you for your effort. I would like to know how to achieve normality of several variables at once (not one by one). Thanks for another time.

@accazen5674 Жыл бұрын

How to report his test in apa style?

@gftempleton Жыл бұрын

Cite this article: Templeton, G. F. (2011). A Two-Step Approach for Transforming Continuous Variables to Normal: Implications and Recommendations for IS Research. Communications of the Association for Information Systems, 28, pp-pp. doi.org/10.17705/1CAIS.02804

@spz145 9 жыл бұрын

Hi, thanks for the share. I tried the method, and it works to normalize the dataset, however, why the sample size is reduced after the procedure? For example, why the sample size reduced from 6843 to 6842 above? Would that affect the conclusion?

@alauddinmohammad1517 7 жыл бұрын

Please give us a reasonable answer why the sample is decreasing after two-step process. Why the missing figure is coming? How can we interpret this problem in research paper? Thank you in advance.

@porscheboddicker1443 9 жыл бұрын

Why do we need to do 2 steps? Can't I just use the fractional rank? For example, my BMI variable was skewed and we wanted to do a GEE with that. Can't I just use the fractional rank as my new BMI?

@chetanasanghavi1576 4 жыл бұрын

How do i write an equation using IDF normal function: For log i can use -- Returns = a + bo log (Beta) + b1 log (Leverage). How do i write equation using this function?

@gftempleton 4 жыл бұрын

I personally use "TS" as in... TS(Beta) TS(Leverage)

@chetanasanghavi1576 4 жыл бұрын

@@gftempleton ok thank you

@gftempleton 4 жыл бұрын

@@chetanasanghavi1576 You can also look in articles that published using the procedure: scholar.google.com/scholar?oi=bibs&hl=en&cites=6281913823923514896

@franciscosanchez-narvaez9474 6 жыл бұрын

Thank you Gary, your tutorial is very clearly and helpful

@gftempleton 6 жыл бұрын

Thanks, Francisco!

@schummanr 8 жыл бұрын

Thanks for the video and the reference, Prof Templeton. When computing the Fractional Rank of some of my variables I end up having a value =1 (highest value on that variable), which then creates an "out of range" error on the IDF.Normal function as the range of values it accepts is 0 to less than 1. This does not happen with all variables, but just some. Any hints as to why this happens and how to address this in this transformation to normality would be appreciate it. Thanks

@amadeo3844 7 жыл бұрын

I too would like to know how to correct this problem, as this results in missingness in the data that I would like to avoid.

@l.briant3537 6 жыл бұрын

Hi Fidel Vila, I think I've worked it out. I think there must be some rounding errors, which means that the probability (first argument) ends up being interpreted as being out of range (it has to be within 0 and 1). I'm not sure how this happens (perhaps the calculations for the mean and SD need to be to more significant places, but I've tried this and it doesn't remove the error), but I have worked out a fix which is a bit of a "fudge": Say you have a variable X to be normalised, with mean MEAN and standard deviation SD. Lets suppose you have conducted the fractional rank and made a variable RX. You then do the following: >COMPUTE X_norm=IDF.NORMAL(RX/1.001,MEAN,SD). >EXECUTE. Dividing RX by 1.001 ensures that the variable is kept within the allowed range. (Although I repeat: I am not sure why it is interpreted as being out of range - as far as I can see, my variables all fall into 0 and 1, so it must be to do with rounding errors for the mean and SD calculations). Hope this helps!

@ellieking4132 5 жыл бұрын

@@l.briant3537 Thank you SO SO much for this! I was genuinely despairing about the method not working, and your solution worked perfectly!!!

@annisahermawanputri6959 3 жыл бұрын

Hello, i have question. I use this method to fix my normality test and it works, but i still struggle with heteroscedasticity and autocorrelation. Can i do a transformation again after i do this 'two step transformation'? thank you

@gftempleton 3 жыл бұрын

Annisa - If your original variables were extremely non-normally distributed, you likely had those problems before the transformation. I would think the Two-Step helped those conditions. There are infinite transformations you can take, but the first one I'd try is the Two-Step because it attempts to transform your variable to perfect normality - usually a good thing for dealing with outliers. In short, you can do whatever you want as long as you report it to your advisor, teacher, or reviewers.

@annisahermawanputri6959 3 жыл бұрын

@@gftempleton yes i agree that this two step transformation is amazing, it does make my variables distributed normally, thanks to you:) I don't know what happen since i am not a statistician, but my variable still doesn't able to pass Durbin Watson & Glejser test. But, yes i will ask my lecturer for a solution. thank you again!

@gftempleton 3 жыл бұрын

@@annisahermawanputri6959 Many studies are published without technically achieving normality based on diagnostic tests. Many interpretations of parametric procedures (i.e., those assuming a normality) actually work with what is called "asymptotic normality," which is not technical or exact normality, but near normality. So, you only need asymptotic normality, not normality, to do parametric tests (like Pearson's r, factor analysis, and linear regression).

@sasali6727 7 жыл бұрын

Gary, I ran several times the procedure on both SPSS and EXCEL using the same data set. Apparently, the outputs are inconsistent. Not sure what might cause the difference. I double checked the formula as well described on your paper. Here is the excel formula: To get the percent rank =IF(B4="","",IF(PERCENTRANK(B$2:B$50,B4)=1,0.9999,IF(PERCENTRANK(B$2:B$50,B4)=0,0.0001,PERCENTRANK(B$2:B$50,B4)))) To get the inverse of the Cumulative Normal Distribution =IF(B115="","",NORMINV(B115,0,1)) Running data set with replaced outliers with mean and on the original data produce some significant changes. So, replacing outliers with means doesn't look a reliable method to apply. Now, I am thinking to Winsorize my original data? Do you have any recommendation on it to not miss any single outlier? My data is both hugely negatively skewed and has outliers. They make it hard to figure what is the best way to do. I am think to Robust Statistics as well given my data. Any thought on that? Huge thanks.

@progress410 5 жыл бұрын

Thank you very much for wonderful method, may I ask a question please. I have the data that including 2 group in 1 variable when I use this method, should I split file for separate group, because when I compare mean difference by T-test statistic the result will be changing? For example I have walking speed for 1500 persons this data incliding heart disease person (N=245)and non-heart disease person. When I fix to normal distribution can I fix in one time (N=1500) or I should split file to heart disease and non heart disease group, both of data not normal distribution. I look forward to hearing from you soon. thank you very much.

@gftempleton 5 жыл бұрын

I'm not 100% sure if I understand your question. Assuming you will be standardizing the data along with the Two-Step (mean=0, sd=1)...if you want to test difference between two groups, I would stack them all together and transform one time. If you use original series mean and standard deviation, you'd have to compute the values for both groups, then normalize separately.

@progress410 5 жыл бұрын

@@gftempleton thank you very much for your suggession.

@muhammadarie4382 6 жыл бұрын

can negative value data be transformed through this way?

@Dennis-J316 5 жыл бұрын

From where you wrote the value for the second question mark (?) and from the third question mark (?)

@gftempleton 5 жыл бұрын

First ? is the mean, second is the standard deviation

@haziziesa4534 3 жыл бұрын

Dr Templeton - the mean and SD where come from? Do you get it from the original data (not normal one)

@gftempleton 3 жыл бұрын

Yes - both the mean and SD come from original data.

@mycorfish 6 жыл бұрын

very nice method in deed! however, after using the method, levene test result shows non-normal distribution? can you explain this for us please?

@gftempleton 6 жыл бұрын

You likely have one or more of three distribution attributes that are preventing normalization using the Two-Step: 1) too few levels (possible values), 2) a mode that has too many instances and/or 3) the mode is too far from the distribution middle. Absent those three characteristics, it would work perfectly.

@mycorfish 6 жыл бұрын

thank you very much! i ll try to do that. my next worry is however, is that possible to rectify the data these ways and say the transformed data and the original data are still the same?

@gftempleton 6 жыл бұрын

You can do whatever you want as long as you say what you did in the report. Generally, there's not a whole lot of freedom associated with rectifying these issues. In some disciplines, it would make sense to, for example, delete zeroes. Such action may help with the normality attainment afforded by the Two-Step. However, it is only ethical to say what was done in the report. If it makes sense, I think your reviewers will be on board with it.

@joshuaveniegas 5 жыл бұрын

Does this method treat heteroskedasticity?

@zeric_raiz 7 жыл бұрын

Thank you very much for such an instructional article and for the follow up video. I've been able to normalize my data following your method but still have a doubt. I've a non-normally distributed variable "SOM" (metric) which assumes values for the years 2002 and 2012 (nominal variable named SAMPLE only assuming the value '1' for 2002 and '2' for 2012 collected samples). I'm now able to 'globally' normalize my SOM variable with the 'Two-Step Transformation' BUT when I now do a ANALYZE --> DESCRIPTIVE STATISTICS --> EXPLORE with a split file or a 'Factor List' by the variable SAMPLE and re-analyze the normality tests I state that only the 2012 samples are normally distributed with the 2002 not being normally distributed and don't know how to resolve this. I'm stating this particular case but I also need to split my variable SOM even further (i.e. "date collected AND soil type" or "Date collected AND soil type AND cultivation system", etc). Or is this a non-issue because the 'global variable SOM' is now normally distributed? I'm having this issue recursively and simply cannot find the answer to this problem. If you find the time to enlighten me on the issue it will help a lot. Thanks anyway for such a great transformation.

@aminfarzaneh8142 8 жыл бұрын

Thank you so much for the method. I can not nomalize my data. Can I have the data set you used?

@j-m.s.6646 6 жыл бұрын

Would using this method after the fact be considered a linear-linear regression, a log-log or what? Also, would transforming the variables post-processing back to say a scale from one to ten be considered good practice in terms of easing interpretation?

@batould1786 4 жыл бұрын

Hi, does this work for a dataset with negative values?

@gftempleton 4 жыл бұрын

Absolutely - no restrictions.

@kinghyari 9 жыл бұрын

hello, may i apply this method to variables containing negative and positive values?

@gftempleton 9 жыл бұрын

kinghyari Certainly - no preconditioning is necessary.

@ensarifadi457 6 жыл бұрын

Dear Gary, Can you please tell me what are the implications of using this technique on likert scale. for instance, I have used a likert scale in which 1 is strongly disagree and 7 being strongly agree. Does it inverse the relation or what ? Thanks

@sams1856 10 жыл бұрын

Thanks Gary, This youtube for transform continuous variables toward normality. Can I use this technique for Likert-scale variables Regards

@gftempleton 10 жыл бұрын

The technique will transform any variable toward normality, except for binary variables. That being said, there will be a variety of results depending on the situation. I have experienced positive, yet less beneficial results when applying the two-step to Likert scale data. While ORIGINAL data based on Likert-based items are often significantly non-normal, summing the items usually results in fairly good normality. Because non-normality isn't usually a terrible problem with Likert-based data, transforming toward normal doesn't help that much. On the other hand, transforming highly continuous data, especially ratios, will often yield tremendous downstream benefits in scientific testing. Good luck!

@madiharazzam1098 7 жыл бұрын

You are so nice by being responsive Mr. Gary :) so can you please help how can we transform non-normal data of Lickert Scale to normal.

@ivduudt2980 9 жыл бұрын

How to back-transform after an anova?

@mahanesti3990 6 жыл бұрын

Hi Mr. Templeton, thank you for your transformation into normality method. What we should call this transformation method ?

@saharsarhan 7 жыл бұрын

How can i calculate the series mean via SPSS?

@linyuliao3417 6 жыл бұрын

Thank your for sharing this video. Can I ask a question. How is the first step related to the second step?

@mohammadalkurdi4460 6 жыл бұрын

الحمد لله i am use more approach to my data but every time in non-normal i am use log also square root not uesful then use your approach is my data became normality>>>>but my question every time this approach transfer to normality? thank you Dr

@fazlihaleem6603 9 жыл бұрын

can we do such transformations for ordinal ranked data

@gftempleton 9 жыл бұрын

Fazal Haleem You can, but ranked data is uniformly distributed. Therefore, I would expect an incremental improvement in results. There is a wide range of distributional situations out there, so you won't know until you try.

@allenlee6593 8 жыл бұрын

Does anyone know how to convert back to the oringal number after the transformation??? Let's say the value is 50 from x before transformation? The video seems useless without conversion information

@gftempleton 8 жыл бұрын

+Allen Lee If you are asking how to interpret transformed values, do so exactly how you would do the original units, except call it something like "normalized units." That's the same as is done with log transformed - instead of original units, it is in log original units. If you can only stand interpreting in original units, use original units in all analyses (of course, results will likely be different).

@risausa4796 2 жыл бұрын

Hi Gary! Thanks for this video. Where did you get the value for the MEAN and STANDARD DEVIATION?

@gftempleton 2 жыл бұрын

Two options: 1) the original variable mean and standard deviation or 2) 0 for mean, 1 for standard deviation (z-scores).

@lanuit9733 4 жыл бұрын

Thank you so much for this video!!! You saved my life! 감사합니다. Thanks again!

@gftempleton 4 жыл бұрын

Glad to help - good luck!

@NoeWanKenobi 7 жыл бұрын

Thank you so much for you easy and helpful explanation! You really saved my life (and thesis, which are the same thing right now) :P

@mohammedkhalid9799 6 жыл бұрын

THANK YOU, Mr. Gary

@minaorang5094 5 жыл бұрын

Thank you for the video and your paper! I used the two-step method for my non-normal data, and all turned to normal distributions! The only concern remained is that if I am allowed to use this method for my data, which are drawn based on 4- and 5-point Likert scales??! I read at your article to use this method mostly for higher levels (up to 100)! I would appreciate it if you could tell me whether I can use this method for 4- and 5-point Likert scales or not! Thanks in advance!

@pauls1571 7 жыл бұрын

Hi Gary. First, thanks for your informative video. I was dealing with a few very non-normal distributions, and this method worked wonderfully in normalizing the data. That said, I have one question for you, the answer to which I cannot seem to figure out. Namely, in the video description, you note that, "This approach retains the original series mean and standard deviation to improve the interpretation of results." However, I have not found this to be the case. Although the means and SDs for the transformed variables are quite similar to he original series' means and SDs, they are not perfectly retained. At least, this was true in cases where a value of 1 was generated after completing the first (fractional rank-order step), even when I used the formula, to replace the 1, that you mentioned in your response to a comment, below (i.e., replace the 1 with 1-(1/n)). Any clarification here would be much appreciated. Thanks again for your informative video.

@gftempleton 7 жыл бұрын

Another reason they aren't exactly the original mean and standard deviation is because of inflated frequencies (stacks of the same value) that are some distance from the mean. If there were no 'same values" in the dataset, the resulting mean and standard deviation would be the exact mean and standard deviation and the original set. The approach "tries" to do that at least.

@gftempleton 7 жыл бұрын

1-(1/n)) is a close approximation that allows researchers to lose a record. Consider it part of the procedure - just like the first two steps. You may be right, that may cause the mean and standard deviations parameters to vary slightly. I don't think it would affect interpretations much. Sample size is a bigger issue in a lot of cases. This should be up to the researcher to decide.

@aishwarypawar7728 5 жыл бұрын

Hello, I have 4 non-normal variable in my dataset. Do I need to individually perform these steps for each of 4 non-normal variables? or is there any other method??

@m.roussel1757 9 жыл бұрын

Thank you for this helpful video. I was wondering.... what is the use of the bootstrap option in Amos ? is it better to perform a transformation or is bootstrap sufficient when performing a CFA ?

@gftempleton 9 жыл бұрын

+Miriam Roussel This would make the subject of a good research paper. I believe bootstrapping has its own weaknesses, as does any transformation. I prefer using normalized, real data.

@johannesmeixner3454 9 жыл бұрын

is this transformation related/equal to a boxcox transformation?

@gftempleton 9 жыл бұрын

I am not a statistician, but I'm confident they are different. This transformation optimizes normality while retaining the original series mean and standard deviation (for interpretative purposes). I don't believe Box-Cox transformations can perform as well. You can always try and compare.

@eshanWONG 6 жыл бұрын

Hi, I used the same method on all three of my non-normal variables and it turned out fine for two only. Are there any assumptions or characteristics of the data needed to successfully transform to normality? I tried the other ways (log10 etc. ) of transforming too, but I still can't get normal distribution for that particular variable. Do you have any recommendations that I can try for normality? TQ

@gftempleton 6 жыл бұрын

Yishan - for the one variable that won't transform to normal, the problem may be the characteristics of the original distribution. If you don't have many levels (e.g., binary=2 levels) or if there is an "inflated frequency" (e.g., stacks of zeroes or other values), then it won't transform to normal regardless of what method you use. I would do the best you can, tell the reviewers what you did, and hope for the best.

@eshanWONG 6 жыл бұрын

Thanks for the clarification. That variable is measuring participant's depressive score, and there are 120 cases with zero value out of 1000+ cases. I'm trying to remain all data as I think zero score can be meaningful in the research. By using the 'two-step approach', I was able to reduce the skewness from 1.312 to 0.184 (std error= .076), which later obtained Z value within the range of +-2.58. Despite the 'non-normal' shape of histogram and significant p value of Kolmogorov-Smirnov test, can I still assume that data has met the assumption of normality (based on Z value) and proceed with the transformed variable?

@MrLunadecancer 8 жыл бұрын

Sorry, but when I try to aply the formula 1-(1/n), the result is the same o minus another value

@ellieseager589 2 жыл бұрын

You just saved me. Thank you!

@Belcebub69 7 жыл бұрын

Thanks, Gary, it is very helpful what you have done. Are you maybe aware, if there's any critical peer review or papers out there regarding this method of yours? Thanks, for the answer.

@gftempleton 7 жыл бұрын

This paper has been peer reviewed. It will be published in print in early August: aaajournals.org/doi/abs/10.2308/isys-51510?code=aaan-site

@HucheshBudihal 9 жыл бұрын

Thank u sir for this video. I refer your paper but i am facing a small problem that i do in both excel, SPSS and manually (percentile rank= 1-(rank of xi/n)) as given in the your article, i got different values in step one and step two when i used each methods your mentioned in article. I think values should same but i am not getting. Please response me its help me a lot...

@lllv1989 8 жыл бұрын

This is an amazing method. I'm wondering if there's an added value to winsorizing or otherwise capping variables before the transformation. Some of the variables I have clinical variables for which have a case or two have extreme outliers and are also non-normal. Using the means and standard deviations for these variables seems a little weird to me because the Ms and SDs before winsorizing don't seem within the range of values usually seen in my patient population. If I winsorize before transforming, the Ms and SDs seem a little more representative... Am I completely off here?

@OmisileKehindeOlugbenga 7 жыл бұрын

Thanks a lot for saving my say. But you did not mention initially that I would need a 1-(1/n) transformation before the final inversion. Thanks all the same.

@radina3737 6 жыл бұрын

First of all thank you very much for this approach saves me a lot of time and effort. My question is: I have a dependent variable measuring "click intention" which can be measured from 0 to 100. After normalizing the data however I get 3 negative results and 2 above 100. Is it acceptable to keep it this way? Thank you very much!