Timeline of case screening: Missing data in rows 00:48 Unengaged responses 10:50 Outliers (on continuous variables) 16:28 Variable Screening: Missing data in columns Skewness & Kurtosis 20:38
@alexanderstevens20772 жыл бұрын
FYI, SPSS counts missing values per case using the NMISS() function. Call it TotalMissing. For me, I used Compute Variable and the Numerical Expression was NMISS(PRFEXP,EFFEXP,APPSAT,EXP2USE,INT2USE,TRST2,TRST3,TRST4,INFQ1,INFQ2,INFOQUL,INFOSAT,APPQ7,APPQ3,SOCINF,OrgAssurances). You can then compute PercentMissing using RND((TotalMIssing/numvars)*100). numvars are the variables for ordinal variables of interest. For me numvars = 16. I mention as it allows the researcher to stay within SPSS rather than having to copy & pasting into Excel. Also, using Analyze>Multiple Imptation>Analyze Patterns produces a succinct summary of missingness in both pie chart and table form. I found this very useful.
@Gaskination2 жыл бұрын
Thanks!
@alexanderstevens20772 жыл бұрын
What are your recommendations/thoughts about using Expectation Maximization to impute rather than median for ordinal Likert-scaled vars?
@Gaskination2 жыл бұрын
I have never used expectation maximization. So, not sure. My preferred method for Likert scales is to look at the scale it is on, and for that case, see how they responded to the other items on that scale. Then take the average within that scale for that row. So, if they answered 4, 4, _, 4, 3, I would probably replace the missing value with a 4.
@davirami76073 жыл бұрын
You're a legend James! Thanks
@subhrosarkar43668 жыл бұрын
Hi Dr. Gaskin, Really appreciate your videos. What should be the standard deviation for checking potential unengaged responses in case of a 7-point Likert type scale?
@Gaskination8 жыл бұрын
Something small, like 0.700. This is not a published threshold. It is just a tool for finding those who might be unengaged. The true test is then to go and look at their responses to see if they appear unengaged.
@adrianas88127 ай бұрын
@@Gaskination hello, i have a question. What if I have scales that have different values possible and some of the scale show sd.0 but others don't for one respondent. Should ii then delete it or does it mean that the person was paying attention after all? I would really appreciate Your help. :)
@Gaskination7 ай бұрын
@@adrianas8812 I would recommend manually inspecting any suspicious cases to determine whether they were paying attention. This may include whether there is any variance on their responses, as well as how long it took them to complete the survey.
@klangliss8 жыл бұрын
Hi James, I've seen you mention in previous comments that you do not need to detect outliers for Likert type questions, could you please point me towards some published texts supporting this please? Thanks in advance, Katie
@oblivion95683 ай бұрын
That slowmo voice "But it will be like this" 😂
@abdulmoeed46612 жыл бұрын
What type of variable type (nominal or ordinal) should we use for "Age Groupings and Income Grouping" as I am taking Age & Income in groups like Age (0-15, 16-30, so on) and Income (
@Gaskination2 жыл бұрын
A purist will say these are nominal measures if the intervals are not equal. If the intervals are equal (e.g., by tens or twenties), then you could probably list them as ordinal. Whenever possible, it is better to use exact numbers, rather than binned groupings. If you are just looking at directions of effects (e.g., age has a positive effect on job satisfaction), then you can use them as ordinal.
@abdulmoeed683 жыл бұрын
Hello, you took the measurement as Scale for all variables. I was seeing the other video and they said that you have to choose "Ordinal" as a Measure for Five point Likert. Can you clarify further on this "Ordinal and Scale"?
@Gaskination3 жыл бұрын
Likert scales are ordinal. Correct. SPSS does not treat them differently though for the analyses I typically do.
@corinnemcnally64864 жыл бұрын
Thank you for sharing these videos. Is there any reason to use the same number of scale anchors for every scale included in a questionnaire? For example, should I use a scale of 1-5 for every item/measure? Or, is it okay for some to be 1- 5, others to be 1- 7. I have been told having a different number of Likert scale anchors for different variables when conducting SEM can cause problems. However, I notice in your dataset you have some scales 1-5, others 1-7. Can you confirm the proceeding this way is okay? Thank you.
@Gaskination4 жыл бұрын
If the scales do not differ greatly, then there should be little, if any, effect. The problem comes when you have very short scales (e.g., less than 5 levels) combined with very large (e.g., 10+). Statistically, there isn't much of a difference, but bias is introduced when there is insufficiently accurate precision in the scale size. For example, if I have a scale of 1-4, but I'm really a 2.5 (but must choose 2 or 3).
@sohaismail25034 жыл бұрын
I believe SPSS already reports excess kurtosis, not kurtosis itself. It already subtracts 3 from the value. So the reported values should be compared to zero not 3. For example, using stata would give exactly those kurtosis values +3
@Gaskination4 жыл бұрын
REALLY? That would surprise me. But I don't know if you are right. I read this: stats.idre.ucla.edu/spss/output/descriptive-statistics/ but it isn't super clear about it. If it does subtract zero (or at least shift the value closer to zero by three units - considering the value might be negative to begin with), then we should be more conservative in our interpretation. However, I still don't know if what you say is correct. If you can find any reference supporting this, please let me know so that I can correct my videos. Thanks!
@sohaismail25034 жыл бұрын
@@Gaskination Here is a reference I found. Check page 149, equation 4.3 and the paragraph below. Hope this can help. books.google.com.eg/books?id=ZLyPPFqVtX4C&pg=PT175&lpg=PT175&dq=spss+excess+kurtosis&source=bl&ots=BbpqHV3TrD&sig=ACfU3U36EzqcDJrPe9sxiOLiG3ryFpE62w&hl=en&sa=X&ved=2ahUKEwiZ0sbHlrzpAhVC1BoKHRbmDE4Q6AEwDnoECAoQAQ#v=onepage&q=spss%20excess%20kurtosis&f=false
@sohaismail25034 жыл бұрын
@@Gaskination In particular I discovered this because I followed your analysis using Stata instead and I got much higher values of kurtosis.
@sohaismail25034 жыл бұрын
@@Gaskination One thing to note is that if kurtosis is computed in its raw form (where a normal distribution would have the value of 3), then there is no way that the values would be negative since the formula (powered by 4) would not allow it. Given that your values are negative sometimes, this means that it is excess kurtosis and would be compared to the value of zero not 3 (where the normal distribuation would have zero kurtosis). So negative values are flatter tails and positive values are sharper peaks. And acceptable values would be around (+/-) 1 or 2.
@Gaskination4 жыл бұрын
@@sohaismail2503 Correct. The value of 3 is the distance from zero (plus or minus) that is considered "too much kurtosis" (too flat or too peaked). Thanks.
@purnimanandy44054 жыл бұрын
Hello, I am an absolute beginner with SPSS and SEM. Despite responses being in numbers, in the variable view my variables are showing up as string and not numeric. How can I correct that please?
@Gaskination4 жыл бұрын
The Variable View shows the variables in rows, along with their characteristics (e.g., type, values, labels). If the type is listed as "string", then you can click on that type and change it to numeric. Data View will show the numeric values of your variables.
@hebaelsayedelbdawyahmedhas8529 Жыл бұрын
Hello James, Thank you for your amazing videos. I learned a lot from you. Please, I need to know what is the medium score that I can write instead of blank responses in case I used Likert scale. Does it will be 2.5 for 5 Likert scale and 3.5 for 7 Likert scale? please, confirm this information?
@Gaskination Жыл бұрын
It is best to use the median if it is skewed and the mean if it is normal. But don't use the mean or median of the scale. Instead use the mean or median of the observed values on that scale.
@biskolok5 жыл бұрын
this is incredibly helpful. thanks doctor
@VanTran-cc2kk7 жыл бұрын
Hi James, I used mean replacement to replace missing values. But when I did the descriptive analysis again after that, SPSS does not recognize replaced value (for example, there are difference categories for original score 2 and newly replaced score 2). Also when I reversed scoring of some variables, SPSS does not reverse new values which were missing. Do you know how I could solve this problem? Thank you!
@Gaskination7 жыл бұрын
The only thing I can think of is if you created a new variable when you replaced missing values. This is the default option. So, then when you conducted subsequent analyses, you may have used the original variables, rather than the new ones that included the imputed values.
@nehayadav71837 жыл бұрын
The video is of great help Prof. I would like to know if there are pre established scales for the model that i am trying to confirm, is it required that I do a EFA and then do a CFA? It is possible that the factor structure may change in that case...Pl guide. Can you share some reference where only CFA is used, and not EFA while using pre existing scales please..
@Gaskination7 жыл бұрын
I always do an EFA prior to CFA because EFA reveals discriminant validity issues better than CFA.
@ugogirlss8 жыл бұрын
Hello, thank you for your wonderful videos. Does the same principles apply to using longitudinal data for growth curve modeling? Would I check the results for each variable at each time point separately (wide data format) or would I check the results for each variable for all time points simultaneously (long data format)? Thank you so much.
@Gaskination8 жыл бұрын
Any variables you'll be using in regression based statistics should be checked first for normality (and also all the missing data and erroneous data stuff).
@sajeeb20058 жыл бұрын
Hi, Gaskin Sir, From your previous video of data cleaning, it came to know that SD below 0.5 of scale item had problem of not engaged responses . these scale items should be deleted . But in this video, some scale item was below SD 0.5 but you deleted only value having 0. Which ideas i should follow for unengaged response? pls suggest.
@Gaskination8 жыл бұрын
+Sajeeb Shrestha The 0.5 for standard deviation is only an indicator of potential unengaged responses. Each case must then be visually inspected to determine if they were truly unengaged.
@sajeeb20058 жыл бұрын
+James Gaskin thanks sir.
@sofiaguardado96507 жыл бұрын
Hi! So I have interview with likerts scale variables. I have read we have to check for normality and multicollinearity before the analysis. I know you checked for kurtosis and skewness but Is still not clear to me what to do if my data is not normal and if vIf for multicollinearity are high
@Gaskination7 жыл бұрын
If not normal, then on a Likert, you cannot transform. However, if you have multiple indicators for a latent factor, then you can remove the nonnormal ones. This should only be done for extreme non-normality though (e.g., Skewness or Kurtosis > 3.3). If VIF is high, then try to do an EFA with the indicators for just the two factors that have high VIF. See what the crossloading items are and see if you can remove the ones that have the strongest crossloading. This might help.
@manuelgarcia48646 жыл бұрын
For non-normal data, what type of discrepancy estimate is recommended in AMOS? Would you recommend some type of bootstrapping procedure? Do you have a video where you deal with this type of data? Would you recommend to use SmartPLS? Thanks in advance.
@Gaskination6 жыл бұрын
Bootstrapping is generally considered to mitigate some of the limitations of non-normal data (but can only do so much - it cannot remove all the limitations). SmartPLS has similar benefits and caveats.
@manuelgarcia48646 жыл бұрын
Thanks so much for answering so quickly!
@Sahity8 жыл бұрын
Hi Dr. Gaskin, I was wondering if I can do a data transformation (log10) right after the data screening, because i noticed that the alfas rised in the transformed variables for the SEM analysis... is this right or artificial?
@Gaskination8 жыл бұрын
Renan Ogiwara that is fine if the variables are continuous. But if the are ordinal then it doesn't make much sense.
@Sahity8 жыл бұрын
James Gaskin thanks a lot
@ASTROKALYMNOS7 жыл бұрын
Dear James, thank's a lot for your awsome and precious videos. Can you please suggest a way of constructing a reliable scale as a measuring instrument in social sciences? Shall I follow your procedure of EFA? Thank's a lot.
@Gaskination7 жыл бұрын
Do you mean to make your own new set of measures? That process is slightly different for formative or reflective. For formative, create a diverse set of measures that capture the full range of dimensions of the construct. For reflective, choose the most important one of those many dimensions, and find a way to ask several similar questions around that one dimension.
@errnanadhirah89253 жыл бұрын
Hello Sir Gaskin. This is such helpful video. However, I would like to know the rule of thumb of identifying unengaged responses or only relying on O variance? Do you have any literature regarding this part? Thank you Sir for your help.
@Gaskination3 жыл бұрын
There isn't really a rule of thumb. Some guidelines are to check reverse-coded questions for consistency and to check if someone just put the same number all the way across. Otherwise, it is hard to determine. You can also look at how long it took them to complete your survey. If not a reasonable amount of time, they probably weren't engaged.
@emmaversity5 жыл бұрын
Hi thanks for the video, please kindly help me the link to the dataset you used for the video. I would really love to practice with it. Thanks in anticipation of your reply.
@Gaskination5 жыл бұрын
The dataset is available on the homepage of statwiki. Best of luck to you.
@hasithaperera20793 жыл бұрын
Hi, Thank you for these wonderful lessons. When comes to kurtosis, it is clear why we can choose a range between 3 - (-3). But in skewness, doesn't it have to be "0" or is there is reason to choose the same 3 to -3 range for skewness also? (In the same time I am guessing that we can not get a normal distribution (skewness = 0) for ordinal scales, hope my guess is okay)
@Gaskination3 жыл бұрын
We allow some tolerance in all statistical measures. The tolerance published for skewness and kurtosis varies by which study you look at. I think it is Hair et al 2010 ("Multivariate data analysis") that suggests +/- 3.00.
@hasithaperera20793 жыл бұрын
@@Gaskination Thank you very much. Love the videos.
@alexanderstevens20772 жыл бұрын
@@Gaskination Do you have any more recent publications that you prefer? 2010 is now more than a decade and I do respect your guidance. Thanks!
@Gaskination2 жыл бұрын
@@alexanderstevens2077 There is an eighth edition published in 2018.
@meghafatabhoy64376 жыл бұрын
Hello Dr. Gaskin, I have found your videos incredibly helpful in learning factor analysis! I am planning to run an EFA for my graduate thesis and wanted to ask how if you have references available for your data screening methods?
@Gaskination6 жыл бұрын
You'll find some useful ones here: statwiki.kolobkreations.com/index.php?title=References Focus on the general topics and miscellaneous sections.
@MitsosDA7 жыл бұрын
Hi, Im screening my dataset. So far, I have cleaned my dataset, omitting low and unengaged response. Im kinda stuck in the next phase, as to not knowing how to to proceed from hereon. 'Analyzing frequencies' show that I have still most of my variables missing values, up to 5 percent. The size of the dataset is 280 respondents and 66 variables. Some variables are missing up to 12 values (not like the example where there are only 2 missing values). These variables are necessary as they make up for some latent variables. Should I just go on with my analysis or should I replace these missing values using multiple imputation (EM, regression, ...). Ive tried EM but the EM means indicates that the data are not missing randomly..... WHat confuses me is that when performing separate Little MCAR test before the EM Imputation, the result says that data are missing randomly. AM i doing something wrong?
@MitsosDA7 жыл бұрын
I'm a bit confused, on the IBM website it says that MI is ok when not MCAR www.ibm.com/support/knowledgecenter/en/SSLVMB_sub/spss/tutorials/mva_describe_rerun_mcartest.html . While other sources says don't do anything; I have a lot of indicator variables which need to be aggregated into different scales, and I will be performing regression techniques as well. Can I just go on aggregating and perform regression techniques with the missing values in my dataset?
@Gaskination7 жыл бұрын
Examine missing data by rows. In most cases, if data is missing for multiple variables, it will be missing for the same respondent (row). If a row is missing too many values, just delete the row. That should get you into a better position to replace missing values with mean, median, or predicted values.
@MitsosDA7 жыл бұрын
Yes I already checked for rows and deleted the respondents with high missing values (+10 percent); also unengaged respondents (n=1). I deleted 10 out of 290 respondents that way. I ran a Missing value analysis to check for patterns (via MI). To summarize: 22pct of cases, 0,604pct of values and 24pct of variables have incomplete data.
@MitsosDA7 жыл бұрын
So If a respondent is missing two scores, in a three-item scale for example, should I impute those scores or delete that row. I'm starting with a theory that contains 15 latent variables, which are measured by 66 variables in total. Also I have 6 controls...Thanks for your response! PS I also tried a EFA, but that was a real mess, with factors loading high on each other, which they aren't supposed to according to the theoretical model. I guess I should just carry on with CFA instead of EFA. I received some instructions for the data-analysis which don't make any mention of EFA, nor CFA. The instructions are: 1/Aggregate the items into scales (Calculate Cronbach's alpha via reliability analysis) 2/Perform descriptives 3/Check the correlations between variables (before building regression model) 4/Build Regression model and do collinearity diagnostics (VIF), where I should include relevant variables that are relevant to me. Moderations can be tested also 5/Options to include other testing (via SPSS), ?? Does this seems a legitimate order, or way of doing an analysis? PS The imputation is really worrying me because I don't want to make any unnecessary or erroneous adjustments to the dataset. Also when performing tests will it only include the last imputed dataset. I've noticed that in the frequence table it counts all the data (x5). Sorry for the lenthy response/question, and thanks in advance!
@Gaskination7 жыл бұрын
At that point then, I would recommend running the model with and without imputed missing values. See if they models differ much. If no difference, then just impute the data.
@ReemaAbuShaheen6 жыл бұрын
Thanks for the great series. I learnt SEM through your amazing vedios. I have question : When I want to measure the skweness and kurtosis.. Is it statistically right to calculate the average for each variable( which consist of different questions(items)Ex: learning variable has learning 1 , learning2, learning 3 ) and then calculate the skweness for the new average variable not for each item ?
@Gaskination6 жыл бұрын
Usually it is done at the item-level. However, if you are going to use averages for your causal analysis, then doing it at the averaged level is fine.
@mohammadhossain57298 жыл бұрын
Thanks Gaskin sir for very amazing video. It would be very helpful if you suggest me: 1. Can I use the dataset in SEM, which is designed for AHP (Analytical Hierarchy Process). Under AHP, Saaty's 9 scale is used instead of likert scale
@Gaskination8 жыл бұрын
I'm not familiar with AHP, but from the little I learned through an internet search, I don't think SEM is a good solution for AHP.
@mohammadhossain57298 жыл бұрын
Thank you, professor.
@mekdesatinaf85844 жыл бұрын
Thanks, Gaskin sir for the very amazing video. Sir, I am new for SEM so I really need to understand from zero levels, how can you help me? I need to do an analysis for my study, but I have no idea where I should start?
@Gaskination4 жыл бұрын
One good place to start is with this SEM series and the StatWiki. If you want something more interactive and with assessments and more videos, you could try the online SEM course: kzbin.info/www/bejne/ZoTTiKZoiaiaj5o
@elkiza108 жыл бұрын
Sorry I'm new to statistics. But do u have tutorials how to perform validaty and reliability test on AMOS?
@Gaskination8 жыл бұрын
You can search my channel and find many many videos about this topic. Here is the most recent one: kzbin.info/www/bejne/eXelhnh5j8yIq9E
@elkiza108 жыл бұрын
James Gaskin ok. Thanks doc. It helps me a lot. I use mediation variables
@Laurisa20128 жыл бұрын
Hi James, great videos. A massive help for my data analysis. Thanks so much. However, I have a problem. I replaced my missing data. Exactly as you explained but then when I replaced them and go to the variable view the values (e.g. female=1, male=2) are gone and instead it only says none. Do you know what the problem could be. And if it is very difficult to detect the problem is it ok to just insert the variable values again? I guess at the moment my numbers don't have a meaning to SPSS do they? As it doesnt even say 1=1 etc. I have mainly likert scales. Would be very thankful for your help :) :) I also just tried it with one of your data set and after replacing the values were none as well, thus at least my data set doesnt seem to be the problem
@gaskinstories77268 жыл бұрын
When you impute new values, it simply creates a new variable (even if it is replacing an existing one), so it requires you recreate the values. Just click on None and then enter the values in the window that pops up. Another way to do it is to create a new variable that doesn't replace the existing one, and then copy the value labels from the old one to the new one.
@Gaskination8 жыл бұрын
oops. that was me logged into a different channel.
@VenaMarissa8 жыл бұрын
10:37 InfoAcq_3 still has 2 missing values
@Gaskination8 жыл бұрын
+Vena Marissa Thanks for catching that. I didn't end up catching it until a couple videos later...
@tetlleyplus7 жыл бұрын
and Age too, ID 31
@humming_strumming23105 жыл бұрын
Thank you so much Sir Gaskin..Your videos are incredibly amazing ...the way you explain, it's undoubtedly awesome.. Sir, I have one question also, what are the various ways through which we can make factors in factor analysis?.. Actually sir, I am working on a data set, and in this data set after doing so much iteration, out of 7, only 6 became the independent factors still one is left...now how to fix this problem??
@Gaskination5 жыл бұрын
You can constrain the factors to exactly the number you want. If using SPSS, just click on the extraction button (in the EFA window) and then at the bottom, choose to extract an exact number (rather than based on eigenvalues).
@humming_strumming23105 жыл бұрын
@@Gaskination thank you sir for your reply, but I choosed this option as well...as I am having 7 variables so choosed this option as well by giving the value 7..but one factor is making so much dispersion among all..
@Gaskination5 жыл бұрын
@@humming_strumming2310 This video might help: kzbin.info/www/bejne/pZbShaOOnrihmcU
@muradali60323 жыл бұрын
please write in comment the formula written in Excel for missing values
@Gaskination3 жыл бұрын
It's just the median of the column.
@ievamasevic73603 жыл бұрын
Hello! Is there a reference for skewness and kurtosis level of +/-3? :) Your videos are amazing! Wish I had you as my lecturer!
@Gaskination3 жыл бұрын
I'm not totally sure, but it might be this one: Sposito, V. A., Hand, M. L., & Skarpness, B. (1983). On the efficiency of using the sample kurtosis in selecting optimal lpestimators. Communications in Statistics-simulation and Computation, 12(3), 265-272.
@ievamasevic73603 жыл бұрын
@@Gaskination Thank you so much!!!!!
@ievamasevic73603 жыл бұрын
@@Gaskination Hi James! I had a question regarding EFA. I have two factors which are converging separately but the within the correlation matrix I am receiving a score higher than .700. Is it ok to assume this is a second order construct with two dimensions? Theoretically it also makes sense. Thank you!
@ievamasevic73603 жыл бұрын
Also it is ok to use a mixture of extraction methods within one efa for example start with ML and finish with PCA? Thank you! I just bought your SEM online course! Life changer!!!!
@Gaskination3 жыл бұрын
@@ievamasevic7360 Since the EFA is purely exploratory, you can use whatever methods you would like to use in order to better understand your data. Glad you like the course!