Two-step Cluster Analysis in SPSS

Рет қаралды 205,102

Күн бұрын

Пікірлер: 221

@Gaskination 4 жыл бұрын

Here's a fun pet project I've been working on: udreamed.com/. It is a dream analytics app. Here is the KZbin channel where we post a new video almost three times per week: kzbin.info/door/iujxblFduQz8V4xHjMzyzQ Also available on iOS: apps.apple.com/us/app/udreamed/id1054428074 And Android: play.google.com/store/apps/details?id=com.unconsciouscognitioninc.unconsciouscognition&hl=en Check it out! Thanks!

@duran099 10 жыл бұрын

Thank you! I enjoyed the back and forth of your problem shooting at the start for which variables to use. Made it more real, and gave some context from a theory perspective.

@blackchallice 12 жыл бұрын

WOW, you have explained cluster analysis very clearly. This is the first time I'm learning CA and I totally get it. Thank you!

@vide0gameCaster 9 жыл бұрын

Dude you don't understand how this vid helped me for my statistic exam. I aced my test thanks to you! You just gain a subscriber!

@lefkiospaik 11 жыл бұрын

Great presentation! Moreover the "not suitable" variables you chose in the beginning, really helped a lot to understand more on the cluster analysis. Thanks

@samirsarsamss 10 жыл бұрын

Many thanks dear James Gaskin for this helpful video, please go ahead with other different aspects or even tools.

@Gaskination 12 жыл бұрын

Funny you should ask! I was just considering doing this yesterday. I will probably do a K-means cluster, and also show how to segment the data and explore clusters for sub-populations. This is definitely on my to do list.

@TulioMaia 12 жыл бұрын

Thank you so much! I'm a starter on SPSS. I'm a R user, but i'm gonna start SPSS from now! Thanks again!

@krismatthews7550 9 жыл бұрын

You seriously just saved my Quantitative Analysis project :] THANK YOU!

@JohnParavantis 5 жыл бұрын

If I may, at 9:01 I would like to correct your reference to the boxplot: the middle line does indeed represent the median, but the left and right edges of the box lie at the first and third quantile respectively. So, rather than representing one standard deviation below and above the mean, the box represents the middle 50% of the observations. Thank you very much for the video, very lucid explanation of swamping variables, still very useful in 2019!

@Gaskination 5 жыл бұрын

Thanks!

@vshapoval 11 жыл бұрын

I do not have questions, but I found your video extremely helpful with very good explanation So I only wanted to say thank you. Your video was a great help. =)

@yuriveneziani8029 7 жыл бұрын

Amazing explanation... clear and direct! Thank you!

@Xirukah 12 жыл бұрын

You're a great guy!! I study SPSS in College in three levels.. Introduction to Data Analysis, Univariate Data Analysis and Multi-Variate Data Analysis for 3rd level. In this moment i'm on 3rd and this process is really usefull! Thank You!!

@ЕленаПономарёва-л5х6ъ 4 жыл бұрын

thank you for awesome explanation! wish you good luck! I've found all your videos very very very helpful

@Gaskination 11 жыл бұрын

You can certainly try k-means. It just depends on what your research intentions are. I actually prefer k-means over two-step. I just learned two-step first, so that's what I made the video for. I should probably make one for k-means sometime...

@Gaskination 11 жыл бұрын

Not a stupid question because I had to look up the answer :) The SPSS help manual says that the two-step cluster analysis assumes normally distributed data for all continuous variables, but that tests have shown it to be robust enough to handle non-normal data fairly well.

@Gaskination 11 жыл бұрын

Look at the sig value. If it is less than 0.05, then it is the groups are significantly different for that variables of comparison. If it is poor quality, then you might try a three factor model. Not sure you can rely on the cluster groups when they are poor. This means that the membership assignment was inconsistent based on the indicators used for the clustering. e.g., sometimes males went into cluster 1, sometimes in cluster 2.

@ildilovasz2982 4 жыл бұрын

Thank you for this video, very clear and it helped me write my thesis.

@talhelmt 11 жыл бұрын

Thanks! I appreciate the time you put into making this.

@AlbertGavino 9 жыл бұрын

great simple video on 2 step clustering (great for categorical variables or binary ones) with some continuous variables.But I like 2 step since it creates it's own clusters of which I don't have to specify (unlike in K-means)

@MarcRodrigues10 6 жыл бұрын

Thank you! This video helped me a lot, especially with the results analysis.

@Gaskination 11 жыл бұрын

That is what I meant, but those are undesirable sample sizes. You might also look at indicator importance to see if one variable is swamping out the others. If so, you might consider removing it. Or you can try K-means clustering... I haven't made any video for that yet...

@mxm001 9 жыл бұрын

Thank you SO much, James. This was very helpful.

@Gaskination 12 жыл бұрын

Did you double click it? You have to double click it to make it show up.

@TheCopginger 12 жыл бұрын

That's great indee! Well, I also have some ideas on how you could make it better from learner point of view. 1. Explaining why use certain/specific methodology for clustering 2. Producing it from basic to advanced methodology 3. Probably using data across industry/sector I dont know how much time you have to spend on these and you would want to, however I can provide you data which will enhance your quality of analysis. (and off course your self marketing value)

@snailbby6664 7 жыл бұрын

"These are the ones you'll probably punish by making them managers" 😂

@adidash3247 11 жыл бұрын

At 11:18, the coloumn "TSC_5282", what does the number represent?

@Gaskination 11 жыл бұрын

It represents the cluster (group) that record belongs to.

@adidash3247 11 жыл бұрын

Thank You...

@koenovisch 11 жыл бұрын

James, do you know a video in which the IPA (importance/performance analysis) is being explained? Have you made such a video?

@tomh3675 10 жыл бұрын

Thanks for the video, do you have an example of doing a cluster analysis as a way of illustrating factor analysis/factor scores?

@Gaskination 10 жыл бұрын

No, but I do have several videos about how to do factor analysis and extract factor scores.

@emindeger. 4 жыл бұрын

Hi thank you very much for this video series. I have a question, I would appreciate it if you answer. Do we need to normalize the data in spss?

@tekonen 10 жыл бұрын

Thanks for sharing your knowledge!

@koenovisch 11 жыл бұрын

Thank you for your reaction! I will continue looking for it!

@petradubajovamarinakova9268 10 жыл бұрын

Your video helped me. Thank you very much :)

@azianwacko 9 жыл бұрын

Hello James, can you explain evaluation fields and whether something like a scale of mental health would go in there?

@Gaskination 9 жыл бұрын

+Thomas Chan Evaluation fields are used to see differences in evaluation variables based on cluster membership. It is sort of like doing an ANOVA on those variables, using the cluster membership as the factoring variable. The evaluation variables will not be used to determine cluster membership.

@sureshpatel3992 3 жыл бұрын

Hello James, can Two-step Cluster Analysis handle mixed variable type? Eg. some variables that are output of factor analysis (that will have negative values too), and some binary variables?

@Gaskination 3 жыл бұрын

Yes. The two-step method can handle all types of variables. The only thing you need to watch out for is highly skewed or kurtote variables, or discrete (categorical/nominal) variable without adequate representation from each group/category.

@sureshpatel3992 3 жыл бұрын

@@Gaskination thanks so much for your reply, this would really help!

@spss-for-research6518 9 жыл бұрын

I have a dumb problem and I wonder if someone could help me. The SPSS shows the cluster comparisons only for the inputs, but NOT for the descriptive variables. It just shows a message: "the cluster comparison view encountered a problem and cannot display correctly" or something like that. Why? I can't figure out.

@Gaskination 9 жыл бұрын

spss-for-research I'm not sure. It may have something to do with the variables included. Try removing one variable at a time to see if you can identify which one is causing the problem. If it isn't that, then it may be a conflict in one of the libraries being utilized to run the analysis. If that is the case, then you might need to reinstall SPSS, or you might need to update your java or .NET version (not sure which one SPSS uses).

@Thanh-ThaoTPham 7 жыл бұрын

Hi James, thanks for your valuable sharing. However, is there any source for the acceptable size of smallest cluster and threshold of ratio of sizes? Thanks in advance.

@Gaskination 7 жыл бұрын

I'm not sure. I'm really not an expert on cluster analysis. Those numbers just "feel" right, which I realize is not very scientific of me. I guess they feel right because they are practically useful - i.e., clusters of those sizes are usable in subsequent analyses and cluster ratios of that proportion break the data up into roughly equivalent groups.

@Thanh-ThaoTPham 7 жыл бұрын

Thanks so much for your reply. Anw, I really love your tutorial series ^^

@nihonbunka 7 жыл бұрын

Is it possible to analyse cluster NOT around central concepts like intelligence or years on the job but upon family relationship (binary relationship closeness in a network with the absence of commonalities, as is the case in real families).

@Gaskination 7 жыл бұрын

That's an interesting idea, but I don't know how to do it in a two-step. You might be able to do it with multiple alignment algorithms, but I'm not sure if SPSS has those...

@nihonbunka 7 жыл бұрын

Thank you very much indeed. I have found a partial solution in the software here socnetv.org/downloads which has a network analysis network community detection algorithm which can be used on the correlation matrix produced by SPSS factor analysis. Others have had the idea before journals.plos.org/plosone/article?id=10.1371/journal.pone.0051558 using a different community detection algorithm Full statement of problem and partial solution www.talkstats.com/showthread.php/69145-Family-Relationship-version-of-Factor-analysis-for-Japanese-Groups?p=199672&highlight=#post199672

@Gaskination 7 жыл бұрын

cool! Thanks!

@Zopzuita 12 жыл бұрын

I can't doublecklick since the model viewer doesn't show up it all. It writes the clusters in the column but that's it - even though I activated the option...Any ideas what could be wrong? Thanks a lot in advance!

@chrisnahm 7 жыл бұрын

Really enjoyed this and was very helpful. Thank you!

@mcole6234 11 жыл бұрын

James, Very informative. You mention the need for over 30 in the smallest cluster and between 2-3 for the largest: smallest ratio. I am dong a Phd and wondered where these numbers came from. Do you have an academic reference(s) I could cite. Also, at the end of the video when you ran an ANOVA from the newly formed variables in SPSS. I ran different analysis, and never had more than 4 clusters but there were 5 new variables, all with uniformative names. How do I know which ones to use?

@azianwacko 8 жыл бұрын

Hello again James, can you explain how the analysis actually creates the clusters? I've tried using it for categorical variables and I'm not fully understanding just how it determines the clusters. Thank you

@Gaskination 8 жыл бұрын

Here are some resources to help you understand 2 step cluster analysis better: 1. www.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/idh_twostep_main.htm 2. www.spss.ch/upload/1122644952_The%20SPSS%20TwoStep%20Cluster%20Component.pdf 3. www.ryerson.ca/~rmichon/mkt700/SPSS/TwoStep%20Cluster%20Analysis.htm 4. kzbin.info/www/bejne/aH3dY5WLYth1faM

@polisherci 8 жыл бұрын

Hey, can you run a regression clustered by a certain variable on SPSS? like the regress ... cluster (.. ) command in stata?

@Gaskination 8 жыл бұрын

I'm not sure. I haven't used STATA much. You can run a cluster analysis, and then use those clusters as grouping variables when running regressions.

@Gaskination 12 жыл бұрын

Thanks for the ideas. I just do these when the need arises or when I have the time. I'll probably have some time to do a couple next week. I have some data that has grouping variables, so no need to send me yours. Thank you though.

@TulioMaia 12 жыл бұрын

About the database you've used. Where did you get ir? Is it in the program itself?

@hem135 6 жыл бұрын

Hi James - This video is very helpful, thank you! Within the model viewer, I can see the average silhouette statistic for the cluster result. My understanding is this number is the average fit across item in the cluster. Is there a way to find the silhouette data for each item separately? For context, I'm using cluster analysis to identify exemplar scenarios for different types of behavior. I'm clustering scenarios based on participant ratings (e.g., this scenario represents X behavior, yes/no). I'd like to compare fit across a few different types of participant groups using an ANOVA of the silhouettes for each item. Thanks in advance!

@Gaskination 6 жыл бұрын

If there is a way, I'm not sure how to do it.

@zhexiongtao2167 11 жыл бұрын

really interesting and helpful! Hope you can also make one for K-Means

@DrMMRaziq 12 жыл бұрын

Dear do you have a tutorial of logistic regression? Would be great!

@DrMMRaziq 11 жыл бұрын

What if one of the item after applying post hoc shows a non significant p value e.g. you differentiate clusters on a variable, and then find that two of the clusters do not significantly differ on one item.

@brandonknettel545 11 жыл бұрын

Hi there, thanks for the informative video. I ran this analysis for my data in two different ways and each time I got a single-cluster solution. I'm assuming that that is an indication that my participants are homogenous on the variables being studied, but when I run ANOVAs I am getting significant group differences. Is my best bet to run a k-group cluster analysis and force a distinction?

@OPaixao13 8 жыл бұрын

Hi James How can I get the Cubic Criterion Values at different number of clusters under consideration?? I think it's also a good way to justify why X number of clusters instead of Y, right??

@Gaskination 8 жыл бұрын

I'm not sure. I've never heard of the cubic criterion. Best of luck to you.

@Gaskination 12 жыл бұрын

Glad to be helpful. Hope you'll subscribe and tell your friends. :)

@TheAce0 7 жыл бұрын

You mention that when having SPSS determine clusters automatically, Euclidean distance measurement is more appropriate but when specifying the number of clusters, Log-likelihood is preferred. Could you perhaps elaborate on why this is the case? Would you know any papers that go into a bit of detail about this?

@Gaskination 7 жыл бұрын

oooh, this has been a while. The literature I read at the time suggested these things, but I can't remember which articles and books I read, or what they had to say about it. Sorry about that. If cluster analysis was something I did more often, I would have a better answer for you. But I haven't done a cluster analysis again since making this video...

@TheAce0 7 жыл бұрын

Ah, okay, fair enough. I'm dealing with cluster analysis right now and need to figure out which parameters are appropriate and why :)

@DisconnectHack 9 жыл бұрын

Hi James, did you say "swarming variable" or "swapping variable"? I couldn't figure it out, and I have tried looking for definitions for both, only found "swapping variable" for computer science, were you talking about the same ?

@Gaskination 9 жыл бұрын

+DisconnectHack Swamping. I don't know what the technical term would be (or if there is one).

@DisconnectHack 9 жыл бұрын

+James Gaskin Thanks James, it appears there isn't one.

@rajeshpandit3634 9 жыл бұрын

Great video. I just want to check whether the variables you put both continuous and categorical, do you standardize them? Standardize I mean Z Normal variables as you are putting scale, binary, categorical variables together

@Gaskination 9 жыл бұрын

+Rajesh Pandit SPSS automatically standardizes all continuous variables when doing a 2-step cluster analysis. You can see this in the options area when doing the 2-step.

@cynthiagallagher75 8 жыл бұрын

Is here a video that provides more detail on interpreting the clusters themselves? It would be helpful to understand how the clusters are being selected and how the clusters are developed.

@Gaskination 8 жыл бұрын

The only other two-step cluster analysis video that I have is part of the Rosen College SEM Boot Camp: kzbin.info/www/bejne/aH3dY5WLYth1faM

@nassimfard867 9 жыл бұрын

tnx for the videos. Can you please tell me if a set of data can be clsutered only by one variable? and if yes is the two-step cluster more probable or the k-mean clustering? I want to categorize a set of data based on one variable in to three groups and i don't know how to define the cut-off or range for each categorie. I would be glad if you can help me

@Gaskination 9 жыл бұрын

Nassim Fard If it is just one variable, then clustering algorithms won't help. If the variable is categorical, then just group them based on the category values. For example, if the variable is religion, then group them by which religion they affiliate with. If the variable is continous or ordinal, then make logical cutoff points into low, med, high.

@thomasbulitta3817 9 жыл бұрын

Hi James, Thank you for that Video. It was very helpful. Do you know what actually happens "inside" SPSS when you this "Two-Step-Cluster"? Which forms of clustering are used? Single Linkage and hierarchial cluster analysis?

@Gaskination 9 жыл бұрын

+Thomas Bulitta It performs a hierarchical and a non-hierarchical step. I'm not sure which specific algorithms, but I bet the SPSS manual says.

@joseedupont2409 10 жыл бұрын

Very helpful! What version of SPSS are you using?

@Gaskination 10 жыл бұрын

Probably v20 or 21 in this video. Maybe 19...

@sugun1993 6 жыл бұрын

Thank you for the quick tutorial. I am performing two step clustering on a data from a recent study but wants to somehow fit this new data in the clusters generated from past data. Kind of like supervised learning, but neither the coefficients of the model of past data is not available nor the data, unfortunately. Is there a way to solve this or is this case hopeless? p.s. To get the project done in time, without access to any tools, I tried to put the new records in clusters, manually, respecting the features/characteristics of the previously generated clusters. Since the time is my major constraint and the data is just 40 new entries, I have already performed it (could you give me some idea about my options to justify the job done this way). But I am just curious to know the right way.

@Gaskination 6 жыл бұрын

If the new data is using the exact same variables as the original data, then you can simply add the new rows to the dataset and re-run the cluster analysis. That is the easiest way. If the new data is not using the same variables, then there is no statistical way to cluster them along the same lines.

@gs19921 8 жыл бұрын

Thank you for this video I have done 4 different kmeans clustering and I need a method that choose the best clusteranalyses.Can I do it with twosteps or another method?

@Gaskination 8 жыл бұрын

+gs19921 Two step will provide a "fit" measure to let you know if the clustering solution was good. You can also examine the AIC (try to minimize it).

@TheCopginger 12 жыл бұрын

Thanks Mr. Gaskination! would you also show much more complicated (both in terms of data and procedure) segmentation.

@DaDonnyZhang 10 жыл бұрын

Great video! Thank you so much!

@Zopzuita 12 жыл бұрын

Great video! I only have a problem with the model viewer - it doesn't show up. The results are written in the column in my table but the output misses the interactive graphics. Does anybody else have the same problem? Any ideas how to fix this? Thanks!!!

@Gaskination 11 жыл бұрын

1. No references come to mind. When you run comparisons later on between clusters, if one cluster is much larger than another, then this will affect the critical ratio (t, f, or z statistic) since critical ratios are sensitive to sample size. Thus, working with similar sizes is ideal when making comparisons. 2. SPSS makes n+1 groups, where the extra 1 is those who did not fit in anywhere else. To figure out which clusters are which, look at the cluster output number in the output window.

@MrNicks86 10 жыл бұрын

Thanks for the great video - very useful! I was just wondering if you could explain (in a nutshell) the difference between this Two-Step cluster analysis and k-means? Thanks

@Gaskination 10 жыл бұрын

The main difference is that two-step allows you to distinguish between categorical and continuous variables, and it processes them differently. Whereas k-means just treats them all the same. So, if you have categorical variables, two-step would be a more accurate clustering.

@MrNicks86 10 жыл бұрын

Thanks for your reply. So with continuous data like domestic energy use, would k means be more appropriate? And is it right to say that k means treats each variable as independent to the next, which in the case of domestic energy use is not quite the case? Many thanks again!

@Gaskination 10 жыл бұрын

Nicholas Samson Unfortunately, I'm not an expert in cluster analyses. So your question surpasses my immediate knowledge. I would just have to look it up. I know that there are some good documents and articles that discuss the differences between two-step and k-means. I just googled it. Best of luck to you.

@MrNicks86 10 жыл бұрын

Thanks James!

@Gaskination 12 жыл бұрын

I don't yet, but people keep asking for one, so I should probably do one.

@educationalconsultant9880 4 жыл бұрын

Can I use cluster analysis in step wise classification like first classify asymptomatic and symptomatic , then in asymptomatic classify in terms of symptoms? ??

@Gaskination 4 жыл бұрын

I think it should be possible. You could do the classification and save cluster membership number. Then, filter the dataset so that not all rows remain, but only remain those that are part of asymptomatic clusters. Then, cluster again to see if they cluster by symptom. Another route would be to just use evaluation variables in the two-step clustering. These variables aren't used to determine membership in clusters, but each cluster is evaluated post-hoc by symptoms.

@educationalconsultant9880 4 жыл бұрын

@@Gaskination Thank you very much for your reply

@Jemoeder86 10 жыл бұрын

Very informative! Thanks

@harsin009 6 жыл бұрын

Can these profiles really be used as a moderator in SEM analysis? Because I thought SEM only uses continuous variables since it analyzes relationship between multiple variables through regression analysis. For a while, I thought you were referring to Hierarchical Regression Analysis. Thank you!

@Gaskination 6 жыл бұрын

It can be used as a multigroup moderator for multigroup analysis, which is a form of moderation.

@mldsg72 9 жыл бұрын

James, nice job, very well done! Do you mind to make a little comment about AIC and BIC on 2-step cluster?

@Gaskination 9 жыл бұрын

Marcelo Gabriel I was not aware you could generate AIC and BIC in SPSS during a 2-step cluster analysis. I've gone back to it to fiddle with it, but I can't figure it out if it is possible.

@mldsg72 9 жыл бұрын

James, thanks for your reply. At least on versions 20 and 22, you must check the "Clustering Criterion" by choosing BIC or AIC. I'm more inclined to consider AIC than BIC due to its characteristics. Your comment would be nice. Regards

@Gaskination 9 жыл бұрын

Marcelo Gabriel Thanks for pointing me to that. I played with it and looked into it and it appears that the results are often the same (with my data), but that in general, AIC is preferred to BIC. Here is an informative explanation of why as well as some useful references: en.wikipedia.org/wiki/Akaike_information_criterion#Comparison_with_BIC

@louizekahina9985 10 жыл бұрын

Thank you for your video. I have a problem with my statistics. I run two factor analysis for two questions in my questionnaire. After that I run a cluster analysis with factor scores of the second factor analysis which i have already done. The first time i got 2 clusters, but I saw that in the column of cluster membership which automatically created by the system there was ( -1) as a cluter membership, i didn't understand why? I run other time the cluster analysis but this time i deleted the scores for the first factor analysis i did, i kept just factor scores for the second analysis i needed to run the cluster analysis, this time i got 3 clusters? My question is there any relation between the two factor analysis i did before? In my cluster i just use the score for one analysis? Why i got different results if the scores were just as variables and no interaction between them?

@Gaskination 10 жыл бұрын

The -1 means it didn't fit in any cluster. I don't understand your other questions.

@louizekahina9985 10 жыл бұрын

James Gaskin Thank you for your answer. For other questions i found why i had different results. I want to know if it's posible to explain more about clusters generated by using factor scores and not variables of our variable list. Thank you

@louizekahina9985 10 жыл бұрын

Hello, if i have the ratio of sizes 3,05 can i keep the 3 cluster i got, or the size of clusters is not wel adjusted because this ratio is greater than 3. Thank you

@Gaskination 10 жыл бұрын

Louize Kahina That ratio is fine. Also, as for using factor scores in cluster analysis, this is fine because the factor scores are just weighted averages based on the factor loadings. So, this is totally fine and requires no special interpretation.

@louizekahina9985 10 жыл бұрын

James Gaskin Thank you. I asked for the interpretation of the clusters regarding to the two score factors i used. I didn't understand what means exaclty the medians for each factor? are these clusters' centers ?

@TheCopginger 12 жыл бұрын

By the way, I was performing cluster analysis based on your video. However, I have few questions to ask you 1. Is it possible to assign weightage to individual record while performing segmentation? 2. If there is already weightage available for individual record (based on other criterion) how to make use of that in the segmentation process?

@JessicaRodrigues-wz3xo 7 жыл бұрын

Hi! How can I choose variables that are significant to use on it? There´s a statistical test to help? I have a lot of variables and I wanna know how I should choose them, if it has a criteria.

@Gaskination 7 жыл бұрын

Usually it is chosen theoretically, rather than statistically.

@JessicaRodrigues-wz3xo 7 жыл бұрын

Thank you for responding! I have several variables to draw a social and demographic profile of my population. Theoretically all these variables are important, but when I do the analysis with all of them, the results are not good. In other versions of SPSS there was a cut in those variables, a critical value, but I do not know how to identify this in SPSS 22. Can you help me, please?

@Gaskination 7 жыл бұрын

Jéssica Rodrigues you can look at the cluster quality or at the variable importance graph. These will give you indications of the overall value of the variables for clustering into groups.

@shahzadfarid6446 7 жыл бұрын

Sir, Please upload detail lectures on Optimal scaling in SPSS (i.e. MCA, CATPCA and non-linear canonical correlation). These lectures are not available on KZbin. I searched in your channel , with the hope ... , but unfortunately ....

@Gaskination 7 жыл бұрын

I have never done those, so I cannot make videos on them. Any time I learn a new analysis, I make a video for it. If I ever have occasion to do these, I'll make videos for them. Best of luck to you.

@sticky924 10 жыл бұрын

Thank you for this video, it is very helpful

@ΑλεξάνδραΘεοφίλη-ο3ε 6 жыл бұрын

HELP! I am using the spss v.17 and I don't get the model index ... what is going wrong?

@Gaskination 6 жыл бұрын

I'm not sure what you mean by model index. Do you mean you are not getting the silhouette index? I'm not sure what might be causing that either way though... Sorry about that.

@Gaskination 11 жыл бұрын

I have not. Best of luck. But, basically it is like an R-squared analysis. It shows how much of the variance is being explained by each indicator.

@tayeenulhoque1637 10 жыл бұрын

Can you please explain or suggest for likert sclae ordinal data which cluster analysis should apply ? Is it K-Means Cluster/Hierarchical/ Two step. Its it necessary to conduct CATPCA (categorical principal component analysis) prior to starting the cluster analysis, and can you please tell me after CATPCA how can I proceed for cluster analysis apparently the method. As I have four exogenous variable which contains 20 items.

@Gaskination 10 жыл бұрын

Usually we would use factor analysis for this kind of data. However, if you want to do a cluster, then I would do the EFA first and generate factor scores for each construct. Then use these factor scores in a cluster analysis. 2-step or k-means each offer slightly different features and analyses, so you could try both.

@tayeenulhoque1637 10 жыл бұрын

Thank you Mr. James.. i really appreciate your valuable comments

@yifanli4312 5 жыл бұрын

Thank you! This vedio is very helpful!

@Jbrandalise 6 жыл бұрын

Hi James, I'm doing a two-step cluster analysis and my ratio of size was nearly to 18.0. Is it something in literature talking about? Thank you so much!

@Gaskination 6 жыл бұрын

I can't remember which literature talks about it. The concern is that you just want to make sure you have adequate representation from each group.

@Jbrandalise 6 жыл бұрын

James Gaskin I have 20 observations and there is one cluster with 19 and another with just 1. Okay, it is China in this one, and my theme is international competitiveness. I think it’s fine, have some test that i can do to make sure that the clustering it’s great? Thank you again!

@Gaskination 6 жыл бұрын

If you only have 20 responses, and all but 1 are part of a single cluster, then this is not a good cluster solution. You might try removing the nationality variable to see if that fixes the clustering.

@Jbrandalise 6 жыл бұрын

James Gaskin I did this and resolved it. Thank you so much!

@Jbrandalise 6 жыл бұрын

James Gaskin I did this and resolved it. Thank you so much!

@roxy629 9 жыл бұрын

Awesome! So clear and informational :) James, what would be the major differences between cluster analysis and factor analysis? Is it the profiling aspect? Can CA do things that FA cannot? Thanks again!

@Gaskination 9 жыл бұрын

roxy629 Cluster analysis clusters rows. Factor analysis "clusters" columns.

@roxy629 9 жыл бұрын

James Gaskin ahhh!!! that's why it's called "profiling" makes so much sense thanks james :)

@arieprabowo4675 7 жыл бұрын

do u have installer for spss 13? two step cluster only can be operate in spss version 13 i guess. thx before

@Gaskination 7 жыл бұрын

13? That is very old. SPSS is now on version 24. My version 24 runs the two step just fine. I don't have an installer though, as I'm not a licensed distributor.

@kamalpreetrakhra8071 4 жыл бұрын

Can you recommend any books fo two step cluster analysis?

@Gaskination 4 жыл бұрын

I think I learned it from Hair et al 2010 Multivariate Data Analysis, but I'm not sure. It is not my primary methodology, so I'm not too familiar with the literature around it.

@kamalpreetrakhra8071 4 жыл бұрын

@@Gaskination thanks.

@kaykums5350 10 жыл бұрын

very helpful and easy to understand. Can I use a multiple non-unique ID for Data Mining?

@Gaskination 10 жыл бұрын

I don't think I understand your question. Do you mean you want to do datamining on a dataset that has multiple IDs that are the same? If so, then no, you should combine those rows that have the same ID (if they are actually the same case and not a different one with just a duplicate ID), or create unique IDs for unique cases.

@kaykums5350 10 жыл бұрын

James Gaskin Thanks James for the prompt response. That answers my question. can I send u my sample data so you can guide me further.

@kaykums5350 10 жыл бұрын

Kayode Kumapayi they are not totally the same case. some duplicates have different field records from the order in a given ID.

@Gaskination 10 жыл бұрын

Kayode Kumapayi If you have issues you simply cannot resolve, I might be able to guide you a bit. I receive dozens of requests per day though, so please only email me if you are really stuck. Thanks!

@alfonspriessner8556 8 жыл бұрын

Hi James! Very helpful video - you saved me a lot of time. :-) Unfortunately, I have two additional questions, and it would be great if you could help me. I am sure, you are the expert who can help me! 1) Lets assume SPSS program proposes 3 clusters based on a set of variables. What statistical tests are used for the selection of 3 clusters instead of 2 or 4 in the background? I read in some papers that e.g., likelihood-ratio (L2) and its p-value, the Bayesian Information Criterion (BIC) and the number of parameters (Npar) could be examples for these statistical tests (there are for sure others)? And if some of these tests are conducted by SPSS in the background, is there a way how I can create an output-chart of these statistical parameters in SPSS? In other words, since SPSS tells me 3 clusters, I would like to show why 3 clusters and not 4 based on a few statistical tests. 2) Lets assume we still have these 3 clusters from question 1 which were created based on a set of variables. But I have another variable (e.g., age) which I did not use for the cluster analysis. How (if there is any option in SPSS) can I calculate the mean of variable age for each of the 3 identified cluster and show it in an output table (best case for more than 1 additional variable). I hope you understand my questions. I would appreciate your help and guidance!! Thanks a lot in advance! Regards, Alfons

@Gaskination 8 жыл бұрын

1. SPSS let's you choose the AIC or the BIC as the clustering criterion, or you can use the silhouette measure that shows in the output. The silhouette is considered fairly robust. You can force it to 2 or 4 clusters as well to see what the silhouette score is for those. 2. Watch this video at the 2:16 mark. It will show how to do this using the Output button.

@jameszanzarelli9255 6 жыл бұрын

is it possible to score new customers using an existing clustering model?

@Gaskination 6 жыл бұрын

If you mean to assign them to an existing cluster, yes. You can do this with Multiple Discriminant Analysis. Here is a video on it: kzbin.info/www/bejne/bWHZeIKaetuMl68

@jameszanzarelli9255 6 жыл бұрын

Thanks James! :)

@wassdepp1 8 жыл бұрын

Thank you, It made my day

@Byzantic 11 жыл бұрын

I get 'predictor importance' instead of 'variable importance'. Is there a difference?

@Gaskination 11 жыл бұрын

No difference

@olofreichenberg6885 12 жыл бұрын

Very helpful!

@arieprabowo4675 8 жыл бұрын

Where can get the data used in your video? thx before :)))))

@Gaskination 8 жыл бұрын

The sohana data is on the homepage of the statwiki.

@juliaworldwide 8 жыл бұрын

Thank you very much for that !

@TheAnthology09 12 жыл бұрын

Thank you very much for the video. I have a specific question about using cluster analysis in my data. Can I contact you via email?

@Gaskination 12 жыл бұрын

Oh. That's bizarre... I'm not sure. I would google it, or email IBM.

@isabellalapenna8118 5 жыл бұрын

how do i find out which participants are in each cluster?

@Gaskination 5 жыл бұрын

check the video at the 2:39, and then at 10:18

@isabellalapenna8118 5 жыл бұрын

thank you @@Gaskination

@ntaalya 10 жыл бұрын

Thank You very much!

@medosman23 10 жыл бұрын

great video thank you

@DrMMRaziq 12 жыл бұрын

Good tutorial

@123canuckfan 11 жыл бұрын

God I wish you were my stats teacher!

@aboalinoona9211 10 жыл бұрын

thanx for this vid would you please uploud this example file

@Gaskination 10 жыл бұрын

It is available on the homepage of my wiki: statwiki.kolobkreations.com It's the Sohana dataset.

@aboalinoona9211 10 жыл бұрын

James Gaskin thanks