Validating K-means cluster anslysis in SPSS

Рет қаралды 48,818

Күн бұрын

Пікірлер

@Ana-zi4mk 8 жыл бұрын

Hi, James. Thank you for this video. I also watched your other video regarding K-means cluster analysis in SPSS where you have mentioned: „If we can't converge in 10 iterations than we probably don’t have good data for clustering”. I am trying to learn how to do the cluster analysis and I am using some of my data. I have followed your suggestions on how to determine the number of clusters and how to validate them. In my case, I did k-means cluster analysis where I have specified 2, 3, 4 and 5 clusters. In the case of 3 cluster solution, post hoc tests were significantly different in the table presenting Multiple comparisons, but a number of iterations where 0.000 was achieved for all three clusters was 14. On the other hand, in the case of 4 cluster solutions, a number of iterations where 0.000 was achieved for all three clusters was 10, but in the table presenting Multiple comparisons two clusters were not significantly different on few variable. What is your opinion, is my data not suitable for cluster analysis?

@Gaskination 8 жыл бұрын

+Ana It might be suitable. The more variables you include, the harder it is to converge. So, if there are lots of variables, then more than 10 iterations is fine. I don't know if there is a published threshold or guideline.

@eboamuah6811 3 жыл бұрын

@@Gaskination Hi James. Your work has been very helpful. I have read about silhouette as a method of validation in K mean cluster analysis. However, I don't know how to obtain that in SPSS. Is there any index in SPSS that can be used to validate the number of clusters chosen in K mean cluster analysis? Thank you

@Gaskination 3 жыл бұрын

@@eboamuah6811 silhouette is used in two-step cluster analysis in SPSS, but I don't know of a way to produce it for K-means.

@justagirl713 2 жыл бұрын

Thank you for this! Nice last name!

@zhalehmohammadalipour3542 3 жыл бұрын

Very great tutorial! it helped a lot. Thanks.

@jdemontre 4 жыл бұрын

Hey James, I enjoy your videos specially about SEM and now cluster analysis. Thank you! I ran my data and everything went well (10 variables and ca.100 observations). The 3-cluster solution was the best in all criteria. But the Bonferroni test resulted not significant in 2 (out of 60) comparisons (p-vaue slightly higher than 0.1), does it mean the solution was not validated?

@Gaskination 4 жыл бұрын

If it is just 2 out of 60 comparisons, then this is strong evidence that it is a good clustering solution. Nice!

@statsmadeeasy7233 Жыл бұрын

Hi James can we get a copy of the file that you used? I wanted to practice it.

@Gaskination Жыл бұрын

It's the burgers dataset available on the homepage of statwiki.gaskination.com/

@kanika8123 4 жыл бұрын

Thanks a lot. Very helpful video.

@009kishor 6 жыл бұрын

Very helpful video 👍🏻

@kieramillar-brandt2854 3 жыл бұрын

Hi James, thanks for this video. Is there a paper that can be referenced to support that a lower number of iterations is better? Or maybe a paper that indicates best practice in general for reporting the results of k-means clustering? Many thanks. Kiera

@Gaskination 3 жыл бұрын

Chapter nine of Hair et al 2010 ("Multivariate Data Analysis") is all about clustering methods.

@kieramillar-brandt2854 3 жыл бұрын

@@Gaskination thanks very much. That's really appreciated. Your videos are great!

@marcelbeermann1036 4 жыл бұрын

Thanks for the video. How can I see if a cluster actually is underrepresented?

@Gaskination 4 жыл бұрын

It's just a subjective judgment. If the sample size of the cluster is small, then perhaps it is under-represented. You can see what the profile of members of that cluster looks like to determine if it is a legitimate cluster, or just an odd outlier.

@masharifulamin5682 4 жыл бұрын

Hello James, im new here, is it possible to get the dataset to practice? plz share it with us.

@Gaskination 4 жыл бұрын

The dataset is available on the homepage of statwiki: statwiki.kolobkreations.com/

@mayurgo10 7 жыл бұрын

my data contains 900 observations and i tried k means method, the data converges at 15 iterations for 4 cluster solution and 16 iterations for 10 cluster solution. can you suggest some good test to check which cluster solution would be better?

@Gaskination 7 жыл бұрын

Check the AIC or BIC if that is an option. You want to minimize these. Also, check to see which solution is more helpful. Usually 3-5 clusters is most useful and anything more than 5 begins to be difficult to interpret or distinguish.

@henrypritchard4911 4 жыл бұрын

Hi James, This has been very helpful, so firstly thank you! I was wondering if there was a way to validate/find a statistical difference between two clusters as a post hoc one way ANOVAs cannot be performed on fewer than 3 groups/clusters of data? Kind Regards, Henry

@Gaskination 4 жыл бұрын

You can just use a t-test instead.

@henrypritchard4911 4 жыл бұрын

@@Gaskination Thank you!

@henrypritchard4911 4 жыл бұрын

@@Gaskination Hi James, I am sorry to be a pain with another question. I was also wondering why in these instances there is no need to test for normality of distribution before performing the ANOVA with post hoc tests? Thank you in advance and Kind regards, Henry

@Gaskination 4 жыл бұрын

@@henrypritchard4911 Normality of distribution is not required for cluster membership. We really just need sufficient sample size in each group.