Hi, James. Thank you for this video. I also watched your other video regarding K-means cluster analysis in SPSS where you have mentioned: „If we can't converge in 10 iterations than we probably don’t have good data for clustering”. I am trying to learn how to do the cluster analysis and I am using some of my data. I have followed your suggestions on how to determine the number of clusters and how to validate them. In my case, I did k-means cluster analysis where I have specified 2, 3, 4 and 5 clusters. In the case of 3 cluster solution, post hoc tests were significantly different in the table presenting Multiple comparisons, but a number of iterations where 0.000 was achieved for all three clusters was 14. On the other hand, in the case of 4 cluster solutions, a number of iterations where 0.000 was achieved for all three clusters was 10, but in the table presenting Multiple comparisons two clusters were not significantly different on few variable. What is your opinion, is my data not suitable for cluster analysis?
@Gaskination8 жыл бұрын
+Ana It might be suitable. The more variables you include, the harder it is to converge. So, if there are lots of variables, then more than 10 iterations is fine. I don't know if there is a published threshold or guideline.
@eboamuah68113 жыл бұрын
@@Gaskination Hi James. Your work has been very helpful. I have read about silhouette as a method of validation in K mean cluster analysis. However, I don't know how to obtain that in SPSS. Is there any index in SPSS that can be used to validate the number of clusters chosen in K mean cluster analysis? Thank you
@Gaskination3 жыл бұрын
@@eboamuah6811 silhouette is used in two-step cluster analysis in SPSS, but I don't know of a way to produce it for K-means.
@justagirl7132 жыл бұрын
Thank you for this! Nice last name!
@zhalehmohammadalipour35423 жыл бұрын
Very great tutorial! it helped a lot. Thanks.
@jdemontre4 жыл бұрын
Hey James, I enjoy your videos specially about SEM and now cluster analysis. Thank you! I ran my data and everything went well (10 variables and ca.100 observations). The 3-cluster solution was the best in all criteria. But the Bonferroni test resulted not significant in 2 (out of 60) comparisons (p-vaue slightly higher than 0.1), does it mean the solution was not validated?
@Gaskination4 жыл бұрын
If it is just 2 out of 60 comparisons, then this is strong evidence that it is a good clustering solution. Nice!
@statsmadeeasy7233 Жыл бұрын
Hi James can we get a copy of the file that you used? I wanted to practice it.
@Gaskination Жыл бұрын
It's the burgers dataset available on the homepage of statwiki.gaskination.com/
@kanika81234 жыл бұрын
Thanks a lot. Very helpful video.
@009kishor6 жыл бұрын
Very helpful video 👍🏻
@kieramillar-brandt28543 жыл бұрын
Hi James, thanks for this video. Is there a paper that can be referenced to support that a lower number of iterations is better? Or maybe a paper that indicates best practice in general for reporting the results of k-means clustering? Many thanks. Kiera
@Gaskination3 жыл бұрын
Chapter nine of Hair et al 2010 ("Multivariate Data Analysis") is all about clustering methods.
@kieramillar-brandt28543 жыл бұрын
@@Gaskination thanks very much. That's really appreciated. Your videos are great!
@marcelbeermann10364 жыл бұрын
Thanks for the video. How can I see if a cluster actually is underrepresented?
@Gaskination4 жыл бұрын
It's just a subjective judgment. If the sample size of the cluster is small, then perhaps it is under-represented. You can see what the profile of members of that cluster looks like to determine if it is a legitimate cluster, or just an odd outlier.
@masharifulamin56824 жыл бұрын
Hello James, im new here, is it possible to get the dataset to practice? plz share it with us.
@Gaskination4 жыл бұрын
The dataset is available on the homepage of statwiki: statwiki.kolobkreations.com/
@mayurgo107 жыл бұрын
my data contains 900 observations and i tried k means method, the data converges at 15 iterations for 4 cluster solution and 16 iterations for 10 cluster solution. can you suggest some good test to check which cluster solution would be better?
@Gaskination7 жыл бұрын
Check the AIC or BIC if that is an option. You want to minimize these. Also, check to see which solution is more helpful. Usually 3-5 clusters is most useful and anything more than 5 begins to be difficult to interpret or distinguish.
@henrypritchard49114 жыл бұрын
Hi James, This has been very helpful, so firstly thank you! I was wondering if there was a way to validate/find a statistical difference between two clusters as a post hoc one way ANOVAs cannot be performed on fewer than 3 groups/clusters of data? Kind Regards, Henry
@Gaskination4 жыл бұрын
You can just use a t-test instead.
@henrypritchard49114 жыл бұрын
@@Gaskination Thank you!
@henrypritchard49114 жыл бұрын
@@Gaskination Hi James, I am sorry to be a pain with another question. I was also wondering why in these instances there is no need to test for normality of distribution before performing the ANOVA with post hoc tests? Thank you in advance and Kind regards, Henry
@Gaskination4 жыл бұрын
@@henrypritchard4911 Normality of distribution is not required for cluster membership. We really just need sufficient sample size in each group.
@najeebullahahmadzai51605 ай бұрын
Thank you sir!
@shantanuchakrabory55274 жыл бұрын
K-mean cluster analysis using spss in really special one