Cluster analysis

Рет қаралды 92,571

Күн бұрын

Пікірлер: 103

@Margoth195 2 жыл бұрын

Sir, you are a saint! Thank you thank you thank you!!! Not only did you make this easy but you gave me peace of mind. If we ever meet in person, I hope you will give me the honor of buying you a drink.

@oberoiHimanshu 5 жыл бұрын

Best Video i ever saw on clustering algorithms. Great Work. Thanks for posting!

@hannukoistinen5329 9 ай бұрын

This is not an algorithm.

@yashgourav 3 жыл бұрын

just to like this video and add a comment, I logged in from my google account.. awesome work Hefin... really appreciate your efforts.. :)

@WarmHeartedP 6 жыл бұрын

Very helpful! I've found the k-means and hierarchical algorithms most useful for my specific data. Thumbs up for this video!

@hefinrhys8572 6 жыл бұрын

Thanks! I'm glad you've been able to apply the algorithms and get meaningful results from your data.

@liviaaraujo94 6 жыл бұрын

Please do not stop posting videos! Congratulations on the explanations. Brazil here

@hefinrhys8572 6 жыл бұрын

Obrigada! Estou feliz que você tenha gostado!

@gssmytube 2 жыл бұрын

Dr Hefin Rhys your sessions on clustering is truely amazing and well explained thank you bring some on PAM and CLARA

@jackpumpunifrimpong-manso6523 4 жыл бұрын

Wonderful! I'm impressed. You're very bright! God has used you to bless me. Thank you! Keep on making more videos. Congrats & Cheers!

@Legogostar456 3 жыл бұрын

wow! very well explained, thank you so much. I appreciate the details and the beginner-friendliness of your tutorial!

@Insanesibak 5 жыл бұрын

This is a great video on Clustering. Thank you for putting it together.

@gabrieleinguglia2314 5 жыл бұрын

Really congratulation for how you made these tutorials! They are really clear and helpful! Thank you

@mikeybratkovic 2 жыл бұрын

Wow! Absolute great video, well explained + really helpful tipps and tricks! Thank you for that!

@muhammedwalugembe7142 4 жыл бұрын

Best video and explanation on clustering

@amourlafleur1762 2 жыл бұрын

You are a great teacher. thanks MILLION

@nausheenfatima6523 4 жыл бұрын

So clear and concise. Thank you!

@user-xc9ih8gv4h 5 жыл бұрын

This is high quality content. Thanks.

@hefinrhys8572 5 жыл бұрын

Thanks Glenn!

@blaeandblack547 3 жыл бұрын

Best on the web re clustering, thanks.

@nolevel433 2 жыл бұрын

Wonderful explanation. Thank you so much!!!!

@StockSpotlightPodcast 4 жыл бұрын

Fantastic video! The only thing that would have been nice to see is how do you take these clustering solutions and create a new column with the cluster number for each observation that could then be exported to Excel and used to create a PPT slide. This would have really been helpful for work situations.

@edwart83 4 жыл бұрын

At 30.37, in theory lower BIC is better the model fits and not like you said.

@hefinrhys8572 4 жыл бұрын

Nice spot. Using usual definition of BIC yes, lower is better. The mclust package rearranges the equation to be: BIC = L - 0.5 * p * ln(n) Such that the expression should be maximised instead of minimised.

@edwart83 4 жыл бұрын

@@hefinrhys8572 Ok i didn't know that definition , maybe they need to not call it BIC to not confuse the people. This part is for the people that read this reply and maybe don't know what we are talking: BIC=k ln(n)-2ln(L), k is the number of parameter estimated by the model, L is the maximized value of the likelihood function of the model and n the number of observations (sample size). Lower BIC is, less information we lose.

@angushenderson2020 3 жыл бұрын

❤️ Thank you so much! this was brilliant!

@albertcardoso1383 6 жыл бұрын

What a brilliant tutorial, thank you!

@hefinrhys8572 6 жыл бұрын

Thank you! Glad to be of help :)

@parth1211 2 жыл бұрын

Ty for the quality content brother , I am beginner that's been very helpful can you please provide more videos 🙂 thankyou

@NAVEENKUMARS12 4 жыл бұрын

Very good introductions to clustering!!

@thejuhulikal6290 3 жыл бұрын

Thank you for the information, please help me to understand how the results of PCA is used in clustering, because both the videos of PCA and Clustering are found not continued, please help us if we want to continue clustering with results of PCA how we should do that..again thank you for detailed information

@hefinrhys8572 3 жыл бұрын

If you wish to cluster the results of a PCA, simply select the number of principal components you wish to retain (the first few that explain most of the variance), and use the data projected onto these components as the input data to your clustering algorithm. But I would suggest you compare the clusters based on the original data, and the clusters based on the reduced dimensional data, to see which performs better.

@thejuhulikal6290 3 жыл бұрын

Thank u so much

@sophielong8937 3 жыл бұрын

hi, i keep getting the error: " Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ" when i try and plot the graph to see the number of k with this code - plot(1:10, betweenss_totss, type = "b", + ylab = "Between SS / Total SS", xlab = "Clusters (k)") Do you know how i can solve this? I have tried looking online and have found how to make the x and y axis the same, however for this graph we don't need them to be the same. Any information you could spare would be great!!

@hefinrhys8572 3 жыл бұрын

Hi Sophie, the first two arguments to the plot() function need to be vectors representing the x and y axes, respectively. At the moment, you only supply a vector for the x axis (1:10) that only contains values 1 through 10. What are you trying to plot here? The easiest way to plot the clusters is to create a new column in the data.frame indicating cluster membership, plotting two variables against each other and colouring by this cluster variable.

@jacobthomsen2248 3 жыл бұрын

Incredible, so well explained, thanks!

@lingzhao242 4 жыл бұрын

You are amazing! really helpful tutorials! Thank you!

@OgunCakr 6 жыл бұрын

Great explanation 👍 you have such a soothing voice btw 😘

@hefinrhys8572 6 жыл бұрын

Thanks! Glad you enjoyed the voice ;)

@djangoworldwide7925 Жыл бұрын

Fantastic. Subscribed!

@reshmirajeev5770 4 жыл бұрын

Helloo..deah.. how to find a gene cluster of an secondary metabolite..if the genome of the organism is not sequenced?can you help me?

@rounakagarwal2134 4 жыл бұрын

Very Nice Explanation. Learnt a lot of things :)

@leandanielvillareal3352 4 жыл бұрын

This is the best video I have watched about cluster analysis. I subscribed immediately after watching this. Here's my question. I don't know if I skipped this in the video but how do I extract the vector containing the specific cluster each observation belongs to? I tried doing the model-based method.

@hefinrhys8572 4 жыл бұрын

I didn't show this in the video sorry, but you can extract the vector of most probable clusters by accessing the $classification component of your mclust model. If you want more detailed information, i.e. the matrix of probabilities for each datum belonging to each class, extract the $z component. In R, it's always useful to call str() on your model objects so you can understand and inspect their structure. Using ?Mclust and reading the Value section, also shows you what each component of the model object means.

@leandanielvillareal3352 4 жыл бұрын

@@hefinrhys8572 oh okay thanks. I didn't think of using ?Mclust. I'm going to use your videos as reference. Very informative and understandable. Thanks again.

@fernandoguerrerozurita4716 5 жыл бұрын

What a very usefull tutorial video!!! Thank you so much!!

@tarkatirtha 3 жыл бұрын

Great video!

@Rafael.a.f 3 жыл бұрын

Is there a clustering method indicated to a certain number of variables.? I have in my study 32 variables and I'm thinking that perhaps it would be a specific procedure to that much of variables. thanks for sharing you knowledge.

@Olivia-rd6ce 3 жыл бұрын

Is there a way that you could figure out what metrics contribute to the hierarchical break?

@topfundus1093 4 жыл бұрын

Hallo Hefin, wieder ein großartiges Tutorial. Erlaube eine Frage: Wie kann ich zu einem Wert (oder Wertepaar x, y) aus dem Datensatz das zutreffende Cluster zuordnen? Eindimensionales Beispiel: Werte 1 - 10 = Cluster 1 11 - 20 = Cluster 2 21 - 30 = Cluster 3 Zu welchem Cluster gehört Var = 19? Antwort: Var (=19) gehört zum Cluster 2. Wie berechne ich das mit kmeans bzw. R (ggplot2)? Hello Hefin, another great tutorial. Allow a question: How can I assign the appropriate cluster to a value (or value pair x, y) from the data set? One-dimensional example: values 1 - 10 = cluster 1 11 - 20 = cluster 2 21 - 30 = cluster 3 Which cluster does Var = 19 belong to? Answer: Var (= 19) belongs to cluster 2. How do I calculate this with kmeans or R (ggplot2)?

@vishnunath1524 6 жыл бұрын

Thank you for this excellent tutorial !

@hefinrhys8572 6 жыл бұрын

Thank you! Glad you enjoyed.

@s.m.m8006 4 жыл бұрын

Thank you for this great video , but if I need to use GMM clustering algorithm with them . may you help me to do that plz ?

@hannahredders4442 4 жыл бұрын

How can I perform cluster analysis on data that I specify as survey data first (with svydesign)? thanks!

@rima3088 4 жыл бұрын

Big thumbs up ! Huge thumbs up... Really

@muhammadazharnadeem2682 4 жыл бұрын

Excellent Job. Please provide the sample data file publically so that we can arrange our data easily. Thanks in advance.

@tomsteffen1882 4 жыл бұрын

Do you know about Wards-Method when doing cluster Analysis?

@AnaCvejic 3 жыл бұрын

You help me :) Tnx for video

@hitoshinishizawa1868 3 жыл бұрын

Thank you for the wonderful tutorial!!! I have one question. Is there anyway to create different data frames based on the clusters identified? For instance, having a new column for 'cluster' and have the cluster # for each row. I am doing the following but not sure if it is right. kc

@EUrunner 4 жыл бұрын

Thanks, you've helped me a lot 👍

@johnnybravo86 6 жыл бұрын

absolutely amazing!! thank you!. please do more. i'm trying to learn r to do CFA to fit theory of planned behavior models. can you do one on this?

@hefinrhys8572 6 жыл бұрын

You're very welcome. I'm sorry for the late reply to this; I'm not familiar with theory of planned behaviour models, but if you describe the problem you're trying to solve, I may be able to help.

@charithkrish 4 жыл бұрын

Hi sir, im new to R as im from a different background, is clustering available for Panel data?

@jaaadeeeeful 4 жыл бұрын

Anyone else thinks his voice and accent sound like the man at Headspace? I cannot help deepening my breath when learning this...

@sahaywrestling5497 5 жыл бұрын

Thank you very much, extremely helpful

@ejiet-igolatemmanuel7376 4 ай бұрын

Excellent study material. BUT BECAME BLURRED FROM 6.45 TO ABOUT 16.05 MINUTES. Please how can I study the entire video clearly? Thank you for such a summary.

@nssSmooge 6 жыл бұрын

I am just curious, how would I go about removing a trend in my data and then using all the observation into clustering. I have several countries and several variables but observed over time [annually]. The bad thing is that picking a year to do clustering is an option but not that much great since explanation can be fault if its based only on one year - if I am not mistaken xD

@hefinrhys8572 5 жыл бұрын

Hi Wildfox, sorry for the late reply. So I presume you are looking for clusters of countries? One approach would be to model the relationship between time and a dependent variable (if you have one) for each country using a linear model. Then, use the estimated marginal means (also called least squares means) to cluster the countries. Estimated marginal means are the predicted means of variables in a linear model after accounting for all the other variables (i.e. after removing the effect of time).

@hannahredders4442 4 жыл бұрын

another question: what's a recommended number of variables for a cluster analysis?

@navjotsingh2251 11 ай бұрын

Honestly, it depends on what you are clustering. But there is the risk of too many variables which will cause problems clustering, I think this falls down to trial and error also using domain knowledge of what variables to include.

@SachinSingh-wr5yv 6 жыл бұрын

Thank you sir for the videos...!!!!!

@hefinrhys8572 6 жыл бұрын

You are welcome!

@hersil2012 6 жыл бұрын

Very well explained! Thanks

@hefinrhys8572 6 жыл бұрын

Thank you! I hope it was useful.

@nikhiljamisetti7139 6 жыл бұрын

CAN ANYBODY TELL ME HOW TO FIND THE DEFECT CLUSTERS IN THE ABOVE DATASET

@hefinrhys8572 6 жыл бұрын

Hi Nikhil, I'm not quite sure I understand what you are trying to do. Do you mean defect clustering in application testing? This isn't a statistical computing application per se, but here is some information which may be of use: www.pitsolutions.ch/blog/defect-clustering-and-pesticide-paradox/

@ghdoia 5 жыл бұрын

can you provide me with the data you used (iris) ?

@hefinrhys8572 5 жыл бұрын

Hi ghdoia, so R and its packages come with datasets built in. To list them all, simply run data(), to load one (such as the iris dataset), run data(iris). Then, you can access the iris dataset by referring to it by name. I hope that helps.

@ghdoia 5 жыл бұрын

@@hefinrhys8572 thank you very much. I learnt alots from you. I've just had some of issue when I did run the below code : plot(1:10, betweenss_totalss, type = "b", ylab = "Between SS/Total SS", xlab = "Cluster(K)") as it shows this message below; Error in plot.window(...) : need finite 'ylim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf Please can you tell what is that mean and how i can solve this problem?

@delt19 6 жыл бұрын

is there an easy way to output the Mclust cluster assignments to a csv file?

@hefinrhys8572 6 жыл бұрын

Sorry for the slow reply to this. I hope you worked it out, but I would extract the assigned groups using something like this: # ADD NEW COLUMN TO DATAFRAME WITH MCLUST GROUP ASSIGNMENTS irisScaled$Group

@anooshmitadas3597 6 жыл бұрын

It helped a lot. well done.

@hefinrhys8572 6 жыл бұрын

Glad it was of use :)

@priyanwadaatapattu5900 6 жыл бұрын

> betweenss_totss

@hefinrhys8572 6 жыл бұрын

Hi Priyanwada, I'm reading this on my phone so cannot check in R right now, but it looks like you have 'list()' in front of 'for(i in 10)'. This 'list()' function is empty and I'm not sure you need it anyway.

@123gregery 6 жыл бұрын

Congratulations for your videos. One question: how can I get the means of the columns of irisScaled? (irisScaled is a list).

@hefinrhys8572 6 жыл бұрын

Thank you! Sorry for the late reply. So irisScaled is a matrix (which I think I omitted to mention in the video), and a succinct way to get the means of each column would be to use the apply() function: apply(irisScaled, 2, mean) where the first argument is the data, the second is an index (either 1 to iterate over rows or 2 for columns) and the third argument is the function to apply. You could also find the mean for each column individually like this: mean(irisScaled[, 1])