Silhouette (clustering)- Validating Clustering Models- Unsupervised Machine Learning

Рет қаралды 59,722

Күн бұрын

Пікірлер: 67

@arjundev4908 4 жыл бұрын

So to brief about clustering algorithm.. 1. its a unsupervised Machine learning algorithm. 2. To find out number of clusters we use "Elbow method". 3. Silhouette(si-lo-wet) scores helps in finding whether the data points in a Clusters formed belongs to respective cluster or not. formula is b-a/max(b,a) where 'a' is distance within the cluster and 'b' is distance between the cluster... if ab and the silhouette scores are negative and highly unlikely that clustering formed is incorrect.. Please add points if missed or correct me if i am wrong..!!

@shubhamthapa7586 4 жыл бұрын

what do you think of DBSCAN and HDBSCAN silhoutte coefficient wont work there cuz we dont have no of clusters as a parameter ?

@sampadkar19 Ай бұрын

@@shubhamthapa7586 do we actually need k ?? beacuse hiearchical and DBSCAN both form clusters so we can easily do it using the formula

@NashatJumaah 6 ай бұрын

Thanks for the easy to follow tutorials...Big Love from Iraq

@chintansoni6370 2 жыл бұрын

Thanks Krish. Much better understanding. Really appericiate your efforts to provide knowledge by creating videos.

@ramendrachaudhary9784 4 жыл бұрын

pretty much simple and pretty much amazing explaination. Thanks Krish

@mprasad3661 4 жыл бұрын

Awesome explanation bro

@DeepGamingAI 4 жыл бұрын

Loved it, very nicely explained :)

@matheusgoes640 2 жыл бұрын

Thank you so much for this video! Great explained!👏

@kushagrak4903 3 жыл бұрын

Sir i have doubt like , in your example the silhouette value of 2 greater than 4 , so how can we decide whether k = 4 is good than k=2 without implementing any diagram .

@sihammohamed7480 4 жыл бұрын

Thanks Krish Awesome explanation :)

@thankyouthankyou1172 4 жыл бұрын

8:18: why there's no mention about the min()?

@krishnag5734 3 жыл бұрын

The min is used to find the nearest neighboring cluster to the current cluster Ci. To get it, compute b(i) for all the clusters from a point in the current cluster and get the min value. Whichever cluster holds that min value, is the nearest neighbor to the current cluster. Let's say you have 5 different clusters, and you are computing b(i) to the rest of the 4 clusters from the 1st. you will have 4 different b(i) values. Whichever value is minimum, the corresponding cluster is closest to the first cluster.

@naufalsiregar9662 2 жыл бұрын

I just saw your video, this is great... Thanksss

@DineshBabu-gn8cm 3 жыл бұрын

Excellent explanation Krish

@stevemungai3542 2 жыл бұрын

excellent explanation

@kamilc9286 4 жыл бұрын

Sir, if you'll run the number of negative_samples k=4 has 1 (not visible on the chart), so k=2 has score 0,74 and 0 -ve and k=4 has score 0,65 and 1 -ve. Only elbow method suggests k=4.

@prasadshiva3538 Жыл бұрын

thats great lecture Krish, but what if ,we got k=5 in Silhouette but k=4 in Elbow, how to conclude this the correct k value

@1potdish271 3 жыл бұрын

Very nice explanation. (y)

@anbarasanpm3295 9 ай бұрын

How the point is chosen in Cluster 1?

@iqbalsaviola6052 3 жыл бұрын

Something that im looking for

@MrChudhi 3 жыл бұрын

Please can you explain how we can use adjusted RAND score for a K-Means models.

@amitgupta-tb9td 4 жыл бұрын

Sir plss make more validation techniques for unsupervised learning...!

@rishisingh5581 4 жыл бұрын

Sir community class aaj launch hogi ki nahi?

@sukshithshetty4847 2 жыл бұрын

has this code been shown in some other video from scratch ?

@DevanshKhandekar 4 жыл бұрын

Can be better explained with terms like ' Intracluster distance ' and 'Intercluster distance'

@krishnaik06 4 жыл бұрын

Yes absolutely right

@sajidchoudhary1165 4 жыл бұрын

Sir Please make video on xgboost math intuition for Regression and Classification Please Sir Please

@sajidchoudhary1165 4 жыл бұрын

@Karthik Vishwanath yes i was watching but i couldn't understand some hyperparameters like gamma, lambda, cover

@ashulohar8948 2 жыл бұрын

Plz make vedios on performance metrics of regression algorithm

@adotac 2 жыл бұрын

Can someone tell me what software he used to draw on screen?

@divyamadhuri126 2 жыл бұрын

That was super helpful. My doubt is that if my clusters are overlapping, what should I interpret? Are my data points poorly clustered?

@yashsethi2402 2 жыл бұрын

What is the name of the plot that you created?

@sakshamshivhare2474 2 жыл бұрын

Hi krish , i wanted to ask in an interview i was asked how to interpret the results of K means clustering and how to label the results. can you or anyone help me out with this question

@sambit123sahu 3 жыл бұрын

what is the value of |C| ?

@पंकजकुलड़िया 4 жыл бұрын

at 18:02 there is a negative value for Cluster no.-2, so I think k should be 2, Please clarify sir, Also thanks a lot for making such kind of video and Virtual Interview sessions...

@mohammedameen3249 3 жыл бұрын

but as elbow method used which sow k =4 is the best

@sandipansarkar9211 2 жыл бұрын

finished watching

@rhiothelab5251 4 жыл бұрын

Thanks, Sir for all the content, Sir suppose I am using DBSCAN and it gives only one cluster then how to measure the correctness of cluster

@rezamohammadi3096 3 жыл бұрын

Hi @Krish Thanks for amazing tutorial. I'm using k-prototyps library (for mixed numerical and numinal data type) and I want to calculate Silhouette Index to compare my clustering results with previous studies (e.g. k-medoid). Could you please give me a clue to calculate Silhouette Index in my case?

@saisidhartha2855 3 жыл бұрын

This is the same case with, me kindly let me know whether it is possible. I have used elbow method for k-prototype to determine the K value. Looking forward for shilloute method also

@zama-sarib 2 жыл бұрын

@@saisidhartha2855 We can use gower distance with gower_distance as precomputer metric in silhoutte sklearn. Gower distance typically works well with mixed data like numerical and categorical data types.

@vignesh7687 3 жыл бұрын

Thank you Krish, But I could see a small negative value for K = 4 in the plot.

@noobgamerstamilyt9963 3 жыл бұрын

yes bro there is some small negative value in it bro

@varnikareshma1873 2 жыл бұрын

sir, instead of using Euclidean or Manhattan Distance can we use cosine based distance. If it is possible can u please hint me how to use it.

@ashulohar8948 2 жыл бұрын

Cosine is for text data

@ayushgoel9584 4 жыл бұрын

community classes ka kya hua ??

@krishnaik06 4 жыл бұрын

Will be live in some time

@sandipansarkar9211 4 жыл бұрын

Superb explanation. Need to get my hands dirty with Jupyter notebook.Thanks

@padmanabh7031 4 жыл бұрын

Bhudde

@soniakashyap001 4 жыл бұрын

Hi Krish, when i used to search to find any method by which we could check validity of cluster i used to get only elbow methods in search. I never came across links related to silhouettes. How did you find this method?

@mranaljadhav8259 4 жыл бұрын

Try to check all the link provided by google, i think you opened only one link. I search like you "validity of cluster" , I got the right methods like silhouette, dunn index etc

@rohitkamra1628 4 жыл бұрын

What can we expect on discord server?

@chandinisaikumar2736 4 жыл бұрын

Can we use this evaluation method for DBSCAN

@shubhamthapa7586 4 жыл бұрын

no i dont think so

@chandinisaikumar2736 4 жыл бұрын

@@shubhamthapa7586 Can you please let me know the evaluation methid for DBSCAN Thanks in advance

@shubhamthapa7586 4 жыл бұрын

@@chandinisaikumar2736 in current scenario there is no best evaluation metric available for DBSCAN however you can use silhoutte coeffecient for a refrence but you need to optimize the parameters of DBSCAN first which is hard as compare to KMeans clustering , honeynet.github.io/cuckooml/2016/07/19/clustering-evaluation/ here they make a good use of silhoutte coeff in DBSCAN

@aimem246 3 жыл бұрын

thx

@hicodeguru 3 жыл бұрын

X has 500 features, but KMeans is expecting 1000 features as input. ??

@oscarfamousdarteh189 4 жыл бұрын

Sir , please could you show another video with an uploaded csv data ? I'm finding some difficulties

@sagemaker 10 ай бұрын

Pronunciation: Sil-who-at

@raviyadav2552 4 жыл бұрын

si-low -et

@manikprabhug412 4 жыл бұрын

Hi sir

@TJ-wo1xt 3 жыл бұрын

dont pay wikipedia, not everything that is there is correct.

@diosmorbodiosmorbo9547 3 жыл бұрын

It is indeed. That thing of "in wikipedia anyone can writte anything" is false. You have a very estrict and dedicated community validating every time someone writtes smh. And with new topics like ML is even more. So stfu