So to brief about clustering algorithm.. 1. its a unsupervised Machine learning algorithm. 2. To find out number of clusters we use "Elbow method". 3. Silhouette(si-lo-wet) scores helps in finding whether the data points in a Clusters formed belongs to respective cluster or not. formula is b-a/max(b,a) where 'a' is distance within the cluster and 'b' is distance between the cluster... if ab and the silhouette scores are negative and highly unlikely that clustering formed is incorrect.. Please add points if missed or correct me if i am wrong..!!
@shubhamthapa75864 жыл бұрын
what do you think of DBSCAN and HDBSCAN silhoutte coefficient wont work there cuz we dont have no of clusters as a parameter ?
@sampadkar195 күн бұрын
@@shubhamthapa7586 do we actually need k ?? beacuse hiearchical and DBSCAN both form clusters so we can easily do it using the formula
@NashatJumaah5 ай бұрын
Thanks for the easy to follow tutorials...Big Love from Iraq
@chintansoni63702 жыл бұрын
Thanks Krish. Much better understanding. Really appericiate your efforts to provide knowledge by creating videos.
@ramendrachaudhary97844 жыл бұрын
pretty much simple and pretty much amazing explaination. Thanks Krish
@kushagrak49033 жыл бұрын
Sir i have doubt like , in your example the silhouette value of 2 greater than 4 , so how can we decide whether k = 4 is good than k=2 without implementing any diagram .
@matheusgoes6402 жыл бұрын
Thank you so much for this video! Great explained!👏
@sihammohamed74804 жыл бұрын
Thanks Krish Awesome explanation :)
@mprasad36614 жыл бұрын
Awesome explanation bro
@stevemungai35422 жыл бұрын
excellent explanation
@DeepGamingAI4 жыл бұрын
Loved it, very nicely explained :)
@DineshBabu-gn8cm3 жыл бұрын
Excellent explanation Krish
@naufalsiregar96622 жыл бұрын
I just saw your video, this is great... Thanksss
@amitgupta-tb9td4 жыл бұрын
Sir plss make more validation techniques for unsupervised learning...!
@kamilc92863 жыл бұрын
Sir, if you'll run the number of negative_samples k=4 has 1 (not visible on the chart), so k=2 has score 0,74 and 0 -ve and k=4 has score 0,65 and 1 -ve. Only elbow method suggests k=4.
@1potdish2713 жыл бұрын
Very nice explanation. (y)
@DevanshKhandekar4 жыл бұрын
Can be better explained with terms like ' Intracluster distance ' and 'Intercluster distance'
@krishnaik064 жыл бұрын
Yes absolutely right
@iqbalsaviola60523 жыл бұрын
Something that im looking for
@ashulohar8948 Жыл бұрын
Plz make vedios on performance metrics of regression algorithm
@पंकजकुलड़िया4 жыл бұрын
at 18:02 there is a negative value for Cluster no.-2, so I think k should be 2, Please clarify sir, Also thanks a lot for making such kind of video and Virtual Interview sessions...
@mohammedameen32493 жыл бұрын
but as elbow method used which sow k =4 is the best
@sajidchoudhary11654 жыл бұрын
Sir Please make video on xgboost math intuition for Regression and Classification Please Sir Please
@sajidchoudhary11654 жыл бұрын
@Karthik Vishwanath yes i was watching but i couldn't understand some hyperparameters like gamma, lambda, cover
@prasadshiva3538 Жыл бұрын
thats great lecture Krish, but what if ,we got k=5 in Silhouette but k=4 in Elbow, how to conclude this the correct k value
@sandipansarkar92112 жыл бұрын
finished watching
@thankyouthankyou11724 жыл бұрын
8:18: why there's no mention about the min()?
@krishnag57343 жыл бұрын
The min is used to find the nearest neighboring cluster to the current cluster Ci. To get it, compute b(i) for all the clusters from a point in the current cluster and get the min value. Whichever cluster holds that min value, is the nearest neighbor to the current cluster. Let's say you have 5 different clusters, and you are computing b(i) to the rest of the 4 clusters from the 1st. you will have 4 different b(i) values. Whichever value is minimum, the corresponding cluster is closest to the first cluster.
@sandipansarkar92114 жыл бұрын
Superb explanation. Need to get my hands dirty with Jupyter notebook.Thanks
@padmanabh70313 жыл бұрын
Bhudde
@divyamadhuri126 Жыл бұрын
That was super helpful. My doubt is that if my clusters are overlapping, what should I interpret? Are my data points poorly clustered?
@vignesh76873 жыл бұрын
Thank you Krish, But I could see a small negative value for K = 4 in the plot.
@noobgamerstamilyt99633 жыл бұрын
yes bro there is some small negative value in it bro
@rhiothelab52513 жыл бұрын
Thanks, Sir for all the content, Sir suppose I am using DBSCAN and it gives only one cluster then how to measure the correctness of cluster
@MrChudhi3 жыл бұрын
Please can you explain how we can use adjusted RAND score for a K-Means models.
@sakshamshivhare24742 жыл бұрын
Hi krish , i wanted to ask in an interview i was asked how to interpret the results of K means clustering and how to label the results. can you or anyone help me out with this question
@rishisingh55814 жыл бұрын
Sir community class aaj launch hogi ki nahi?
@sukshithshetty48472 жыл бұрын
has this code been shown in some other video from scratch ?
@rezamohammadi30963 жыл бұрын
Hi @Krish Thanks for amazing tutorial. I'm using k-prototyps library (for mixed numerical and numinal data type) and I want to calculate Silhouette Index to compare my clustering results with previous studies (e.g. k-medoid). Could you please give me a clue to calculate Silhouette Index in my case?
@saisidhartha28553 жыл бұрын
This is the same case with, me kindly let me know whether it is possible. I have used elbow method for k-prototype to determine the K value. Looking forward for shilloute method also
@zama-sarib2 жыл бұрын
@@saisidhartha2855 We can use gower distance with gower_distance as precomputer metric in silhoutte sklearn. Gower distance typically works well with mixed data like numerical and categorical data types.
@adotac2 жыл бұрын
Can someone tell me what software he used to draw on screen?
@anbarasanpm32958 ай бұрын
How the point is chosen in Cluster 1?
@rohitkamra16284 жыл бұрын
What can we expect on discord server?
@yashsethi24022 жыл бұрын
What is the name of the plot that you created?
@sagemaker9 ай бұрын
Pronunciation: Sil-who-at
@varnikareshma18732 жыл бұрын
sir, instead of using Euclidean or Manhattan Distance can we use cosine based distance. If it is possible can u please hint me how to use it.
@ashulohar89482 жыл бұрын
Cosine is for text data
@soniakashyap0014 жыл бұрын
Hi Krish, when i used to search to find any method by which we could check validity of cluster i used to get only elbow methods in search. I never came across links related to silhouettes. How did you find this method?
@mranaljadhav82594 жыл бұрын
Try to check all the link provided by google, i think you opened only one link. I search like you "validity of cluster" , I got the right methods like silhouette, dunn index etc
@aimem2463 жыл бұрын
thx
@hicodeguru3 жыл бұрын
X has 500 features, but KMeans is expecting 1000 features as input. ??
@sambit123sahu3 жыл бұрын
what is the value of |C| ?
@ayushgoel95844 жыл бұрын
community classes ka kya hua ??
@krishnaik064 жыл бұрын
Will be live in some time
@chandinisaikumar27364 жыл бұрын
Can we use this evaluation method for DBSCAN
@shubhamthapa75864 жыл бұрын
no i dont think so
@chandinisaikumar27364 жыл бұрын
@@shubhamthapa7586 Can you please let me know the evaluation methid for DBSCAN Thanks in advance
@shubhamthapa75864 жыл бұрын
@@chandinisaikumar2736 in current scenario there is no best evaluation metric available for DBSCAN however you can use silhoutte coeffecient for a refrence but you need to optimize the parameters of DBSCAN first which is hard as compare to KMeans clustering , honeynet.github.io/cuckooml/2016/07/19/clustering-evaluation/ here they make a good use of silhoutte coeff in DBSCAN
@manikprabhug4124 жыл бұрын
Hi sir
@oscarfamousdarteh1894 жыл бұрын
Sir , please could you show another video with an uploaded csv data ? I'm finding some difficulties
@raviyadav25523 жыл бұрын
si-low -et
@TJ-wo1xt3 жыл бұрын
dont pay wikipedia, not everything that is there is correct.
@diosmorbodiosmorbo95473 жыл бұрын
It is indeed. That thing of "in wikipedia anyone can writte anything" is false. You have a very estrict and dedicated community validating every time someone writtes smh. And with new topics like ML is even more. So stfu