Data Analysis 7: Clustering

Data Analysis 7: Clustering - Computerphile

Рет қаралды 74,458

5 жыл бұрын

Grouping similar things together - either users with similar habits, or products in an online shop. Dr Mike Pound on Clustering. This is part 7 of the Data Analysis Learning Playlist: • Data Analysis with Dr ...
This Learning Playlist was designed by Dr Mercedes Torres-Torres & Dr Michael Pound of the University of Nottingham Computer Science Department. Find out more about Computer Science at Nottingham here: bit.ly/2IqwtNg
This series was made possible by sponsorship from by Google.
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 43

@Computerphile 5 жыл бұрын

Check out the full Data Analysis Learning Playlist: kzbin.info/aero/PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba

@heyandy889 4 жыл бұрын

13:58 "It's worked pretty well; it's not perfect." I feel like that should be the slogan for this course and for data science in general.

@AubreyBarnard 5 жыл бұрын

To hopefully make it clearer to everyone, the iris labels were hidden from the clustering algorithms and then only used after the fact to see how well the clusters recovered the true labels. So the task was still unsupervised because the supervision was hidden. This is a standard technique for evaluating unsupervised machine learning algorithms.

@williamchamberlain2263 4 жыл бұрын

Thanks - was wondering

@Wolves2314 5 жыл бұрын

I always click on a new Mike Pound Computerphile video at first sight.

@LittleLightCZ Жыл бұрын

I love videos like these. I believe that one day people won't say "I learned that on college", but "I went on KZbin and it was all there" instead.

@PaulMuston 4 жыл бұрын

How meta. This WAS the video that was recommended for me to watch.

@sajadmalik9097 3 жыл бұрын

I love this guy! He is amazing at these things.... Thanks friend this is a lifetime charity. Everyone is going to learn from this at free of cost.

@lumine2205 5 жыл бұрын

You're gonna be so happy after google recommends you the Saw movie! A whole movie dedicated to a SAW!

@kieranklaassen 5 жыл бұрын

best series ever! :) thanks so much for this

@ec92009y 2 жыл бұрын

Wood turning videos are fascinating. Can’t wait to see what the KZbin algorithm recommends next for me.

@JohnsonLobster 5 жыл бұрын

I watch Computerphile videos the same way Mike watches woodturning videos.

@jacques5301 5 жыл бұрын

Just submitted my last paper for my masters degree that HEAVILY relies on clustering algorithms. I really wish this video was released 2 years ago.

@jacques5301 5 жыл бұрын

Also please consider making a video on big o complexity. I think it would go really good with this topic.

@HighlyShifty 3 жыл бұрын

Thanks so much for this, a great look at clustering

@rnarith855 2 жыл бұрын

Very clear explanation

@williamchamberlain2263 4 жыл бұрын

Depending on which language(s) you're using, DBSCAN libraries can be worth a look

@hans-edwardhoene8333 3 жыл бұрын

If you use the PAM clustering algorithm with an outlier, as in your example, is it possible that PAM would assign the outlier to its own group? In other words, is it possible that the lowest error would be achieved by assigning the outlier to one group and everything else to another group?

@Ma8t 4 жыл бұрын

Hi, thanks for these great videos. At 13:35, I'm missing maximise_diag function, in which package is it?

@sebastiangilbert9105 4 жыл бұрын

I am having the same problem, did you get an answer?

@jasper939393 4 жыл бұрын

I can't find any function for this too. ill come back if i find it.

@user-hq8tl4oc9o 4 жыл бұрын

Hello. I'm having same issue, did you get an answer?

@xfactor7923 2 жыл бұрын

Same question...No answers yet

@vsandu Жыл бұрын

Brilliant, cheers!

@Andrewsarcus 5 жыл бұрын

I purchased a wood turning lathe

@adamcetinkent 5 жыл бұрын

Is there a degree to which the dimensions are weighted when you cluster? Or would you apply the weighting to your data before clustering them?

@Jupiter__001_ 5 жыл бұрын

I assume you would do this with the dimensions that come out of PCA, which implies that they have already been weighted.

@MartinMaat 2 жыл бұрын

If you ever wondered about the expression "high brow", here's your example.

@Peter-fy3zj Жыл бұрын

Could I use kmeans for hyperspectral images ?

@guitarislife01 3 жыл бұрын

This video was recommended to me after watching an MIT lecture on clustering lol

@rabidbigdog 5 жыл бұрын

The joy of wood? Dr Mike could become the Nick Offerman of the UK.

@alecksandrborovkov7602 2 жыл бұрын

:) thanks

@SM-vo5gj Жыл бұрын

We all end up watching the wood turning videos, lol

@ArunKumar-yb2jn 2 жыл бұрын

If KZbin can't subtitle properly a British accent, I have no hope in Artificial Intelligence.

@darceysinclair8929 2 жыл бұрын

PSA: KZbin uses a mix between association analysis and clustering analysis

@ElPasoJoe1 4 жыл бұрын

K nearest neighbors...

@ramixnudles7958 5 жыл бұрын

Only at 11 min in, but erm confused. You're clustering, but you don't have any idea of what you're clustering on? You aren't clustering on a dimension? I.e., music genre What gives you a red/blue division in the first place - "it just looks like here's a cluster... And that, there, is a cluster. Now, let's make 'em fit..."? I'm not understanding.

@AubreyBarnard 5 жыл бұрын

Clustering is just defining groups based on similarity. That similarity could be based on one or on any number of attributes. It is unsupervised which means there are no labels in the data. Now, the algorithms label each point with the same label as the closest cluster center, thereby labeling each cluster. Then the cluster centers are adjusted to better reflect the cluster members, and then the labels are updated based on the new cluster centers (which can assign points to different clusters compared to the previous round). Repeat until convergence, which is when no points are assigned to a different cluster and when the cluster centers stay the same. This is when the "best fit" is achieved.