High Quality, High Performance Clustering with HDBSCAN | SciPy 2016

High Quality, High Performance Clustering with HDBSCAN | SciPy 2016 | Leland McInnes

Рет қаралды 21,297

Enthought

Күн бұрын

Пікірлер: 20

@DouglasDuhaime 5 жыл бұрын

Leland is truly a gentleman and a scholar

@kevon217 7 ай бұрын

Been on a Leland yt binge as of late, saw this comment, and truly agree.

@lelandmcinnes9501 8 жыл бұрын

Thanks to the great people at conda-forge hdbscan is now available as conda packages (which is by far the easiest way to install it). conda install -c conda-forge hdbscan

@zwitter689 8 жыл бұрын

Thanks, very nicely done. I installed hdbscan and am trying to mimic the examples you give but I can't find the data for the example on "Getting More Information About a Clustering". I like to follow the examples exactly so a copy of the actual data set you used would be great, can you help me with this?

@lelandmcinnes9501 8 жыл бұрын

It's in the github repository with the notebooks: github.com/scikit-learn-contrib/hdbscan/blob/master/notebooks/clusterable_data.npy

@zwitter689 8 жыл бұрын

Thank you and especially for the quick response.

@chengchu88 7 жыл бұрын

Dr McInnes, thanks for the great video. I am using the HDBSCAN on a large dataset, and I know how to set 'memory' parameter to cache the hard computation. My question is, after I cache the computation during fitting, how do I change the min_cluster_size and min_sample_size and re-label the same data without going through the time-consuming fitting again? Could you provide a few sample python lines? thank you, Cheng

@elivazquez7582 6 жыл бұрын

Great video! Great presentation - thanks for doing this!

@rajeshbalakrishnan2228 4 жыл бұрын

Wowwww!! One of best clustering discussion

@shyamsbox 6 жыл бұрын

Very nice! We will try HDBSCAN.

@wexwexexort 3 жыл бұрын

great talk!

@enthought 8 жыл бұрын

More info on HDBSCAN here: github.com/lmcinnes/hdbscan. See the complete SciPy 2016 Conference talk & tutorial playlist here: kzbin.info/aero/PLYx7XA2nY5Gf37zYZMw6OqGFRPjB1jCy6

@Marin-ct5my 3 жыл бұрын

HDBScan seems to be capable of producing clusters which share overlapping nodes, given that clustering for me is to identify shared points between clusters, what would I have to do to the algorithm to get those? I was surprised when nobody had a question about this and there was nothing said about it despite it being a possible feature of the algorithm.

@grygoriyzolotarov3228 6 жыл бұрын

What is the font you use in your presentations (very appealing)?

@karthik-ex4dm 6 жыл бұрын

Great video...Since clustering cannot do better in high dimension space, the pair wise distance matrix should be fine if we are working in high dim spaces..right? but even computation of pairwise distance will also be computational expensive for very high dimension space right?. So the best choice must be finding best features using something like forward feature selection and then perform hdbscan. right?

@rednax3788 8 жыл бұрын

HDBSCAN IS KING

@jennifermew8386 7 жыл бұрын

how do you identify noise in HDBSCAN ? how do the algorithm tell the difference between outliers and noise?

@ashishkannad3021 6 жыл бұрын

the ones which are not clustered in any cluster are our noises!

@andrewdennis6976 6 жыл бұрын

I am running your example code to just play around and keep getting an error. TypeError: descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str' unfortunately there is not much documentation on this so its hard to find fixes. Any help?