Been on a Leland yt binge as of late, saw this comment, and truly agree.
@lelandmcinnes95018 жыл бұрын
Thanks to the great people at conda-forge hdbscan is now available as conda packages (which is by far the easiest way to install it). conda install -c conda-forge hdbscan
@zwitter6898 жыл бұрын
Thanks, very nicely done. I installed hdbscan and am trying to mimic the examples you give but I can't find the data for the example on "Getting More Information About a Clustering". I like to follow the examples exactly so a copy of the actual data set you used would be great, can you help me with this?
@lelandmcinnes95018 жыл бұрын
It's in the github repository with the notebooks: github.com/scikit-learn-contrib/hdbscan/blob/master/notebooks/clusterable_data.npy
@zwitter6898 жыл бұрын
Thank you and especially for the quick response.
@chengchu887 жыл бұрын
Dr McInnes, thanks for the great video. I am using the HDBSCAN on a large dataset, and I know how to set 'memory' parameter to cache the hard computation. My question is, after I cache the computation during fitting, how do I change the min_cluster_size and min_sample_size and re-label the same data without going through the time-consuming fitting again? Could you provide a few sample python lines? thank you, Cheng
@enthought8 жыл бұрын
More info on HDBSCAN here: github.com/lmcinnes/hdbscan. See the complete SciPy 2016 Conference talk & tutorial playlist here: kzbin.info/aero/PLYx7XA2nY5Gf37zYZMw6OqGFRPjB1jCy6
@elivazquez75826 жыл бұрын
Great video! Great presentation - thanks for doing this!
@rajeshbalakrishnan22284 жыл бұрын
Wowwww!! One of best clustering discussion
@grygoriyzolotarov32286 жыл бұрын
What is the font you use in your presentations (very appealing)?
@Marin-ct5my3 жыл бұрын
HDBScan seems to be capable of producing clusters which share overlapping nodes, given that clustering for me is to identify shared points between clusters, what would I have to do to the algorithm to get those? I was surprised when nobody had a question about this and there was nothing said about it despite it being a possible feature of the algorithm.
@karthik-ex4dm6 жыл бұрын
Great video...Since clustering cannot do better in high dimension space, the pair wise distance matrix should be fine if we are working in high dim spaces..right? but even computation of pairwise distance will also be computational expensive for very high dimension space right?. So the best choice must be finding best features using something like forward feature selection and then perform hdbscan. right?
@jennifermew83867 жыл бұрын
how do you identify noise in HDBSCAN ? how do the algorithm tell the difference between outliers and noise?
@ashishkannad30216 жыл бұрын
the ones which are not clustered in any cluster are our noises!
@wexwexexort3 жыл бұрын
great talk!
@andrewdennis69766 жыл бұрын
I am running your example code to just play around and keep getting an error. TypeError: descriptor 'get_metric' requires a 'hdbscan.dist_metrics.DistanceMetric' object but received a 'str' unfortunately there is not much documentation on this so its hard to find fixes. Any help?
@shyamsbox6 жыл бұрын
Very nice! We will try HDBSCAN.
@rednax37888 жыл бұрын
HDBSCAN IS KING
@KeshavDial5 жыл бұрын
For anyone who was looking for Christian Hennig's PyData talk kzbin.info/www/bejne/g5eZfqR_iJekopY