HDBSCAN, Fast Density Based Clustering, the How and the Why - John Healy

  Рет қаралды 61,798

PyData

PyData

Күн бұрын

Пікірлер: 40
@benhurrodriguez1807
@benhurrodriguez1807 8 ай бұрын
Presentation Skills: 100000/10
@RajatSaxena35
@RajatSaxena35 2 жыл бұрын
Presentation Skills: 10/10
@stanleykurniawan3053
@stanleykurniawan3053 2 ай бұрын
+ 1000 aura
@reocam8918
@reocam8918 2 жыл бұрын
Nice presentation, I see 200% confidence and eloquence
@alexanderdevaux661
@alexanderdevaux661 3 жыл бұрын
this is exactly what I have been looking for! great presentation.
@-beee-
@-beee- Жыл бұрын
Wow, what a great talk! Love the intuitive explanations and visuals. Super helpful. Thank you!
@21rufus21
@21rufus21 Жыл бұрын
Absolutely fantastic presentation, thank you
@MrRaisin56
@MrRaisin56 2 жыл бұрын
Wow I love the enthusiasm! It really makes it so much nicer to watch. Very insightful as well thank you very much!
@vunder8737
@vunder8737 4 ай бұрын
This truly was a wonderful presenter, would love to listen to him on other presentations
@hannahnelson4569
@hannahnelson4569 8 ай бұрын
A very impressive presentation and algorithm! Thank you for teaching all this!
@alaaelhadba7310
@alaaelhadba7310 Жыл бұрын
Thank you so much. It was exactly what I was looking for 🎉🎉
@valeryzuev3957
@valeryzuev3957 3 жыл бұрын
15:30 there might be a misprint in the formula: d(X_i, X_j), not d(X_j, X_j)
@jiayangcheng
@jiayangcheng 4 ай бұрын
Love the presentation. Great work!
@opelfrost
@opelfrost 2 ай бұрын
thanks a lot, learn a lot from this presentation
@pankajgoikar4158
@pankajgoikar4158 2 жыл бұрын
Awesome presentation.
@honey-py9pj
@honey-py9pj 2 жыл бұрын
what an amazing speaker!
@maximillianweil2672
@maximillianweil2672 Жыл бұрын
Thank you for the super interesting talk! I was wondering if you have worked with the new HDBSCAN integrated in sklearn 1.3.0? Is it possible to draw the cluster tree with this implementation?
@RoulDukeGonzo
@RoulDukeGonzo 7 ай бұрын
Any luck?
@vampierkill
@vampierkill 2 жыл бұрын
Sorry has to comment because of the kiiiiiiick ass animation! Brilliant.
@danaizenberg2402
@danaizenberg2402 Жыл бұрын
great talk
@TrixieFromSanFran
@TrixieFromSanFran 2 жыл бұрын
The coloring of the tree at 14:00 is needlessly confusing. See figure 3a in their paper McInnes & Healy 2017 to clarify things
@sushilkhadka-iu3gf
@sushilkhadka-iu3gf Жыл бұрын
that was a great talk!
@edwardmalthouse973
@edwardmalthouse973 Ай бұрын
Thank you for your presentation. It was very helpful. I'm not sure about the claim that k-means requires small amounts of data. I believe K-means is O(n) (assuming a small number of dimensions and iterations) and I have used on very large data sets without problems. I would also like to respectfully push back on the spherical cow comment. While it certainly depends on the domain, in social science and business applications with large, noisy data sets, the spherical, or at least elliptical, assumption often works very well, and produces better assumptions than the more nonparametric algorithms. It's easy to construct mathematical examples with odd-shaped clusters, but I've not encountered them in practice, although it could just be due to the domains I work in.
@daisyondwari9795
@daisyondwari9795 Ай бұрын
👀
@nihshrey
@nihshrey Жыл бұрын
Amazing
@ahmedayman2380
@ahmedayman2380 Жыл бұрын
can someone tell me about his linkedin or his full name please or how to connect to him
@RoulDukeGonzo
@RoulDukeGonzo 7 ай бұрын
0:24 name and email
@RoulDukeGonzo
@RoulDukeGonzo 7 ай бұрын
Any idea why the GPU version of this method can't take a pre-computed distance matrix?
@scatteredvideos1
@scatteredvideos1 5 ай бұрын
There is a RAPIDS version of HDBScan. I'm personally struggling to get dependencies working together but it does exist
@RoulDukeGonzo
@RoulDukeGonzo 5 ай бұрын
@@scatteredvideos1 I think that's what I used... Anyway, I'll give it another go.
@scatteredvideos1
@scatteredvideos1 5 ай бұрын
To be honest the speed up really isn't even that great, it's only partially parallelized with GPUs. It's better just to reduce the dimensionality of your data, PCA to 95% of explained variance, and then UMAP to 10 or so dims, then cluster using HDBSCAN. I've found doing a grid search over a bunch of different HDBscan parameters can be helpful if you aren't getting perfect clustering.
@scatteredvideos1
@scatteredvideos1 5 ай бұрын
With 10 UMAP dims and 184k data points my cluster is done in about 7 s on a Google colab high ram CPU instance
@RoulDukeGonzo
@RoulDukeGonzo 5 ай бұрын
@@scatteredvideos1 I haven't tried GPU accelerated HDBSCAN, but for other clustering algorithms, the difference between CPU and GPU is night and day (so I was expecting it to be so here). I'm clustering embedding data from LLMs so it's extremely dense and uncorrelated, so PCA hasn't been much use (at least in my hands).
@pahulhallan
@pahulhallan 2 жыл бұрын
27:50 Installation
@0MVR_0
@0MVR_0 7 ай бұрын
clustering is highly driven by the formatting of how the data relates to itself and is near impossible to accomplish using a single method of approach.
@RoulDukeGonzo
@RoulDukeGonzo 7 ай бұрын
Agree, but in practical terms, where do you start?
@0MVR_0
@0MVR_0 7 ай бұрын
@@RoulDukeGonzo An intimate descriptive knowledge of the data is recommended.
@laughingsaeed
@laughingsaeed 3 ай бұрын
I don't why he's talking so fast! Is someone after him and he needs to run away?!
Непосредственно Каха: сумка
0:53
К-Media
Рет қаралды 12 МЛН
번쩍번쩍 거리는 입
0:32
승비니 Seungbini
Рет қаралды 182 МЛН
OCCUPIED #shortssprintbrasil
0:37
Natan por Aí
Рет қаралды 131 МЛН
DBSCAN Clustering: Stop #4 on Your DIY Data Science Roadmap
33:25
David Langer
Рет қаралды 1,1 М.
Brian Kent: Density Based Clustering in Python
39:24
PyData
Рет қаралды 34 М.
Clustering with DBSCAN, Clearly Explained!!!
9:30
StatQuest with Josh Starmer
Рет қаралды 357 М.
A Bluffer's Guide to Dimension Reduction - Leland McInnes
36:33
UMAP Dimension Reduction, Main Ideas!!!
18:52
StatQuest with Josh Starmer
Рет қаралды 119 М.
DBSCAN Clustering Coding Tutorial in Python & Scikit-Learn
40:31
В Европе заставят Apple сделать в айфонах USB Type-C
0:18
Короче, новости
Рет қаралды 1,1 МЛН
Абзал неге келді? 4.10.22
3:53
QosLike fan club
Рет қаралды 31 М.
🪄Вечная спичка #diy #выживание #поход
1:00
Короче, ВИ
Рет қаралды 2,8 МЛН
ТЕЛЕФОН МЕНЯЕТ ЦВЕТ😅 #upx
0:34
RanF
Рет қаралды 639 М.
Заставила парня продать зажигалки
0:52
Жизнь Барахольщика
Рет қаралды 3,3 МЛН
Down Spout Catch Basin Installation to French Drain
0:58
Komar Project
Рет қаралды 6 МЛН