THE best explanation I have found for this concept. Very well explained with examples which is exactly what I was looking for. Everyone else was explaining what curse of dimensionality is and why it happens only theoretically. Thank you.
@karimelmokhtari95594 жыл бұрын
The curse of dimensionality is often overlooked and is behind unexpected predictions. Thank you for explaining this issue through a simple approach.
@ritvikmath4 жыл бұрын
no problem!
@RajaSekharGowda4 жыл бұрын
I just seen your playlist... It was wonderful for Data science enthusiasts like me... Finally I found best channel 👍
@ritvikmath4 жыл бұрын
Wow, thanks!
@treelight17074 жыл бұрын
First time I get that explanation for the topic. Always assumed it was just about the computation time, that it can increase exponentially. Not that algos fail, and a LOT more samples are needed.
@rakkaalhazimi36723 жыл бұрын
You precisely explain it in simple and subtle ways. Now I get an Idea on what to write on my blog :D thanks.
@derBolide3 жыл бұрын
Thank you for this well structured video! I have two questions though: 1. Why does the data form these peaks in the histogram, so more precisely, why are there exactly so many points with in example a distance of 10 two each other and at a distance of 40? 2. How come that if we have so clearly seperated groups of points, with i.e. distance of 10 and distance of 40, that we can't seperate them with nearest neighbor? Isn't it easy to say that if the point has a lot of points around him at distance 10, that it belongs to class A?
@gmermoud2 жыл бұрын
You are absolutely right. This video is actually quite wrong. If you differentiate the histograms of intra- and inter-cluster distance, you see this clearly. If anything, the curse of dimensionality is here a blessing, as it makes the inter- and intra-cluster distances more sharply differentiated. What is true, however, is that, for a given cluster, the intra-cluster distances become all essentially the same as dimensionality increases. Check out Zimek, A.; Schubert, E.; Kriegel, H.-P. (2012). "A survey on unsupervised outlier detection in high-dimensional numerical data". Statistical Analysis and Data Mining. 5 (5): 363-387 for some discussions about when the curse of dimensionality is really a problem (and when it is actually a blessing). The wikipedia article is also a good reference.
@younique9710 Жыл бұрын
@@gmermoudhat means, even though we use a high dimension, if we observe different distances between different pairs, we would be good to go ahead for an analysis because the different distances imply not equal distance between every data point in the high dimension?
@JT-js3uf Жыл бұрын
@gmermoud: Interesting point. Doesn't the notion of similarity breakdown, especially if one is relying on distance measures such as Euclidean distance, as instances from all classes become so far spread out such that instances from different classes become very close together (unless the signal is located on lower dimension manifolds)? If not, why do kNN and self-organising maps scale so poorly to very high dimensional data?
@modakad3 ай бұрын
Amazing explanation. the #varibes vs #samples needed graph is an eye opener. I have a question on the initial part : In the first part of the Euclidian pair-wise distance vs #dimensions, even though the max-min distance is shrinking, the ranking of distances will (or might) still hold true, irrespective of #dimensions. If that's the case, the algorithms should not loose any discriminative power in theory. In practice, yes, the strain this might bring on compute requirements can make it impractical and hence the needs to reduce dimensions. Does this make sense ? Would love to know everyone's thoughts
@exmanitor Жыл бұрын
Very good explanation, exactly what I was looking for.
@samwhite42844 жыл бұрын
These videos are super helpful for intuition, cheers
@ritvikmath4 жыл бұрын
Glad you like them!
@diegososa52804 жыл бұрын
Brilliant once again, cheers
@ritvikmath4 жыл бұрын
thanks :)
@honeyBadger5824 жыл бұрын
Great video as always! instructive and very easy to understand
@ritvikmath4 жыл бұрын
Glad to hear it!
@fadouamassaoudy78325 ай бұрын
can you give us the link of the notebook please
@grandthruadversity4 жыл бұрын
Hey can you make a topic on regularization like the lasso etc
@san_lowkey5 күн бұрын
thank you for this clear explanation
@SeyiTopeOgunji Жыл бұрын
Is it only applicable or more pronounced with KNN classifier, or applicable with other classifiers?
@christiansetzkorn62413 жыл бұрын
great intuitive presentation
@sanjaykrish87193 жыл бұрын
Whooo!! phenomenal explanation.. ❤
@ritvikmath3 жыл бұрын
Glad you liked it!
@kyleciantar3074 жыл бұрын
Can you clarify for me what the y-axis in the histograms represent? My understanding is that the x-axis is the distance between two measured points, and the height of the graph represents the number of times that distance occurs. My question is then why does the y-axis use decimal values if we are counting the number of times the same measurement is made?
@srividhyasainath92973 жыл бұрын
Hey, I have the same question. Were you able to crack it? Let me know if you did
@lukestorer43993 жыл бұрын
@@srividhyasainath9297 its the probablity density function. i.e. how often that value occurs
@saggarwal018 ай бұрын
Mind blown! Thank you Data god
@redherring00773 жыл бұрын
💜💜💜. Excellent excellent excellent. I am sharing these videos with my colleagues left and right 😛😋. Can you please do a video on multivariate adaptive regression splines and also a series on image processing? Love your videos.
@elvykamunyokomanunebo14412 жыл бұрын
what happens if you standardize the features before hand in the case of KNN?
@Phil-oy2mr4 жыл бұрын
Can you further explain why there are 2 (or 3) peaks on the histogram? I was thinking there would be n peaks, where n are the amount of dimensions, assuming there are prevalent clusters.
@kdhlkjhdlk4 жыл бұрын
There should be one peak, but he chose a silly example where there are two clusters that are perfectly separated in every dimension.
@jairjuliocc4 жыл бұрын
Thank you. Very intuitive
@ritvikmath4 жыл бұрын
You're very welcome!
@peterw878010 ай бұрын
Excellent explanation
@ritvikmath10 ай бұрын
Glad it was helpful!
@keshavsharma2674 жыл бұрын
Clearly explained. Thanks
@ritvikmath4 жыл бұрын
You are welcome!
@pushkarparanjpe3 жыл бұрын
Great explanation. Thanks! The dataset used to demonstrate the curse was randomly generated. Does this curse hold true even for "regular" datasets - say something like term count matrix or tfidf matrix computed from actual real world documents ? Often the vocabulary sizes of common NLP problems runs into many thousands yet tfidf is applied and seems to work decently - is it immune to this curse ?
@salarbasiri59594 жыл бұрын
Great explanation thanks
@chyldstudios4 жыл бұрын
Very nice!
@ritvikmath4 жыл бұрын
Thanks!
@paulfaulker42013 жыл бұрын
Hi, could you share the notebook in this video? I have tried finding it in your Github, but couldn't find it. Thanks.
@gorgolyt Жыл бұрын
I don't think the first example is great. Why are there two strong peaks? You didn't explain this. Also, why is this a problem? For a k-NN classifier with 2 clusters, isn't this exactly what you want? You want there to be a clear distinction between points in the same cluster that are close (the first peak) and points in different clusters (the second peak). Also it doesn't make sense that you always seem to have a significant residual of points with zero distance.
@senthilkumaran68123 ай бұрын
the plot shows the distance between each point with other so ,according to the plot every point have so much distance from other like 5, or 7 so the we will neglect the part that no one is nearest neighbour to the point
@EW-mb1ih3 жыл бұрын
Maybe you could have added a slide with the "takeaway". Anyway, nice video