#71: Scikit-learn 68:Supervised Learning 46: Intuition- brute force, KD & Ball tree

Рет қаралды 5,101

Күн бұрын

Пікірлер: 18

@TommygunG Жыл бұрын

For Ball tree, you also would also need to scan through every point in region 7 as well as region 7 is a leaf node. For example, this would matter if a point (7,6) existed.

@learndataa Жыл бұрын

Thank you for your comment. I appreciate your engagement and your point about the Ball tree algorithm. Yes, if additional points existed they would have to be considered during the search process.

@sand9282 Жыл бұрын

Please write the conditions while you split the trees. Additionally, I would suggest using letters like A, B, C, or C1, C2, C3 to designate circle names. Writing numbers while designating circles can confuse first-time learners about their actual meaning. One more thing, I would like to suggest elaborating a bit more on finding the centroid. While we know that the median is the middle value in a set of data points, it would be helpful to explain how to find the centroid since we will be applying all these concepts to a dataset. Overall, the video is short and nice.

@learndataa Жыл бұрын

Thank you for your feedback. Hoping below helps. Ball tree circle condition: - Draw a circle with centroid as its centre with a radius that is the farthest point from that centroid in that half. # How to calculate centroid of points: (1, 2), (3, 4), (5, 6), (7, 8)? x_cord = (1+3+5+7)/4 = 4.0 y_cord = (2+4+6+8)/4 = 5.0 Thus, centroid would be: (4.0, 5.0) Try in Python: ############### # Calculate centroid ############### def calculate_centroid(points): x_sum = 0 y_sum = 0 num_points = len(points) for point in points: x_sum += point[0] y_sum += point[1] centroid_x = x_sum / num_points centroid_y = y_sum / num_points return centroid_x, centroid_y # Example points = [(1, 2), (3, 4), (5, 6), (7, 8)] centroid = calculate_centroid(points) print("Centroid:", centroid) -- -- -- Output of the code would be: -- -- -- Centroid: (4.0, 5.0)

@Astro_Rohan 2 жыл бұрын

Thank you for the awesome video. I was able to use this to actually speed up distance between elements from each other in a very large dataset (6418). Would you know why computing all distances between all the data points is faster than brute force since it's still finding distances between each pair just like brute force? Your channel is very underrated. Should have so many more subscribers for such quality content!

@BalaMurugan-cm6ev 3 жыл бұрын

Clearly Explained !!!

@RanjitSingh-rq1qx Жыл бұрын

Sir which one is best. And you should explain this thing. How we can choose one of them based on the scenario?

@learndataa Жыл бұрын

Thank you. According to the scikit-learn docs: Note: N = number of samples; D = number of features (i.e. dimensionality) > Brute Force: - fast computation - scales as O[DN2] - for small datasets (N < 30 or so) > K-D Tree - address computational inefficiencies of brute-force - these structures attempt to reduce the required number of distance calculations by efficiently encoding aggregate distance information for the sample - scales as O[DN log(N)] - very fast for low-dimensional (D < 20) neighbors searches - it becomes inefficient as grows very large > Ball Tree - To address the inefficiencies of KD Trees in higher dimensions - scales as O[D log(N)] Link: scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms

@EvgenyMozolev 2 жыл бұрын

Great explanation!

@RanjitSingh-rq1qx Жыл бұрын

Very well explained. ❤️

@learndataa Жыл бұрын

Thank you. Glad it was helpful.

@BalaMurugan-cm6ev 3 жыл бұрын

Sir How KD or Ball tree using in DBscan clustering

@cristhiancastro7984 2 жыл бұрын

Hi, very good video! I have a doubt! In Ball Tree, when calculating the centroid and then the farthest 2 points? How we came up with this farthest point? Is it sort of bruce force for this?

@lukalicina8230 2 жыл бұрын

@@learndataa Yeah, I mean you'd have to compute the distances at every iteration for that set and then the set becomes divided? But what after the iterations after that, i think for the first run through while you assemble the tree it will take the same time as a for-loop? But after you have the tree inserting and comparing every new input will only be as simple as finding the nearest set point and computing the few distances inside the set/node/leaf. You're technicly on the test-data not on the 'training data' better called the knowledge base or corpuse. I may be wrong tho, let me know if this makes any sense or if someone really knows the correct answer? Seems really interesting!

@BalaMurugan-cm6ev 3 жыл бұрын

Sir how to find the centroid after we split the data using median

@BalaMurugan-cm6ev 3 жыл бұрын

@@learndataa ((x1+x2+x3+...xn)/n,(y1+y2+y3+...+yn)/n)