For Ball tree, you also would also need to scan through every point in region 7 as well as region 7 is a leaf node. For example, this would matter if a point (7,6) existed.
@learndataa Жыл бұрын
Thank you for your comment. I appreciate your engagement and your point about the Ball tree algorithm. Yes, if additional points existed they would have to be considered during the search process.
@sand9282 Жыл бұрын
Please write the conditions while you split the trees. Additionally, I would suggest using letters like A, B, C, or C1, C2, C3 to designate circle names. Writing numbers while designating circles can confuse first-time learners about their actual meaning. One more thing, I would like to suggest elaborating a bit more on finding the centroid. While we know that the median is the middle value in a set of data points, it would be helpful to explain how to find the centroid since we will be applying all these concepts to a dataset. Overall, the video is short and nice.
@learndataa Жыл бұрын
Thank you for your feedback. Hoping below helps. Ball tree circle condition: - Draw a circle with centroid as its centre with a radius that is the farthest point from that centroid in that half. # How to calculate centroid of points: (1, 2), (3, 4), (5, 6), (7, 8)? x_cord = (1+3+5+7)/4 = 4.0 y_cord = (2+4+6+8)/4 = 5.0 Thus, centroid would be: (4.0, 5.0) Try in Python: ############### # Calculate centroid ############### def calculate_centroid(points): x_sum = 0 y_sum = 0 num_points = len(points) for point in points: x_sum += point[0] y_sum += point[1] centroid_x = x_sum / num_points centroid_y = y_sum / num_points return centroid_x, centroid_y # Example points = [(1, 2), (3, 4), (5, 6), (7, 8)] centroid = calculate_centroid(points) print("Centroid:", centroid) -- -- -- Output of the code would be: -- -- -- Centroid: (4.0, 5.0)
@Astro_Rohan2 жыл бұрын
Thank you for the awesome video. I was able to use this to actually speed up distance between elements from each other in a very large dataset (6418). Would you know why computing all distances between all the data points is faster than brute force since it's still finding distances between each pair just like brute force? Your channel is very underrated. Should have so many more subscribers for such quality content!
@BalaMurugan-cm6ev3 жыл бұрын
Clearly Explained !!!
@RanjitSingh-rq1qx Жыл бұрын
Sir which one is best. And you should explain this thing. How we can choose one of them based on the scenario?
@learndataa Жыл бұрын
Thank you. According to the scikit-learn docs: Note: N = number of samples; D = number of features (i.e. dimensionality) > Brute Force: - fast computation - scales as O[DN2] - for small datasets (N < 30 or so) > K-D Tree - address computational inefficiencies of brute-force - these structures attempt to reduce the required number of distance calculations by efficiently encoding aggregate distance information for the sample - scales as O[DN log(N)] - very fast for low-dimensional (D < 20) neighbors searches - it becomes inefficient as grows very large > Ball Tree - To address the inefficiencies of KD Trees in higher dimensions - scales as O[D log(N)] Link: scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms
@EvgenyMozolev2 жыл бұрын
Great explanation!
@RanjitSingh-rq1qx Жыл бұрын
Very well explained. ❤️
@learndataa Жыл бұрын
Thank you. Glad it was helpful.
@BalaMurugan-cm6ev3 жыл бұрын
Sir How KD or Ball tree using in DBscan clustering
@cristhiancastro79842 жыл бұрын
Hi, very good video! I have a doubt! In Ball Tree, when calculating the centroid and then the farthest 2 points? How we came up with this farthest point? Is it sort of bruce force for this?
@lukalicina82302 жыл бұрын
@@learndataa Yeah, I mean you'd have to compute the distances at every iteration for that set and then the set becomes divided? But what after the iterations after that, i think for the first run through while you assemble the tree it will take the same time as a for-loop? But after you have the tree inserting and comparing every new input will only be as simple as finding the nearest set point and computing the few distances inside the set/node/leaf. You're technicly on the test-data not on the 'training data' better called the knowledge base or corpuse. I may be wrong tho, let me know if this makes any sense or if someone really knows the correct answer? Seems really interesting!
@BalaMurugan-cm6ev3 жыл бұрын
Sir how to find the centroid after we split the data using median