Talk:K-d tree: Difference between revisions

Content added Content deleted

Inline

Revision as of 06:10, 7 March 2012

New task

A couple people have encouraged me at times to contribute something from my work. While I don't actually maintain any k-d tree code, I do know k-d trees are used in various ways in astronomy, and it seems they have become well accepted data structures. I found the WP nearest neighbor description a bit to cursory to code from directly, but the Moore psedocode relatively easy to implement. While Moore acknowledges some inefficiencies in his presented code, I thought the simplicity of it made it a good starting point for someone coding a k-d tree for the first time in a new language. I first tried a data set of 1e6 points but found the tree construction took a couple of seconds. That sure showed the motivation for the n log n algorithms! Rather than lead the task in that direction though, I though I'd initially show the simpler, if slower algorithm and just scale back the data set. The more interesting part, after all, is the nearest neighbor search, which is log n and returns the answer in a flash. —Sonia 19:38, 6 March 2012 (UTC)

I was reading through the linked WP page and came across this paragraph:

k-d trees are not suitable for efficiently finding the nearest neighbour in high dimensional spaces. As a general rule, if the dimensionality is k, the number of points in the data, $N$ , should be $N\gg 2^{k}$ . Otherwise, when k-d trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search, and approximate nearest-neighbour methods are used instead.

It would be nice to have some sort of note along those lines here too as it is a major criterion for algorithm/data-structure selection. (Astronomy's mostly low-dimensioned, 2D or occasionally 3D, so k-d trees make plenty of sense for them. Alas, the work I've done in an astronomy-allied field recently was all very high dimensioned with some dimensions not being standard-numeric, so we couldn't make good use of this sort of thing and anyway didn't need it as “nearest neighbour” wasn't a problem we had to solve. Instead, we use lots of relational databases. But I'm rambling…) –Donal Fellows 06:10, 7 March 2012 (UTC)

@@ Line 1: / Line 1: @@
 ==New task==
 A couple people have encouraged me at times to contribute something from my work.  While I don't actually maintain any k-d tree code, I do know k-d trees are used in various ways in astronomy, and it seems they have become well accepted data structures.  I found the WP nearest neighbor description a bit to cursory to code from directly, but the Moore psedocode relatively easy to implement.  While Moore acknowledges some inefficiencies in his presented code, I thought the simplicity of it made it a good starting point for someone coding a k-d tree for the first time in a new language.  I first tried a data set of 1e6 points but found the tree construction took a couple of seconds.  That sure showed the motivation for the n log n algorithms!  Rather than lead the task in that direction though, I though I'd initially show the simpler, if slower algorithm and just scale back the data set.  The more interesting part, after all, is the nearest neighbor search, which is log n and returns the answer in a flash. &mdash;[[User:Sonia|Sonia]] 19:38, 6 March 2012 (UTC)
+I was reading through the linked WP page and came across this paragraph:
+<blockquote>k-d trees are not suitable for efficiently finding the nearest neighbour in high dimensional spaces. As a general rule, if the dimensionality is k, the number of points in the data, <math>N</math>, should be <math>N \gg 2^k</math>. Otherwise, when k-d trees are used with high-dimensional data, most of the points in the tree will be evaluated and the efficiency is no better than exhaustive search, and approximate nearest-neighbour methods are used instead.</blockquote>
+It would be nice to have some sort of note along those lines here too as it is a major criterion for algorithm/data-structure selection. (Astronomy's mostly low-dimensioned, 2D or occasionally 3D, so k-d trees make plenty of sense for them. Alas, the work I've done in an astronomy-allied field recently was all very high dimensioned with some dimensions not being standard-numeric, so we couldn't make good use of this sort of thing and anyway didn't need it as “nearest neighbour” wasn't a problem we had to solve. Instead, we use ''lots'' of relational databases. But I'm rambling…) –[[User:Dkf|Donal Fellows]] 06:10, 7 March 2012 (UTC)