K-d tree: Difference between revisions

Content added Content deleted
(Undo revision 183439 by Ideasman42documentation prefers the string form.)
m (J)
Line 822:
nodes visited: 25
</pre>
 
=={{header|J}}==
 
As a general rule, tree algorithms are a bad idea in J. That said, here's an implementation:
 
<lang J>NB. dead simple reference implementation
NB. for testing correctness
 
build0=:3 :0
data=: y
)
 
distance=: +/&.:*:@:-"1
 
nearest0=:3 :0
nearest=. data {~ (i. <./) |data distance y
(nearest distance y);(#data);nearest
)
 
NB. kd tree implementation
 
coclass 'kdnode'
create=:3 :0
Axis=: ({:$y)|<.2^.#y
if. 4>#y do.
Leaf=:1
Points=: y
else.
Leaf=:0
data=. y /: Axis|."1 y
n=. >.-:#data
Points=: ,:n{data
Left=: conew&'kdnode' n{.data
Right=: conew&'kdnode' (1+n)}.data
end.
)
 
distance=: +/&.:*:@:-"1
 
nearest=:3 :0
_ 0 nearest y
:
'd n'=. 0 1+x
dists=. Points distance y
ndx=. (i. <./) dists
nearest=. ndx { Points
range=. ndx { dists
if. Leaf do.
range;n;nearest return.
else.
d0=. d <. range
p0=. nearest
if. d0=0 do. 0;n;y return. end.
if. 0={./:Axis|."1 y,Points do.
'dist n pnt'=.(x+0 1) nearest__Left y
if. dist > d0 do. d0;n;p0 return. end.
if. dist < d do.
'dist2 n pnt2'=. (x+0 1) nearest__Right y
if. dist2 < dist do. dist2;n;pnt2 return. end.
end.
else.
'dist n pnt'=. (x+0 1) nearest__Right y
if. dist > d0 do. d0;n;p0 return. end.
if. dist < x do.
'dist2 n pnt2'=. (x+0 1) nearest__Left y
if. dist2 < dist do. dist2;n;pnt2 return. end.
end.
end.
end.
dist;n;pnt return.
)
 
coclass 'kdtree'
create=:3 :0
root=: conew&'kdnode' y
)
nearest=:3 :0
nearest__root y
)</lang>
 
And here's example use:
 
<lang J> tree=:conew&'kdtree' (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
nearest__tree 9 2
┌───────┬─┬───┐
│1.41421│2│8 1│
└───────┴─┴───┘</lang>
 
The first box is distance from argument point to selected point. The second box is the number of nodes visited. The third box is the selected point.
 
Here's the bigger problem:
 
<lang J> tree=:conew&'kdtree' dataset=:?1000 3$0
nearest__tree pnt=: ?3$0
┌─────────┬──┬──────────────────────────┐
│0.0387914│10│0.978082 0.767632 0.392523│
└─────────┴──┴──────────────────────────┘</lang>
 
So, what's the problem here?
 
First off, that's a lot of code, it took time to write. Let's assume that that time was free. Let's also assume that the time taken to build the tree structure was free. We're going to use this tree billions of times. Now what?
 
Well, let's compare the above implementation to a brute force implementation for time. Here's a "visit all nodes" implementation. It should give us the same kinds of results but we will claim that each candidate point is a node so we'll be visiting a lot more "nodes":
 
<lang J>build0=:3 :0
data=: y
)
 
distance=: +/&.:*:@:-"1
 
nearest0=:3 :0
nearest=. data {~ (i. <./) |data distance y
(nearest distance y);(#data);nearest
)</lang>
 
Here's the numbers we get:
 
<lang J> build0 (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
nearest0 9 2
┌───────┬─┬───┐
│1.41421│6│8 1│
└───────┴─┴───┘
build0 dataset
nearest0 pnt
┌─────────┬────┬──────────────────────────┐
│0.0387914│1000│0.978082 0.767632 0.392523│
└─────────┴────┴──────────────────────────┘</lang>
 
But what about timing?
 
<lang J> tree=:conew&'kdtree' (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
timespacex 'nearest__tree 9 2'
0.000174795 16640
build0 (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
timespacex 'nearest0 9 2'
3.62419e_5 6016</lang>
 
The kdtree implementation is almost five times slower than the brute force implementation for this small dataset. How about the bigger dataset?
 
<lang J> tree=:conew&''kdtree'' dataset
timespacex 'nearest__tree pnt'
0.0134688 54272
build0 dataset
timespacex 'nearest0 pnt'
0.00140702 22144</lang>
 
On the bigger dataset, the kdtree implementation is almost ten times slower than the brute force implementation.
 
See also: [[wp:KISS_principle]]
 
=={{header|Perl 6}}==