K-d tree: Difference between revisions

Content added Content deleted
(J: fix a bug in node visit counting, and redo timings)
Line 827: Line 827:
As a general rule, tree algorithms are a bad idea in J. That said, here's an implementation:
As a general rule, tree algorithms are a bad idea in J. That said, here's an implementation:


<lang J>coclass 'kdnode'
<lang J>NB. dead simple reference implementation
NB. for testing correctness

build0=:3 :0
data=: y
)

distance=: +/&.:*:@:-"1

nearest0=:3 :0
nearest=. data {~ (i. <./) |data distance y
(nearest distance y);(#data);nearest
)

NB. kd tree implementation

coclass 'kdnode'
create=:3 :0
create=:3 :0
Axis=: ({:$y)|<.2^.#y
Axis=: ({:$y)|<.2^.#y
Line 864: Line 848:
_ 0 nearest y
_ 0 nearest y
:
:
'd n'=. 0 1+x
N_base_=:N_base_+1
dists=. Points distance y
dists=. Points distance y
ndx=. (i. <./) dists
ndx=. (i. <./) dists
Line 870: Line 854:
range=. ndx { dists
range=. ndx { dists
if. Leaf do.
if. Leaf do.
range;n;nearest return.
range;nearest return.
else.
else.
d0=. d <. range
d0=. x <. range
p0=. nearest
p0=. nearest
if. d0=0 do. 0;n;y return. end.
if. d0=0 do. 0;y return. end.
if. 0={./:Axis|."1 y,Points do.
if. 0={./:Axis|."1 y,Points do.
'dist n pnt'=.(x+0 1) nearest__Left y
'dist pnt'=.x nearest__Left y
if. dist > d0 do. d0;n;p0 return. end.
if. dist > d0 do. d0;p0 return. end.
if. dist < d do.
if. dist < x do.
'dist2 n pnt2'=. (x+0 1) nearest__Right y
'dist2 pnt2'=. x nearest__Right y
if. dist2 < dist do. dist2;n;pnt2 return. end.
if. dist2 < dist do. dist2;pnt2 return. end.
end.
end.
else.
else.
'dist n pnt'=. (x+0 1) nearest__Right y
'dist pnt'=. x nearest__Right y
if. dist > d0 do. d0;n;p0 return. end.
if. dist > d0 do. d0;p0 return. end.
if. dist < x do.
if. dist < x do.
'dist2 n pnt2'=. (x+0 1) nearest__Left y
'dist2 pnt2'=. x nearest__Left y
if. dist2 < dist do. dist2;n;pnt2 return. end.
if. dist2 < dist do. dist2;pnt2 return. end.
end.
end.
end.
end.
end.
end.
dist;n;pnt return.
dist;pnt return.
)
)


Line 899: Line 883:
)
)
nearest=:3 :0
nearest=:3 :0
nearest__root y
N_base_=:0
'dist point'=. nearest__root y
dist;N_base_;point
)</lang>
)</lang>


Line 907: Line 893:
nearest__tree 9 2
nearest__tree 9 2
┌───────┬─┬───┐
┌───────┬─┬───┐
│1.41421│2│8 1│
│1.41421│3│8 1│
└───────┴─┴───┘</lang>
└───────┴─┴───┘</lang>


Line 915: Line 901:


<lang J> tree=:conew&'kdtree' dataset=:?1000 3$0
<lang J> tree=:conew&'kdtree' dataset=:?1000 3$0
nearest__tree pnt=: ?3$0
nearest__tree pnt
┌─────────┬───┬──────────────────────────┐
┌─────────┬──┬──────────────────────────┐
│0.0387914│10│0.978082 0.767632 0.392523│
│0.0387914│561│0.978082 0.767632 0.392523│
└─────────┴───┴──────────────────────────┘</lang>
└─────────┴──┴──────────────────────────┘</lang>


So, why are trees "generally a bad idea in J"?
So, why are trees "generally a bad idea in J"?
Line 954: Line 940:
<lang J> tree=:conew&'kdtree' (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
<lang J> tree=:conew&'kdtree' (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
timespacex 'nearest__tree 9 2'
timespacex 'nearest__tree 9 2'
0.000262674 15616
0.000174795 16640
build0 (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
build0 (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
timespacex 'nearest0 9 2'
timespacex 'nearest0 9 2'
3.62419e_5 6016</lang>
3.62419e_5 6016</lang>


The kdtree implementation is almost five times slower than the brute force implementation for this small dataset. How about the bigger dataset?
The kdtree implementation is almost ten times slower than the brute force implementation for this small dataset. How about the bigger dataset?


<lang J> tree=:conew&'kdtree' dataset
<lang J> tree=:conew&'kdtree' dataset
0.0169044 48128
timespacex 'nearest__tree pnt'
0.0134688 54272
build0 dataset
build0 dataset
timespacex 'nearest0 pnt'
timespacex 'nearest0 pnt'
0.00140702 22144</lang>
0.00140702 22144</lang>


On the bigger dataset, the kdtree implementation is almost ten times slower than the brute force implementation.
On the bigger dataset, the kdtree implementation is over ten times slower than the brute force implementation.


Exercise for the student: work out the cost of cpu time and decide how big of a dataset you need for the time it takes to implement the kdtree to pay back the investment of time needed to implement it. Also work out how many times the tree has to be accessed to more than cover the time needed to build it.
Exercise for the student: work out the cost of cpu time and decide how big of a dataset you need for the time it takes to implement the kdtree to pay back the investment of time needed to implement it. Also work out how many times the tree has to be accessed to more than cover the time needed to build it.