K-d tree: Difference between revisions

Content added Content deleted
(J: fix a bug in node visit counting, and redo timings)
(bugfix: need to test distance from point to dividing plane to decide if checking other side is relevant)
Line 830: Line 830:
create=:3 :0
create=:3 :0
Axis=: ({:$y)|<.2^.#y
Axis=: ({:$y)|<.2^.#y
if. 4>#y do.
Mask=: Axis~:i.{:$y
if. 3>#y do.
Leaf=:1
Leaf=:1
Points=: y
Points=: y
Line 836: Line 837:
Leaf=:0
Leaf=:0
data=. y /: Axis|."1 y
data=. y /: Axis|."1 y
n=. >.-:#data
n=. <.-:#data
Points=: ,:n{data
Points=: ,:n{data
Left=: conew&'kdnode' n{.data
Left=: conew&'kdnode' n{.data
Line 848: Line 849:
_ 0 nearest y
_ 0 nearest y
:
:
N_base_=:N_base_+1
n=.' ',~":N_base_=:N_base_+1
dists=. Points distance y
dists=. Points distance y
ndx=. (i. <./) dists
ndx=. (i. <./) dists
Line 860: Line 861:
if. d0=0 do. 0;y return. end.
if. d0=0 do. 0;y return. end.
if. 0={./:Axis|."1 y,Points do.
if. 0={./:Axis|."1 y,Points do.
'dist pnt'=.x nearest__Left y
'dist pnt'=.d0 nearest__Left y
if. dist > d0 do. d0;p0 return. end.
if. dist > d0 do.
if. dist < x do.
d0;p0 return.
'dist2 pnt2'=. x nearest__Right y
end.
if. dist2 < dist do. dist2;pnt2 return. end.
if. dist < d0 do.
if. dist > (Mask#pnt) distance Mask#,Points do.
'dist2 pnt2'=. d0 nearest__Right y
if. dist2 < dist do. dist2;pnt2 return. end.
end.
end.
end.
else.
else.
'dist pnt'=. x nearest__Right y
'dist pnt'=. d0 nearest__Right y
if. dist > d0 do. d0;p0 return. end.
if. dist > d0 do.
if. dist < x do.
d0;p0 return.
'dist2 pnt2'=. x nearest__Left y
end.
if. dist2 < dist do. dist2;pnt2 return. end.
if. dist < d0 do.
if. dist > (Mask#pnt) distance Mask#,Points do.
'dist2 pnt2'=. d0 nearest__Left y
if. dist2 < dist do. dist2;pnt2 return. end.
end.
end.
end.
end.
end.
Line 893: Line 902:
nearest__tree 9 2
nearest__tree 9 2
┌───────┬─┬───┐
┌───────┬─┬───┐
│1.41421│3│8 1│
│1.41421│4│8 1│
└───────┴─┴───┘</lang>
└───────┴─┴───┘</lang>


Line 902: Line 911:
<lang J> tree=:conew&'kdtree' dataset=:?1000 3$0
<lang J> tree=:conew&'kdtree' dataset=:?1000 3$0
nearest__tree pnt
nearest__tree pnt
┌─────────┬──┬──────────────────────────┐
┌─────────┬───┬──────────────────────────┐
│0.0387914│561│0.978082 0.767632 0.392523│
│0.0387914│12│0.978082 0.767632 0.392523│
└─────────┴──┴──────────────────────────┘</lang>
└─────────┴───┴──────────────────────────┘</lang>


So, why are trees "generally a bad idea in J"?
So, why are trees "generally a bad idea in J"?
Line 940: Line 949:
<lang J> tree=:conew&'kdtree' (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
<lang J> tree=:conew&'kdtree' (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
timespacex 'nearest__tree 9 2'
timespacex 'nearest__tree 9 2'
0.000487181 19328
0.000262674 15616
build0 (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
build0 (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2)
timespacex 'nearest0 9 2'
timespacex 'nearest0 9 2'
3.62419e_5 6016</lang>
3.62419e_5 6016</lang>


The kdtree implementation is almost ten times slower than the brute force implementation for this small dataset. How about the bigger dataset?
The kdtree implementation is over ten times slower than the brute force implementation for this small dataset. How about the bigger dataset?


<lang J> tree=:conew&'kdtree' dataset
<lang J> tree=:conew&'kdtree' dataset
timespacex 'nearest__tree pnt'
0.0169044 48128
0.00141408 45312
build0 dataset
build0 dataset
timespacex 'nearest0 pnt'
timespacex 'nearest0 pnt'
0.00140702 22144</lang>
0.00140702 22144</lang>


On the bigger dataset, the kdtree implementation is over ten times slower than the brute force implementation.
On the bigger dataset, the kdtree implementation is about the same speed as the brute force implementation.

Exercise for the student: work out the cost of cpu time and decide how big of a dataset you need for the time it takes to implement the kdtree to pay back the investment of time needed to implement it. Also work out how many times the tree has to be accessed to more than cover the time needed to build it.


See also: [[wp:KISS_principle]]
See also: [[wp:KISS_principle]]