Revision as of 08:10, 6 June 2014 (view source) Rdm (talk \| contribs) m (→‎{{header\|J}}) ← Older edit		Revision as of 17:28, 6 June 2014 (view source) Rdm (talk \| contribs) (J: fix a bug in node visit counting, and redo timings) Newer edit →
Line 827: As a general rule, tree algorithms are a bad idea in J. That said, here's an implementation: <lang J>coclass 'kdnode'▼ ~~<lang J>NB. dead simple reference implementation~~ ~~NB. for testing correctness~~ ~~build0=:3 :0~~ ~~data=: y~~ ) ~~distance=: +/&.:*:@:-"1~~ ~~nearest0=:3 :0~~ ~~nearest=. data {~ (i. <./) \|data distance y~~ ~~(nearest distance y);(#data);nearest~~ ) ~~NB. kd tree implementation~~ ▲coclass 'kdnode' create=:3 :0 Axis=: ({:$y)\|<.2^.#y Line 864 ⟶ 848: _ 0 nearest y : ~~'d n'~~N_base_=~~. 0 1~~:N_base_+x1 dists=. Points distance y ndx=. (i. <./) dists Line 870 ⟶ 854: range=. ndx { dists if. Leaf do. range;n;nearest return. else. d0=. dx <. range p0=. nearest if. d0=0 do. 0;n;y return. end. if. 0={./:Axis\|."1 y,Points do. 'dist n pnt'=.(x~~+0 1)~~ nearest__Left y if. dist > d0 do. d0;n;p0 return. end. if. dist < dx do. 'dist2 n pnt2'=. (x~~+0 1)~~ nearest__Right y if. dist2 < dist do. dist2;n;pnt2 return. end. end. else. 'dist n pnt'=. (x~~+0 1)~~ nearest__Right y if. dist > d0 do. d0;n;p0 return. end. if. dist < x do. 'dist2 n pnt2'=. (x~~+0 1)~~ nearest__Left y if. dist2 < dist do. dist2;n;pnt2 return. end. end. end. end. dist;n;pnt return. ) Line 899 ⟶ 883: ) nearest=:3 :0 ~~nearest__root y~~N_base_=:0 'dist point'=. nearest__root y dist;N_base_;point )</lang> Line 907 ⟶ 893: nearest__tree 9 2 ┌───────┬─┬───┐ │1.~~41421│2│8~~41421│3│8 1│ └───────┴─┴───┘</lang> Line 915 ⟶ 901: <lang J> tree=:conew&'kdtree' dataset=:?1000 3$0 nearest__tree pnt~~=: ?3$0~~ ┌─────────┬───┬──────────────────────────┐ ~~┌─────────┬──┬──────────────────────────┐~~ │0.~~0387914│10│0~~0387914│561│0.978082 0.767632 0.392523│ └─────────┴───┴──────────────────────────┘</lang> ~~└─────────┴──┴──────────────────────────┘</lang>~~ So, why are trees "generally a bad idea in J"? Line 954 ⟶ 940: <lang J> tree=:conew&'kdtree' (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2) timespacex 'nearest__tree 9 2' 0.000262674 15616 ~~0.000174795 16640~~ build0 (2,3), (5,4), (9,6), (4,7), (8,1),: (7,2) timespacex 'nearest0 9 2' 3.62419e_5 6016</lang> The kdtree implementation is almost ~~five~~ten times slower than the brute force implementation for this small dataset. How about the bigger dataset? <lang J> tree=:conew&'kdtree' dataset 0.0169044 48128 ~~timespacex 'nearest__tree pnt'~~ ~~0.0134688 54272~~ build0 dataset timespacex 'nearest0 pnt' 0.00140702 22144</lang> On the bigger dataset, the kdtree implementation is ~~almost~~over ten times slower than the brute force implementation. Exercise for the student: work out the cost of cpu time and decide how big of a dataset you need for the time it takes to implement the kdtree to pay back the investment of time needed to implement it. Also work out how many times the tree has to be accessed to more than cover the time needed to build it.

K-d tree: Difference between revisions

K-d tree (view source)

Revision as of 17:28, 6 June 2014