Talk:Boyer-Moore string search: Difference between revisions

m (update historical comment to link to historical view of what was being commented on)
 
(6 intermediate revisions by 2 users not shown)
Line 11:
;;(print (bm_compile_pattern "abcdb"))
and replit just seems to run an editor... Then again I've never used Emacs, so what do I know? --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 15:55, 9 July 2022 (UTC)
 
: https://www.gnu.org/software/emacs/manual/html_node/efaq/Evaluating-Emacs-Lisp-code.html explains how to run emacs lisp code. --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 15:28, 17 July 2022 (UTC)
 
:: That said, currently, the Emacs Lisp implementation (and the current J implementation which copies it) does not implement the [[wp:Boyer–Moore_string-search_algorithm#The_good_suffix_rule|good suffix rule]]. So the algorithm exhibits poor performance in certain contexts.
:: This isn't the sort of thing which impacts the correctness of the returned result -- it's strictly a performance issue (it's a performance improvement when "good suffixes" are rare or absent in the text being searched, it's a performance problem when relatively long "good suffixes" appear relatively often in the text being searched (lazy construction of the corresponding lookup tables would tend to reduce the cost of that part of the implementation, but would also tend to increase cache pressure)). I've got other things to do right now, but I'll circle back on this when I have time. --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 13:53, 18 July 2022 (UTC)
::: And... my simplistic tests suggest that the performance advantages of the full Boyer-Moore algorithm are significant (performance improvement greater than a factor of 2) in at least some example cases. So, the emacs lisp implementation should probably be marked as incorrect. (But that leaves us with the question of how to determine whether an arbitrary implementation in an arbitrary programming language is correct -- this isn't about the correctness of the result, it's about the relative performance of the implementation.)
::: Near as I can tell, to properly judge implementations, we would have to work out search patterns which give us adequate code coverage (like 'ctcat') and ask for a display of some relevant intermediate results. --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 17:54, 19 July 2022 (UTC)
::::What do you think about the character comparison counts as shown by Julia/Phix, as a proof/measure of correctness? --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 16:23, 21 July 2022 (UTC)
:::::I guess it can be a decent indicator given well-chosen test cases, though it's arguably mislabeled -- bad_character_rule, for example, engages in slightly abstracted character comparisons, but these are not counted in those implementations -- so this count and its use would need careful description, if it were made to be part of the task. --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 17:06, 21 July 2022 (UTC)
6,951

edits