Burrows–Wheeler transform: Difference between revisions

Content added Content deleted
(→‎{{header|TXR}}: Replace collect-each in bwt with mapcar.)
(→‎{{header|TXR}}: Add missing definition of eof!)
Line 2,666: Line 2,666:
We use the U+DC00 code point as the EOF sentinel. In TXR terminology, this code is called the <i>pseudo-null</i>. It plays a special significance in that when a NUL byte occurs in UTF-8 external data, TXR's decoder maps it the U+DC00 point. When a string containing U+DC00 is converted to UTF-8, that code becomes a NUL again.
We use the U+DC00 code point as the EOF sentinel. In TXR terminology, this code is called the <i>pseudo-null</i>. It plays a special significance in that when a NUL byte occurs in UTF-8 external data, TXR's decoder maps it the U+DC00 point. When a string containing U+DC00 is converted to UTF-8, that code becomes a NUL again.


<syntaxhighlight lang="txrlisp">(defun bwt (str)
<syntaxhighlight lang="txrlisp">(defvarl eof "\xDC00")

(defun bwt (str)
(if (contains eof str)
(if (contains eof str)
(error "~s: input may not contain ~a" %fun% eof))
(error "~s: input may not contain ~a" %fun% eof))