Jump to content

Burrows–Wheeler transform: Difference between revisions

→‎{{header|TXR}}: New section.
(→‎{{header|J}}: concise implementation)
(→‎{{header|TXR}}: New section.)
Line 2,662:
SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES -> |STEXYDST.E.IXXIIXXSSMPPS.B..EE.^.USFXDIIOIIIT -> SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
^ABC| -> error -> error</pre>
=={{header|TXR}}==
 
We use the U+DC00 code point as the EOF sentinel. In TXR terminology, this code is called the <i>pseudo-null</i>. It plays a special significance in that when a NUL byte occurs in UTF-8 external data, TXR's decoder maps it the U+DC00 point. When a string containing U+DC00 is converted to UTF-8, that code becomes a NUL again.
 
<syntaxhighlight lang="txrlisp">(defun bwt (str)
(let* ((seof `@str\xDC00`)
(n (len seof)))
(flow (collect-each ((i 0..n))
(rot seof i))
sort
(mappend last))))
 
(defun ibwt (str)
(let* ((n (len str))
(row (vector n "")))
(dotimes (i n)
(set [row i] (mkstring 0)))
(dotimes (i n)
(each ((c str)
(j 0))
(set [row j] `@c@[row j]`))
(nsort row))
[(find-if (op ends-with "\xDC00") row) 0..-1]))</syntaxhighlight>
 
At the REPL:
 
<pre>1> (bwt "^BANANA")
"BNN^AA\xDC00;A"
2> (ibwt *1)
"^BANANA"
3> (bwt "SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES")
"TEXYDST.E.IXIXIXXSSMPPS.B..E.\xDC00.UESFXDIIOIIITS"
4> (ibwt *3)
"SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES"</pre>
 
=={{header|Visual Basic .NET}}==
{{trans|C#}}
Line 2,788 ⟶ 2,823:
--> ERROR: Input can't contain STX or ETX
--></pre>
 
=={{header|Wren}}==
{{trans|Go}}
543

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.