Burrows–Wheeler transform: Difference between revisions

Content added Content deleted
m (syntax highlighting fixup automation)
m (Automated syntax highlighting fixup (second round - minor fixes))
Line 14: Line 14:
Source: [[wp:Burrows–Wheeler_transform|Burrows–Wheeler transform]]
Source: [[wp:Burrows–Wheeler_transform|Burrows–Wheeler transform]]
<br><br>
<br><br>

=={{header|11l}}==
=={{header|11l}}==
{{trans|Python}}
{{trans|Python}}


<syntaxhighlight lang=11l>F bwt(String =s)
<syntaxhighlight lang="11l">F bwt(String =s)
‘Apply Burrows-Wheeler transform to input string.’
‘Apply Burrows-Wheeler transform to input string.’
assert("\002" !C s & "\003" !C s, ‘Input string cannot contain STX and ETX characters’)
assert("\002" !C s & "\003" !C s, ‘Input string cannot contain STX and ETX characters’)
Line 70: Line 69:


</pre>
</pre>

=={{header|BQN}}==
=={{header|BQN}}==
<syntaxhighlight lang=BQN>stx←@+2
<syntaxhighlight lang="bqn">stx←@+2
BWT ← { # Burrows-Wheeler Transform and its inverse as an invertible function
BWT ← { # Burrows-Wheeler Transform and its inverse as an invertible function
𝕊: "Input contained STX"!⊑stx¬∘∊𝕩 ⋄ (⍋↓𝕩) ⊏ stx∾𝕩;
𝕊: "Input contained STX"!⊑stx¬∘∊𝕩 ⋄ (⍋↓𝕩) ⊏ stx∾𝕩;
Line 79: Line 77:
</syntaxhighlight>
</syntaxhighlight>
Example use:
Example use:
<syntaxhighlight lang=text> BWT "banana"
<syntaxhighlight lang="text"> BWT "banana"
"annb␂aa"
"annb␂aa"
BWT⁼ BWT "banana"
BWT⁼ BWT "banana"
Line 107: Line 105:
Error: Input contained STX
Error: Input contained STX
</syntaxhighlight>
</syntaxhighlight>

=={{header|C}}==
=={{header|C}}==
{{trans|Python}}
{{trans|Python}}
<syntaxhighlight lang=c>#include <string.h>
<syntaxhighlight lang="c">#include <string.h>
#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdlib.h>
Line 236: Line 233:
-->
-->
</pre>
</pre>

=={{header|C sharp|C#}}==
=={{header|C sharp|C#}}==
{{trans|D}}
{{trans|D}}
<syntaxhighlight lang=csharp>using System;
<syntaxhighlight lang="csharp">using System;
using System.Collections.Generic;
using System.Collections.Generic;
using System.Linq;
using System.Linq;
Line 363: Line 359:
--> ERROR: Input can't contain STX or ETX
--> ERROR: Input can't contain STX or ETX
--></pre>
--></pre>

=={{header|C++}}==
=={{header|C++}}==
{{trans|C#}}
{{trans|C#}}
<syntaxhighlight lang=cpp>#include <algorithm>
<syntaxhighlight lang="cpp">#include <algorithm>
#include <iostream>
#include <iostream>
#include <vector>
#include <vector>
Line 489: Line 484:
--> Error Input can't contain STX or ETX
--> Error Input can't contain STX or ETX
--></pre>
--></pre>

=={{header|D}}==
=={{header|D}}==
{{trans|Kotlin}}
{{trans|Kotlin}}
<syntaxhighlight lang=d>import std.algorithm.iteration;
<syntaxhighlight lang="d">import std.algorithm.iteration;
import std.algorithm.mutation;
import std.algorithm.mutation;
import std.algorithm.searching;
import std.algorithm.searching;
Line 590: Line 584:
--> ERROR: Input can't contain STX or ETX
--> ERROR: Input can't contain STX or ETX
--></pre>
--></pre>

=={{header|Factor}}==
=={{header|Factor}}==
Factor has a Burrows-Wheeler transform implementation in its standard library. In addition to the transformed sequence, the <code>bwt</code> word also outputs an index for use with the inverse <code>ibwt</code> word.
Factor has a Burrows-Wheeler transform implementation in its standard library. In addition to the transformed sequence, the <code>bwt</code> word also outputs an index for use with the inverse <code>ibwt</code> word.
<syntaxhighlight lang=factor>USING: formatting io kernel math.transforms.bwt sequences ;
<syntaxhighlight lang="factor">USING: formatting io kernel math.transforms.bwt sequences ;
{
{
"banana" "dogwood" "TO BE OR NOT TO BE OR WANT TO BE OR NOT?"
"banana" "dogwood" "TO BE OR NOT TO BE OR WANT TO BE OR NOT?"
Line 620: Line 613:
ibwt-> "SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES"
ibwt-> "SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES"
</pre>
</pre>

=={{header|Go}}==
=={{header|Go}}==
{{trans|Python}}
{{trans|Python}}
<syntaxhighlight lang=go>package main
<syntaxhighlight lang="go">package main


import (
import (
Line 725: Line 717:
-->
-->
</pre>
</pre>

=={{header|Groovy}}==
=={{header|Groovy}}==
{{trans|Java}}
{{trans|Java}}
<syntaxhighlight lang=groovy>class BWT {
<syntaxhighlight lang="groovy">class BWT {
private static final String STX = "\u0002"
private static final String STX = "\u0002"
private static final String ETX = "\u0003"
private static final String ETX = "\u0003"
Line 826: Line 817:
--> ERROR: String cannot contain STX or ETX
--> ERROR: String cannot contain STX or ETX
--> </pre>
--> </pre>

=={{header|Haskell}}==
=={{header|Haskell}}==
<syntaxhighlight lang=haskell>-- A straightforward, inefficient implementation of the Burrows–Wheeler
<syntaxhighlight lang="haskell">-- A straightforward, inefficient implementation of the Burrows–Wheeler
-- transform, based on the description in the Wikipedia article.
-- transform, based on the description in the Wikipedia article.
--
--
Line 895: Line 885:
SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
</pre>
</pre>

=={{header|J}}==
=={{header|J}}==
<pre>
<pre>
Line 929: Line 918:
</pre>
</pre>


<syntaxhighlight lang=text>
<syntaxhighlight lang="text">
NB. articulate definition
NB. articulate definition
NB. local names (assignment =.) won't pollute the namespace
NB. local names (assignment =.) won't pollute the namespace
Line 969: Line 958:
)
)
</syntaxhighlight>
</syntaxhighlight>

=={{header|Java}}==
=={{header|Java}}==
{{trans|Kotlin}}
{{trans|Kotlin}}
<syntaxhighlight lang=java>import java.util.ArrayList;
<syntaxhighlight lang="java">import java.util.ArrayList;
import java.util.List;
import java.util.List;


Line 1,073: Line 1,061:
--> ERROR: String cannot contain STX or ETX
--> ERROR: String cannot contain STX or ETX
--> </pre>
--> </pre>

=={{header|jq}}==
=={{header|jq}}==
{{trans|Wren}}
{{trans|Wren}}
{{works with|jq}}
{{works with|jq}}
'''Works with gojq, the Go implementation of jq'''
'''Works with gojq, the Go implementation of jq'''
<syntaxhighlight lang=jq># substitute ^ for STX and | for ETX
<syntaxhighlight lang="jq"># substitute ^ for STX and | for ETX
def makePrintable:
def makePrintable:
if . == null then null
if . == null then null
Line 1,105: Line 1,092:
</syntaxhighlight>
</syntaxhighlight>
'''Tests'''
'''Tests'''
<syntaxhighlight lang=jq>
<syntaxhighlight lang="jq">
def tests:
def tests:
(
(
Line 1,147: Line 1,134:
--> ERROR: String can't contain STX or ETX
--> ERROR: String can't contain STX or ETX
</pre>
</pre>

=={{header|Julia}}==
=={{header|Julia}}==
<syntaxhighlight lang=julia>bwsort(vec) = sort(vec, lt = (a, b) -> string(a) < string(b))
<syntaxhighlight lang="julia">bwsort(vec) = sort(vec, lt = (a, b) -> string(a) < string(b))


function burrowswheeler_encode(s)
function burrowswheeler_encode(s)
Line 1,196: Line 1,182:
ERROR: LoadError: "String for Burrows-Wheeler input cannot contain STX or ETX"
ERROR: LoadError: "String for Burrows-Wheeler input cannot contain STX or ETX"
</pre>
</pre>

=={{header|Kotlin}}==
=={{header|Kotlin}}==
{{trans|Python}}
{{trans|Python}}
<syntaxhighlight lang=scala>// Version 1.2.60
<syntaxhighlight lang="scala">// Version 1.2.60


const val STX = "\u0002"
const val STX = "\u0002"
Line 1,287: Line 1,272:
-->
-->
</pre>
</pre>

=={{header|Ksh}}==
=={{header|Ksh}}==
<syntaxhighlight lang=ksh>
<syntaxhighlight lang="ksh">
#!/bin/ksh
#!/bin/ksh


Line 1,420: Line 1,404:
|ABC^
|ABC^
ERROR: string cannot contain ^ or |</pre>
ERROR: string cannot contain ^ or |</pre>

=={{header|Lua}}==
=={{header|Lua}}==
{{trans|Java}}
{{trans|Java}}
<syntaxhighlight lang=lua>STX = string.char(tonumber(2,16))
<syntaxhighlight lang="lua">STX = string.char(tonumber(2,16))
ETX = string.char(tonumber(3,16))
ETX = string.char(tonumber(3,16))


Line 1,532: Line 1,515:
--> ERROR: bwt.lua:6: String cannot contain STX
--> ERROR: bwt.lua:6: String cannot contain STX
--></pre>
--></pre>

=={{header|Mathematica}} / {{header|Wolfram Language}}==
=={{header|Mathematica}} / {{header|Wolfram Language}}==
<syntaxhighlight lang=Mathematica>ClearAll[BurrowWheeler, InverseBurrowWheeler]
<syntaxhighlight lang="mathematica">ClearAll[BurrowWheeler, InverseBurrowWheeler]
BurrowWheeler[sin_String, {bdelim_, edelim_}] := Module[{s},
BurrowWheeler[sin_String, {bdelim_, edelim_}] := Module[{s},
s = Characters[bdelim <> sin <> edelim];
s = Characters[bdelim <> sin <> edelim];
Line 1,563: Line 1,545:
|do^oodwg
|do^oodwg
dogwood</pre>
dogwood</pre>

=={{header|Nim}}==
=={{header|Nim}}==
Translation of the Wikipedia Python algorithm. The program uses characters '¹' and '²' to display STX and ETX.
Translation of the Wikipedia Python algorithm. The program uses characters '¹' and '²' to display STX and ETX.


<syntaxhighlight lang=Nim>import algorithm
<syntaxhighlight lang="nim">import algorithm
from sequtils import repeat
from sequtils import repeat
import strutils except repeat
import strutils except repeat
Line 1,656: Line 1,637:
After transformation: ²STEXYDST.E.IXXIIXXSSMPPS.B..EE.¹.USFXDIIOIIIT
After transformation: ²STEXYDST.E.IXXIIXXSSMPPS.B..EE.¹.USFXDIIOIIIT
After inverse transformation: SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES</pre>
After inverse transformation: SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES</pre>

=={{header|Pascal}}==
=={{header|Pascal}}==
A console program in Free Pascal, created with the Lazarus IDE. It doesn't use special characters, but records the encoded string along with an index to enable decoding (as in the original paper by Burrows and Wheeler).
A console program in Free Pascal, created with the Lazarus IDE. It doesn't use special characters, but records the encoded string along with an index to enable decoding (as in the original paper by Burrows and Wheeler).


The first character in a Pascal string has index 1, but it's more convenient to describe the algorithm in terms of zero-based indices. The constant STR_BASE = 1 is used to make it clear where Pascal usage has been taken into account.
The first character in a Pascal string has index 1, but it's more convenient to describe the algorithm in terms of zero-based indices. The constant STR_BASE = 1 is used to make it clear where Pascal usage has been taken into account.
<syntaxhighlight lang=pascal>
<syntaxhighlight lang="pascal">
program BurrowsWheeler;
program BurrowsWheeler;


Line 1,883: Line 1,863:
---> SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
---> SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
</pre>
</pre>

=={{header|Perl}}==
=={{header|Perl}}==
{{trans|Raku}}
{{trans|Raku}}
<syntaxhighlight lang=perl>use utf8;
<syntaxhighlight lang="perl">use utf8;
binmode STDOUT, ":utf8";
binmode STDOUT, ":utf8";


Line 1,938: Line 1,917:
Transformed: OOORREEETTRTW BBB ATTT NNOOONOO👍 ?
Transformed: OOORREEETTRTW BBB ATTT NNOOONOO👍 ?
Inverse transformed: TO BE OR NOT TO BE OR WANT TO BE OR NOT?</pre>
Inverse transformed: TO BE OR NOT TO BE OR WANT TO BE OR NOT?</pre>

=={{header|Phix}}==
=={{header|Phix}}==
An efficient implementation, based mainly on http://spencer-carroll.com/an-easy-to-understand-explanation-of-the-burrows-wheeler-transform/ <br>
An efficient implementation, based mainly on http://spencer-carroll.com/an-easy-to-understand-explanation-of-the-burrows-wheeler-transform/ <br>
Takes around about ten seconds to transform and invert a 100K string. Note: requires 0.8.0+
Takes around about ten seconds to transform and invert a 100K string. Note: requires 0.8.0+
<!--<syntaxhighlight lang=Phix>(phixonline)-->
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #000080;font-style:italic;">-- demo\rosetta\burrows_wheeler.exw
<span style="color: #000080;font-style:italic;">-- demo\rosetta\burrows_wheeler.exw
--/*
--/*
Line 2,100: Line 2,078:
inverse: SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
inverse: SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
</pre>
</pre>

=={{header|Python}}==
=={{header|Python}}==
This Python implementation sacrifices speed for simplicity: the program is short, but takes more than the linear time that would be desired in a practical implementation.
This Python implementation sacrifices speed for simplicity: the program is short, but takes more than the linear time that would be desired in a practical implementation.
Line 2,108: Line 2,085:
Ref: [https://www.codespeedy.com/burrows-wheeler-transform-in-python/ Burrows Wheeler Transform in Python]
Ref: [https://www.codespeedy.com/burrows-wheeler-transform-in-python/ Burrows Wheeler Transform in Python]


<syntaxhighlight lang=Python>
<syntaxhighlight lang="python">
def bwt(s):
def bwt(s):
"""Apply Burrows-Wheeler transform to input string."""
"""Apply Burrows-Wheeler transform to input string."""
Line 2,126: Line 2,103:
return s.rstrip("\003").strip("\002") # Get rid of start and end markers
return s.rstrip("\003").strip("\002") # Get rid of start and end markers
</syntaxhighlight>
</syntaxhighlight>

=={{header|Raku}}==
=={{header|Raku}}==
(formerly Perl 6)
(formerly Perl 6)
{{works with|Rakudo|2018.06}}
{{works with|Rakudo|2018.06}}


<syntaxhighlight lang=raku line># STX can be any character that doesn't appear in the text.
<syntaxhighlight lang="raku" line># STX can be any character that doesn't appear in the text.
# Using a visible character here for ease of viewing.
# Using a visible character here for ease of viewing.
Line 2,178: Line 2,154:
Original: Oops👍
Original: Oops👍
String can't contain STX character.</pre>
String can't contain STX character.</pre>

=={{header|REXX}}==
=={{header|REXX}}==
Programming note: &nbsp; a little bit of code was added to support more (legal) characters in the input string for the '''BWT'''
Programming note: &nbsp; a little bit of code was added to support more (legal) characters in the input string for the '''BWT'''
<br>function. &nbsp; The error recovery and error messages are rudimentary when an illegal character in the input is detected.
<br>function. &nbsp; The error recovery and error messages are rudimentary when an illegal character in the input is detected.
<syntaxhighlight lang=rexx>/*REXX program performs a Burrows─Wheeler transform (BWT) on a character string(s). */
<syntaxhighlight lang="rexx">/*REXX program performs a Burrows─Wheeler transform (BWT) on a character string(s). */
$.= /*the default text for (all) the inputs*/
$.= /*the default text for (all) the inputs*/
parse arg $.1 /*obtain optional arguments from the CL*/
parse arg $.1 /*obtain optional arguments from the CL*/
Line 2,269: Line 2,244:
***error*** BWT: The input string contains an invalid character at position 15.
***error*** BWT: The input string contains an invalid character at position 15.
</pre>
</pre>

=={{header|Ruby}}==
=={{header|Ruby}}==
{{trans|C#}}
{{trans|C#}}
<syntaxhighlight lang=ruby>STX = "\u0002"
<syntaxhighlight lang="ruby">STX = "\u0002"
ETX = "\u0003"
ETX = "\u0003"


Line 2,366: Line 2,340:
--> Input can't contain STX or ETX
--> Input can't contain STX or ETX
--></pre>
--></pre>

=={{header|Rust}}==
=={{header|Rust}}==
<syntaxhighlight lang=rust>
<syntaxhighlight lang="rust">
use core::cmp::Ordering;
use core::cmp::Ordering;


Line 2,479: Line 2,452:
Inverse BWT: TO BE OR NOT TO BE OR WANT TO BE OR NOT?
Inverse BWT: TO BE OR NOT TO BE OR WANT TO BE OR NOT?
</pre>
</pre>

=={{header|Scala}}==
=={{header|Scala}}==
{{trans|Kotlin}}
{{trans|Kotlin}}
<syntaxhighlight lang=scala>import scala.collection.mutable.ArrayBuffer
<syntaxhighlight lang="scala">import scala.collection.mutable.ArrayBuffer


object BWT {
object BWT {
Line 2,571: Line 2,543:
^ABC|
^ABC|
--> ERROR: String can't contain STX or ETX</pre>
--> ERROR: String can't contain STX or ETX</pre>

=={{header|Sidef}}==
=={{header|Sidef}}==
{{trans|Python}}
{{trans|Python}}
<syntaxhighlight lang=ruby>class BurrowsWheelerTransform (String L = "\002") {
<syntaxhighlight lang="ruby">class BurrowsWheelerTransform (String L = "\002") {


method encode(String s) {
method encode(String s) {
Line 2,611: Line 2,582:
BWT("SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES") = "STEXYDST.E.IXXIIXXSSMPPS.B..EE.$.USFXDIIOIIIT"
BWT("SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES") = "STEXYDST.E.IXXIIXXSSMPPS.B..EE.$.USFXDIIOIIIT"
</pre>
</pre>

=={{header|Swift}}==
=={{header|Swift}}==


{{trans|Kotlin}}
{{trans|Kotlin}}


<syntaxhighlight lang=swift>import Foundation
<syntaxhighlight lang="swift">import Foundation


private let stx = "\u{2}"
private let stx = "\u{2}"
Line 2,680: Line 2,650:
SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES -> |STEXYDST.E.IXXIIXXSSMPPS.B..EE.^.USFXDIIOIIIT -> SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES -> |STEXYDST.E.IXXIIXXSSMPPS.B..EE.^.USFXDIIOIIIT -> SIX.MIXED.PIXIES.SIFT.SIXTY.PIXIE.DUST.BOXES
^ABC| -> error -> error</pre>
^ABC| -> error -> error</pre>

=={{header|Visual Basic .NET}}==
=={{header|Visual Basic .NET}}==
{{trans|C#}}
{{trans|C#}}
<syntaxhighlight lang=vbnet>Module Module1
<syntaxhighlight lang="vbnet">Module Module1


ReadOnly STX As Char = Chr(&H2)
ReadOnly STX As Char = Chr(&H2)
Line 2,807: Line 2,776:
--> ERROR: Input can't contain STX or ETX
--> ERROR: Input can't contain STX or ETX
--></pre>
--></pre>

=={{header|Wren}}==
=={{header|Wren}}==
{{trans|Go}}
{{trans|Go}}
{{libheader|Wren-sort}}
{{libheader|Wren-sort}}
<syntaxhighlight lang=ecmascript>import "/sort" for Sort
<syntaxhighlight lang="ecmascript">import "/sort" for Sort


var stx = "\x02"
var stx = "\x02"
Line 2,896: Line 2,864:
-->
-->
</pre>
</pre>

=={{header|zkl}}==
=={{header|zkl}}==
<syntaxhighlight lang=zkl>class BurrowsWheelerTransform{
<syntaxhighlight lang="zkl">class BurrowsWheelerTransform{
fcn init(chr="$"){ var special=chr; }
fcn init(chr="$"){ var special=chr; }
fcn encode(str){
fcn encode(str){
Line 2,915: Line 2,882:
}
}
}</syntaxhighlight>
}</syntaxhighlight>
<syntaxhighlight lang=zkl>BWT:=BurrowsWheelerTransform();
<syntaxhighlight lang="zkl">BWT:=BurrowsWheelerTransform();
//BWT.encode("$"); // --> assert(bbb.zkl:25): String cannot contain char "$"
//BWT.encode("$"); // --> assert(bbb.zkl:25): String cannot contain char "$"