Factorize string into Lyndon words

Revision as of 01:08, 31 January 2024 by Aspen138 (talk | contribs) (Add Scala implementation)

Given a finite alphabet, we can lexicographically order all strings in the alphabet. If two strings have different lengths, then pad the shorter string on the right with the smallest letter. For example, we have 01 > 001, but 01 = 010. As we see, this order is a total preorder, but not a total order, since it identifies different strings.

Task
Factorize string into Lyndon words
You are encouraged to solve this task according to the task description, using any language you may know.

A Lyndon word is a non-empty string that is strictly lower in lexicographic order than all its circular rotations. In particular, if a string is equal to a circular rotation, then it is not a Lyndon word. For example, since 0101 = 0101 (rotation by 2), it is not a Lyndon word.

The first Lyndon words on the binary alphabet {0, 1} are:

0, 1, 01, 001, 011, 0001, 0011, 0111, 00001, 00011, 00101, 00111, 01011, 01111, ...

Some basic properties:

  • The only Lyndon word that ends with 0 is 0.
  • Proof. If s0 is a Lyndon word, and s is not empty, then s0 < 0s. If s contains 1 somewhere, then s0 > 0s. Therefore s has only 0. But then s0 = 0s, contradiction.
  • The lexicographic order is a total order on the Lyndon words.
  • Proof. For, the only way for two different strings s, s' to have the same lexicographic ordering is for one of them to pad to the other. We can assume that s00...0 = s'. If that is so, then s00...0 is a Lyndon word that ends with 0, so it is just 0, and so s is a Lyndon word that is also empty, contradiction.

The Chen–Fox–Lyndon theorem states that any string is uniquely factorizable into a sequence of Lyndon words non-decreasing in lexicographic order. Duval's algorithm computes this in O(length of input) time and and O(1) space.[1] See [2] for a description of Duval's algorithm.

C

Copied from [2], under Creative Commons Attribution Share Alike 4.0 International License.

vector<string> duval(string const& s) {
    int n = s.size();
    int i = 0;
    vector<string> factorization;
    while (i < n) {
        int j = i + 1, k = i;
        while (j < n && s[k] <= s[j]) {
            if (s[k] < s[j])
                k = i;
            else
                k++;
            j++;
        }
        while (i <= k) {
            factorization.push_back(s.substr(i, j - k));
            i += j - k;
        }
    }
    return factorization;
}


MATLAB

Translation of: Python
clear all;close all;clc;
m = '0';
for i = 1:7
    m0 = m;
    m = strrep(m, '0', 'a');
    m = strrep(m, '1', '0');
    m = strrep(m, 'a', '1');
    m = strcat(m0, m);
end

factorization = chenFoxLyndonFactorization(m);
for index=1:length(factorization)
    disp(factorization(index));
end


function factorization = chenFoxLyndonFactorization(s)
    n = length(s);
    i = 1;
    factorization = {};
    while i <= n
        j = i + 1;
        k = i;
        while j <= n && s(k) <= s(j)
            if s(k) < s(j)
                k = i;
            else
                k = k + 1;
            end
            j = j + 1;
        end
        while i <= k
            factorization{end+1} = s(i:i + j - k - 1);
            i = i + j - k;
        end
    end
    assert(strcmp(join(factorization, ''), s));
end
Output:
    {'011'}

    {'01'}

    {'0011'}

    {'00101101'}

    {'0010110011010011'}

    {'00101100110100101101001100101101'}

    {'001011001101001011010011001011001101001100101101'}

    {'001011001101'}

    {'001'}

Python

Duval's algorithm:

def chen_fox_lyndon_factorization(s):
    n = len(s)
    i = 0
    factorization = []
    while i < n:
        j, k = i + 1, i
        while j < n and s[k] <= s[j]:
            if s[k] < s[j]:
                k = i
            else:
                k += 1
            j += 1
        while i <= k:
            factorization.append(s[i:i + j - k])
            i += j - k
    assert ''.join(factorization) == s
    return factorization

Example use with Thue-Morse sequence

m='0'
for i in range(0,7):
     m0=m
     m=m.replace('0','a')
     m=m.replace('1','0')
     m=m.replace('a','1')
     m=m0+m

print(chen_fox_lyndon_factorization(m))

Output:

['011', '01', '0011', '00101101', '0010110011010011', '00101100110100101101001100101101', '001011001101001011010011001011001101001100101101', '001011001101', '001']


Rust

Translation of: Python
fn chen_fox_lyndon_factorization(s: &str) -> Vec<String> {
    let n = s.len();
    let mut i = 0;
    let mut factorization = Vec::new();
    while i < n {
        let (mut j, mut k) = (i + 1, i);
        while j < n && s.as_bytes()[k] <= s.as_bytes()[j] {
            if s.as_bytes()[k] < s.as_bytes()[j] {
                k = i;
            } else {
                k += 1;
            }
            j += 1;
        }
        while i <= k {
            factorization.push(s[i..i + j - k].to_string());
            i += j - k;
        }
    }
    factorization
}

fn main() {
    let mut m = String::from("0");
    for _ in 0..7 {
        let m0 = m.clone();
        m = m.replace("0", "a");
        m = m.replace("1", "0");
        m = m.replace("a", "1");
        m = m0 + &m;
    }

    let factorization = chen_fox_lyndon_factorization(&m);
    println!("{:?}", factorization);
}
Output:
["011", "01", "0011", "00101101", "0010110011010011", "00101100110100101101001100101101", "001011001101001011010011001011001101001100101101", "001011001101", "001"]


Scala

Translation of: Python
object ChenFoxLyndonFactorization extends App {
  def chenFoxLyndonFactorization(s: String): List[String] = {
    val n = s.length
    var i = 0
    var factorization = List[String]()
    while (i < n) {
      var j = i + 1
      var k = i
      while (j < n && s.charAt(k) <= s.charAt(j)) {
        if (s.charAt(k) < s.charAt(j)) {
          k = i
        } else {
          k += 1
        }
        j += 1
      }
      while (i <= k) {
        factorization = factorization :+ s.substring(i, i + j - k)
        i += j - k
      }
    }
    assert(s == factorization.mkString)
    factorization
  }

  var m = "0"
  for (i <- 0 until 7) {
    val m0 = m
    m = m.replace('0', 'a').replace('1', '0').replace('a', '1')
    m = m0 + m
  }

  println(chenFoxLyndonFactorization(m))
}
Output:
List(011, 01, 0011, 00101101, 0010110011010011, 00101100110100101101001100101101, 001011001101001011010011001011001101001100101101, 001011001101, 001)

  1. Duval, Jean-Pierre (1983), "Factorizing words over an ordered alphabet", Journal of Algorithms, 4 (4): 363–381, doi:10.1016/0196-6774(83)90017-2
  2. 2.0 2.1 https://cp-algorithms.com/string/lyndon_factorization.html