Word break problem

From Rosetta Code
Word break problem is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.
Task
Given an input string and a dictionary of words, segment the input string into a space-separated sequence of dictionary words if possible.

Perl 6[edit]

Works with: Rakudo version 2017.04

This implementation does not necessarily find every combination, it returns the one with the longest matching tokens.

my @words = <a bc abc cd b>;
my $regex = @words.join('|');
 
put "$_: ", word-break($_) for <abcd abbc abcbcd acdbc abcdd>;
 
sub word-break (Str $word) { ($word ~~ / ^ (<$regex>)+ $ /)[0] // "Not possible" }
Output:
abcd: a b cd
abbc: a b bc
abcbcd: abc b cd
acdbc: a cd bc
abcdd: Not possible

REXX[edit]

This REXX version allows the words to be tested (and the dictionary words) to be specified on the command line.

/*REXX program breaks up a  word (or string)  into a  list of words  from a dictionary. */
parse arg a '/' x; $=space($); x=space(x) /*get optional args; elide extra blanks*/
if a=='' | a=="," then a='abcd abbc abcbcd acdbc abcdd' /*maybe use the defaults ? */
if x=='' | x=="," then x='a bc abc cd b' /* " " " " */
na=words(a) /*the number of words to be tested. */
nx=words(x) /* " " " " " the dictionary*/
say nx 'dictionary words:' x /*display the words in the dictionary. */
say /*display a blank line to the terminal.*/
aw=0; do i=1 for na; _=word(a,i) /*obtain a word that will be tested. */
aw=max(aw,length(_)) /*find widest width word being tested. */
end /*i*/ /* [↑] AW is used to align the output*/
@.=0 /*initialize the dictionary to "null". */
xw=0; do i=1 for nx; _=word(x,i) /*obtain a word from the dictionary. */
xw=max(xw,length(_)); @._=1 /*find widest width dictionary word. */
end /*i*/ /* [↑] define a dictionary word. */
p=0 /* [↓] process a word in the A list.*/
do j=1 for na; yy=word(a,j) /*YY: test a word from the A list.*/
do t=(nx+1)**(xw+1) by -1 to 1 until y==''; y=yy /*try a word possibility.*/
$= /*nullify the (possible) result list. */
do try=t while y\='' /*keep testing until Y is exhausted. */
p=(try+p)//xw+1 /*use a possible width for this attempt*/
p=fw(y,p); if p==0 then iterate t /*is this part of the word not found ? */
$=$ ? /*It was found. Add partial to the list*/
y=substr(y,p+1) /*now, use and test the rest of word. */
end /*try*/
end /*t*/
 
if t==0 then $= '(not possible)' /*indicate that the word isn't possible*/
say right(yy,aw) '───►' strip($) /*display the result to the terminal. */
end /*j*/
exit /*stick a fork in it, we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────*/
fw: parse arg z,L; do k=L by -1 for L; ?=left(z,k); if @.? then leave; end; return k
output   when using the default inputs:
5 dictionary words: a bc abc cd b

  abcd ───► a b cd
  abbc ───► a b bc
abcbcd ───► abc b cd
 acdbc ───► a cd bc
 abcdd ───► (not possible)

Rust[edit]

Dynamic programming[edit]

use std::collections::HashSet;
fn create_string(s: &str, v: Vec<Option<usize>>) -> String {
let mut idx = s.len();
let mut slice_vec = vec![];
while let Some(prev) = v[idx] {
slice_vec.push(&s[prev..idx]);
idx = prev;
}
slice_vec.reverse();
slice_vec.join(" ")
 
 
}
 
fn word_break(s: &str, dict: HashSet<&str>) -> Option<String> {
let size = s.len() + 1;
let mut possible = vec![None; size];
 
let check_word = |i,j| dict.get(&s[i..j]).map(|_| i);
 
for i in 1..size {
possible[i] = possible[i].or_else(|| check_word(0,i));
 
if possible[i].is_some() {
for j in i+1..size {
possible[j] = possible[j].or_else(|| check_word(i,j));
}
 
if possible[s.len()].is_some() {
return Some(create_string(s, possible));
}
 
};
}
None
}
 
fn main() {
let mut set = HashSet::new();
set.insert("a");
set.insert("bc");
set.insert("abc");
set.insert("cd");
set.insert("b");
println!("{:?}", word_break("abcd", set).unwrap());
}
Output:
"a b cd"