Word break problem: Difference between revisions
m
syntax highlighting fixup automation
Thundergnat (talk | contribs) m (syntax highlighting fixup automation) |
|||
Line 9:
{{trans|D}}
<
String val
[String] parsed
Line 44:
d = [‘abc’, ‘a’, ‘ac’, ‘b’, ‘c’, ‘cb’, ‘d’]
process(d, [‘abcd’, ‘abbc’, ‘abcbcd’, ‘acdbc’, ‘abcdd’])</
{{out}}
Line 78:
=={{header|Aime}}==
<
wordbreak(record dict, text phrase, integer p, list words)
{
Line 133:
return 0;
}</
{{Out}}
<pre>abcd: a b cd
Line 143:
=={{header|C++}}==
{{trans|Rust}}
<
#include <iostream>
#include <optional>
Line 229:
return 0;
}
</syntaxhighlight>
{{out}}
Line 238:
=={{header|D}}==
{{trans|Java}}
<
import std.range;
import std.stdio;
Line 297:
parsed = p;
}
}</
{{out}}
<pre>String = aab, Dictionary = ["a", "aa", "b", "ab", "aab"]. Solutions = 4
Line 330:
{{libheader| System.Generics.Collections}}
{{Trans|Go}}
<syntaxhighlight lang="delphi">
program Word_break_problem;
Line 418:
Readln;
end.</
=={{header|Go}}==
<
import (
Line 471:
}
}
}</
{{out}}
<pre>
Line 483:
=={{header|Haskell}}==
{{Trans|Javascript}}
<
import Data.Tree (Tree(..))
Line 516:
main :: IO ()
main = (putStrLn . unlines) $ wordBreaks ws <$> texts</
{{Out}}
<pre>abcd:
Line 537:
With such short sentences we can find the partition sets, then check that all are words.
<syntaxhighlight lang="j">
all_partitions=: <@(<;.1)"1 _~ (1,.[:#:[:i.2^<:@:#) NB. all_partitions 'abcd'
word_break=: ([ #~ 0 = [: #&>@:] -.L:_1 _)~ all_partitions
main=: (] , (;:inv L:_1@:word_break >))"_ 0 boxopen
</syntaxhighlight>
<pre>
Line 595:
=={{header|Java}}==
Accept string to be parsed, and dictionary of words, as method arguments.
<syntaxhighlight lang="java">
import java.util.ArrayList;
import java.util.Arrays;
Line 665:
}
</syntaxhighlight>
{{out}}
<pre>
Line 700:
Composing a solution from generic functions.
{{Trans|Python}}
<
'use strict';
Line 811:
// MAIN ---
return main();
})();</
{{Out}}
<pre>abcd:
Line 842:
used to determine the number of possible parses of the string "totalreturn".
<syntaxhighlight lang="jq">
def words: ["a", "bc", "abc", "cd", "b"];
def strings: ["abcd", "abbc", "abcbcd", "acdbc", "abcdd"];
Line 888:
demo1, " ", demo2, "", demo3
</syntaxhighlight>
{{out}}
<pre>
Line 908:
=={{header|Julia}}==
Some extra loops to record and print all solutions.<
words = ["a", "bc", "abc", "cd", "b"]
strings = ["abcd", "abbc", "abcbcd", "acdbc", "abcdd"]
Line 935:
end
wordbreak()</
{{output}}<pre>
1 Solution(s) for abcd:
Line 950:
=={{header|Kotlin}}==
I've downloaded the free dictionary at http://www.puzzlers.org/pub/wordlists/unixdict.txt for this task. All single letters from 'a' to 'z' are considered to be words by this dictionary but 'bc' and 'cd' which I'd have expected to be present are not.
<
import java.io.File
Line 997:
println()
}
}</
{{out}}
Line 1,023:
=={{header|Lua}}==
<
-- possible candidates for this particalur problem
function genDict(ws)
Line 1,088:
for i=1,#test do
print(test[i],wordbreak(test[i]))
end</
{{out}}
<pre>abcd a b cd
Line 1,099:
We provide two ways to initialize the dictionary: either explicitly by providing the list of words or by providing a file name from which extract the words. In the second case, it is possible to specify the minimal length of the words to keep. This is useful mainly to eliminate all the one letter words we get with “unixdict.txt”.
<
type
Line 1,146:
dict = initDict("unixdict.txt", 2)
dict.breakWord("because")
dict.breakWord("software")</
{{out}}
Line 1,174:
=={{header|Perl}}==
<
use warnings;
Line 1,186:
@matches = $word =~ /^ ($one_of) ($one_of)? ($one_of)? ($one_of)? $/x; # sub-optimal: limited number of matches
return join(' ', grep {$_} @matches) || "(not possible)";
}</
{{out}}
<pre>a: a
Line 1,203:
=={{header|Phix}}==
The distributed version also contains a few experiments with applying a real dictionary, largely unsuccessful.
<!--<
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Word_break_problem.exw
Line 1,316:
<span style="color: #0000FF;">{}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">wait_key</span><span style="color: #0000FF;">()</span>
<!--</
{{out}}
<pre>
Line 1,332:
===Non-determinism using member/2===
<code>s/2</code> checks all possible combinations. It uses <code>member/2</code> to select some element from Dict and backtracks if no solution is found.
<
Tests = [["aab", ["a", "aa", "b", "ab", "aab"]],
["abba", ["a", "aa", "b", "ab", "aab"]],
Line 1,356:
S := S ++ [D]
end,
S.flatten==String.</
{{out}}
Line 1,385:
===Non-determinism using member/2 and append/3===
<code>s2/2</code> is more efficient. It uses <code>append/3</code> to extract the found prefix from the string.
<
String2 = copy_term(String),
S = [],
Line 1,400:
S := S ++ [D]
end,
S.flatten==String.</
===Using recursion===
<code>s3/2</code> use the same idea as <code>s2/2</code> but use recursion. Neater and more efficient.
<
s3(Dict,String,S).
Line 1,411:
member(E,Dict),
append(E,String2,String),
s3(Dict,String2,S).</
===Some tests===
Here is the test engine.
<
garbage_collect(300_000_000),
_ = random2(),
Line 1,435:
end,
println(len=All.len),
nl.</
As can be seen by this summary, <code>s3/2</code> is the most efficient of these versions. <code>s/2</code> is too slow but for the simplest examples.
Line 1,476:
=={{header|PicoLisp}}==
<
(setq *Dict2
(quote
Line 1,516:
(println (word "samsungandmango" *Dict2))
(println (word "samsungandmangok" *Dict2))
(println (word "ksamsungandmango" *Dict2))</
{{out}}
<pre>
Line 1,538:
{{Works with|Python|3.7}}
<
from itertools import (chain)
Line 1,640:
# MAIN ---
if __name__ == '__main__':
main()</
{{Out}}
<pre>abcd:
Line 1,657:
This returns all the possible splits (and null list if none is possible). Who's to say which is the best?
<
(define render-phrases pretty-print)
Line 1,688:
(render-phrases (word-splits "samsungandmango" dict-2))
(render-phrases (word-splits "samsungandmangok" dict-2))
(render-phrases (word-splits "ksamsungandmango" dict-2)))</
{{out}}
<pre>'("a b cd")
Line 1,710:
This implementation does not necessarily find ''every'' combination, it returns the one with the longest matching tokens.
<syntaxhighlight lang="raku"
my $regex = @words.join('|');
put "$_: ", word-break($_) for <abcd abbc abcbcd acdbc abcdd>;
sub word-break (Str $word) { ($word ~~ / ^ (<$regex>)+ $ /)[0] // "Not possible" }</
{{out}}
<pre>abcd: a b cd
Line 1,725:
=={{header|REXX}}==
This REXX version allows the words to be tested (and the dictionary words) to be specified on the command line.
<
parse arg a '/' x; a=space(a); x=space(x) /*get optional args; elide extra blanks*/
if a=='' | a=="," then a= 'abcd abbc abcbcd acdbc abcdd' /*Not specififed? Use default*/
Line 1,759:
exit /*stick a fork in it, we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────*/
fw: parse arg z,L; do k=L by -1 for L; ?=left(z,k); if @.? then leave; end; return k</
{{out|output|text= when using the default inputs:}}
<pre>
Line 1,772:
=={{header|Ring}}==
<
# Project : Word break problem
Line 1,837:
next
next
</syntaxhighlight>
Output:
<pre>
Line 1,848:
=={{header|Ruby}}==
<
def split_text_with_dict(text, dict, splited=[])
solutions = []
Line 1,865:
return solutions
end
</syntaxhighlight>
{{out}}
Line 1,883:
=={{header|Rust}}==
===Dynamic programming===
<
fn create_string(s: &str, v: Vec<Option<usize>>) -> String {
let mut idx = s.len();
Line 1,928:
set.insert("b");
println!("{:?}", word_break("abcd", set).unwrap());
}</
{{output}}
<pre>"a b cd"</pre>
Line 1,935:
===First solution===
Finds all possible solutions recursively, using a trie representation of the dictionary:
<
def add(s: String): TrieNode = s match {
case "" => copy(isWord = true)
Line 1,968:
def wordBreak(s: String, dict: TrieNode): List[List[String]] = {
wordBreakRec(s, dict, dict, "")
}</
Calling it with some example strings:
<
val testCases = List("abcd", "abbc", "abcbcd", "acdbc", "abcdd")
for (s <- testCases) {
Line 1,978:
println("\t" + words.mkString(" "))
}
}</
{{out}}
<pre>abcd has 1 solution(s):
Line 1,992:
===Combined solution===
{{Out}}Best seen running in your browser either by [https://scalafiddle.io/sf/49YwsD5/1 ScalaFiddle (ES aka JavaScript, non JVM)] or [https://scastie.scala-lang.org/L47OzgmjSkOnQ5wfrZ5snQ Scastie (remote JVM)].
<
val dict = buildTrie("a", "bc", "abc", "cd", "b")
lazy val empty = TrieNode(isWord = false, Map.empty) // lazy or in a companion object
Line 2,041:
})
}</
=={{header|Seed7}}==
<
const func boolean: wordBreak (in string: stri, in array string: words, in string: resultList) is func
Line 2,077:
end if;
end for;
end func;</
{{out}}
Line 2,091:
=={{header|Sidef}}==
{{trans|zkl}}
<
var r = ->(str, arr=[]) {
Line 2,113:
for str in (strs) {
printf("%11s: %s\n", str, word_break(str, words) \\ '(not possible)')
}</
{{out}}
<pre>
Line 2,130:
{{trans|Rust}}
<
@inlinable
Line 2,201:
print("\(test):")
print(" \(wordBreak(str: test, dict: words) ?? "did not parse with given words")")
}</
{{out}}
Line 2,218:
=={{header|Tailspin}}==
Does a depth-first search (after generating all possibilities on the current level). We could stop looking further after one is found, just add a condition to do nothing if a done-flag is set.
<
templates wordBreaks&{dict:}
composer starts&{with:}
Line 2,243:
'ababab' -> wordBreaks&{dict: ['a', 'ab', 'bab']} -> !OUT::write
'abcbab' -> wordBreaks&{dict: ['a', 'ab', 'bab']} -> !OUT::write
</syntaxhighlight>
{{out}}
<pre>
Line 2,257:
=={{header|Visual Basic .NET}}==
{{trans|Java}}
<
Structure Node
Line 2,329:
End Sub
End Module</
{{out}}
<pre>String = aab, Dictionary = System.Collections.Generic.List`1[System.String]. Solutions = 4
Line 2,362:
{{trans|Go}}
{{libheader|Wren-fmt}}
<
class Prefix {
Line 2,401:
System.print("can't break")
}
}</
{{out}}
Line 2,413:
=={{header|zkl}}==
<
words=words.split(" "); // to list of words
r:=fcn(str,words,sink){ // recursive search, easy to collect answer
Line 2,427:
if(False==r) return("not possible");
r.reverse().concat(" ")
}</
<
println(text,": ",wordBreak(text,"a bc abc cd b"))
}</
{{out}}
<pre>
|