Multisplit

From Rosetta Code
Jump to: navigation, search
Task
Multisplit
You are encouraged to solve this task according to the task description, using any language you may know.
It is often necessary to split a string into pieces based on several different (potentially multi-character) separator strings, while still retaining the information about which separators were present in the input. This is particularly useful when doing small parsing tasks. The task is to write code to demonstrate this.

The function (or procedure or method, as appropriate) should take an input string and an ordered collection of separators. The order of the separators is significant: The delimiter order represents priority in matching, with the first defined delimiter having the highest priority. In cases where there would be an ambiguity as to which separator to use at a particular point (e.g., because one separator is a prefix of another) the separator with the highest priority should be used. Delimiters can be reused and the output from the function should be an ordered sequence of substrings.

Test your code using the input string “a!===b=!=c” and the separators “==”, “!=” and “=”.

For these inputs the string should be parsed as "a" (!=) "" (==) "b" (=) "" (!=) "c", where matched delimiters are shown in parentheses, and separated strings are quoted, so our resulting output is "a", empty string, "b", empty string, "c". Note that the quotation marks are shown for clarity and do not form part of the output.

Extra Credit: provide information that indicates which separator was matched at each separation point and where in the input string that separator was matched.

Contents

[edit] Ada

multisplit.adb:

with Ada.Containers.Indefinite_Doubly_Linked_Lists;
with Ada.Text_IO;
 
procedure Multisplit is
package String_Lists is new Ada.Containers.Indefinite_Doubly_Linked_Lists
(Element_Type => String);
use type String_Lists.Cursor;
 
function Split
(Source  : String;
Separators : String_Lists.List)
return String_Lists.List
is
Result  : String_Lists.List;
Next_Position  : Natural := Source'First;
Prev_Position  : Natural := Source'First;
Separator_Position : String_Lists.Cursor;
Separator_Length  : Natural;
Changed  : Boolean;
begin
loop
Changed  := False;
Separator_Position := Separators.First;
while Separator_Position /= String_Lists.No_Element loop
Separator_Length :=
String_Lists.Element (Separator_Position)'Length;
if Next_Position + Separator_Length - 1 <= Source'Last
and then Source
(Next_Position .. Next_Position + Separator_Length - 1) =
String_Lists.Element (Separator_Position)
then
if Next_Position > Prev_Position then
Result.Append
(Source (Prev_Position .. Next_Position - 1));
end if;
Result.Append (String_Lists.Element (Separator_Position));
Next_Position := Next_Position + Separator_Length;
Prev_Position := Next_Position;
Changed  := True;
exit;
end if;
Separator_Position := String_Lists.Next (Separator_Position);
end loop;
if not Changed then
Next_Position := Next_Position + 1;
end if;
if Next_Position > Source'Last then
Result.Append (Source (Prev_Position .. Source'Last));
exit;
end if;
end loop;
return Result;
end Split;
 
Test_Input  : constant String := "a!===b=!=c";
Test_Separators : String_Lists.List;
Test_Result  : String_Lists.List;
Pos  : String_Lists.Cursor;
begin
Test_Separators.Append ("==");
Test_Separators.Append ("!=");
Test_Separators.Append ("=");
Test_Result := Split (Test_Input, Test_Separators);
Pos  := Test_Result.First;
while Pos /= String_Lists.No_Element loop
Ada.Text_IO.Put (" " & String_Lists.Element (Pos));
Pos := String_Lists.Next (Pos);
end loop;
Ada.Text_IO.New_Line;
-- other order of separators
Test_Separators.Clear;
Test_Separators.Append ("=");
Test_Separators.Append ("!=");
Test_Separators.Append ("==");
Test_Result := Split (Test_Input, Test_Separators);
Pos  := Test_Result.First;
while Pos /= String_Lists.No_Element loop
Ada.Text_IO.Put (" " & String_Lists.Element (Pos));
Pos := String_Lists.Next (Pos);
end loop;
end Multisplit;

output:

 a != == b = != c
 a != = = b = != c

[edit] AutoHotkey

Str := "a!===b=!=c"
Sep := ["==","!=", "="]
Res := StrSplit(Str, Sep)
for k, v in Res
Out .= (Out?",":"") v
MsgBox % Out
for k, v in Sep
N .= (N?"|":"") "\Q" v "\E"
MsgBox % RegExReplace(str, "(.*?)(" N ")", "$1 {$2}")
Outputs:
a,,b,,c
a {!=} {==}b {=} {!=}c


[edit] AWK

 
# syntax: GAWK -f MULTISPLIT.AWK
BEGIN {
str = "a!===b=!=c"
sep = "(==|!=|=)"
printf("str: %s\n",str)
printf("sep: %s\n\n",sep)
n = split(str,str_arr,sep,sep_arr)
printf("parsed: ")
for (i=1; i<=n; i++) {
printf("'%s'",str_arr[i])
if (i<n) { printf(" '%s' ",sep_arr[i]) }
}
printf("\n\nstrings: ")
for (i=1; i<=n; i++) {
printf("'%s' ",str_arr[i])
}
printf("\n\nseparators: ")
for (i=1; i<n; i++) {
printf("'%s' ",sep_arr[i])
}
printf("\n")
exit(0)
}
 

output:

str: a!===b=!=c
sep: (==|!=|=)

parsed: 'a' '!=' '' '==' 'b' '=' '' '!=' 'c'

strings: 'a' '' 'b' '' 'c'

separators: '!=' '==' '=' '!='

[edit] BBC BASIC

      DIM sep$(2)
sep$() = "==", "!=", "="
PRINT "String splits into:"
PRINT FNmultisplit("a!===b=!=c", sep$(), FALSE)
PRINT "For extra credit:"
PRINT FNmultisplit("a!===b=!=c", sep$(), TRUE)
END
 
DEF FNmultisplit(s$, d$(), info%)
LOCAL d%, i%, j%, m%, p%, o$
p% = 1
REPEAT
m% = LEN(s$)
FOR i% = 0 TO DIM(d$(),1)
d% = INSTR(s$, d$(i%), p%)
IF d% IF d% < m% m% = d% : j% = i%
NEXT
IF m% < LEN(s$) THEN
o$ += """" + MID$(s$, p%, m%-p%) + """"
IF info% o$ += " (" + d$(j%) + ") " ELSE o$ += ", "
p% = m% + LEN(d$(j%))
ENDIF
UNTIL m% = LEN(s$)
= o$ + """" + MID$(s$, p%) + """"

Output:

String splits into:
"a", "", "b", "", "c"
For extra credit:
"a" (!=) "" (==) "b" (=) "" (!=) "c"

[edit] Bracmat

This is a surprisingly difficult task to solve in Bracmat, because in a naive solution using a alternating pattern ("=="|"!="|"=") the shorter pattern "=" would have precedence over "==". In the solution below the function oneOf iterates (by recursion) over the operators, trying to match the start of the current subject string sjt with one operator at a time, until success or reaching the end of the list with operators, whichever comes first. If no operator is found at the start of the current subject string, the variable nonOp is extended with one byte, thereby shifting the start of the current subject string one byte to the right. Then a new attempt is made to find an operator. This is repeated until either an operator is found, in which case the unparsed string is restricted to the part of the input after the found operator, or no operator is found, in which case the whl loop terminates.

( ( oneOf
= operator
.  !arg:%?operator ?arg
& ( @(!sjt:!operator ?arg)&(!operator.!arg)
| oneOf$!arg
)
)
& "a!===b=!=c":?unparsed
& "==" "!=" "=":?operators
& whl
' ( @( !unparsed
 : ?nonOp [%(oneOf$!operators:(?operator.?unparsed))
)
& put$(!nonOp str$("{" !operator "} "))
)
& put$!unparsed
& put$\n
);

Output:

a {!=} {==} b {=} {!=} c

[edit] C

What kind of silly parsing is this?

#include <stdio.h>
#include <string.h>
 
void parse_sep(const char *str, const char *const *pat, int len)
{
int i, slen;
while (*str != '\0') {
for (i = 0; i < len || !putchar(*(str++)); i++) {
slen = strlen(pat[i]);
if (strncmp(str, pat[i], slen)) continue;
printf("{%.*s}", slen, str);
str += slen;
break;
}
}
}
 
int main()
{
const char *seps[] = { "==", "!=", "=" };
parse_sep("a!===b=!=c", seps, 3);
 
return 0;
}
output
a{!=}{==}b{=}{!=}c

[edit] C++

using the Boost library tokenizer!

#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
 
int main( ) {
std::string str( "a!===b=!=c" ) , output ;
typedef boost::tokenizer<boost::char_separator<char> > tokenizer ;
boost::char_separator<char> separator ( "==" , "!=" ) , sep ( "!" ) ;
tokenizer mytok( str , separator ) ;
tokenizer::iterator tok_iter = mytok.begin( ) ;
for ( ; tok_iter != mytok.end( ) ; ++tok_iter )
output.append( *tok_iter ) ;
tokenizer nexttok ( output , sep ) ;
for ( tok_iter = nexttok.begin( ) ; tok_iter != nexttok.end( ) ;
++tok_iter )
std::cout << *tok_iter << " " ;
std::cout << '\n' ;
return 0 ;
}

Output:

a b c

[edit] C#

Extra Credit Solution

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
 
namespace Multisplit
{
internal static class Program
{
private static void Main(string[] args)
{
foreach (var s in "a!===b=!=c".Multisplit(true, "==", "!=", "=")) // Split the string and return the separators.
{
Console.Write(s); // Write the returned substrings and separators to the console.
}
Console.WriteLine();
}
 
private static IEnumerable<string> Multisplit(this string s, bool returnSeparators = false,
params string[] delimiters)
{
var currentString = new StringBuilder(); /* Initiate the StringBuilder. This will hold the current string to return
* once we find a separator. */

 
int index = 0; // Initiate the index counter at 0. This tells us our current position in the string to read.
 
while (index < s.Length) // Loop through the string.
{
// This will get the highest priority separator found at the current index, or null if there are none.
string foundDelimiter =
(from delimiter in delimiters
where s.Length >= index + delimiter.Length &&
s.Substring(index, delimiter.Length) == delimiter
select delimiter).FirstOrDefault();
 
if (foundDelimiter != null)
{
yield return currentString.ToString(); // Return the current string.
if (returnSeparators) // Return the separator, if the user specified to do so.
yield return
string.Format("{{\"{0}\", ({1}, {2})}}",
foundDelimiter,
index, index + foundDelimiter.Length);
currentString.Clear(); // Clear the current string.
index += foundDelimiter.Length; // Move the index past the current separator.
}
else
{
currentString.Append(s[index++]); // Add the character at this index to the current string.
}
}
 
if (currentString.Length > 0)
yield return currentString.ToString(); // If we have anything left over, return it.
}
}
}

Sample Output

a{"!=", (1, 3)}{"==", (3, 5)}b{"=", (6, 7)}{"!=", (7, 9)}c

[edit] CoffeeScript

 
multi_split = (text, separators) ->
# Split text up, using separators to break up text and discarding
# separators.
#
# Returns an array of strings, which can include empty strings when
# separators are found either adjacent to each other or at the
# beginning/end of the text.
#
# Separators have precedence, according to their order in the array,
# and each separator should be at least one character long.
result = []
i = 0
s = ''
while i < text.length
found = false
for separator in separators
if text.substring(i, i + separator.length) == separator
found = true
i += separator.length
result.push s
s = ''
break
if !found
s += text[i]
i += 1
result.push s
result
 
console.log multi_split 'a!===b=!=c', ['==', '!=', '='] # [ 'a', '', 'b', '', 'c' ]
console.log multi_split '', ['whatever'] # [ '' ]
 

[edit] D

import std.stdio, std.array, std.algorithm;
 
string[] multiSplit(in string s, in string[] divisors) pure nothrow {
string[] result;
auto rest = s.idup;
 
while (true) {
bool done = true;
string delim;
{
string best;
foreach (const div; divisors) {
const maybe = rest.find(div);
if (maybe.length > best.length) {
best = maybe;
delim = div;
done = false;
}
}
}
result.length++;
if (done) {
result.back = rest.idup;
return result;
} else {
const t = rest.findSplit(delim);
result.back = t[0].idup;
rest = t[2];
}
}
}
 
void main() {
"a!===b=!=c"
.multiSplit(["==", "!=", "="])
.join(" {} ")
.writeln;
}

Output (separator locations indicated by braces):

a {}  {} b {}  {} c

[edit] Erlang

20> re:split("a!===b=!=c", "==|!=|=",[{return, list}]).
["a",[],"b",[],"c"]

[edit] F#

If we ignore the "Extra Credit" requirements and skip 'ordered separators' condition (i.e. solving absolute different task), this is exactly what one of the overloads of .NET's String.Split method does. Using F# Interactive:

> "a!===b=!=c".Split([|"=="; "!="; "="|], System.StringSplitOptions.None);;
val it : string [] = [|"a"; ""; "b"; ""; "c"|]
 
> "a!===b=!=c".Split([|"="; "!="; "=="|], System.StringSplitOptions.None);;
val it : string [] = [|"a"; ""; ""; "b"; ""; "c"|]

System.StringSplitOptions.None specifies that empty strings should be included in the result.

[edit] Go

package main
 
import (
"fmt"
"strings"
)
 
func ms(txt string, sep []string) (ans []string) {
for txt > "" {
sepMatch := ""
posMatch := len(txt)
for _, s := range sep {
if p := strings.Index(txt, s); p >= 0 && p < posMatch {
sepMatch = s
posMatch = p
}
}
ans = append(ans, txt[:posMatch])
txt = txt[posMatch+len(sepMatch):]
}
return
}
 
func main() {
fmt.Printf("%q\n", ms("a!===b=!=c", []string{"==", "!=", "="}))
}

Output:

["a" "" "b" "" "c"]

[edit] Icon and Unicon

procedure main()
s := "a!===b=!=c"
# just list the tokens
every writes(multisplit(s,["==", "!=", "="])," ") | write()
 
# list tokens and indices
every ((p := "") ||:= t := multisplit(s,sep := ["==", "!=", "="])) | break write() do
if t == !sep then writes(t," (",*p+1-*t,") ") else writes(t," ")
 
end
 
procedure multisplit(s,L)
s ? while not pos(0) do {
t := =!L | 1( arb(), match(!L)|pos(0) )
suspend t
}
end
 
procedure arb()
suspend .&subject[.&pos:&pos <- &pos to *&subject + 1]
end

Sample Output:

a != == b = != c
a != (2) == (4) b = (7) != (8) c

[edit] J

multisplit=: 4 :0
'sep begin'=. |: t=. y /:~&.:(|."1)@;@(i.@#@[ ,.L:0"0 I.@E.L:0) x
end=. begin + sep { #@>y
last=. next=. 0
r=. 2 0$0
while. next<#begin do.
r=. r,.(last}.x{.~next{begin);next{t
last=. next{end
next=. 1 i.~(begin>next{begin)*.begin>:last
end.
r=. r,.'';~last}.x
)

Explanation:

First find all potentially relevant separator instances, and sort them in increasing order, by starting location and separator index. sep is separator index, and begin is starting location. end is ending location.

Then, loop through the possibilities, skipping over those separators which would overlap with previously used separators.

The result consists of two rows: The first row is the extracted substrings, the second row is the "extra credit" part -- for each extracted substring, the numbers in the second row are the separator index (0 for the first index, 1 for the second, ...), and the location in the original string where the separator appeared. Note that the very last substring does not have a separator following it, so the extra credit part is blank for that substring.

Example use:

   S=: 'a!===b=!=c'
S multisplit '==';'!=';'='
┌───┬───┬───┬───┬─┐
│a │ │b │ │c│
├───┼───┼───┼───┼─┤
1 10 32 61 7│ │
└───┴───┴───┴───┴─┘
S multisplit '=';'!=';'=='
┌───┬───┬───┬───┬───┬─┐
│a │ │ │b │ │c│
├───┼───┼───┼───┼───┼─┤
1 10 30 40 61 7│ │
└───┴───┴───┴───┴───┴─┘
'X123Y' multisplit '1';'12';'123';'23';'3'
┌───┬───┬─┐
│X │ │Y│
├───┼───┼─┤
0 13 2│ │
└───┴───┴─┘

[edit] JavaScript

Based on Ruby example.

Library: Underscore.js
RegExp.escape = function(text) {
return text.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
}
 
multisplit = function(string, seps) {
var sep_regex = RegExp(_.map(seps, function(sep) { return RegExp.escape(sep); }).join('|'));
return string.split(sep_regex);
}

[edit] Mathematica

Just use the build-in function "StringSplit":

StringSplit["a!===b=!=c", {"==", "!=", "="}]
Output:
{a,,b,,c}

[edit] Nimrod

import strutils
 
iterator tokenize(text, sep): tuple[token: string, isSep: bool] =
var i, lastMatch = 0
while i < text.len:
for j, s in sep:
if text[i..text.high].startsWith s:
if i > lastMatch: yield (text[lastMatch .. <i], false)
yield (s, true)
lastMatch = i + s.len
i += s.high
break
inc i
if i > lastMatch: yield (text[lastMatch .. <i], false)
 
for token, isSep in "a!===b=!=c".tokenize(["==", "!=", "="]):
if isSep: stdout.write '{',token,'}'
else: stdout.write token
echo ""

Output:

a{!=}{==}b{=}{!=}c

[edit] Perl

sub multisplit {
my ($sep, $string, %opt) = @_ ;
$sep = join '|', map quotemeta($_), @$sep;
$sep = "($sep)" if $opt{keep_separators};
split /$sep/, $string, -1;
}
 
print "'$_' " for multisplit ['==','!=','='], "a!===b=!=c";
print "\n";
print "'$_' " for multisplit ['==','!=','='], "a!===b=!=c", keep_separators => 1;
print "\n";
Output:
'a' '' 'b' '' 'c' 
'a' '!=' '' '==' 'b' '=' '' '!=' 'c' 

[edit] Perl 6

sub multisplit($str, @seps) { $str.split(/ ||@seps /, :all) }
 
my @chunks = multisplit( 'a!===b=!=c==d', < == != = > );
 
# Print the strings.
say @chunks».Str.perl;
 
# Print the positions of the separators.
for grep Match, @chunks -> $s {
say " $s from $s.from() to $s.to()";
}

Output:

("a", "!=", "", "==", "b", "=", "", "!=", "c", "==", "d")
  !=	from 1 to 3
  ==	from 3 to 5
  =	from 6 to 7
  !=	from 7 to 9
  ==	from 10 to 12

Using the array @seps in a pattern automatically does alternation. By default this would do longest-term matching (that is, | semantics), but we can force it to do left-to-right matching by embedding the array in a short-circuit alternation (that is, || semantics). As it happens, with the task's specified list of separators, it doesn't make any difference.

Perl 6 automatically returns Match objects that will stringify to the matched pattern, but can also be interrogated for their match positions, as illustrated above by post-processing the results two different ways.

[edit] PicoLisp

(de multisplit (Str Sep)
(setq Sep (mapcar chop Sep))
(make
(for (S (chop Str) S)
(let L
(make
(loop
(T (find head Sep (circ S))
(link
(list
(- (length Str) (length S))
(pack (cut (length @) 'S)) ) ) )
(link (pop 'S))
(NIL S (link NIL)) ) )
(link (pack (cdr (rot L))))
(and (car L) (link @)) ) ) ) )
 
(println (multisplit "a!===b=!=c" '("==" "!=" "=")))
(println (multisplit "a!===b=!=c" '("=" "!=" "==")))

Output:

("a" (1 "!=") NIL (3 "==") "b" (6 "=") NIL (7 "!=") "c")
("a" (1 "!=") NIL (3 "=") NIL (4 "=") "b" (6 "=") NIL (7 "!=") "c")

[edit] Pike

string input = "a!===b=!=c";
array sep = ({"==", "!=", "=" });
 
array result = replace(input, sep, `+("\0", sep[*], "\0"))/"\0";
result;
Result: ({ "a", "!=", "", "==", "b", "=", "", "!=", "c" })
 
int pos = 0;
foreach(result; int index; string data)
{
if ((<"==", "!=", "=">)[data])
result[index] = ({ data, pos });
pos+=sizeof(data);
}
 
result;
Result: ({"a", ({"!=", 1}), "", ({"==", 3}), "b", ({"=", 6}), "", ({"!=", 7}), "c"})

[edit] Prolog

Works with SWI-Prolog.

multisplit(_LSep, '') -->
{!},
[].
 
multisplit(LSep, T) -->
{next_sep(LSep, T, [], Token, Sep, T1)},
( {Token \= '' },[Token], {!}; []),
( {Sep \= '' },[Sep], {!}; []),
multisplit(LSep, T1).
 
next_sep([], T, Lst, Token, Sep, T1) :-
% if we can't find any separator, the game is over
( Lst = [] ->
Token = T, Sep = '', T1 = '';
 
% we sort the list to get nearest longest separator
predsort(my_sort, Lst, [(_,_, Sep)|_]),
atomic_list_concat([Token|_], Sep, T),
atom_concat(Token, Sep, Tmp),
atom_concat(Tmp, T1, T)).
 
next_sep([HSep|TSep], T, Lst, Token, Sep, T1) :-
sub_atom(T, Before, Len, _, HSep),
next_sep(TSep, T, [(Before, Len,HSep) | Lst], Token, Sep, T1).
 
next_sep([_HSep|TSep], T, Lst, Token, Sep, T1) :-
next_sep(TSep, T, Lst, Token, Sep, T1).
 
 
my_sort(<, (N1, _, _), (N2, _, _)) :-
N1 < N2.
 
my_sort(>, (N1, _, _), (N2, _, _)) :-
N1 > N2.
 
my_sort(>, (N, N1, _), (N, N2, _)) :-
N1 < N2.
 
my_sort(<, (N, N1, _), (N, N2, _)) :-
N1 > N2.
 

Output :

?- multisplit(['==', '!=', '='], 'ax!===b=!=c', Lst, []).
Lst = [ax,'!=',==,b,=,'!=',c] .

[edit] Python

[edit] Using Regular expressions

>>> import re
>>> def ms2(txt="a!===b=!=c", sep=["==", "!=", "="]):
if not txt or not sep:
return []
ans = m = []
for m in re.finditer('(.*?)(?:' + '|'.join('('+re.escape(s)+')' for s in sep) + ')', txt):
ans += [m.group(1), (m.lastindex-2, m.start(m.lastindex))]
if m and txt[m.end(m.lastindex):]:
ans += [txt[m.end(m.lastindex):]]
return ans
 
>>> ms2()
['a', (1, 1), '', (0, 3), 'b', (2, 6), '', (1, 7), 'c']
>>> ms2(txt="a!===b=!=c", sep=["=", "!=", "=="])
['a', (1, 1), '', (0, 3), '', (0, 4), 'b', (0, 6), '', (1, 7), 'c']

[edit] Not using RE's

Inspired by C-version

def multisplit(text, sep):
lastmatch = i = 0
matches = []
while i < len(text):
for j, s in enumerate(sep):
if text[i:].startswith(s):
if i > lastmatch:
matches.append(text[lastmatch:i])
matches.append((j, i)) # Replace the string containing the matched separator with a tuple of which separator and where in the string the match occured
lastmatch = i + len(s)
i += len(s)
break
else:
i += 1
if i > lastmatch:
matches.append(text[lastmatch:i])
return matches
 
>>> multisplit('a!===b=!=c', ['==', '!=', '='])
['a', (1, 1), (0, 3), 'b', (2, 6), (1, 7), 'c']
>>> multisplit('a!===b=!=c', ['!=', '==', '='])
['a', (0, 1), (1, 3), 'b', (2, 6), (0, 7), 'c']
 

Alternative version

def min_pos(List):
return List.index(min(List))
 
def find_all(S, Sub, Start = 0, End = -1, IsOverlapped = 0):
Res = []
if End == -1:
End = len(S)
if IsOverlapped:
DeltaPos = 1
else:
DeltaPos = len(Sub)
Pos = Start
while True:
Pos = S.find(Sub, Pos, End)
if Pos == -1:
break
Res.append(Pos)
Pos += DeltaPos
return Res
 
def multisplit(S, SepList):
SepPosListList = []
SLen = len(S)
SepNumList = []
ListCount = 0
for i, Sep in enumerate(SepList):
SepPosList = find_all(S, Sep, 0, SLen, IsOverlapped = 1)
if SepPosList != []:
SepNumList.append(i)
SepPosListList.append(SepPosList)
ListCount += 1
if ListCount == 0:
return [S]
MinPosList = []
for i in range(ListCount):
MinPosList.append(SepPosListList[i][0])
SepEnd = 0
MinPosPos = min_pos(MinPosList)
Res = []
while True:
Res.append( S[SepEnd : MinPosList[MinPosPos]] )
Res.append([SepNumList[MinPosPos], MinPosList[MinPosPos]])
SepEnd = MinPosList[MinPosPos] + len(SepList[SepNumList[MinPosPos]])
while True:
MinPosPos = min_pos(MinPosList)
if MinPosList[MinPosPos] < SepEnd:
del SepPosListList[MinPosPos][0]
if len(SepPosListList[MinPosPos]) == 0:
del SepPosListList[MinPosPos]
del MinPosList[MinPosPos]
del SepNumList[MinPosPos]
ListCount -= 1
if ListCount == 0:
break
else:
MinPosList[MinPosPos] = SepPosListList[MinPosPos][0]
else:
break
if ListCount == 0:
break
Res.append(S[SepEnd:])
return Res
 
 
S = "a!===b=!=c"
multisplit(S, ["==", "!=", "="]) # output: ['a', [1, 1], '', [0, 3], 'b', [2, 6], '', [1, 7], 'c']
multisplit(S, ["=", "!=", "=="]) # output: ['a', [1, 1], '', [0, 3], '', [0, 4], 'b', [0, 6], '', [1, 7], 'c']

[edit] Racket

 
#lang racket
(regexp-match* #rx"==|!=|=" "a!===b=!=c" #:gap-select? #t #:match-select values)
;; => '("a" ("!=") "" ("==") "b" ("=") "" ("!=") "c")
 

[edit] REXX

/*REXX program to split a string based on different separator strings.  */
parse arg ? /*get string from command line. */
if ?=='' then ? = 'a!===b=!=c' /*None specified? Use default.*/
say 'old string='? /*echo the old string to screen. */
zz = '0'x /*null char, can be most anything*/
seps = '== != =' /*a list of seperaters to be used*/
 
do j=1 for words(seps) /*parse string with all the seps.*/
sep=word(seps,j) /*pick a separater to use now. */
 
do k=1 for length(sep) /*parse for various sep versions.*/
sep=strip(insert(zz,sep,k),,zz) /*allow imbedded "nulls" in sep. */
 ?=changestr(sep,?,zz) /* ··· but not trailing "nulls". */
 
do until ?==??;  ??=? /*keep changing until no more chg*/
 ?=changestr(zz || zz, ?, zz) /*reduce replicated "nulls". */
end /*until···*/
 
sep=changestr(zz, sep, '') /*remove true nulls from the sep.*/
end /*k*/
end /*j*/
 
showNull = ' {} ' /*one last change, allow the ... */
?=changestr(zz,?,showNull) /*showing of "null" characters. */
say 'new string='? /*now, show and tell time. */
/*stick a fork in it, we're done.*/

Some older REXXes don't have a changestr bif, so one is included here ──► CHANGESTR.REX.

output when using the default input:

old string=a!===b=!=c
new string=a {} b {} c

[edit] Ruby

The simple method, using a regular expression to split the text.

text = 'a!===b=!=c'
separators = ['==', '!=', '=']
 
def multisplit_simple(text, separators)
sep_regex = Regexp.new(separators.collect {|sep| Regexp.escape(sep)}.join('|'))
text.split(sep_regex)
end
 
p multisplit_simple(text, separators)
["a", "", "b", "", "c"]
=> nil
p multisplit_simple(text, ['=', '!=', '=='])
["a", "", "", "b", "", "c"]
=> nil

The version that also returns the information about the separations.

def multisplit(text, separators)
sep_regex = Regexp.new(separators.collect {|sep| Regexp.escape(sep)}.join('|'))
separator_info = []
pieces = []
i = prev = 0
while i = text.index(sep_regex, i)
separator = Regexp.last_match(0)
pieces << text[prev .. i-1]
separator_info << [separator, i]
i = i + separator.length
prev = i
end
pieces << text[prev .. -1]
[pieces, separator_info]
end
 
p multisplit(text, separators)
# => [["a", "", "b", "", "c"], [["!=", 1], ["==", 3], ["=", 6], ["!=", 7]]]

Also demonstrating a method to rejoin the string given the separator information.

def multisplit_rejoin(info)
str = info[0].zip(info[1])[0..-2].inject("") {|str, (piece, (sep, idx))| str << piece << sep}
str << info[0].last
end
 
p multisplit_rejoin(multisplit(text, separators)) == text
# => true

[edit] Run BASIC

str$ = "a!===b=!=c" 
sep$ = "=== != =! b =!="
 
while word$(sep$,i+1," ") <> ""
i = i + 1
theSep$ = word$(sep$,i," ")
split$ = word$(str$,1,theSep$)
print i;" ";split$;" Sep By: ";theSep$
wend
Output:
1 a!     Sep By: ===
2 a      Sep By: !=
3 a!===b Sep By: =!
4 a!===  Sep By: b
5 a!===b Sep By: =!=

[edit] Scala

import scala.annotation.tailrec
def multiSplit(str:String, sep:Seq[String])={
def findSep(index:Int)=sep find (str startsWith (_, index))
@tailrec def nextSep(index:Int):(Int,Int)=
if(index>str.size) (index, 0) else findSep(index) match {
case Some(sep) => (index, sep.size)
case _ => nextSep(index + 1)
}
def getParts(start:Int, pos:(Int,Int)):List[String]={
val part=str slice (start, pos._1)
if(pos._2==0) List(part) else part :: getParts(pos._1+pos._2, nextSep(pos._1+pos._2))
}
getParts(0, nextSep(0))
}
 
println(multiSplit("a!===b=!=c", Seq("!=", "==", "=")))

Output:

List(a, , b, , c)

[edit] Tcl

This simple version does not retain information about what the separators were:

proc simplemultisplit {text sep} {
set map {}; foreach s $sep {lappend map $s "\uffff"}
return [split [string map $map $text] "\uffff"]
}
puts [simplemultisplit "a!===b=!=c" {"==" "!=" "="}]
Output:
a {} b {} c

However, to keep the match information a more sophisticated technique is best. Note that the most natural model of result here is to return the split substrings as a separate list to the match information (because the two collections of information are of different lengths).

proc multisplit {text sep} {
foreach s $sep {lappend sr [regsub -all {\W} $s {\\&}]}
set sepRE [join $sr "|"]
set pieces {}
set match {}
set start 0
while {[regexp -indices -start $start -- $sepRE $text found]} {
lassign $found x y
lappend pieces [string range $text $start [expr {$x-1}]]
lappend match [lsearch -exact $sep [string range $text {*}$found]] $x
set start [expr {$y + 1}]
}
return [list [lappend pieces [string range $text $start end]] $match]
}

Demonstration code:

set input "a!===b=!=c"
set matchers {"==" "!=" "="}
lassign [multisplit $input $matchers] substrings matchinfo
puts $substrings
puts $matchinfo

Output:

a {} b {} c
1 1 0 3 2 6 1 7

[edit] TXR

[edit] Using text-extraction pattern language

Here, the separators are embedded into the syntax rather than appearing as a datum. Nevertheless, this illustrates how to do that small tokenizing task with various separators.

The clauses of choose are applied in parallel, and all potentially match at the current position in the text. However :shortest tok means that only that clause survives (gets to propagate its bindings and position advancement) which minimizes the length of the string which is bound to the tok variable. The :gap 0 makes the horizontal collect repetitions strictly adjacent. This means that coll will quit when faced with a nonmatching suffix portion of the data rather than scan forward (no gap allowed!). This creates an opportunity for the tail variable to grab the suffix which remains, which may be an empty string.

@(next :args)
@(coll :gap 0)@(choose :shortest tok)@\
@tok@{sep /==/}@\
@(or)@\
@tok@{sep /!=/}@\
@(or)@\
@tok@{sep /=/}@\
@(end)@(end)@tail
@(output)
@(rep)"@tok" {@sep} @(end)"@tail"
@(end)

Runs:

$ ./txr multisplit.txr 'a!===b=!=c'
"a" {!=} "" {==} "b" {=} "" {!=} "c"
$ ./txr  multisplit.txr 'a!===!==!=!==b'
"a" {!=} "" {==} "" {!=} "" {=} "" {!=} "" {!=} "" {=} "b"
$ ./txr  multisplit.txr ''
""
$ ./txr  multisplit.txr 'a'
"a"
$ ./txr  multisplit.txr 'a='
"a" {=} ""
$ ./txr  multisplit.txr '='
"" {=} ""
$ ./txr  multisplit.txr '=='
"" {==} ""
$ ./txr  multisplit.txr '==='
"" {==} "" {=} ""

[edit] Using the tok-str function

Translation of: Racket
$ txr -p '(tok-str "a!===b=!=c" #/==|!=|=/ t)'
("a" "!=" "" "==" "b" "=" "" "!=" "c")

Here the third boolean argument means "keep the material between the tokens", which in the Racket version seems to be requested by the argument #:gap-select? #:t.

[edit] zkl

Translation of: Python
fcn multisplit(text, sep){
lastmatch := i := 0; matches := List();
while(i < text.len()){
foreach j,s in ([0..].zip(sep)){
if(i == text.find(s,i)){
if(i > lastmatch) matches.append(text[lastmatch,i-lastmatch]);
matches.append(T(j,i)); # Replace the string containing the matched separator with a tuple of which separator and where in the string the match occured
lastmatch = i + s.len();
i += s.len()-1;
break;
}
}
i += 1;
}
if(i > lastmatch) matches.append(text[lastmatch,i-lastmatch]);
return(matches);
}
multisplit("a!===b=!=c", T("==", "!=", "=")).println();
multisplit("a!===b=!=c", T("!=", "==", "=")).println();
Output:
L("a",L(1,1),L(0,3),"b",L(2,6),L(1,7),"c")
L("a",L(0,1),L(1,3),"b",L(2,6),L(0,7),"c")

Personal tools
Namespaces

Variants
Actions
Community
Explore
Misc
Toolbox