Multisplit: Difference between revisions

From Rosetta Code
Content added Content deleted
No edit summary
(J)
Line 4: Line 4:
Note: Sub - substring, SepNum - separator number in input list, SepPos - separator position in input string.<br>
Note: Sub - substring, SepNum - separator number in input list, SepPos - separator position in input string.<br>
Input order of separators is important: they are considered in that order.
Input order of separators is important: they are considered in that order.

=={{header|J}}==

<lang j>multisplit=:4 :0
'sep begin'=.|:t=. y /:~&.:(|."1)@;@(i.@#@[ ,.L:0"0 I.@E.L:0) x
end=. begin + sep { #@>y
last=.next=.0
r=.2 0$0
while.next<#begin do.
r=.r,.(last}.x{.~next{begin);next{t
last=.next{end
next=.1 i.~(begin>next{begin)*.begin>:last
end.
r=.r,.'';~last}.x
)</lang>

Explanation:

First find all potentially relevant separator instances, and sort them in increasing order, by starting location and separator index. <code>sep</code> is separator index, and <code>begin</code> is starting location. <code>end</code> is ending location.

Then, loop through the possibilities, skipping over those which conflict with the currently selected sequence.

Example use:

<lang j> S multisplit '==';'!=';'='
┌───┬───┬───┬───┬─┐
│a │ │b │ │c│
├───┼───┼───┼───┼─┤
│1 1│0 3│2 6│1 7│ │
└───┴───┴───┴───┴─┘
S multisplit '=';'!=';'=='
┌───┬───┬───┬───┬───┬─┐
│a │ │ │b │ │c│
├───┼───┼───┼───┼───┼─┤
│1 1│0 3│0 4│0 6│1 7│ │
└───┴───┴───┴───┴───┴─┘
'X123Y' multisplit '1';'12';'123';'23';'3'
┌───┬───┬─┐
│X │ │Y│
├───┼───┼─┤
│0 1│3 2│ │
└───┴───┴─┘</lang>


=={{header|Python}}==
=={{header|Python}}==

Revision as of 16:06, 27 February 2011

Multisplit is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Code to split string with several separators.

Input: string, list of separators
Output: [Sub0, [Sep0Num, Sep0Pos], Sub1, [Sep1Num, Sep1Pos], ..., SubN]
Note: Sub - substring, SepNum - separator number in input list, SepPos - separator position in input string.
Input order of separators is important: they are considered in that order.

J

<lang j>multisplit=:4 :0

 'sep begin'=.|:t=. y /:~&.:(|."1)@;@(i.@#@[ ,.L:0"0 I.@E.L:0) x
 end=. begin + sep { #@>y
 last=.next=.0
 r=.2 0$0
 while.next<#begin do.
   r=.r,.(last}.x{.~next{begin);next{t
   last=.next{end
   next=.1 i.~(begin>next{begin)*.begin>:last
 end.
 r=.r,.;~last}.x

)</lang>

Explanation:

First find all potentially relevant separator instances, and sort them in increasing order, by starting location and separator index. sep is separator index, and begin is starting location. end is ending location.

Then, loop through the possibilities, skipping over those which conflict with the currently selected sequence.

Example use:

<lang j> S multisplit '==';'!=';'=' ┌───┬───┬───┬───┬─┐ │a │ │b │ │c│ ├───┼───┼───┼───┼─┤ │1 1│0 3│2 6│1 7│ │ └───┴───┴───┴───┴─┘

  S multisplit '=';'!=';'=='

┌───┬───┬───┬───┬───┬─┐ │a │ │ │b │ │c│ ├───┼───┼───┼───┼───┼─┤ │1 1│0 3│0 4│0 6│1 7│ │ └───┴───┴───┴───┴───┴─┘

  'X123Y' multisplit '1';'12';'123';'23';'3'

┌───┬───┬─┐ │X │ │Y│ ├───┼───┼─┤ │0 1│3 2│ │ └───┴───┴─┘</lang>

Python

<lang python>def min_pos(List): return List.index(min(List))

def find_all(S, Sub, Start = 0, End = -1, IsOverlapped = 0): Res = [] if End == -1: End = len(S) if IsOverlapped: DeltaPos = 1 else: DeltaPos = len(Sub) Pos = Start while 1: Pos = S.find(Sub, Pos, End) if Pos == -1: break Res.append(Pos) Pos += DeltaPos return Res

def multisplit(S, SepList): SepPosListList = [] SLen = len(S) SepNumList = [] ListCount = 0 for i in range(len(SepList)): Sep = SepList[i] SepPosList = find_all(S, Sep, 0, SLen, IsOverlapped = 1) if SepPosList != []: SepNumList.append(i) SepPosListList.append(SepPosList) ListCount += 1 if ListCount == 0: return [S] MinPosList = [] for i in range(ListCount): MinPosList.append(SepPosListList[i][0]) SepEnd = 0 MinPosPos = min_pos(MinPosList) Res = [] while 1: Res.append( S[SepEnd : MinPosList[MinPosPos]] ) Res.append([SepNumList[MinPosPos], MinPosList[MinPosPos]]) SepEnd = MinPosList[MinPosPos] + len(SepList[SepNumList[MinPosPos]]) while 1: MinPosPos = min_pos(MinPosList) if MinPosList[MinPosPos] < SepEnd: del(SepPosListList[MinPosPos][0]) if len(SepPosListList[MinPosPos]) == 0: del(SepPosListList[MinPosPos]) del(MinPosList[MinPosPos]) del(SepNumList[MinPosPos]) ListCount -= 1 if ListCount == 0: break else: MinPosList[MinPosPos] = SepPosListList[MinPosPos][0] else: break if ListCount == 0: break Res.append(S[SepEnd:]) return Res


S = "a!===b=!=c" multisplit(S, ["==", "!=", "="]) # output: ['a', [1, 1], , [0, 3], 'b', [2, 6], , [1, 7], 'c'] multisplit(S, ["=", "!=", "=="]) # output: ['a', [1, 1], , [0, 3], , [0, 4], 'b', [0, 6], , [1, 7], 'c'] </lang>