Multisplit: Difference between revisions
No edit summary |
(J) |
||
Line 4: | Line 4: | ||
Note: Sub - substring, SepNum - separator number in input list, SepPos - separator position in input string.<br> |
Note: Sub - substring, SepNum - separator number in input list, SepPos - separator position in input string.<br> |
||
Input order of separators is important: they are considered in that order. |
Input order of separators is important: they are considered in that order. |
||
=={{header|J}}== |
|||
<lang j>multisplit=:4 :0 |
|||
'sep begin'=.|:t=. y /:~&.:(|."1)@;@(i.@#@[ ,.L:0"0 I.@E.L:0) x |
|||
end=. begin + sep { #@>y |
|||
last=.next=.0 |
|||
r=.2 0$0 |
|||
while.next<#begin do. |
|||
r=.r,.(last}.x{.~next{begin);next{t |
|||
last=.next{end |
|||
next=.1 i.~(begin>next{begin)*.begin>:last |
|||
end. |
|||
r=.r,.'';~last}.x |
|||
)</lang> |
|||
Explanation: |
|||
First find all potentially relevant separator instances, and sort them in increasing order, by starting location and separator index. <code>sep</code> is separator index, and <code>begin</code> is starting location. <code>end</code> is ending location. |
|||
Then, loop through the possibilities, skipping over those which conflict with the currently selected sequence. |
|||
Example use: |
|||
<lang j> S multisplit '==';'!=';'=' |
|||
┌───┬───┬───┬───┬─┐ |
|||
│a │ │b │ │c│ |
|||
├───┼───┼───┼───┼─┤ |
|||
│1 1│0 3│2 6│1 7│ │ |
|||
└───┴───┴───┴───┴─┘ |
|||
S multisplit '=';'!=';'==' |
|||
┌───┬───┬───┬───┬───┬─┐ |
|||
│a │ │ │b │ │c│ |
|||
├───┼───┼───┼───┼───┼─┤ |
|||
│1 1│0 3│0 4│0 6│1 7│ │ |
|||
└───┴───┴───┴───┴───┴─┘ |
|||
'X123Y' multisplit '1';'12';'123';'23';'3' |
|||
┌───┬───┬─┐ |
|||
│X │ │Y│ |
|||
├───┼───┼─┤ |
|||
│0 1│3 2│ │ |
|||
└───┴───┴─┘</lang> |
|||
=={{header|Python}}== |
=={{header|Python}}== |
Revision as of 16:06, 27 February 2011
Code to split string with several separators.
Input: string, list of separators
Output: [Sub0, [Sep0Num, Sep0Pos], Sub1, [Sep1Num, Sep1Pos], ..., SubN]
Note: Sub - substring, SepNum - separator number in input list, SepPos - separator position in input string.
Input order of separators is important: they are considered in that order.
J
<lang j>multisplit=:4 :0
'sep begin'=.|:t=. y /:~&.:(|."1)@;@(i.@#@[ ,.L:0"0 I.@E.L:0) x end=. begin + sep { #@>y last=.next=.0 r=.2 0$0 while.next<#begin do. r=.r,.(last}.x{.~next{begin);next{t last=.next{end next=.1 i.~(begin>next{begin)*.begin>:last end. r=.r,.;~last}.x
)</lang>
Explanation:
First find all potentially relevant separator instances, and sort them in increasing order, by starting location and separator index. sep
is separator index, and begin
is starting location. end
is ending location.
Then, loop through the possibilities, skipping over those which conflict with the currently selected sequence.
Example use:
<lang j> S multisplit '==';'!=';'=' ┌───┬───┬───┬───┬─┐ │a │ │b │ │c│ ├───┼───┼───┼───┼─┤ │1 1│0 3│2 6│1 7│ │ └───┴───┴───┴───┴─┘
S multisplit '=';'!=';'=='
┌───┬───┬───┬───┬───┬─┐ │a │ │ │b │ │c│ ├───┼───┼───┼───┼───┼─┤ │1 1│0 3│0 4│0 6│1 7│ │ └───┴───┴───┴───┴───┴─┘
'X123Y' multisplit '1';'12';'123';'23';'3'
┌───┬───┬─┐ │X │ │Y│ ├───┼───┼─┤ │0 1│3 2│ │ └───┴───┴─┘</lang>
Python
<lang python>def min_pos(List): return List.index(min(List))
def find_all(S, Sub, Start = 0, End = -1, IsOverlapped = 0): Res = [] if End == -1: End = len(S) if IsOverlapped: DeltaPos = 1 else: DeltaPos = len(Sub) Pos = Start while 1: Pos = S.find(Sub, Pos, End) if Pos == -1: break Res.append(Pos) Pos += DeltaPos return Res
def multisplit(S, SepList): SepPosListList = [] SLen = len(S) SepNumList = [] ListCount = 0 for i in range(len(SepList)): Sep = SepList[i] SepPosList = find_all(S, Sep, 0, SLen, IsOverlapped = 1) if SepPosList != []: SepNumList.append(i) SepPosListList.append(SepPosList) ListCount += 1 if ListCount == 0: return [S] MinPosList = [] for i in range(ListCount): MinPosList.append(SepPosListList[i][0]) SepEnd = 0 MinPosPos = min_pos(MinPosList) Res = [] while 1: Res.append( S[SepEnd : MinPosList[MinPosPos]] ) Res.append([SepNumList[MinPosPos], MinPosList[MinPosPos]]) SepEnd = MinPosList[MinPosPos] + len(SepList[SepNumList[MinPosPos]]) while 1: MinPosPos = min_pos(MinPosList) if MinPosList[MinPosPos] < SepEnd: del(SepPosListList[MinPosPos][0]) if len(SepPosListList[MinPosPos]) == 0: del(SepPosListList[MinPosPos]) del(MinPosList[MinPosPos]) del(SepNumList[MinPosPos]) ListCount -= 1 if ListCount == 0: break else: MinPosList[MinPosPos] = SepPosListList[MinPosPos][0] else: break if ListCount == 0: break Res.append(S[SepEnd:]) return Res
S = "a!===b=!=c"
multisplit(S, ["==", "!=", "="]) # output: ['a', [1, 1], , [0, 3], 'b', [2, 6], , [1, 7], 'c']
multisplit(S, ["=", "!=", "=="]) # output: ['a', [1, 1], , [0, 3], , [0, 4], 'b', [0, 6], , [1, 7], 'c']
</lang>