Category talk:Wren-pattern: Difference between revisions

Added limited support for 'quantified group' matches.
(Added limited support for 'lazy' matches.)
(Added limited support for 'quantified group' matches.)
Line 72:
Although the ''standard'' character classes should suffice for most purposes, the user can redefine up to three of them (i, j and k) for each pattern to deal with special cases, such as a limited range of letters or digits.
 
An upper case character class represents the complement of the lower case version. For example /A matches any character other than a-z or A-Z, including non-ASCII characters. Note that /Z normally just matches Z itself as it's not possible, of course, to match no characters. However, it does have a special meaning for 'lazy' and 'quantified group' matches which will be covered later.
 
/ followed by any character other than a letter just matches the character itself. This allows the 12 meta-characters to be treated as ''literal'' characters without their special meaning. So /| is a literal vertical bar.
Line 101:
The upper case version of an extended class represents the complement of the lower case version. For example &N matches any character other than a sign.
 
& followed by any other letter or character behaves exactly the same as if it were preceded by / except that &Z has a special meaning for 'lazy' and 'quantified group' matches which will be covered later.
 
;Complements
Line 162:
 
These are patterns which are used in the appropriate methods as replacements for a matching pattern. They are treated as normal text except that they can contain back-references ($0 always refers to the whole match) and a literal $ must be escaped with $$ (not the usual /$).
 
;Quantification of groups of characters
 
Quantifiers always qualify ''singles''. Any departure from this is an error and groups of characters cannot therefore be directly quantified.
 
There may sometimes be ways to quantify them indirectly either by simply repeating the pattern or by using captures and back-references. For example "[abab|ab|]" would match 'ab' repeated 2, 1 or 0 times and "[abcd]$1$1" would match 'abcd' repeated exactly three times.
 
However, this sort of approach clearly has its limitations and there is no way to match a group of characters repeated an indefinite number of times.
 
;Examples
Line 199 ⟶ 191:
 
These methods do not actually change the 'greedy' nature of the engine but use a hack (replacing text with rarely used control characters and back again after matching) to simulate lazy matching to a limited extent.
 
;Quantification of groups of characters
 
Quantifiers always qualify ''singles''. Any departure from this is an error and groups of characters cannot therefore be directly quantified.
 
There may sometimes be ways to quantify them indirectly either by simply repeating the pattern or by using captures and back-references. For example "[abab|ab|]" would match 'ab' repeated 2, 1 or 0 times and "[abcd]$1$1" would match 'abcd' repeated exactly three times. However, this sort of approach clearly has its limitations.
 
A different and usually better approach is to use the 'findWithGroup' or 'findWithGroup2' methods. These work in an analogous way to the 'findLazy' and 'findLazy2' methods. However, this time '/Z' and '&Z' respectively match the parameter strings 't' and 'u' themselves rather than any characters other than these strings and we can therefore quantify them as though they were 'singles'.
 
===Source code===
Line 960:
var text2 = m.text.replace(rep1, t).replace(rep2, u)
return Match.new_(text2, m.index, captures2)
}
 
// As the 'find' method but can simulate quantified group matching by treating '/Z' within the pattern
// as matching the string of literal characters 't'.
// Should not be used if 's' might contain the SO (shift out) character '0x0e'.
findWithGroup(s, t) {
var SO = "\x0e"
s = s.replace(t, SO)
var indexMap = List.filled(s.count, 0)
var i = 0
var j = 0
var d = t.count - 1
for (c in s) {
indexMap[i] = j
if (c == SO) j = j + d
i = i + 1
j = j + 1
}
var pattern2 = _pattern.replace("/Z", SO).replace(Pattern.escape(t), SO)
var p2 = Pattern.new(pattern2, _type, _i, _j, _k)
var m = p2.find(s)
if (!m) return null
var captures2 = []
for (c in m.captures) {
captures2.add(Capture.new_(c.text.replace(SO, t), indexMap[c.index]))
}
var text2 = m.text.replace(SO, t)
return Match.new_(text2, indexMap[m.index], captures2)
}
 
// As the 'find' method but can simulate quantified group matching by treating '/Z' within the pattern
// as matching the string of literal characters 't' and '&Z' within the pattern as matching
// the string of literal characters 'u'.
// Should not be used if 's' might contain the SO (shift out) character '0x0e' or the
// SI (shift in) character '0x0f'.
findWithGroup2(s, t, u) {
var SO = "\x0e"
var SI = "\x0f"
s = s.replace(t, SO).replace(u, SI)
var indexMap = List.filled(s.count, 0)
var i = 0
var j = 0
var d1 = t.count - 1
var d2 = u.count - 1
for (c in s) {
indexMap[i] = j
if (c == SO) {
j = j + d1
} else if (c == SI) {
j = j + d2
}
i = i + 1
j = j + 1
}
var pattern2 = _pattern.replace("/Z", SO).replace(Pattern.escape(t), SO)
.replace("&Z", SI).replace(Pattern.escape(u), SI)
var p2 = Pattern.new(pattern2, _type, _i, _j, _k)
var m = p2.find(s)
if (!m) return null
var captures2 = []
for (c in m.captures) {
captures2.add(Capture.new_(c.text.replace(SO, t).replace(SI, u), indexMap[c.index]))
}
var text2 = m.text.replace(SO, t).replace(SI, u)
return Match.new_(text2, indexMap[m.index], captures2)
}
 
9,479

edits