Regular expressions: Difference between revisions
(Added PicoLisp) |
(→MIRC Scripting Language: added) |
||
Line 397: | Line 397: | ||
5 |
5 |
||
a not b ot c |
a not b ot c |
||
</pre> |
|||
=={{header|MIRC Scripting Language}}== |
|||
<lang mirc>alias regular_expressions { |
|||
var %string = This is a string |
|||
var %re = string$ |
|||
if ($regex(%string,%re) > 0) { |
|||
echo -a Ends with string. |
|||
} |
|||
%re = \ba\b |
|||
if ($regsub(%string,%re,another,%string) > 0) { |
|||
echo -a Result 1: %string |
|||
} |
|||
%re = \b(another)\b |
|||
echo -a Result 2: $regsubex(%string,%re,yet \1) |
|||
}</lang> |
|||
Output: |
|||
<pre> |
|||
Ends with string. |
|||
Result 1: This is another string |
|||
Result 2: This is yet another string |
|||
</pre> |
</pre> |
||
Revision as of 00:44, 5 May 2010
You are encouraged to solve this task according to the task description, using any language you may know.
The goal of this task is
- to match a string against a regular expression
- to substitute part of a string using a regular expression
AppleScript
<lang applescript>try
find text ".*string$" in "I am a string" with regexp
on error message
return message
end try
try
change "original" into "modified" in "I am the original string" with regexp
on error message
return message
end try</lang>
ALGOL 68
The routines grep in strings and sub in string are not part of ALGOL 68's standard prelude.
<lang algol68>INT match=0, no match=1, out of memory error=2, other error=3;
STRING str := "i am a string";
- Match: #
STRING m := "string$"; INT start, end; IF grep in string(m, str, start, end) = match THEN printf(($"Ends with """g""""l$, str[start:end])) FI;
- Replace: #
IF sub in string(" a ", " another ",str) = match THEN printf(($gl$, str)) FI;</lang> Output:
Ends with "string" i am another string
Standard ALGOL 68 does have an primordial form of pattern matching called a format. This is designed to extract values from input data. But it can also be used for outputting (and transputting) the original data.
For example:<lang algol68>FORMAT pattern = $ddd" "c("cats","dogs")$; FILE file; STRING book; associate(file, book); on value error(file, (REF FILE f)BOOL: stop); on format error(file, (REF FILE f)BOOL: stop);
book := "100 dogs"; STRUCT(INT count, type) dalmatians;
getf(file, (pattern, dalmatians)); print(("Dalmatians: ", dalmatians, new line)); count OF dalmatians +:=1; printf(($"Gives: "$, pattern, dalmatians, $l$))</lang> Output:
Dalmatians: +100 +2 Gives 101 dogs
AutoHotkey
<lang AutoHotkey>MsgBox % foundpos := RegExMatch("Hello World", "World$") MsgBox % replaced := RegExReplace("Hello World", "World$", "yourself")</lang>
AWK
AWK supports regular expressions, which are typically marked up with slashes in front and back, and the "~" operator: <lang awk>$ awk '{if($0~/[A-Z]/)print "uppercase detected"}' abc ABC uppercase detected</lang> As shorthand, a regular expression in the condition part fires if it matches an input line: <lang awk>awk '/[A-Z]/{print "uppercase detected"}' def DeF uppercase detected</lang> For substitution, the first argument can be a regular expression, while the replacement string is constant (only that '&' in it receives the value of the match): <lang awk>$ awk '{gsub(/[A-Z]/,"*");print}' abCDefG ab**ef* $ awk '{gsub(/[A-Z]/,"(&)");print}' abCDefGH ab(C)(D)ef(G)(H)</lang> This variant matches one or more uppercase letters in one round: <lang awk>$ awk '{gsub(/[A-Z]+/,"(&)");print}' abCDefGH ab(CD)ef(GH)</lang>
C
As far as I can see, POSIX defined function for regex matching, but nothing for substitution. So we must do all the hard work by hand. The complex-appearing code could be turned into a function.
<lang c>#include <stdio.h>
- include <stdlib.h>
- include <sys/types.h>
- include <regex.h>
- include <string.h>
int main() {
regex_t preg; regmatch_t substmatch[1]; const char *tp = "string$"; const char *t1 = "this is a matching string"; const char *t2 = "this is not a matching string!"; const char *ss = "istyfied"; regcomp(&preg, "string$", REG_EXTENDED); printf("'%s' %smatched with '%s'\n", t1, (regexec(&preg, t1, 0, NULL, 0)==0) ? "" : "did not ", tp); printf("'%s' %smatched with '%s'\n", t2, (regexec(&preg, t2, 0, NULL, 0)==0) ? "" : "did not ", tp); regfree(&preg); /* change "a[a-z]+" into "istifyed"?*/ regcomp(&preg, "a[a-z]+", REG_EXTENDED); if ( regexec(&preg, t1, 1, substmatch, 0) == 0 ) { //fprintf(stderr, "%d, %d\n", substmatch[0].rm_so, substmatch[0].rm_eo); char *ns = malloc(substmatch[0].rm_so + 1 + strlen(ss) + (strlen(t1) - substmatch[0].rm_eo) + 2); memcpy(ns, t1, substmatch[0].rm_so+1); memcpy(&ns[substmatch[0].rm_so], ss, strlen(ss)); memcpy(&ns[substmatch[0].rm_so+strlen(ss)], &t1[substmatch[0].rm_eo], strlen(&t1[substmatch[0].rm_eo])); ns[ substmatch[0].rm_so + strlen(ss) + strlen(&t1[substmatch[0].rm_eo]) ] = 0; printf("mod string: '%s'\n", ns); free(ns); } else { printf("the string '%s' is the same: no matching!\n", t1); } regfree(&preg); return 0;
}</lang>
C++
<lang cpp>#include <iostream>
- include <string>
- include <iterator>
- include <boost/regex.hpp>
int main() {
boost::regex re(".* string$"); std::string s = "Hi, I am a string";
// match the complete string if (boost::regex_match(s, re)) std::cout << "The string matches.\n"; else std::cout << "Oops - not found?\n";
// match a substring boost::regex re2(" a.*a"); boost::smatch match; if (boost::regex_search(s, match, re2)) { std::cout << "Matched " << match.length() << " characters starting at " << match.position() << ".\n"; std::cout << "Matched character sequence: \"" << match.str() << "\"\n"; } else { std::cout << "Oops - not found?\n"; }
// replace a substring std::string dest_string; boost::regex_replace(std::back_inserter(dest_string), s.begin(), s.end(), re2, "'m now a changed"); std::cout << dest_string << std::endl;
}</lang>
C#
<lang csharp>using System; using System.Text.RegularExpressions;
class Program {
static void Main(string[] args) { string str = "I am a string";
if (new Regex("string$").IsMatch(str)) { Console.WriteLine("Ends with string."); }
str = new Regex(" a ").Replace(str, " another "); Console.WriteLine(str); }
}</lang>
Clojure
<lang clojure>(let [s "I am a string"]
;; match (when (re-find #"string$" s) (println "Ends with 'string'.")) (when-not (re-find #"^You" s) (println "Does not start with 'You'."))
;; substitute (using Java) (println (.replaceAll s " a " " another "))
)</lang>
Common Lisp
Uses CL-PPCRE - Portable Perl-compatible regular expressions for Common Lisp.
<lang lisp>(let ((string "I am a string"))
(when (cl-ppcre:scan "string$" string) (write-line "Ends with string")) (unless (cl-ppcre:scan "^You" string ) (write-line "Does not start with 'You'")))</lang>
Substitute
<lang lisp>(let* ((string "I am a string")
(string (cl-ppcre:regex-replace " a " string " another "))) (write-line string))</lang>
Test and Substitute
<lang lisp>(let ((string "I am a string"))
(multiple-value-bind (string matchp) (cl-ppcre:regex-replace "\\bam\\b" string "was") (when matchp (write-line "I was able to find and replace 'am' with 'was'."))))</lang>
D
<lang d>import std.stdio, std.regexp;
void main() {
string s = "I am a string";
// Test: if (search(s, r"string$")) writefln("Ends with 'string'");
// Test, storing the regular expression: auto re1 = RegExp(r"string$"); if (re1.search(s).test) writefln("Ends with 'string'");
// Substitute: writefln(sub(s, " a ", " another "));
// Substitute, storing the regular expression: auto re2 = RegExp(" a "); writefln(re2.replace(s, " another "));
}</lang>
Note that in std.string there are string functions to perform those string operations in a faster way.
Erlang
<lang erlang>match() -> String = "This is a string", case re:run(String, "string$") of {match,_} -> io:format("Ends with 'string'~n"); _ -> ok end.
substitute() -> String = "This is a string", NewString = re:replace(String, " a ", " another ", [{return, list}]), io:format("~s~n",[NewString]).</lang>
Forth
Test/Match <lang forth>include ffl/rgx.fs
\ Create a regular expression variable 'exp' in the dictionary
rgx-create exp
\ Compile an expression
s" Hello (World)" exp rgx-compile [IF]
.( Regular expression successful compiled.) cr
[THEN]
\ (Case sensitive) match a string with the expression
s" Hello World" exp rgx-cmatch? [IF]
.( String matches with the expression.) cr
[ELSE]
.( No match.) cr
[THEN]</lang>
Haskell
Test <lang haskell>import Text.Regex
str = "I am a string"
case matchRegex (mkRegex ".*string$") str of
Just _ -> putStrLn $ "ends with 'string'" Nothing -> return ()</lang>
Substitute <lang haskell>import Text.Regex
orig = "I am the original string" result = subRegex (mkRegex "original") orig "modified" putStrLn $ result</lang>
HicEst
<lang hicest>CHARACTER string*100/ "The quick brown fox jumps over the lazy dog" / REAL, PARAMETER :: Regex=128, Count=256
characters_a_m = INDEX(string, "[a-m]", Regex+Count) ! counts 16
vocals_changed = EDIT(Text=string, Option=Regex, Right="[aeiou]", RePLaceby='**', DO=LEN(string) ) ! changes 11 WRITE(ClipBoard) string ! Th** q****ck br**wn f**x j**mps **v**r th** l**zy d**g</lang>
J
J's regex support is built on top of PCRE.
<lang j>load'regex' NB. Load regex library str =: 'I am a string' NB. String used in examples.</lang>
Matching:
<lang j> '.*string$' rxeq str NB. 1 is true, 0 is false 1</lang>
Substitution:
<lang j> ('am';'am still') rxrplc str I am still a string</lang>
Note: use<lang J> open'regex'</lang> to read the source code for the library. The comments list 6 main definitions and a dozen utility definitions.
Java
Test
<lang java>String str = "I am a string"; if (str.matches(".*string")) { // note: matches() tests if the entire string is a match
System.out.println("ends with 'string'");
}</lang>
Substitute
<lang java>String orig = "I am the original string"; String result = orig.replaceAll("original", "modified"); // result is now "I am the modified string"</lang>
JavaScript
Test/Match <lang javascript>var subject = "Hello world!";
// Two different ways to create the RegExp object // Both examples use the exact same pattern... matching "hello" var re_PatternToMatch = /Hello (World)/i; // creates a RegExp literal with case-insensitivity var re_PatternToMatch2 = new RegExp("Hello (World)", "i");
// Test for a match - return a bool var isMatch = re_PatternToMatch.test(subject);
// Get the match details // Returns an array with the match's details // matches[0] == "Hello world" // matches[1] == "world" var matches = re_PatternToMatch2.exec(subject);</lang>
Substitute <lang javascript>var subject = "Hello world!";
// Perform a string replacement // newSubject == "Replaced!" var newSubject = subject.replace(re_PatternToMatch, "Replaced");</lang>
M4
<lang M4>regexp(`GNUs not Unix', `\<[a-z]\w+') regexp(`GNUs not Unix', `\<[a-z]\(\w+\)', `a \& b \1 c')</lang>
Output:
5 a not b ot c
MIRC Scripting Language
<lang mirc>alias regular_expressions {
var %string = This is a string var %re = string$ if ($regex(%string,%re) > 0) { echo -a Ends with string. } %re = \ba\b if ($regsub(%string,%re,another,%string) > 0) { echo -a Result 1: %string } %re = \b(another)\b echo -a Result 2: $regsubex(%string,%re,yet \1)
}</lang>
Output:
Ends with string. Result 1: This is another string Result 2: This is yet another string
Objective-C
Test
<lang objc>NSString *str = @"I am a string"; NSString *regex = @".*string$";
NSPredicate *pred = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", regex];
if ([pred evaluateWithObject:str]) {
NSLog(@"ends with 'string'");
}</lang> Unfortunately this method cannot find the location of the match or do substitution.
OCaml
With the standard library
Test <lang ocaml>#load "str.cma";; let str = "I am a string";; try
ignore(Str.search_forward (Str.regexp ".*string$") str 0); print_endline "ends with 'string'"
with Not_found -> ()
- </lang>
Substitute <lang ocaml>#load "str.cma";; let orig = "I am the original string";; let result = Str.global_replace (Str.regexp "original") "modified" orig;; (* result is now "I am the modified string" *)</lang>
Using Pcre
Library: ocaml-pcre
<lang ocaml>let matched pat str =
try ignore(Pcre.exec ~pat str); (true) with Not_found -> (false)
let () =
Printf.printf "matched = %b\n" (matched "string$" "I am a string"); Printf.printf "Substitute: %s\n" (Pcre.replace ~pat:"original" ~templ:"modified" "I am the original string")
- </lang>
Oz
<lang oz>declare
[Regex] = {Module.link ['x-oz://contrib/regex']} String = "This is a string"
in
if {Regex.search "string$" String} \= false then {System.showInfo "Ends with string."} end {System.showInfo {Regex.replace String " a " fun {$ _ _} " another " end}}</lang>
Perl
Test <lang perl>$string = "I am a string"; if ($string =~ /string$/) {
print "Ends with 'string'\n";
}
if ($string !~ /^You/) {
print "Does not start with 'You'\n";
}</lang>
Substitute <lang perl>$string = "I am a string"; $string =~ s/ a / another /; # makes "I am a string" into "I am another string" print $string;</lang>
Test and Substitute <lang perl>$string = "I am a string"; if ($string =~ s/\bam\b/was/) { # \b is a word border
print "I was able to find and replace 'am' with 'was'\n";
}</lang>
Options <lang perl># add the following just after the last / for additional control
- g = globally (match as many as possible)
- i = case-insensitive
- s = treat all of $string as a single line (in case you have line breaks in the content)
- m = multi-line (the expression is run on each line individually)
$string =~ s/i/u/ig; # would change "I am a string" into "u am a strung"</lang>
PHP
<lang php> $string = 'I am a string';</lang>
Test
<lang php>if (preg_match('/string$/', $string)) {
echo "Ends with 'string'\n";
}</lang>
Replace
<lang php>$string = preg_replace('/\ba\b/', 'another', $string); echo "Found 'a' and replace it with 'another', resulting in this string: $string\n";</lang>
PicoLisp
Calling the C library
PicoLisp doesn't have built-in regex functionality. It is easy to call the native C library. <lang PicoLisp>(let (Pat "a[0-9]z" String "a7z")
(use Preg (native "@" "regcomp" 'I '(Preg (64 B . 64)) Pat 1) # Compile regex (when (=0 (native "@" "regexec" 'I (cons NIL (64) Preg) String 0 0 0)) (prinl "String \"" String "\" matches regex \"" Pat "\"") ) ) )</lang>
Output:
String "a7z" matches pattern "a[0-9]z"
Using Pattern Matching
Regular expressions are static and inflexible. Another possibility is dynamic pattern matching, where arbitrary conditions can be programmed. <lang PicoLisp>(let String "The number <7> is incremented"
(use (@A @N @Z) (and (match '(@A "<" @N ">" @Z) (chop String)) (format (pack @N)) (prinl @A "<" (inc @) ">" @Z) ) ) )</lang>
Output:
The number <8> is incremented
PowerShell
<lang powershell>"I am a string" -match '\bstr' # true
"I am a string" -replace 'a\b','no' # I am no string</lang>
By default both the -match
and -replace
operators are case-insensitive. They can be made case-sensitive by using the -cmatch
and -creplace
operators.
PureBasic
<lang PureBasic>String$ = "<tag>some text consisting of Roman letters spaces and numbers like 12</tag>" regex$ = "<([a-z]*)>[a-z,A-Z,0-9, ]*</\1>" regex_replace$ = "letters[a-z,A-Z,0-9, ]*numbers[a-z,A-Z,0-9, ]*" If CreateRegularExpression(1, regex$) And CreateRegularExpression(2, regex_replace$)
If MatchRegularExpression(1, String$) Debug "Tags correct, and only alphanummeric or space characters between them" EndIf Debug ReplaceRegularExpression(2, String$, "char stuff")
EndIf</lang>
Python
<lang python>import re
string = "This is a string"
if re.search('string$',string):
print "Ends with string."
string = re.sub(" a "," another ",string) print string</lang>
R
First, define some strings. <lang R>pattern <- "string" text1 <- "this is a matching string" text2 <- "this does not match"</lang> Matching with grep. The indices of the texts containing matches are returned. <lang R>grep(pattern, c(text1, text2)) # 1</lang> Matching with regexpr. The positions of the starts of the matches are returned, along with the lengths of the matches. <lang R>regexpr(pattern, c(text1, text2))</lang>
[1] 20 -1 attr(,"match.length") [1] 6 -1
Replacement <lang R>gsub(pattern, "pair of socks", c(text1, text2))</lang>
[1] "this is a matching pair of socks" "this does not match"
Raven
<lang raven>'i am a string' as str</lang>
Match:
<lang raven>str m/string$/ if "Ends with 'string'\n" print</lang>
Replace:
<lang raven>str r/ a / another / print</lang>
REBOL
<lang REBOL>REBOL [ Title: "Regular Expression Matching" Author: oofoe Date: 2009-12-06 URL: http://rosettacode.org/wiki/Regular_expression_matching ]
string: "This is a string."
- REBOL doesn't use a conventional Perl-compatible regular expression
- syntax. Instead, it uses a variant Parsing Expression Grammar with
- the 'parse' function. It's also not limited to just strings. You can
- define complex grammars that actually parse and execute program
- files.
- Here, I provide a rule to 'parse' that specifies searching through
- the string until "string." is found, then the end of the string. If
- the subject string satisfies the rule, the expression will be true.
if parse string [thru "string." end] [ print "Subject ends with 'string.'"]
- For replacement, I take advantage of the ability to call arbitrary
- code when a pattern is matched -- everything in the parens will be
- executed when 'to " a "' is satisfied. This marks the current string
- location, then removes the offending word and inserts the replacement.
parse string [ to " a " ; Jump to target. mark: ( remove/part mark 3 ; Remove target. mark: insert mark " another " ; Insert replacement. ) :mark ; Pick up where I left off. ] print [crlf "Parse replacement:" string]
- For what it's worth, the above operation is more conveniently done
- with the 'replace' function
replace string " another " " a " ; Change string back. print [crlf "Replacement:" string]</lang>
Output:
Subject ends with 'string.' Parse replacement: This is another string. Replacement: This is a string.
Ruby
Test <lang ruby>string="I am a string" puts "Ends with 'string'" if string[/string$/] puts "Does not start with 'You'" if !string[/^You/]</lang>
Substitute <lang ruby>puts string.gsub(/ a /,' another ')
- or
string[/ a /]='another' puts string</lang>
Substitute using block <lang ruby>puts(string.gsub(/\bam\b/) do |match|
puts "I found #{match}" #place "was" instead of the match "was" end)</lang>
Scala
Define <lang scala>val Bottles1 = "(\\d+) bottles of beer".r // syntactic sugar val Bottles2 = """(\d+) bottles of beer""".r // using triple-quotes to preserve backslashes val Bottles3 = new scala.util.matching.Regex("(\\d+) bottles of beer") // standard val Bottles4 = new scala.util.matching.Regex("""(\d+) bottles of beer""", "bottles") // with named groups</lang>
Search and replace with string methods: <lang scala>"99 bottles of beer" matches "(\\d+) bottles of beer" // the full string must match "99 bottles of beer" replace ("99", "98") // Single replacement "99 bottles of beer" replaceAll ("b", "B") // Multiple replacement</lang>
Search with regex methods: <lang scala>"\\d+".r findFirstIn "99 bottles of beer" // returns first partial match, or None "\\w+".r findAllIn "99 bottles of beer" // returns all partial matches as an iterator "\\s+".r findPrefixOf "99 bottles of beer" // returns a matching prefix, or None Bottles4 findFirstMatchIn "99 bottles of beer" // returns a "Match" object, or None Bottles4 findPrefixMatchOf "99 bottles of beer" // same thing, for prefixes val bottles = (Bottles4 findFirstMatchIn "99 bottles of beer").get.group("bottles") // Getting a group by name</lang>
Using pattern matching with regex: <lang scala>val Some(bottles) = Bottles4 findPrefixOf "99 bottles of beer" // throws an exception if the matching fails; full string must match for {
line <- """|99 bottles of beer on the wall |99 bottles of beer |Take one down, pass it around |98 bottles of beer on the wall""".stripMargin.lines
} line match {
case Bottles1(bottles) => println("There are still "+bottles+" bottles.") // full string must match, so this will match only once case _ =>
} for {
matched <- "(\\w+)".r findAllIn "99 bottles of beer" matchData // matchData converts to an Iterator of Match
} println("Matched from "+matched.start+" to "+matched.end)</lang>
Replacing with regex: <lang scala>Bottles2 replaceFirstIn ("99 bottles of beer", "98 bottles of beer") Bottles3 replaceAllIn ("99 bottles of beer", "98 bottles of beer")</lang>
Slate
This library is still in its early stages. There isn't currently a feature to replace a substring.
<lang slate>
'http://slatelanguage.org/test/page?query' =~ '^(([^:/?#]+)\\:)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?'.
" ==> {'http:'. 'http'. '//slatelanguage.org'. 'slatelanguage.org'. '/test/page'. '?query'. 'query'. Nil} " </lang>
Smalltalk
<lang smalltalk>|re s s1| re := Regex fromString: '[a-z]+ing'. s := 'this is a matching string'. s1 := 'this does not match'.
(s =~ re) ifMatched: [ :b |
b match displayNl
]. (s1 =~ re) ifMatched: [ :b |
'Strangely matched!' displayNl
] ifNotMatched: [
'no match!' displayNl
].
(s replacingRegex: re with: 'modified') displayNl.</lang>
Tcl
Test using regexp
:
<lang tcl>set theString "I am a string"
if {[regexp -- {string$} $theString]} {
puts "Ends with 'string'"
}
if {![regexp -- {^You} $theString]} {
puts "Does not start with 'You'"
}</lang>
Extract substring using regexp
<lang tcl>set theString "This string has >123< a number in it"
if {[regexp -- {>(\d+)<} $theString -> number]} {
puts "Contains the number $number"
}</lang>
Substitute using regsub
<lang tcl>set theString = "I am a string"
puts [regsub -- { +a +} $theString { another }]</lang>
Toka
Toka's regular expression library allows for matching, but does not yet provide for replacing elements within strings.
<lang toka>#! Include the regex library needs regex
- ! The two test strings
" This is a string" is-data test.1 " Another string" is-data test.2
- ! Create a new regex named 'expression' which tries
- ! to match strings beginning with 'This'.
" ^This" regex: expression
- ! An array to store the results of the match
- ! (Element 0 = starting offset, Element 1 = ending offset of match)
2 cells is-array match
- ! Try both test strings against the expression.
- ! try-regex will return a flag. -1 is TRUE, 0 is FALSE
expression test.1 2 match try-regex . expression test.2 2 match try-regex .</lang>
Vedit macro language
Vedit can perform searches and matching with either regular expressions, pattern matching codes or plain text. These examples use regular expressions.
Match text at cursor location: <lang vedit>if (Match(".* string$", REGEXP)==0) {
Statline_Message("This line ends with 'string'")
}</lang>
Search for a pattern: <lang vedit>if (Search("string$", REGEXP+NOERR)) {
Statline_Message("'string' at and of line found")
}</lang>
Replace: <lang vedit>Replace(" a ", " another ", REGEXP+NOERR)</lang>
- Programming Tasks
- Text processing
- AppleScript
- Satimage.osax
- ALGOL 68
- AutoHotkey
- AWK
- C
- C++
- Boost
- C sharp
- Clojure
- Common Lisp
- D
- Erlang
- Forth
- Forth Foundation Library
- Haskell
- HicEst
- J
- Java
- JavaScript
- M4
- MIRC Scripting Language
- Objective-C
- OCaml
- Oz
- Perl
- PHP
- PicoLisp
- PowerShell
- PureBasic
- Python
- R
- Raven
- REBOL
- Ruby
- Scala
- Slate
- Smalltalk
- Tcl
- Toka
- Vedit macro language