Regular expressions
From Rosetta Code
The goal of this task is
- to match a string against a regular expression
- to substitute part of a string using a regular expression
[edit] AppleScript
Library: Satimage.osax
try
find text ".*string$" in "I am a string" with regexp
on error message
return message
end try
try
change "original" into "modified" in "I am the original string" with regexp
on error message
return message
end try
[edit] ALGOL 68
The routines grep in strings and sub in string are not part of ALGOL 68's standard prelude.
Works with: ALGOL 68G version Any - tested with release mk15-0.8b.fc9.i386
INT match=0, no match=1, out of memory error=2, other error=3;
STRING str := "i am a string";
# Match: #
STRING m := "string$";
INT start, end;
IF grep in string(m, str, start, end) = match THEN printf(($"Ends with """g""""l$, str[start:end])) FI;
# Replace: #
IF sub in string(" a ", " another ",str) = match THEN printf(($gl$, str)) FI;
Output:
Ends with "string" i am another string
Standard ALGOL 68 does have an primordial form of pattern matching called a format. This is designed to extract values from input data. But it can also be used for outputting (and transputting) the original data.
Works with: ALGOL 68 version Standard - But declaring book as flex[]flex[]string
Works with: ALGOL 68G version Any - tested with release mk15-0.8b.fc9.i386
For example:FORMAT pattern = $ddd" "c("cats","dogs")$;
FILE file; STRING book; associate(file, book);
on value error(file, (REF FILE f)BOOL: stop);
on format error(file, (REF FILE f)BOOL: stop);
book := "100 dogs";
STRUCT(INT count, type) dalmatians;
getf(file, (pattern, dalmatians));
print(("Dalmatians: ", dalmatians, new line));
count OF dalmatians +:=1;
printf(($"Gives: "$, pattern, dalmatians, $l$))
Output:
Dalmatians: +100 +2 Gives 101 dogs
[edit] AutoHotkey
MsgBox % foundpos := RegExMatch("Hello World", "World$")
MsgBox % replaced := RegExReplace("Hello World", "World$", "yourself")
[edit] AWK
AWK supports regular expressions, which are typically marked up with slashes in front and back, and the "~" operator:
$ awk '{if($0~/[A-Z]/)print "uppercase detected"}'
abc
ABC
uppercase detected
As shorthand, a regular expression in the condition part fires if it matches an input line:
awk '/[A-Z]/{print "uppercase detected"}'
def
DeF
uppercase detected
For substitution, the first argument can be a regular expression, while the replacement string is constant (only that '&' in it receives the value of the match):
$ awk '{gsub(/[A-Z]/,"*");print}'
abCDefG
ab**ef*
$ awk '{gsub(/[A-Z]/,"(&)");print}'
abCDefGH
ab(C)(D)ef(G)(H)
This variant matches one or more uppercase letters in one round:
$ awk '{gsub(/[A-Z]+/,"(&)");print}'
abCDefGH
ab(CD)ef(GH)
[edit] C
Works with: POSIX
As far as I can see, POSIX defined function for regex matching, but nothing for substitution. So we must do all the hard work by hand. The complex-appearing code could be turned into a function.
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <regex.h>
#include <string.h>
int main()
{
regex_t preg;
regmatch_t substmatch[1];
const char *tp = "string$";
const char *t1 = "this is a matching string";
const char *t2 = "this is not a matching string!";
const char *ss = "istyfied";
regcomp(&preg, "string$", REG_EXTENDED);
printf("'%s' %smatched with '%s'\n", t1,
(regexec(&preg, t1, 0, NULL, 0)==0) ? "" : "did not ", tp);
printf("'%s' %smatched with '%s'\n", t2,
(regexec(&preg, t2, 0, NULL, 0)==0) ? "" : "did not ", tp);
regfree(&preg);
/* change "a[a-z]+" into "istifyed"?*/
regcomp(&preg, "a[a-z]+", REG_EXTENDED);
if ( regexec(&preg, t1, 1, substmatch, 0) == 0 )
{
//fprintf(stderr, "%d, %d\n", substmatch[0].rm_so, substmatch[0].rm_eo);
char *ns = malloc(substmatch[0].rm_so + 1 + strlen(ss) +
(strlen(t1) - substmatch[0].rm_eo) + 2);
memcpy(ns, t1, substmatch[0].rm_so+1);
memcpy(&ns[substmatch[0].rm_so], ss, strlen(ss));
memcpy(&ns[substmatch[0].rm_so+strlen(ss)], &t1[substmatch[0].rm_eo],
strlen(&t1[substmatch[0].rm_eo]));
ns[ substmatch[0].rm_so + strlen(ss) +
strlen(&t1[substmatch[0].rm_eo]) ] = 0;
printf("mod string: '%s'\n", ns);
free(ns);
} else {
printf("the string '%s' is the same: no matching!\n", t1);
}
regfree(&preg);
return 0;
}
[edit] C++
Works with: g++ version 4.0.2
Library: Boost
#include <iostream>
#include <string>
#include <iterator>
#include <boost/regex.hpp>
int main()
{
boost::regex re(".* string$");
std::string s = "Hi, I am a string";
// match the complete string
if (boost::regex_match(s, re))
std::cout << "The string matches.\n";
else
std::cout << "Oops - not found?\n";
// match a substring
boost::regex re2(" a.*a");
boost::smatch match;
if (boost::regex_search(s, match, re2))
{
std::cout << "Matched " << match.length()
<< " characters starting at " << match.position() << ".\n";
std::cout << "Matched character sequence: \""
<< match.str() << "\"\n";
}
else
{
std::cout << "Oops - not found?\n";
}
// replace a substring
std::string dest_string;
boost::regex_replace(std::back_inserter(dest_string),
s.begin(), s.end(),
re2,
"'m now a changed");
std::cout << dest_string << std::endl;
}
[edit] C#
using System;
using System.Text.RegularExpressions;
class Program {
static void Main(string[] args) {
string str = "I am a string";
if (new Regex("string$").IsMatch(str)) {
Console.WriteLine("Ends with string.");
}
str = new Regex(" a ").Replace(str, " another ");
Console.WriteLine(str);
}
}
[edit] Clojure
(let [s "I am a string"]
;; match
(when (re-find #"string$" s)
(println "Ends with 'string'."))
(when-not (re-find #"^You" s)
(println "Does not start with 'You'."))
;; substitute (using Java)
(println (.replaceAll s " a " " another "))
)
[edit] Common Lisp
Translation of: Perl
Uses CL-PPCRE - Portable Perl-compatible regular expressions for Common Lisp.
(let ((string "I am a string"))
(when (cl-ppcre:scan "string$" string)
(write-line "Ends with string"))
(unless (cl-ppcre:scan "^You" string )
(write-line "Does not start with 'You'")))
Substitute
(let* ((string "I am a string")
(string (cl-ppcre:regex-replace " a " string " another ")))
(write-line string))
Test and Substitute
(let ((string "I am a string"))
(multiple-value-bind (string matchp)
(cl-ppcre:regex-replace "\\bam\\b" string "was")
(when matchp
(write-line "I was able to find and replace 'am' with 'was'."))))
[edit] D
import std.stdio, std.regexp;
void main() {
string s = "I am a string";
// Test:
if (search(s, r"string$"))
writefln("Ends with 'string'");
// Test, storing the regular expression:
auto re1 = RegExp(r"string$");
if (re1.search(s).test)
writefln("Ends with 'string'");
// Substitute:
writefln(sub(s, " a ", " another "));
// Substitute, storing the regular expression:
auto re2 = RegExp(" a ");
writefln(re2.replace(s, " another "));
}
Note that in std.string there are string functions to perform those string operations in a faster way.
[edit] Erlang
match() ->
String = "This is a string",
case re:run(String, "string$") of
{match,_} -> io:format("Ends with 'string'~n");
_ -> ok
end.
substitute() ->
String = "This is a string",
NewString = re:replace(String, " a ", " another ", [{return, list}]),
io:format("~s~n",[NewString]).
[edit] Forth
Library: Forth Foundation Library Test/Match
include ffl/rgx.fs
\ Create a regular expression variable 'exp' in the dictionary
rgx-create exp
\ Compile an expression
s" Hello (World)" exp rgx-compile [IF]
.( Regular expression successful compiled.) cr
[THEN]
\ (Case sensitive) match a string with the expression
s" Hello World" exp rgx-cmatch? [IF]
.( String matches with the expression.) cr
[ELSE]
.( No match.) cr
[THEN]
[edit] Haskell
Test
import Text.Regex
str = "I am a string"
case matchRegex (mkRegex ".*string$") str of
Just _ -> putStrLn $ "ends with 'string'"
Nothing -> return ()
Substitute
import Text.Regex
orig = "I am the original string"
result = subRegex (mkRegex "original") orig "modified"
putStrLn $ result
[edit] J
J's regex support is built on top of PCRE.
load'regex' NB. Load regex library
str =: 'I am a string' NB. String used in examples.
Matching:
'.*string$' rxeq str NB. 1 is true, 0 is false
1
Substitution:
('am';'am still') rxrplc strNote: use
I am still a string
open'regex' to read the source code for the library. The comments list 6 main definitions and a dozen utility definitions.
[edit] Java
Works with: Java version 1.5+ Test
String str = "I am a string";
if (str.matches(".*string")) { // note: matches() tests if the entire string is a match
System.out.println("ends with 'string'");
}
Substitute
String orig = "I am the original string";
String result = orig.replaceAll("original", "modified");
// result is now "I am the modified string"
[edit] JavaScript
Test/Match
var subject = "Hello world!";
// Two different ways to create the RegExp object
// Both examples use the exact same pattern... matching "hello"
var re_PatternToMatch = /Hello (World)/i; // creates a RegExp literal with case-insensitivity
var re_PatternToMatch2 = new RegExp("Hello (World)", "i");
// Test for a match - return a bool
var isMatch = re_PatternToMatch.test(subject);
// Get the match details
// Returns an array with the match's details
// matches[0] == "Hello world"
// matches[1] == "world"
var matches = re_PatternToMatch2.exec(subject);
Substitute
var subject = "Hello world!";
// Perform a string replacement
// newSubject == "Replaced!"
var newSubject = subject.replace(re_PatternToMatch, "Replaced");
[edit] M4
regexp(`GNUs not Unix', `\<[a-z]\w+')
regexp(`GNUs not Unix', `\<[a-z]\(\w+\)', `a \& b \1 c')
Output:
5 a not b ot c
[edit] Objective-C
Test Works with: Mac OS X version 10.4+
NSString *str = @"I am a string";
NSString *regex = @".*string$";
NSPredicate *pred = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", regex];
if ([pred evaluateWithObject:str]) {
NSLog(@"ends with 'string'");
}
Unfortunately this method cannot find the location of the match or do substitution.
[edit] OCaml
[edit] With the standard library
Test
#load "str.cma";;
let str = "I am a string";;
try
ignore(Str.search_forward (Str.regexp ".*string$") str 0);
print_endline "ends with 'string'"
with Not_found -> ()
;;
Substitute
#load "str.cma";;
let orig = "I am the original string";;
let result = Str.global_replace (Str.regexp "original") "modified" orig;;
(* result is now "I am the modified string" *)
[edit] Using Pcre
Library: ocaml-pcre
let matched pat str =
try ignore(Pcre.exec ~pat str); (true)
with Not_found -> (false)
;;
let () =
Printf.printf "matched = %b\n" (matched "string$" "I am a string");
Printf.printf "Substitute: %s\n"
(Pcre.replace ~pat:"original" ~templ:"modified" "I am the original string")
;;
[edit] Oz
declare
[Regex] = {Module.link ['x-oz://contrib/regex']}
String = "This is a string"
in
if {Regex.search "string$" String} \= false then
{System.showInfo "Ends with string."}
end
{System.showInfo {Regex.replace String " a " fun {$ _ _} " another " end}}
[edit] Perl
Works with: Perl version 5.8.8 Test
$string = "I am a string";
if ($string =~ /string$/) {
print "Ends with 'string'\n";
}
if ($string !~ /^You/) {
print "Does not start with 'You'\n";
}
Substitute
$string = "I am a string";
$string =~ s/ a / another /; # makes "I am a string" into "I am another string"
print $string;
Test and Substitute
$string = "I am a string";
if ($string =~ s/\bam\b/was/) { # \b is a word border
print "I was able to find and replace 'am' with 'was'\n";
}
Options
# add the following just after the last / for additional control
# g = globally (match as many as possible)
# i = case-insensitive
# s = treat all of $string as a single line (in case you have line breaks in the content)
# m = multi-line (the expression is run on each line individually)
$string =~ s/i/u/ig; # would change "I am a string" into "u am a strung"
[edit] PHP
Works with: PHP version 5.2.0
$string = 'I am a string';
Test
if (preg_match('/string$/', $string))
{
echo "Ends with 'string'\n";
}
Replace
$string = preg_replace('/\ba\b/', 'another', $string);
echo "Found 'a' and replace it with 'another', resulting in this string: $string\n";
[edit] PowerShell
"I am a string" -match '\bstr' # true
"I am a string" -replace 'a\b','no' # I am no string
By default both the -match and -replace operators are case-insensitive. They can be made case-sensitive by using the -cmatch and -creplace operators.
[edit] PureBasic
String$ = "<tag>some text consisting of Roman letters spaces and numbers like 12</tag>"
regex$ = "<([a-z]*)>[a-z,A-Z,0-9, ]*</\1>"
regex_replace$ = "letters[a-z,A-Z,0-9, ]*numbers[a-z,A-Z,0-9, ]*"
If CreateRegularExpression(1, regex$) And CreateRegularExpression(2, regex_replace$)
If MatchRegularExpression(1, String$)
Debug "Tags correct, and only alphanummeric or space characters between them"
EndIf
Debug ReplaceRegularExpression(2, String$, "char stuff")
EndIf
[edit] Python
import re
string = "This is a string"
if re.search('string$',string):
print "Ends with string."
string = re.sub(" a "," another ",string)
print string
[edit] R
First, define some strings.
pattern <- "string"
text1 <- "this is a matching string"
text2 <- "this does not match"
Matching with grep. The indices of the texts containing matches are returned.
grep(pattern, c(text1, text2)) # 1
Matching with regexpr. The positions of the starts of the matches are returned, along with the lengths of the matches.
regexpr(pattern, c(text1, text2))
[1] 20 -1 attr(,"match.length") [1] 6 -1
Replacement
gsub(pattern, "pair of socks", c(text1, text2))
[1] "this is a matching pair of socks" "this does not match"
[edit] Raven
'i am a string' as str
Match:
str m/string$/
if "Ends with 'string'\n" print
Replace:
str r/ a / another / print
[edit] REBOL
rebol [
Title: "Regular Expression Matching"
Author: oofoe
Date: 2009-12-06
URL: http://rosettacode.org/wiki/Regular_expression_matching
]
string: "This is a string."
; REBOL doesn't use a conventional Perl-compatible regular expression
; syntax. Instead, it uses a variant Parsing Expression Grammar with
; the 'parse' function. It's also not limited to just strings. You can
; define complex grammars that actually parse and execute program
; files.
; Here, I provide a rule to 'parse' that specifies searching through
; the string until "string." is found, then the end of the string. If
; the subject string satisfies the rule, the expression will be true.
if parse string [thru "string." end] [
print "Subject ends with 'string.'"]
; For replacement, I take advantage of the ability to call arbitrary
; code when a pattern is matched -- everything in the parens will be
; executed when 'to " a "' is satisfied. This marks the current string
; location, then removes the offending word and inserts the replacement.
parse string [
to " a " ; Jump to target.
mark: (
remove/part mark 3 ; Remove target.
mark: insert mark " another " ; Insert replacement.
)
:mark ; Pick up where I left off.
]
print [crlf "Parse replacement:" string]
; For what it's worth, the above operation is more conveniently done
; with the 'replace' function:
replace string " another " " a " ; Change string back.
print [crlf "Replacement:" string]
Output:
Subject ends with 'string.' Parse replacement: This is another string. Replacement: This is a string.
[edit] Ruby
Test
string="I am a string"
puts "Ends with 'string'" if string[/string$/]
puts "Does not start with 'You'" if !string[/^You/]
Substitute
puts string.gsub(/ a /,' another ')
#or
string[/ a /]='another'
puts string
Substitute using block
puts(string.gsub(/\bam\b/) do |match|
puts "I found #{match}"
#place "was" instead of the match
"was"
end)
[edit] Scala
Define
val Bottles1 = "(\\d+) bottles of beer".r // syntactic sugar
val Bottles2 = """(\d+) bottles of beer""".r // using triple-quotes to preserve backslashes
val Bottles3 = new scala.util.matching.Regex("(\\d+) bottles of beer") // standard
val Bottles4 = new scala.util.matching.Regex("""(\d+) bottles of beer""", "bottles") // with named groups
Search and replace with string methods:
"99 bottles of beer" matches "(\\d+) bottles of beer" // the full string must match
"99 bottles of beer" replace ("99", "98") // Single replacement
"99 bottles of beer" replaceAll ("b", "B") // Multiple replacement
Search with regex methods:
"\\d+".r findFirstIn "99 bottles of beer" // returns first partial match, or None
"\\w+".r findAllIn "99 bottles of beer" // returns all partial matches as an iterator
"\\s+".r findPrefixOf "99 bottles of beer" // returns a matching prefix, or None
Bottles4 findFirstMatchIn "99 bottles of beer" // returns a "Match" object, or None
Bottles4 findPrefixMatchOf "99 bottles of beer" // same thing, for prefixes
val bottles = (Bottles4 findFirstMatchIn "99 bottles of beer").get.group("bottles") // Getting a group by name
Using pattern matching with regex:
val Some(bottles) = Bottles4 findPrefixOf "99 bottles of beer" // throws an exception if the matching fails; full string must match
for {
line <- """|99 bottles of beer on the wall
|99 bottles of beer
|Take one down, pass it around
|98 bottles of beer on the wall""".stripMargin.lines
} line match {
case Bottles1(bottles) => println("There are still "+bottles+" bottles.") // full string must match, so this will match only once
case _ =>
}
for {
matched <- "(\\w+)".r findAllIn "99 bottles of beer" matchData // matchData converts to an Iterator of Match
} println("Matched from "+matched.start+" to "+matched.end)
Replacing with regex:
Bottles2 replaceFirstIn ("99 bottles of beer", "98 bottles of beer")
Bottles3 replaceAllIn ("99 bottles of beer", "98 bottles of beer")
[edit] Slate
This library is still in its early stages. There isn't currently a feature to replace a substring.
(Regex Matcher newOn: '^(([^:/?#]+)\\:)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?')
`>> [match: 'http://slatelanguage.org/test/page?query'. subexpressionMatches]
"==> {"Dictionary traitsWindow" 0 -> 'http:'. 1 -> 'http'. 2 -> '//slatelanguage.org'.
3 -> 'slatelanguage.org'. 4 -> '/test/page'. 5 -> '?query'. 6 -> 'query'. 7 -> Nil}"
[edit] Smalltalk
|re s s1|
re := Regex fromString: '[a-z]+ing'.
s := 'this is a matching string'.
s1 := 'this does not match'.
(s =~ re)
ifMatched: [ :b |
b match displayNl
].
(s1 =~ re)
ifMatched: [ :b |
'Strangely matched!' displayNl
]
ifNotMatched: [
'no match!' displayNl
].
(s replacingRegex: re with: 'modified') displayNl.
[edit] Tcl
Test using regexp:
set theString "I am a string"
if {[regexp -- {string$} $theString]} {
puts "Ends with 'string'"
}
if {![regexp -- {^You} $theString]} {
puts "Does not start with 'You'"
}
Extract substring using regexp
set theString "This string has >123< a number in it"
if {[regexp -- {>(\d+)<} $theString -> number]} {
puts "Contains the number $number"
}
Substitute using regsub
set theString = "I am a string"
puts [regsub -- { +a +} $theString { another }]
[edit] Toka
Toka's regular expression library allows for matching, but does not yet provide for replacing elements within strings.
#! Include the regex library
needs regex
#! The two test strings
" This is a string" is-data test.1
" Another string" is-data test.2
#! Create a new regex named 'expression' which tries
#! to match strings beginning with 'This'.
" ^This" regex: expression
#! An array to store the results of the match
#! (Element 0 = starting offset, Element 1 = ending offset of match)
2 cells is-array match
#! Try both test strings against the expression.
#! try-regex will return a flag. -1 is TRUE, 0 is FALSE
expression test.1 2 match try-regex .
expression test.2 2 match try-regex .
[edit] Vedit macro language
Vedit can perform searches and matching with either regular expressions, pattern matching codes or plain text. These examples use regular expressions.
Match text at cursor location:
if (Match(".* string$", REGEXP)==0) {
Statline_Message("This line ends with 'string'")
}
Search for a pattern:
if (Search("string$", REGEXP+NOERR)) {
Statline_Message("'string' at and of line found")
}
Replace:
Replace(" a ", " another ", REGEXP+NOERR)







