Find URI in text: Difference between revisions
Content added Content deleted
m (→{{header|Icon}} and {{header|Unicon}}: added other example text) |
(→Tcl: Added implementation) |
||
Line 242: | Line 242: | ||
})</lang> |
})</lang> |
||
=={{header| |
=={{header|Tcl}}== |
||
This uses regular expressions to do the matching. It doesn't match a URL without a scheme (too problematic in general text) and it requires more than ''just'' the scheme too, but apart from that it matches slightly too broad a range of strings (though not usually problematically much). Matches some IRIs correctly too, but does not tackle the <tt><bracketed></tt> form (especially not if it includes extra spaces). |
|||
<lang tcl>proc findURIs {text args} { |
|||
set URI {(?x) |
|||
[a-z][-a-z0-9+.]*: # Scheme... |
|||
(?=[/\w]) # ... but not just the scheme |
|||
(?://[-\w.@:]+)? # Host |
|||
[-\w.~/%!$&'()*+,;=]* # Path |
|||
(?:\?[-\w.~%!$&'()*+,;=/?]*)? # Query |
|||
(?:[#][-\w.~%!$&'()*+,;=/?]*)? # Fragment |
|||
} |
|||
regexp -inline -all {*}$args -- $URI $text |
|||
}</lang> |
|||
;Demonstrating<nowiki>:</nowiki> |
|||
Note that the last line of output is showing that we haven't just extracted the URI substrings, but can also get the match positions within the text. |
|||
<lang tcl>set sample { |
|||
this URI contains an illegal character, parentheses and a misplaced full stop: |
|||
http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer). (which is handled by http://mediawiki.org/). |
|||
and another one just to confuse the parser: http://en.wikipedia.org/wiki/-) |
|||
")" is handled the wrong way by the mediawiki parser. |
|||
ftp://domain.name/path(balanced_brackets)/foo.html |
|||
ftp://domain.name/path(balanced_brackets)/ending.in.dot. |
|||
ftp://domain.name/path(unbalanced_brackets/ending.in.dot. |
|||
leading junk ftp://domain.name/path/embedded?punct/uation. |
|||
leading junk ftp://domain.name/dangling_close_paren) |
|||
} |
|||
puts [join [findURIs $sample] \n] |
|||
puts [findURIs $sample -indices]</lang> |
|||
{{out}} |
|||
<pre> |
|||
http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer). |
|||
http://mediawiki.org/). |
|||
http://en.wikipedia.org/wiki/-) |
|||
ftp://domain.name/path(balanced_brackets)/foo.html |
|||
ftp://domain.name/path(balanced_brackets)/ending.in.dot. |
|||
ftp://domain.name/path(unbalanced_brackets/ending.in.dot. |
|||
ftp://domain.name/path/embedded?punct/uation. |
|||
ftp://domain.name/dangling_close_paren) |
|||
{80 140} {163 185} {231 261} {317 366} {368 423} {425 481} {496 540} {555 593} |
|||
</pre> |
|||
=={{header|TXR}}== |
|||
<lang txr>@(define path (path))@\ |
<lang txr>@(define path (path))@\ |
||
@(local x y)@\ |
@(local x y)@\ |