Find URI in text: Difference between revisions
Content added Content deleted
m (→{{header|Tcl}}: add comment) |
(→{{header|Ruby}}: Added Ruby Header and sample) |
||
Line 242: | Line 242: | ||
})</lang> |
})</lang> |
||
=={{header|Ruby}}== |
|||
<lang ruby> |
|||
require 'uri' |
|||
str = 'this URI contains an illegal character, parentheses and a misplaced full stop: |
|||
http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer). (which is handled by http://mediawiki.org/). |
|||
and another one just to confuse the parser: http://en.wikipedia.org/wiki/-) |
|||
")" is handled the wrong way by the mediawiki parser. |
|||
ftp://domain.name/path(balanced_brackets)/foo.html |
|||
ftp://domain.name/path(balanced_brackets)/ending.in.dot. |
|||
ftp://domain.name/path(unbalanced_brackets/ending.in.dot. |
|||
leading junk ftp://domain.name/path/embedded?punct/uation. |
|||
leading junk ftp://domain.name/dangling_close_paren) |
|||
if you have other interesting URIs for testing, please add them here:' |
|||
puts URI.extract(str) |
|||
puts "\nFiltered for HTTP and HTTPS:" |
|||
puts URI.extract(str, ["http", "https"]) |
|||
puts "\nThis is the (extendible) list of supported schemes: #{URI.scheme_list.keys}"</lang> |
|||
{{Output}} |
|||
<pre> |
|||
stop: |
|||
http://en.wikipedia.org/wiki/Erich_K |
|||
http://mediawiki.org/). |
|||
parser: |
|||
http://en.wikipedia.org/wiki/-) |
|||
ftp://domain.name/path(balanced_brackets)/foo.html |
|||
ftp://domain.name/path(balanced_brackets)/ending.in.dot. |
|||
ftp://domain.name/path(unbalanced_brackets/ending.in.dot. |
|||
ftp://domain.name/path/embedded?punct/uation. |
|||
ftp://domain.name/dangling_close_paren) |
|||
here: |
|||
Filtered for HTTP and HTTPS: |
|||
http://en.wikipedia.org/wiki/Erich_K |
|||
http://mediawiki.org/). |
|||
http://en.wikipedia.org/wiki/-) |
|||
This is the (extendible) list of supported schemes: ["FTP", "HTTP", "HTTPS", "LDAP", "LDAPS", "MAILTO"] |
|||
</pre> |
|||
=={{header|Tcl}}== |
=={{header|Tcl}}== |
||
This uses regular expressions to do the matching. It doesn't match a URL without a scheme (too problematic in general text) and it requires more than ''just'' the scheme too, but apart from that it matches slightly too broad a range of strings (though not usually problematically much). Matches some IRIs correctly too, but does not tackle the <tt><bracketed></tt> form (especially not if it includes extra spaces). |
This uses regular expressions to do the matching. It doesn't match a URL without a scheme (too problematic in general text) and it requires more than ''just'' the scheme too, but apart from that it matches slightly too broad a range of strings (though not usually problematically much). Matches some IRIs correctly too, but does not tackle the <tt><bracketed></tt> form (especially not if it includes extra spaces). |