Find URI in text: Difference between revisions

Line 242:

})</lang>

=={{header|Ruby}}==

require 'uri'

str = 'this URI contains an illegal character, parentheses and a misplaced full stop:

http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer). (which is handled by http://mediawiki.org/).

and another one just to confuse the parser: http://en.wikipedia.org/wiki/-)

")" is handled the wrong way by the mediawiki parser.

ftp://domain.name/path(balanced_brackets)/foo.html

ftp://domain.name/path(balanced_brackets)/ending.in.dot.

ftp://domain.name/path(unbalanced_brackets/ending.in.dot.

leading junk ftp://domain.name/path/embedded?punct/uation.

leading junk ftp://domain.name/dangling_close_paren)

if you have other interesting URIs for testing, please add them here:'

puts URI.extract(str)

puts "\nFiltered for HTTP and HTTPS:"

puts URI.extract(str, ["http", "https"])

puts "\nThis is the (extendible) list of supported schemes: #{URI.scheme_list.keys}"</lang>

<pre>

stop:

http://en.wikipedia.org/wiki/Erich_K

http://mediawiki.org/).

parser:

http://en.wikipedia.org/wiki/-)

ftp://domain.name/path(balanced_brackets)/foo.html

ftp://domain.name/path(balanced_brackets)/ending.in.dot.

ftp://domain.name/path(unbalanced_brackets/ending.in.dot.

ftp://domain.name/path/embedded?punct/uation.

ftp://domain.name/dangling_close_paren)

here:

Filtered for HTTP and HTTPS:

http://en.wikipedia.org/wiki/Erich_K

http://mediawiki.org/).

http://en.wikipedia.org/wiki/-)

This is the (extendible) list of supported schemes: ["FTP", "HTTP", "HTTPS", "LDAP", "LDAPS", "MAILTO"]

</pre>

=={{header|Tcl}}==

This uses regular expressions to do the matching. It doesn't match a URL without a scheme (too problematic in general text) and it requires more than ''just'' the scheme too, but apart from that it matches slightly too broad a range of strings (though not usually problematically much). Matches some IRIs correctly too, but does not tackle the <tt><bracketed></tt> form (especially not if it includes extra spaces).