Find URI in text: Difference between revisions

→‎{{header|Ruby}}: Added Ruby Header and sample
m (→‎{{header|Tcl}}: add comment)
(→‎{{header|Ruby}}: Added Ruby Header and sample)
Line 242:
})</lang>
 
=={{header|Ruby}}==
<lang ruby>
require 'uri'
 
str = 'this URI contains an illegal character, parentheses and a misplaced full stop:
http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer). (which is handled by http://mediawiki.org/).
and another one just to confuse the parser: http://en.wikipedia.org/wiki/-)
")" is handled the wrong way by the mediawiki parser.
ftp://domain.name/path(balanced_brackets)/foo.html
ftp://domain.name/path(balanced_brackets)/ending.in.dot.
ftp://domain.name/path(unbalanced_brackets/ending.in.dot.
leading junk ftp://domain.name/path/embedded?punct/uation.
leading junk ftp://domain.name/dangling_close_paren)
if you have other interesting URIs for testing, please add them here:'
 
 
puts URI.extract(str)
 
puts "\nFiltered for HTTP and HTTPS:"
puts URI.extract(str, ["http", "https"])
 
puts "\nThis is the (extendible) list of supported schemes: #{URI.scheme_list.keys}"</lang>
{{Output}}
<pre>
stop:
http://en.wikipedia.org/wiki/Erich_K
http://mediawiki.org/).
parser:
http://en.wikipedia.org/wiki/-)
ftp://domain.name/path(balanced_brackets)/foo.html
ftp://domain.name/path(balanced_brackets)/ending.in.dot.
ftp://domain.name/path(unbalanced_brackets/ending.in.dot.
ftp://domain.name/path/embedded?punct/uation.
ftp://domain.name/dangling_close_paren)
here:
 
Filtered for HTTP and HTTPS:
http://en.wikipedia.org/wiki/Erich_K
http://mediawiki.org/).
http://en.wikipedia.org/wiki/-)
 
This is the (extendible) list of supported schemes: ["FTP", "HTTP", "HTTPS", "LDAP", "LDAPS", "MAILTO"]
</pre>
=={{header|Tcl}}==
This uses regular expressions to do the matching. It doesn't match a URL without a scheme (too problematic in general text) and it requires more than ''just'' the scheme too, but apart from that it matches slightly too broad a range of strings (though not usually problematically much). Matches some IRIs correctly too, but does not tackle the <tt>&lt;bracketed&gt;</tt> form (especially not if it includes extra spaces).
1,149

edits