Find URI in text: Difference between revisions

jq
(jq)
Line 195:
foo://domain.hld/
</pre>
 
=={{header|jq}}==
{{works with|jq|with regex}}
 
The following uses essentially the same regular expression as is used in the [[#Tcl]] article (as of June 2015), and the results using the given input text are identical. Note in particular that scheme-only strings such as "stop:" are not extracted.
<lang jq># input: an array of strings
# output: a stream of URIs
# Each input string may contain more than one URI.
def findURIs:
match( "
[a-z][-a-z0-9+.]*: # Scheme...
(?=[/\\w]) # ... but not just the scheme
(?://[-\\w.@:]+)? # Host
[-\\w.~/%!$&'()*+,;=]* # Path
(?:\\?[-\\w.~%!$&'()*+,;=/?]*)? # Query
(?:[#][-\\w.~%!$&'()*+,;=/?]*)? # Fragment
"; "gx")
| .string ;
 
# Example: read in a file of arbitrary text and
# produce a stream of the URIs that are identified.
split("\n")[] | findURIs</lang>
 
{{out}}
<lang sh>$ jq -R -r -f Find_URI_in_text.jq Find_URI_in_text.txt
http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer).
http://mediawiki.org/).
http://en.wikipedia.org/wiki/-)
ftp://domain.name/path(balanced_brackets)/foo.html
ftp://domain.name/path(balanced_brackets)/ending.in.dot.
ftp://domain.name/path(unbalanced_brackets/ending.in.dot.
ftp://domain.name/path/embedded?punct/uation.
ftp://domain.name/dangling_close_paren)</lang>
 
 
=={{header|Perl 6}}==
2,487

edits