Find URI in text: Difference between revisions

Find URI in text (view source)

Revision as of 18:04, 25 November 2019

546 bytes added , 4 years ago

m

→‎{{header|REXX}}: used a template for the output section, added/changed whitespace and comments,

Anonymous user

rosettacode>Gerard Schildberger

Revision as of 17:53, 25 November 2019 (view source) rosettacode>Gerard Schildberger m (added whitespace to the task's preamble, added a ;Task:, added whitespace before the TOC.) ← Older edit		Revision as of 18:04, 25 November 2019 (view source) rosettacode>Gerard Schildberger m (→‎{{header\|REXX}}: used a template for the output section, added/changed whitespace and comments,) Newer edit →
Line 637: =={{header\|REXX}}== <lang rexx>/REXX program scans a text (contained within the REXX ~~pgm~~program) to extract URIs. and IRIs/ ~~text~~$$= 'this URI contains an illegal character, parentheses and a misplaced full stop:', 'http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer). (which is handled by http://mediawiki.org/).', 'and another one just to confuse the parser: http://en.wikipedia.org/wiki/-)', '")" is handled the wrong way by the mediawiki parser.', 'ftp://domain.name/path(balanced_brackets)/foo.html', 'ftp://domain.name/path(balanced_brackets)/ending.in.dot.', 'ftp://domain.name/path(unbalanced_brackets/ending.in.dot.', 'leading junk ftp://domain.name/path/embedded?punct/uation.', 'leading junk ftp://domain.name/dangling_close_paren)', 'if you have other interesting URIs for testing, please add them here:' @abc= 'abcdefghijklmnopqrstuvwxyz'; ~~@abcs=@abc\|\|translate~~ /construct lowercase (~~@abc~~Latin) alphabet./ @abcU= @abc; upper @abcU; @abcs= @abc \|\| @abcU /* " lower & uppercase " / ~~@scheme=@abcs \|\| 0123456789 \|\| '+-.'~~ @~~unreserved~~scheme= @abcs \|\| 0123456789 \|\| '+-._~' /add decimal digits & some punctuation/ @unreserved= @abcs \|\| 0123456789 \|\| '-._~' / " " " " " " / @reserved= @unreserved"/?#[]@!$&)(+,;=\'" /add other punctuation & special chars/ ~~t=space(text)' ' /variable T is a working copy./~~ #$=0 space($$)' ' /~~count~~variable of ~~URI's~~$ ~~found~~ sois ~~far.~~a working copy of $$ / #= 0 /~~▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄~~the count of URI's found (so far)./ /▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄/ ~~do while t\='' /scan text for multiple URIs. /~~ ~~y=pos(':',t)~~ do while $\=''; y= pos(':', $) /locate a colon (:) in the text body.$/ if y==0 then leave /~~Colon~~Was a colon found? NoNope, we're done. / if y==1 then do; parse var $ . $ /handle a bare colon by itself. / ~~parse~~ ~~var~~ t . t iterate /~~ignore~~go ~~the~~and ~~bare~~keep ~~colon~~scanning ~~(:).~~for a colon. / ~~iterate~~end /go &[↑] ~~keep~~ ~~scanning for~~ (a ~~colon~~rare special case.) / sr= reverse( left($, y - 1) ~~end~~ ) /extract ~~[↑]~~the ascheme ~~rare~~ ~~special~~and ~~case~~reverse it. / srse=~~reverse(left~~ verify(tsr,~~y-1)~~ @scheme) /~~extract~~locate the ~~scheme~~ ~~and~~end ~~reverse~~ of the scheme. / se$=~~verify~~ substr(sr$,~~@scheme~~ y + 1) /~~locate~~assign ~~the~~an ~~end~~adjusted ofnew ~~the scheme~~text. / tif se\=~~substr(t,y+1)~~=0 then sr= left(sr, se - 1) /possibly "crop" ~~/assign~~the an ~~adjusted~~scheme ~~new~~ ~~text~~name. / ~~if se\=~~s=0 ~~then sr=left~~reverse(sr~~,se-1~~) /~~possibly~~reverse ~~crop~~it ~~the~~again ~~scheme~~to rectify the name. / she=~~reverse~~ verify(sr$, @reserved) /~~reverse~~locate ~~again~~the toend ~~rectify~~of ~~name~~the hier─part. / hes=~~verify~~ s':'left(t$,~~@reserved~~ he - 1) /~~locate~~extract and append the ~~end~~hier─part. of ~~the~~ ~~hier-part~~/ s$=~~s':'left~~ substr(t$, he-1) /~~extract~~assign &an ~~append~~adjusted ~~the~~new ~~hier-~~part of text. / t#=~~substr(t,he)~~ # + 1 /~~assign~~bump anthe ~~adjusted~~ ~~new~~URI ~~text~~ counter. / !.#=~~#+1~~ s /~~bump~~assign the URI ~~counter.~~ to an array (!.) / ~~!.#=s~~end /while/ /~~assign~~ [↑] scan the text for URI's. to an ~~array.~~ / /▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀/ ~~end /while t\='' / / [↑] scan the text for URIs. /~~ do k=1 for #; say !.k; end /stick a fork in it, we're all done. /~~▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀~~</lang> {{out\|output\|text=  when using the internal default inputs:}} ~~do k=1 for #; say !.k; end /stick a fork in it, we're done.*/</lang>~~ ~~'''output'''~~ <pre> stop: