Find URI in text: Difference between revisions

Line 9:

Consider the following issues:

* <code>. , ; ? ( )</code> are legal characters in a URI, but they are often used in plain text as a delimiter.

* <code>. , ; ' ? ( )</code> are legal characters in a URI, but they are often used in plain text as a delimiter.

* a user may type an URI as seen in the browser location-bar with non-ascii characters (which are not legal).

* URIs can be something else besides http:// or https://

Line 18:

Regular expressions to solve the task are fine, but alternative approaches are welcome too. (otherwise, this task would degrade into 'how to apply a regular expression')

=={{header|Pike}}==

<lang Pike>string uritext = "this URI contains an illegal character, parentheses and a misplaced full stop:

http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer). (which is handled by http://mediawiki.org).";

array find_uris(string uritext)

{

array uris=({});

int pos=0;

while((pos = search(uritext, "://", pos+1))>0)

{

int prepos = sizeof(array_sscanf(reverse(uritext[pos-20..pos-1]), "%[a-zA-Z0-9+.-]%s")[0]);

int postpos = sizeof(array_sscanf(uritext[pos+3..], "%[^ <>\"]%s")[0]);

if (uritext[pos-prepos-1]=='(' && uritext[pos+postpos+2]==')')

postpos--;

uris+= ({ uritext[pos-prepos..pos+postpos+2] });

}

return uris;

}</lang>