Find URI in text: Difference between revisions
Content added Content deleted
(something practical for web developers) |
(Pike) |
||
Line 9: | Line 9: | ||
Consider the following issues: |
Consider the following issues: |
||
* <code>. , ; ? ( )</code> are legal characters in a URI, but they are often used in plain text as a delimiter. |
* <code>. , ; ' ? ( )</code> are legal characters in a URI, but they are often used in plain text as a delimiter. |
||
* a user may type an URI as seen in the browser location-bar with non-ascii characters (which are not legal). |
* a user may type an URI as seen in the browser location-bar with non-ascii characters (which are not legal). |
||
* URIs can be something else besides http:// or https:// |
* URIs can be something else besides http:// or https:// |
||
Line 18: | Line 18: | ||
Regular expressions to solve the task are fine, but alternative approaches are welcome too. (otherwise, this task would degrade into 'how to apply a regular expression') |
Regular expressions to solve the task are fine, but alternative approaches are welcome too. (otherwise, this task would degrade into 'how to apply a regular expression') |
||
=={{header|Pike}}== |
|||
<lang Pike>string uritext = "this URI contains an illegal character, parentheses and a misplaced full stop: |
|||
http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer). (which is handled by http://mediawiki.org)."; |
|||
array find_uris(string uritext) |
|||
{ |
|||
array uris=({}); |
|||
int pos=0; |
|||
while((pos = search(uritext, "://", pos+1))>0) |
|||
{ |
|||
int prepos = sizeof(array_sscanf(reverse(uritext[pos-20..pos-1]), "%[a-zA-Z0-9+.-]%s")[0]); |
|||
int postpos = sizeof(array_sscanf(uritext[pos+3..], "%[^ <>\"]%s")[0]); |
|||
if (uritext[pos-prepos-1]=='(' && uritext[pos+postpos+2]==')') |
|||
postpos--; |
|||
uris+= ({ uritext[pos-prepos..pos+postpos+2] }); |
|||
} |
|||
return uris; |
|||
}</lang> |