Anonymous user
Find URI in text: Difference between revisions
m
added whitespace to the task's preamble, added a ;Task:, added whitespace before the TOC.
m (added whitespace to the task's preamble, added a ;Task:, added whitespace before the TOC.) |
|||
Line 1:
{{Draft task|Text processing}}
;Task:
Write a function to search plain text for URIs or IRIs.
Line 10 ⟶ 12:
The abbreviation IRI isn't as well known as URI and the short description is that an IRI is just an alternate form of a URI that supports Internationalization and hence Unicode. While many specifications support both forms this isn't universal.
Consider the following issues:
:* <big> <code>. , ; ' ? ( )</code> </big> are legal characters in a URI, but they are often used in plain text as a delimiter.
:* IRIs allow most (but not all) unicode characters.
:* URIs can be something else besides http:// or https://
▲ this URI contains an illegal character, parentheses and a misplaced full stop:
▲ http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer). (which is handled by http://mediawiki.org/).
▲ and another one just to confuse the parser: http://en.wikipedia.org/wiki/-)
ftp://domain.name/path(balanced_brackets)/foo.html
▲ ")" is handled the wrong way by the mediawiki parser.
▲ if you have other interesting URIs for testing, please add them here:
Regular expressions to solve the task are fine, but alternative approaches are welcome too. (otherwise, this task would degrade into 'finding and applying the best regular expression')
'''Extra Credit:''' implement the parser to match the IRI specification in RFC 3987.
<br><br>
=={{header|Go}}==
|