Talk:Find URI in text: Difference between revisions

Line 13:
: not sure about "stop:". because for one, new schemes can be made up. some applications have internal schemes that are known to us. the task only asks to find URIs, not process them, thus the decision to deal with "stop:" or not, can be handled in the processing stage. for example in some cases you may only be interested in http, https, and maybe ftp. in such a case you'd go through the list of matches and remove anything that is not of interest. of course one could write the parser in a way that it can take a list going in to decided which schemes should be found, but by default there is no harm in finding to much.
: nothing in the task indicates that parenthesis must be balanced either. unbalanced parenthesis are certainly valid and are what the author intended too. please look at the live example i found from wikipedia: [http://en.wikipedia.org/wiki/-) http://en.wikipedia.org/wiki/-)] (and note how mediawiki parses it wrong :-).--[[User:EMBee|eMBee]] 03:57, 8 January 2012 (UTC)
:: I had another look and the RFC definitions also allow '-' and '.' in through 'unreserved', 'pchar', and 'segment' so "http://en.wikipedia.org/wiki/-)" and "http://en.wikipedia.org/wiki/-" are valid as you indicated as well as "http://mediawiki.org/).". Also the URI with the illegal character is valid up until that character so "http://en.wikipedia.org/wiki/Erich_K" is valid. Appendix C doesn't help much as none of the sample URI's are cleanly delineated. --[[User:Dgamey|Dgamey]] 04:37, 8 January 2012 (UTC)
== Expected Output Needed ==
A list of expected output should be given to avoid confusion. Some of the examples are clearly wrong.
* Pike is incomplete and includes the illegal char
* TXR also includes the illegal character
At this time that would be all of the examples are wrong. --[[User:Dgamey|Dgamey]] 04:37, 8 January 2012 (UTC)
Anonymous user