Talk:Find URI in text: Difference between revisions

Line 20:

* TXR also includes the illegal character

At this time that would be all of the examples are wrong. --[[User:Dgamey|Dgamey]] 04:37, 8 January 2012 (UTC)

: this is maybe not clear from the task description, handling unicode characters is intentional in order to allow a user to write an url as they see it. (look at how http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer) is displayed in the browser when you follow the link.)

:it is not necessary to copy the example input exactly. if you can think of other examples that are worth testing, please include them too.

:as for the expected output, this is a question of the balance beween following the rfc and handling user expectations. for example, a <code> . </code> or <code> , </code> at the end of a URI is most likely not part of the URI according to user expectation, but it is a legal character in the RFC. which rule is better? i don't know. until someone can show a live URI that has <code> . </code> or <code> , </code> at the end i am inclined to remove them. in contrast the <code>()</code> case is somewhat easier to decide. if there is a <code>(</code> before the URI, then clearly the <code>)</code> at the end is also not part of the URI, but there are edge-cases too.--[[User:EMBee|eMBee]] 06:58, 8 January 2012 (UTC)