Talk:Find URI in text: Difference between revisions
Content added Content deleted
Line 20: | Line 20: | ||
* TXR also includes the illegal character |
* TXR also includes the illegal character |
||
At this time that would be all of the examples are wrong. --[[User:Dgamey|Dgamey]] 04:37, 8 January 2012 (UTC) |
At this time that would be all of the examples are wrong. --[[User:Dgamey|Dgamey]] 04:37, 8 January 2012 (UTC) |
||
: this is maybe not clear from the task description, handling unicode characters is intentional in order to allow a user to write an url as they see it. (look at how http://en.wikipedia.org/wiki/Erich_Kästner_(camera_designer) is displayed in the browser when you follow the link.) |
|||
:it is not necessary to copy the example input exactly. if you can think of other examples that are worth testing, please include them too. |
|||
:as for the expected output, this is a question of the balance beween following the rfc and handling user expectations. for example, a <code> . </code> or <code> , </code> at the end of a URI is most likely not part of the URI according to user expectation, but it is a legal character in the RFC. which rule is better? i don't know. until someone can show a live URI that has <code> . </code> or <code> , </code> at the end i am inclined to remove them. in contrast the <code>()</code> case is somewhat easier to decide. if there is a <code>(</code> before the URI, then clearly the <code>)</code> at the end is also not part of the URI, but there are edge-cases too.--[[User:EMBee|eMBee]] 06:58, 8 January 2012 (UTC) |