Find URI in text: Difference between revisions

m
(add IRI from RFC 3987, extra credit)
Line 33:
This example follows RFC 3986 very closely (see Talk page for discussion). For better IP parsing see [[Parse_an_IP_Address]].
This solution doesn't handle IRIs per RFC 3987. Neither Icon nor Unicon natively support Unicode although ObjectIcon does.
This solution doesn't currently handle delimitation explicitly. Examples of the form ''<URI>'' or ''"URI"'' aren't needed as they will correctly parse in any event. Ambiguous examples like ''(URI)'' which use valid URI characters will currently parse as ''URI)'' and not ''URI''. URIs are returned per the RFC. For example URIs ending in dots are currently returned with the dot. Once the information is lost the user must guess and reconstruct; however, it's far easier to make remove a character if the URI doesn't work.
 
Filtering of URIs for disambiguation and delineation would be best handled in the 'findURItext' procedure. It might also be a good idea to return both unfiltered and filtered URIs here.
Anonymous user