Talk:XML/Input: Difference between revisions

not saying you should write an xml parser, but that you should use one
(→‎Interpreting XML?: Character conversion should not be in this task)
(not saying you should write an xml parser, but that you should use one)
Line 10:
:::::No. Numeric references, a small set of predefined entities, and the permitted character set, are [http://www.w3.org/TR/xml11/ part of the XML specification]. All XML parsers must support them. Practically, I think it is better for Rosetta Code if our examples show ''robust'', fully-general solutions rather than just-enough-for-the-example-at-hand. Don't spread code that will break when someone with an accent in their name comes along. --[[User:Kevin Reid|Kevin Reid]] 12:23, 2 June 2009 (UTC)
::::::The purpose of Rosetta Code is ''not'' to provide robust, full applications such as XML parsers. Such an application would require thousands of lines of code. Nobody would write them. We should have simple, clearly defined tasks that solve some specific problem, or can be used as a (small) part of an application. The task has to be specified clearly enough so that the implementations will actually solve the problem instead of using shortcut to a known answer. I think this task should be about extracting information from XML file. Character conversion is an entirely different task. Often you would not even want the conversion to be done. And the conversion has nothing to do with code breaking. I now added Vedit example that extracts the data but does not do the conversion. And it does not break because of this, it just extracts the data as expected. --[[User:PauliKL|PauliKL]] 16:33, 2 June 2009 (UTC)
::::::: Certainly no RC example should contain a full XML parser. This one, however, should ''use'' a conformant XML parser library. In a task that is about processing XML, it is misleading to demonstrate half-baked solutions. This is not a matter of doing some "translation" -- it is an inherent part of ''parsing XML at all''. The XML specification [http://www.w3.org/TR/xml11/#entproc states] that "REQUIRED" behavior of an XML processor is that "the indicated character is processed in place of the reference itself" when a character reference occurs in attribute values. --[[User:Kevin Reid|Kevin Reid]] 17:18, 2 June 2009 (UTC)
 
::Donal, the problem is that AWK implementation does not interpret the structure at all. It is quite possible to do some parsing even if there are no ready-made library routines for that. But that does not mean that we should implement a full XML parser. The task should be kept relatively simple.