Talk:Text to HTML: Difference between revisions

Line 81:
: Anyways, this is a plausible start, but some aspects of it still seem overly open-ended. --[[User:Rdm|Rdm]] 22:11, 5 January 2012 (UTC)
:: the requirements are currently open ended exactly because i want to bring out questions like yours. thank you. we can discuss these questions then use them to formulate more concrete requirements.--[[User:EMBee|eMBee]] 02:33, 6 January 2012 (UTC)
::: This task is ill-conceived. You want to derive markup information from text "without format information", yet the only part that fits it is the URL extraction. For everything else, such as indentation, paragraph separation, bullets, etc, you'd always need ''some'' format information from the so-called "plain" text source: you need to venture guesses using information provided by whitespaces and special characters. The problem is, simple markups people use in a text file is almost always informal and ambiguous: a paragraph begining with an asterisk might be a bulleted list item, but it could also be a footnote, or a bold type face, or any other kind of thing people may fancy. When you make a guess at a piece of ambiguous text and put some HTML tags around it, you eliminate all but one interpretation, which can be horribly wrong. If you luckily managed to make all the right guesses, you wouldn't be adding any useful information to the text anyway, and if you make any wrong guesses, you end up assigning a document a structure that mangles the meaning of the text. It's ok to want to make some text appear more pleasing, but not at the cost of loss of information (or the chance for a human reader to guess at the correct information). Without a well defined convention for text markup such as in wiki text, this whole exercise seems very pointless to me. --[[User:LedrugEMBee|LedrugeMBee]] 0809:3152, 7 January 2012 (UTC)
:::: could you come up with some examples? if we restrict the scope of the input to not process large documesnts but things you'd write in a forum, an email, or a blog post, will that simplify the problem? --[[User:EMBee|eMBee]] 09:52, 7 January 2012 (UTC)
 
::: When you make a guess at a piece of ambiguous text and put some HTML tags around it, you eliminate all but one interpretation, which can be horribly wrong. If you luckily managed to make all the right guesses, you wouldn't be adding any useful information to the text anyway, and if you make any wrong guesses, you end up assigning a document a structure that mangles the meaning of the text. It's ok to want to make some text appear more pleasing, but not at the cost of loss of information (or the chance for a human reader to guess at the correct information). Without a well defined convention for text markup such as in wiki text, this whole exercise seems very pointless to me. --[[User:Ledrug|Ledrug]] 08:31, 7 January 2012 (UTC)
:::: you are right, it won't go without conventions. but conventions do not imply markup. unless you count whitespace as markup. what i mean is the difference between:
title without markup
and
==title with markup==
:::: of course there need to be formatting rules. the interpretation by the parser makes it a rule. if the interpretation by the parser is wrong then the input text doesn't fit the rules.
:::: the goal of this discussion is to work out which rules or conventions make sense, are commonly found in the wild and not surprising to a user whose text is getting processed.
 
:::: the conventions applied in the pike solution would be:
* isolated lines are titles
* bullets and numbers are lists
* empty lines end or begin a paragraph
:::: the above discussion about a table format is also a convention. is it a good one? we don't know yet. so we are already discussing that, we just didn't call it conventions or rules.
:::: which other conventions can we find?--[[User:EMBee|eMBee]] 09:52, 7 January 2012 (UTC)
Anonymous user