Text to HTML: Difference between revisions

Line 999:

<p>The formatter can render text with <i>italics</i>, <b>bold</b> and in a <tt>typewriter</tt> font. It also does the right thing with <angle brackets> and &amp;ersands, but relies on the encoding of the characters to be conveyed separately.</p>

</body></html></syntaxhighlight>

=={{header|Vim Script}}==

The problem description is quite open-ended, so this example considers the following as criteria for this Vim Script solution:

* The initial line has the title, which will also be treated as heading level 1 and centred.

* Centred lines (i.e., preceded by more than one space) will be treated as heading level 2 and also centred.

* There is no '''markup''' (as you would see in Markdown, Asciidoc, or other light markup languages). However, this excludes...

* Bulleted and numbered lists, which are determined by lines starting with asterisk-space and numeral-period-space respectively (as you would expect in "plain text").

* Tables in the input are identified by text delimited by tab characters (in contiguous lines), with the first line treated as the table's header.

* Since the output is XHTML, (a) The XML declaration, DOCTYPE, and XML namespace should be as per XHTML 1.0 Strict, and (b) XML predefined entities should be used where appropriate, i.e., &amp;, &apos;, &gt;, &lt;, and &quot; but if character references are in the text file already those should be left as-is.

* Hypertext external links will be handled, and their content should be replicated in the main text.

'''Input file'''

<pre>

Text to HTML using Vim Script

Introduction

This is an example of converting plain text to HTML which demonstrates extracting a title and escaping certain characters within bulleted and numbered lists.

Lists

A 'normal' paragraph before a list.

* This is a bulleted list with a less than sign (<)

* And this is its second line with a greater than sign (>)

A 'normal' paragraph between the lists.

1. This is a numbered list with an ampersand (&), but DO NOT substitute the ampersands within character references like &#x1F606; (😆)

2. "Second line" in double quotes, with “smart” quotes

3. 'Third line' in single quotes, with ‘smart’ ones too, and

4. This, https://rosettacode.org/wiki/Text_to_HTML, is a URI.

Tables

A normal paragraph before a table, which has been formulated with U+0009 tab characters:

Head cell 1 Head cell 2 Head cell 3

Row 2 col 1 Row 2 col 2 Row 2 col 3

Row 3 col 1 Row 3 col 2 Row 3 col 3

Row 4 col 1 Row 4 col 2 Row 4 col 3

The HTML output may be checked against https://validator.w3.org/check to validate that it is valid XHTML.

Conclusion

That's all folks.

</pre>

<small>

NB: &#x1F606; in the input file needed to have &amp; added to it to display correctly.</small>

'''Vim Script (and running it)'''

The following Vim Script has been written to be run from the command line with:

<pre>vim -c "source Text_to_HTML.vim" Text_to_HTML.xhtml</pre>

where:

* ''Text_to_HTML.xhtml'' is the input file (a copy of the .txt file to convert), above, which will be overwritten by

* ''Text_to_HTML.vim'', the Vim Script, reproduced below.

" Substitute the XML predefined character entities

%s/&\ze$[^A-z#]$/\&/g

%s/>/\>/g

%s/</\</g

%s/"/\"/g

%s/'/\'/g

" Substitute URIs: presumes ! $ & ' ( ) * + , ; = : will be %xx escaped

%s/http[s]\?:\/\/[A-z0-9._~:/-]\+\ze[^.:]/<a href="\0">\0<\/a>

" Substitute simple tables, which use tabs (U+0009)

%s/$[^\t]\+\t.\+\n\n\?$\+/<table>\r\0<\/table>\r/

%s/$[^\t]\+\t.\+\n\n\?$\+/<thead>\0<\/tbody>/

%s/$<thead>$$.\+$/\1\r<tr>\2<\/tr>\r<\/thead>\r<tbody>/

%s/^$[^<][^\t]\+\t.\+$\n\n\?$<\/tbody>$/<tr>\1<\/tr>\r\2\r/

%s/^$[^<][^\t]\+\t.\+$\n\n\?/<tr>\1<\/tr>\r/

%s/<tr>\zs.*\ze<\/tr>/\=substitute(submatch(0), '\t', '<\/td><td>', 'g')/g

%s/<tr>/&<td>/

%s/<\/tr>/<\/td>&/

" Substitute the unordered list items, and temporarily precede them with

%s/* $.\+$\n\n*/<li>\1<\/li>\r/

" Substitute the ordered list items, and temporarily precede them with

%s/\d[.] $.\+$\n\n*/<li>\1<\/li>\r/

" Clean up  contiguous lines, wrapping them in <ol>

%s/$<li>.\+\n$\+/<ol>\r&<\/ol>\r/

" Clean up  contiguous lines, wrapping them in <ul>

%s/$<li>.\+\n$\+/<ul>\r&<\/ul>\r/

" Clean up  - remove the placeholder comment

%s///g

" Add the XML declaration, XHTML strict DOCTYPE, <head> and <title> block (with <script> and CSS for the tables), putting the text within <title>...</title>

1s/\s\+$.\+$\n\n\?/<\?xml version="1.0" encoding="UTF-8"\?>\r<!DOCTYPE html PUBLIC "-\/\/W3C\/\/DTD XHTML 1.0 Strict\/\/EN" "http:\/\/www.w3.org\/TR\/xhtml1\/DTD\/xhtml1-strict.dtd">\r<html xmlns="http:\/\/www.w3.org\/1999\/xhtml" xml:lang="en" lang="en">\r<head><title>\1<\/title>\r<style type="text\/css">\rh1, h2 { font-weight: bold; text-align: center; }\rtable, th, td { border: 1px solid black; }\r<\/style>\r<\/head>\r<body>\r<h1>\1<\/h1>\r/

" Substitute paragraphs starting with space+ A-Z and wrap within a <h2>...</h2>

%s/^\s\+$[A-Z].\+$\n/<h2>\1<\/h2>\r/

" Substitute paragraphs starting with A-Z and wrap within a <p>...</p>

%s/^$[A-Z].\+$\n/<p>\1<\/p>\r/

" Add the </body> and </html> to the end of the buffer

$s/\n/&<\/body>\r<\/html>/

" Substitute double returns with single returns

%s/\n\n/\r/

" Write the file and quit Vim

wq!

</syntaxhighlight>

<pre>

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<head><title>Text to HTML using Vim Script</title>

h1, h2 { font-weight: bold; text-align: center; }

table, th, td { border: 1px solid black; }

</style>

</head>

<body>

<h1>Text to HTML using Vim Script</h1>

<h2>Introduction</h2>

<p>This is an example of converting plain text to HTML which demonstrates extracting a title and escaping certain characters within bulleted and numbered lists.</p>

<h2>Lists</h2>

<p>A 'normal' paragraph before a list.</p>

<ul>

<li>This is a bulleted list with a less than sign (<)</li>

<li>And this is its second line with a greater than sign (>)</li>

</ul>

<p>A 'normal' paragraph between the lists.</p>

<ol>

<li>This is a numbered list with an ampersand (&), but DO NOT substitute the ampersands within character references like &#x1F606; (😆)</li>

<li>"Second line" in double quotes, with “smart” quotes</li>

<li>'Third line' in single quotes, with ‘smart’ ones too, and</li>

<li>This, <a href="https://rosettacode.org/wiki/Text_to_HTML">https://rosettacode.org/wiki/Text_to_HTML</a>, is a URI.</li>

</ol>

<h2>Tables</h2>

<p>A normal paragraph before a table, which has been formulated with U+0009 tab characters:</p>

<table>

<thead>

</thead>

<tbody>

</tbody>

</table>

<p>The HTML output may be checked against <a href="https://validator.w3.org/check">https://validator.w3.org/check</a> to validate that it is valid XHTML.</p>

<h2>Conclusion</h2>

<p>That's all folks.</p>

</body>

</html>

</pre>

<small>NB: Again, &#x1F606; in the output file needed to have &amp; added to it to display correctly.</small>

This output validates (checked, as noted in the penultimate paragraph of the output, at https://validator.w3.org/check).

=={{header|Wren}}==