Syntax highlighting using Mediawiki formatting
You are encouraged to solve this task according to the task description, using any language you may know.
- Introduction
When formatting a page for display, Mediawiki allows the page to include bold and italic text by placing the bold/italic text within paired repeated-single quote characters - 3 single quotes for bold and 2 for italic, 5 for bold italic.
E.g.:
'''bold-word''' and ''italic-word'' appears as bold-word and italic-word.
This could be used to provide simple syntax-highlighting without the use of the relatively more expensive <syntaxhighlight> tags or for languages not currently supported by Pygments.
A few languages on Rosetta Code are currently using schemes like this.
- Task
The task is to write a syntax highlighter that given a source in your language will output a wiki formatted version of the source with the keywords/reserved words in bold and the comments in italics.
Note that each source line (including blank lines) should be output with a leading space, to ensure the source is treated as a single block.
Additionally, translate the following characters:
- single-quote (') to '
- ampersand (&) to &
- less-than (<) to <
- greater-than (>) to >
If your language doesn't have keywords/reserved words or comments, use your judgement on what to highlight in bold or italic : )
- Presenting your source
Instead of showing your source within syntaxhighlight tags and having a separate output block, just show the source that would be output from your program when given its own source to process.
I.e., don't use syntaxhighlight tags.
- See also
https://www.mediawiki.org/wiki/Help:Formatting
ALGOL 68
Handles upper-stropping Algol 68 sources (as used by ALGOL 68G and most other compilers).
CO Convert an upper-stropped Algol 68 source to "wiki" format each line is preceeded by a space, bold words are enclosed in ''' and ''' and comments in '' and '' ', &, < and > are converted to ' & < and > everything else if output as is quote-stropping, point-stropping and res-stropping is not suppoered the source is read from stand in and written to stand out the last line in the file must end with a newline { and } are assumed to be alternatives for ( and ), if { } should be treated as comments ( as in ALGOL68RS/algol68toc ) change rs style brief comments to TRUE CO BEGIN BOOL in string := FALSE; BOOL in brief comment := FALSE; INT rs comment depth := 0; STRING comment delimiter := ""; # TRUE if {} delimits a nestable brief comment, as in ALGOL 68RS and # # algol68toc, FALSE if {} are alternatives to () as in ALGOL 68G # BOOL rs style brief comments = FALSE; BOOL at eof := FALSE; # TRUE if EOF has been reached, FALSE otherwise # on logical file end( stand in # set EOF handler for stand in # , ( REF FILE f )BOOL: # note that we reached EOF on the latest read # # and return TRUE so processing can continue # at eof := TRUE ); CHAR nl = REPR 10; # newline character # STRING line := nl; # current source line # INT pos := LWB line; # current position in line # CHAR c := " "; # current source character # PROC next char = VOID: # gets the next source character, stores it in c # IF pos <= UPB line THEN c := line[ pos ]; # not past the end of the source line # pos +:= 1 ELIF # past the end of the current source line - get the next # at eof := FALSE; read( ( line, newline ) ); NOT at eof THEN line +:= nl; # have another line # c := line[ pos := LWB line ]; pos +:= 1 ELSE line := ""; # reached eof # c := REPR 0 FI # next char # ; PROC out char = ( CHAR ch )VOID: # conveerts and outputs ch # IF ch = nl THEN IF NOT in brief comment AND rs comment depth = 0 AND comment delimiter = "" THEN print( ( newline, " " ) ) # newline not in a comment # ELSE # newline in a comment # italic delimiter; print( ( newline, " " ) ); italic delimiter FI ELIF ch = "<" THEN print( ( "<" ) ) ELIF ch = ">" THEN print( ( ">" ) ) ELIF ch = "&" THEN print( ( "&" ) ) ELIF ch = "'" THEN print( ( "'" ) ) ELSE print( ch ) FI # out char # ; # outputs the current character and gets the next # PROC out and next char = VOID: BEGIN out char( c ); next char END; # outputs a wiki start/end italic delimiter # PROC italic delimiter = VOID: print( ( "''" ) ); # returns TRUE if the current character can start a bold word # PROC have bold = BOOL: c >= "A" AND c <= "Z"; PROC get bold word = STRING: # gets a bold word from then source # BEGIN STRING result := ""; WHILE have bold OR c = "_" DO result +:= c; next char OD; result END # get bold word # ; # copy the source to stand out, conveerting to wiki format # next char; WHILE NOT at eof DO IF in string THEN # currently in a string # in string := c /=""""; out and next char ELIF in brief comment THEN # currently in a brief comment # in brief comment := c /= "#"; out and next char; IF NOT in brief comment THEN italic delimiter FI ELIF rs comment depth > 0 THEN # currently in a nesting {...} comment # IF c = "}" THEN rs comment depth -:= 1 FI; out and next char; IF rs comment depth < 1 THEN italic delimiter FI ELIF comment delimiter /= "" THEN # in a CO/COMMENT comment # IF NOT have bold THEN out and next char # haven't reached a bold word # ELSE STRING word = get bold word; # at the start of a bold word # print( ( word ) ); IF word = comment delimiter THEN # reached the end of the comment # italic delimiter; comment delimiter := "" FI FI ELIF c = """" THEN # start of a string or character denotation # out and next char; in string := TRUE ELIF c = "#" THEN # start of a brief comment such as this one # italic delimiter; out and next char; in brief comment := TRUE ELIF c = "{" AND rs style brief comments THEN # nestable brief # italic delimiter; # comment ( ALGOL 68RS and algol68toc ) # out and next char; rs comment depth := 1 ELIF have bold THEN # have a bold word # STRING word = get bold word; IF word /= "CO" AND word /= "COMMENT" THEN print( ( "'''", word, "'''" ) ) # non-comment bold word # ELSE italic delimiter; # start of a bold comment # print( ( word ) ); comment delimiter := word FI ELSE # anything else # out and next char FI OD; IF in string THEN print( ( "**** unterminated string", newline ) ) ELIF in brief comment THEN print( ( "**** unterminated brief comment", newline ) ) ELIF rs comment depth > 0 THEN print( ( "**** unterminated {...} comment", newline ) ) ELIF comment delimiter /= "" THEN print( ( "**** unterminated ", comment delimiter, newline ) ) FI END
ALGOL W
begin comment syntax highlight an Algol W source using Mediawiki formatting the source is read from standard input and written to standard output ; % Algol W strings are limited to 256 characters in length so source lines % % are limited to 256 characters % integer lineWidth, errorCount, lowerA, upperA, linePos, kwMax; integer MAX_TOKEN_LENGTH; string(1) nl; string(256) line; string(1) currChar, commentEnd; string(9) array kw ( 1 :: 64 ); logical inString, inComment; % returns true if currChar is in the inclusive range low to high, false otherwise % logical procedure range( string(1) value low, high ) ; currChar >= low and currChar <= high; procedure nextChar ; % gets the next source character % if linePos = lineWidth then begin currChar := nl; linePos := linePos + 1 end else if linePos > lineWidth then begin readcard( line ); lineWidth := 256; while lineWidth > 1 and line( lineWidth - 1 // 1 ) = " " do lineWidth := lineWidth - 1; linePos := 1; currChar := line( 0 // 1 ) end else begin currChar := line( linePos // 1 ); linePos := linePos + 1 end nextChar ; % returns true if the current character can start an identifier, false otherwise % logical procedure identifierStartChar ; range( "a", "z" ) or range( "A", "Z" ); % returns true if the current character can be pat of an identifier, false otherwise % logical procedure identifierChar ; identifierStartChar or range( "0", "9" ) or currChar = "_"; procedure outAndNextChar ; begin % output currChar and get the next % if currChar = "'" then writeon( "'" ) else if currChar = "&" then writeon( "&" ) else if currChar = "<" then writeon( "<" ) else if currChar = ">" then writeon( ">" ) else if currChar = nl then begin if inComment then writeon( "''" ); write( " " ); if inComment then writeon( "''" ) end else writeon( currChar ); nextChar end outAndNextChar ; procedure identifierOrKeyword ; begin % handle an indentifier or keyword % string(9) word, lWord; integer wLength; % recursive keyword binary search % logical procedure isKeyword ( integer value low, high ) ; if high < low then false else begin integer mid; mid := ( low + high ) div 2; if kw( mid ) > lWord then isKeyword( low, mid - 1 ) else if kw( mid ) = lWord then true else isKeyword( mid + 1, high ) end binarySearchR ; wLength := 0; for chPos := 0 until 8 do begin if identifierChar then begin word( chPos // 1 ) := currChar; lWord( chPos // 1 ) := if range( "A", "Z" ) then code( ( decode( currChar ) - upperA ) + lowerA ) else currChar;; wLength := wLength + 1; nextChar end else begin lWord( chPos // 1 ) := " "; word( chPos // 1 ) := " " end if_identifierChar__ end for_chPos ; if identifierChar then begin % all keywords are <= 9 characters long so this must be an identifier % writeon( word ); while identifierChar do outAndNextChar end else if lWord = "comment" then begin writeon( "''comment" ); commentEnd := ";"; inComment := true end else if isKeyword( 1, kwMax ) then begin writeon( "'''" ); for chPos := 0 until wLength - 1 do writeon( word( chPos // 1 ) ); writeon( "'''" ) end else begin % identifier % for chPos := 0 until wLength - 1 do writeon( word( chPos // 1 ) ) end if_various_words end identifierOrKeyword ; s_w := 0; i_w := 1; % output formarting % MAX_TOKEN_LENGTH := 256; nl := code( 10 ); lowerA := decode( "a" ); upperA := decode( "A" ); % allow the program to continue after reaching end-of-file % ENDFILE := EXCEPTION( false, 1, 0, false, "EOF" ); % ensure the first call to nextChar reads the first line % lineWidth := 256; linePos := lineWidth + 1; currChar := " "; begin % keywords % procedure K ( string(9) value kwStr ) ; begin kwMax := kwMax + 1; kw( kwMax ) := kwStr end K ; kwMax := 0; K("abs");K("algol");K("and");K("array");K("assert");K("begin");K("bits"); K("case");K("complex");K("div");K("do");K("else");K("end");K("false"); K("for");K("fortran");K("go");K("goto");K("if");K("integer");K("is"); K("logical");K("long");K("not");K("null");K("of");K("or"); K("procedure");K("real");K("record");K("reference");K("rem");K("result"); K("shl");K("short");K("shr");K("step");K("string"); K("then");K("to");K("true");K("until");K("value");K("while") end keywords ; inString := inComment := false; outAndNextChar; while not XCPNOTED(ENDFILE) do begin if inString then begin % in a string % inString := currChar not = """"; outAndNextChar; end else if inComment then begin % in a comment % inComment := currChar not = ";" and currChar not = commentEnd; outAndNextChar; if not inComment then writeon( "''" ); end else if identifierStartChar then identifierOrKeyword else if currChar = """" then begin % string literal % outAndNextChar; inString := true end else if currChar = "%" then begin % brief comment % writeon( "''" ); commentEnd := "%"; inComment := true; outAndNextChar end else outAndNextChar end while_not_ar_eof ; if inComment then write( "**** unterminated comment" ) else if inString then write( "**** unterminated string" ) end.
AWK
Parsing of patterns may not be correct in all cases.
# convert an AWK source to wiki format # each line is preceeded by a space, # reserved words are enclosed in ''' and ''' and comments in '' and '' # ', &, < and > are converted to ' & < and > # everything else if output as is # the wiki source is written to stdout BEGIN \ { # reserved word list as in gawk and treating getline as reserved kw = "BEGIN/BEGINFILE/END/ENDFILE/" \ "break/case/continue/default/delete/do/while/else/" \ "exit/for/in/function/func/if/next/nextfile/switch/" \ "getline"; n = split( kw, reservedWords, "/" ); for( w = 1; w <= n; w ++ ) { reserved[ reservedWords[ w ] ] = w; } } # BEGIN { printf( " " ); line = $0; gsub( /&/, "\\&", line ); gsub( /</, "\\<", line ); gsub( />/, "\\>", line ); gsub( /'/, "\\'", line ); if( line != "" ) { c = ""; nextChar(); do { if ( c == "#" ) { # comment printf( "''#%s''", line ); c = ""; } else if( c == "\"" ) { # string literal do { if( c == "\\" ) { outAndNextChar(); } outAndNextChar(); } while( c != "\"" && c != "" ); if( c != "\"" ) { printf( "**** Unterminated string\n" ); } else { nextChar(); } printf( "\"" ); } else if( c == "/" && lastC !~ /[A-Za-z0-9_.]/ ) { # pattern bracketDepth = 0; outAndNextChar(); while( c != "" && ( c != "/" || bracketDepth > 0 ) ) { if( c == "\\" || c == "[" ) { if( c == "[" ) { bracketDepth ++; } outAndNextChar(); } else if( c == "]" ) { bracketDepth --; } outAndNextChar(); } if( c != "/" ) { printf( "**** Unterminated pattern\n" ); } else { nextChar(); } printf( "/" ); } else if( c ~ /[A-Za-z]/ ) { # have a reserved word or identifier word = ""; do { word = word c; nextChar(); } while( c ~ /[A-Za-z0-9_]/ ); if( word in reserved ) { word = "'''" word "'''"; } printf( "%s", word ); } else { # something else outAndNextChar(); } } while( c != "" ); } printf( "\n" ); } function outAndNextChar() { printf( "%s", c ); nextChar(); } function nextChar() { if( c != " " ) { # the last character wasn't a space, save it so we can recognise patterns lastC = c; } if( line == "" ) { # at end of line lastC = c = ""; } else { # not end of line c = substr( line, 1, 1 ); line = substr( line, 2 ); } } # nextChar
ed
The output has some inexactness due to primitiveness of regex, but that's probably the best one can get to with simple ed script (EDIT: RosettaCode seems to use a similarly simplistic formatting, highlighting even escaped slashes, so the script is not that bad it seems). Commands are highlighted with bold, regex patterns with italics.
# by Artyom Bologov - with corrections 27/07/2024 H g/&/s//\&/g g/'/s//\'/g g/</s//\</g g/>/s//\>/g g/^\([acdeEfgGhHijklmnpPqQrtuvVwx=yz]\)/s//'''\1'''/ g/\/\([^/]*\)\//s//\/''\1''\// g/^/s/^/ / ,p Q
FreeBASIC
' Syntax highlight a FreeBASIC source with Mediawiki markup ' ' each line is preceeded by a space, ' keywords are enclosed in ''' and ''' and comments in '' and '' ' ', &, < and > are converted to ' & < and > ' everything else if output as is ' the source is read from standard input and written to standard output ' ' /'...'/ comments are not handled Dim As Const String italic = "''", bold = "'''" ' markup for italic and bold Dim Shared As String kw( 1 To 150 ) ' Keyword table Dim Shared As Integer kwMax = 0 ' Number of keywords ' adds word to the table of keywords Sub addKw( ByVal word As String ) kwMax += 1 kw( kwMax ) = word End Sub ' (incomplete) list of keywords - must be added in sorted order addKw( "ABSTRACT" ): addKw( "ACCESS" ): addKw( "ALIAS" ): addKw( "AND" ) addKw( "ANDALSO" ): addKw( "ANY" ): addKw( "APPEND" ): addKw( "AS" ) addKw( "ASM" ): addKw( "BASE" ): addKw( "BINARY" ): addKw( "BOOLEAN" ) addKw( "BYREF" ): addKw( "BYTE" ): addKw( "BYVAL" ): addKw( "CALL" ) addKw( "CASE" ): addKw( "CAST" ): addKw( "CLASS" ): addKw( "CLOSE" ) addKw( "COMMON" ): addKw( "CONST" ): addKw( "CONSTRUCTOR" ): addKw( "CONTINUE" ) addKw( "DATA" ): addKw( "DECLARE" ): addKw( "DELETE" ): addKw( "DESTRUCTOR" ) addKw( "DIM" ): addKw( "DO" ): addKw( "DOUBLE" ): addKw( "ELSE" ) addKw( "ELSEIF" ): addKw( "END" ): addKw( "ENDIF" ): addKw( "ENUM" ) addKw( "EQV" ): addKw( "ERASE" ): addKw( "ERROR" ): addKw( "EXIT" ) addKw( "EXPLICIT" ): addKw( "EXPORT" ): addKw( "EXTENDS" ): addKw( "EXTERN" ) addKw( "FOR" ): addKw( "FUNCTION" ): addKw( "GET" ): addKw( "GOSUB" ) addKw( "GOTO" ): addKw( "IF" ): addKw( "IIF" ): addKw( "IMP" ) addKw( "IMPLEMENTS" ): addKw( "IMPORT" ): addKw( "INCLUDE" ): addKw( "INPUT" ) addKw( "INTEGER" ): addKw( "IS" ): addKw( "LET" ): addKw( "LIB" ) addKw( "LINE" ): addKw( "LOCK" ): addKw( "LONG" ): addKw( "LONGINT" ) addKw( "LOOP" ): addKw( "LPRINT" ): addKw( "MOD" ): addKw( "NAMESPACE" ) addKw( "NEW" ): addKw( "NEXT" ): addKw( "NOT" ): addKw( "ON" ) addKw( "OPEN" ): addKw( "OPERATOR" ): addKw( "OPTION" ): addKw( "OR" ) addKw( "ORELSE" ): addKw( "OUTPUT" ): addKw( "OVERLOAD" ): addKw( "PEEK" ) addKw( "POINTER" ): addKw( "POKE" ): addKw( "PRESERVE" ): addKw( "PRINT" ) addKw( "PRIVATE" ): addKw( "PROCPTR" ): addKw( "PROPERTY" ): addKw( "PROTECTED" ) addKw( "PTR" ): addKw( "PUBLIC" ): addKw( "PUT" ): addKw( "RANDOM" ) addKw( "READ" ): addKw( "REDIM" ): addKw( "REM" ): addKw( "RESTORE" ) addKw( "RESUME" ): addKw( "RETURN" ): addKw( "SCOPE" ): addKw( "SEEK" ) addKw( "SELECT" ): addKw( "SHARED" ): addKw( "SHL" ): addKw( "SHORT" ) addKw( "SHR" ): addKw( "SINGLE" ): addKw( "STATIC" ): addKw( "STEP" ) addKw( "STRING" ): addKw( "SUB" ): addKw( "SWAP" ): addKw( "THEN" ) addKw( "TO" ): addKw( "TYPE" ): addKw( "TYPEOF" ): addKw( "UBYTE" ) addKw( "UINTEGER" ): addKw( "ULONG" ): addKw( "ULONGINT" ): addKw( "UNION" ) addKw( "UNSIGNED" ): addKw( "UNTIL" ): addKw( "USHORT" ): addKw( "USING" ) addKw( "VAR" ): addKw( "VIRTUAL" ): addKw( "WEND" ): addKw( "WHILE" ) addKw( "WITH" ): addKw( "WRITE" ): addKw( "WSTRING" ): addKw( "XOR" ) addKw( "ZSTRING" ) ' returns True if word is a keyword, False otherwise Function isKeyword( ByVal word As String, ByVal low As Integer, ByVal high As Integer ) As Boolean If high < low Then Return False Else Dim As Integer m = ( low + high ) \ 2 If kw( m ) > word Then Return isKeyword( word, low, m - 1 ) ElseIf kw( m ) = word Then Return True Else Return isKeyword( word, m + 1, high ) End If End If End Function ' returns ch translated to an XML entity, if necessary. Function tran1( ByVal ch As String ) As String Select Case ch Case "<" Return "<" Case ">" Return ">" Case "&" Return "&" Case "'" Return "'" Case Else Return ch End Select End Function ' read the source from standard input, ' write the highlighted source to standard output Dim As String sLine Open Cons For Input As #1 Do Line Input #1, sLine If Eof( 1 ) Then Exit Do End If Dim As String oLine = " " Dim As Integer sPos = 1 Dim As Const Integer sMax = Len( sLine ) Do While sPos <= sMax Dim As String ch = Mid( sLine, sPos, 1 ) Select Case ch Case "'" ' Comment oLine &= italic Do oLine &= tran1( ch ) sPos += 1 If sPos <= sMax Then ch = Mid( sLine, sPos, 1 ) End If Loop Until sPos > sMax oLine &= italic Case """" ' String Do oLine &= tran1( ch ) sPos += 1 If sPos <= sMax Then ch = Mid( sLine, sPos, 1 ) End If Loop Until sPos > sMax Or ch = """" If ch = """" Then oLine &= """" End If Case Else If ( ch >= "a" And ch <= "z" ) Or ( ch >= "A" And ch <= "Z" ) Then ' Identifier or keyword Dim As String word = "" Do word &= ch sPos += 1 If sPos <= sMax Then ch = Mid( sLine, sPos, 1 ) End If Loop While sPos <= sMax _ And ( ( ch >= "a" And ch <= "z" ) _ Or ( ch >= "A" And ch <= "Z" ) _ Or ( ch >= "0" ANd ch <= "9" ) _ Or ch = "_" _ ) If isKeyword( UCase( word ), LBound( kw ), kwMax ) Then word = bold & word & bold End If oLine &= word sPos -= 1 Else ' Something else oLine &= tran1( ch ) End If End Select sPos += 1 Loop Print oLine Loop
Julia
#= Keywords in Julia. Handles two word reserved keywords. #= Also #= handles nested comments such as this. =# =# =# const KEYWORDS = map( w -> Regex("^" * w * "\\W"), sort( [ raw"abstract\s+type", "baremodule", "begin", "break", "catch", "const", "continue", "do", "else", "elseif", "end", "export", "false", "finally", "for", "function", "global", "if", "import", "in", "isa", "let", "local", "macro", "module", raw"mutable\s+struct", "outer", raw"primitive\s+type", "quote", "return", "struct", "true", "try", "using", "while", "where", ], rev = true, by = length), ) # reorder to largest first then convert to Regex """ Find the #= =# delineated comment, including nested versions """ function nestedcommentlimits(s::AbstractString, startcomment = "#=", stopcomment = "=#") either = Regex("$startcomment|$stopcomment", "sa") depth, startpos, stoppos = 0, 0, 0 for m in eachmatch(either, s) if m.match == startcomment startpos = startpos == 0 ? m.match.offset : startpos depth += 1 else stoppos = max(stoppos + 1, m.match.offset + 2) depth -= 1 end depth <= 0 && break end return startpos, stoppos end """ Given a string, output a string that has been modified by adding surrounding \'\' or \'\'\' bracketing for syntax highlighting of keywords and comments """ function partialhighlight(txt) outtxt = Char[] idx, len = 1, length(txt) while idx <= len if !isvalid(txt, idx) # exclude internal positions of multibyte Char idx += 1 continue end c = txt[idx] if c == '\\' # escape the next char, send as is push!(outtxt, c, txt[idx+1]) idx += 2 elseif c == '\"' # quotation start if idx < len - 2 && c == txt[idx+1] == txt[idx+2] # """ quotes """ qlen = findfirst(r"(?<!\\)\"\"\""sa, txt[idx+3:end]) qlen == nothing && error("error with terminator of quote at $idx") append!(outtxt, collect(replace(txt[idx:idx+qlen.stop+2], "\n" => "\n "))) idx += qlen.stop + 3 else # " quote " qlen = findfirst(r"(?<!\\)\"", txt[idx+1:end]) qlen == nothing && error("error with terminator of quote at $idx") append!(outtxt, collect(replace(txt[idx:idx+qlen.stop+1], "\n" => "\n "))) idx += qlen.stop + 2 end elseif c == '#' && txt[max(1, idx - 1)] != ''' # start comment if idx < len && txt[idx+1] == '=' #= comment =# start, stop = nestedcommentlimits(txt[idx:end]) s = replace(txt[idx:idx+stop-1], "\n" => "\n ") append!(outtxt, collect("\'\'$s\'\'")) idx += stop else # found a line comment, like this comment newlinepos = something(findfirst(==('\n'), txt[idx+1:end]), len - idx) append!(outtxt, collect("\'\'$(txt[idx:idx+newlinepos-1])\'\'")) idx += newlinepos end elseif c ∈ 'a':'z' # lowercase char so check for keyword match for (j, reg) in enumerate(KEYWORDS) m = match(reg, txt[idx:end]) if m != nothing wlen = m.match.ncodeunits - 2 append!(outtxt, collect("\'\'\'$(txt[idx:idx+wlen])\'\'\'")) idx += wlen + 1 break elseif j == lastindex(KEYWORDS) # no keyword found, send char to output push!(outtxt, c) idx += 1 end end elseif c in [''', '&', '<', '>'] # \x26 is char & for HTML entity translation s = c == ''' ? "\x26apos;" : c == '&' ? "\x26amp;" : c == '<' ? "\x26lt;" : "\x26gt;" append!(outtxt, collect(s)) idx += 1 else # nothing special found, so pass char to output and increment index into input push!(outtxt, c) idx += 1 end outtxt[end] == '\n' && push!(outtxt, ' ') end return String(outtxt) end println(partialhighlight(read(PROGRAM_FILE, String)))
Oberon-07
The pathname of the source to process should be specified on the command line. The formatted source is written to standard output.
Tested with Oberonc (Oberon-07 compiler for the JVM).
(* ToWiki: format an Oberon-07 source using MediaWiki markup *) (* the source is read from the file specified on the command line *) (* the output is written to standard output *) MODULE ToWiki; IMPORT Files, Out; CONST cr = 0AX; (* carriage-return character *) nl = 0DX; (* newline character *) quote = 22X; (* quote character *) maxRwLength = 10; (* maximum length of a reservced word *) maxRw = 36; (* maximum number of reserved words *) VAR sourcePath : ARRAY 256 OF CHAR; sourceFile : Files.File; ch : CHAR; ioStatus : INTEGER; inString, inComment : BOOLEAN; rwCount : INTEGER; rw : ARRAY maxRw OF ARRAY maxRwLength + 2 OF CHAR; PROCEDURE InitReservedWords; PROCEDURE AddRw( name : ARRAY OF CHAR ); BEGIN rw[ rwCount ] := name; INC( rwCount ) END AddRw; BEGIN (* includes "DEFINITION" - an oberonc extension *) rwCount := 0; AddRw("ARRAY");AddRw("BEGIN");AddRw("BY");AddRw("CASE");AddRw("CONST"); AddRw("DEFINITION");AddRw("DIV");AddRw("DO");AddRw("ELSE"); AddRw("ELSIF");AddRw("END");AddRw("FALSE");AddRw("FOR");AddRw("IF"); AddRw("IMPORT");AddRw("IN");AddRw("IS");AddRw("MOD");AddRw("MODULE"); AddRw("NIL");AddRw("OF");AddRw("OR");AddRw("POINTER"); AddRw("PROCEDURE");AddRw("RECORD");AddRw("REPEAT");AddRw("RETURN"); AddRw("THEN");AddRw("TO");AddRw("TRUE");AddRw("TYPE");AddRw("UNTIL"); AddRw("VAR");AddRw("WHILE") END InitReservedWords; PROCEDURE IsReservedWord( word : ARRAY OF CHAR ) : BOOLEAN; PROCEDURE SearchReservedWords( word : ARRAY OF CHAR ; low, high : INTEGER ) : BOOLEAN; VAR result : BOOLEAN; mid : INTEGER; BEGIN result := FALSE; IF high >= low THEN mid := ( low + high ) DIV 2; IF rw[ mid ] > word THEN result := SearchReservedWords( word, low, mid - 1 ) ELSIF rw[ mid ] = word THEN result := TRUE ELSE result := SearchReservedWords( word, mid + 1, high ) END END RETURN result END SearchReservedWords ; RETURN SearchReservedWords( word, 0, rwCount - 1 ) END IsReservedWord; PROCEDURE HaveLowerCase() : BOOLEAN; BEGIN RETURN ( ch >= "a" ) & ( ch <= "z" ) END HaveLowerCase; PROCEDURE HaveUpperCase() : BOOLEAN; BEGIN RETURN ( ch >= "A" ) & ( ch <= "Z" ) END HaveUpperCase; PROCEDURE HaveDigit() : BOOLEAN; BEGIN RETURN ( ch >= "0" ) & ( ch <= "9" ) END HaveDigit; PROCEDURE MarkComment; BEGIN Out.String( "''" ) END MarkComment; PROCEDURE MarkReservedWord; BEGIN Out.String( "'''" ) END MarkReservedWord; PROCEDURE NextChar; BEGIN ch := Files.ReadChar( sourceFile ); ioStatus := Files.Status( sourceFile ) END NextChar; PROCEDURE OutChar( c : CHAR ); BEGIN IF c = nl THEN (* newline *) IF inComment THEN MarkComment END; Out.Ln; Out.Char( " " ); IF inComment THEN MarkComment END ELSIF c = cr THEN (* carriage-return - ignore *) ELSIF c = "&" THEN Out.String( "&" ) ELSIF c = "<" THEN Out.String( "<" ) ELSIF c = ">" THEN Out.String( ">" ) ELSIF c = "'" THEN Out.String( "'" ) ELSE Out.Char( c ) END END OutChar; PROCEDURE OutAndNextChar; BEGIN OutChar( ch ); NextChar END OutAndNextChar; PROCEDURE IdentifierOrReservedWord; VAR word : ARRAY maxRwLength + 3 OF CHAR; wLength : INTEGER; BEGIN (* note: all keywords are at most 10 characters in length *) wLength := 0; WHILE ( HaveUpperCase() OR HaveLowerCase() OR HaveDigit() ) & ( wLength <= ( maxRwLength + 1 ) ) DO word[ wLength ] := ch; INC( wLength ); NextChar END; word[ wLength ] := 0X; (* null-terminate *) IF wLength > maxRwLength THEN (* to long to be a reserved word *) Out.String( word ); WHILE HaveUpperCase() OR HaveLowerCase() OR HaveDigit() DO OutAndNextChar END ELSIF IsReservedWord( word ) THEN (* have a reserved word *) MarkReservedWord; Out.String( word ); MarkReservedWord; ELSE (* identifier *) Out.String( word ) END END IdentifierOrReservedWord; PROCEDURE ProcessSourceChar; BEGIN IF inString THEN (* currently in a string *) inString := ch # quote; OutAndNextChar ELSIF inComment THEN (* currently in a comment *) IF ch # "*" THEN OutAndNextChar ELSE WHILE ch = "*" DO OutAndNextChar END; inComment := ch # ")"; IF ~ inComment THEN OutAndNextChar; MarkComment END END ELSIF ch = quote THEN (* start of a string *) inString := TRUE; OutAndNextChar ELSIF ch = "(" THEN (* comment starting or "just" "(" *) NextChar; inComment := ch = "*"; IF ~ inComment THEN OutChar( "(" ) ELSE MarkComment; OutChar( "(" ); OutAndNextChar END ELSIF HaveLowerCase() THEN (* identifier *) (* All reserved words are in upper case so this must be an *) (* identifier *) WHILE HaveLowerCase() OR HaveUpperCase() OR HaveDigit() DO OutAndNextChar END ELSIF HaveUpperCase() THEN (* identifier or keyword *) IdentifierOrReservedWord ELSE (* anything else *) OutAndNextChar END END ProcessSourceChar; PROCEDURE Main; (* have to have a Main so ARGNUM and ARGS work *) VAR argCount : INTEGER; argText : ARRAY 32 OF CHAR; BEGIN argCount := ARGNUM(); IF argCount < 1 THEN Out.String( "**** Filename expected on the command line" ); Out.Ln ELSIF argCount > 1 THEN ARGS( 1, argText ); Out.String( "**** Unexpected text on the command line: " ); Out.String( argText ); Out.Ln ELSE (* have a source file name *) ARGS( 0, sourcePath ); sourceFile := Files.Open( sourcePath ); IF sourceFile = NIL THEN Out.String( "**** Unable to open: " ); Out.String( sourcePath ); Out.Ln ELSE (* file opened OK *) InitReservedWords; inString := FALSE; inComment := FALSE; NextChar; OutChar( " " ); WHILE ioStatus = Files.OK DO ProcessSourceChar END; Out.Ln; IF ioStatus # Files.EOF THEN Out.String( "**** I/O error on " ); Out.String( sourcePath ); Out.Ln ELSIF inString THEN Out.String( "**** Unterminated string" ); Out.Ln ELSIF inComment THEN Out.String( "**** Unterminated comment" ); Out.Ln END; Files.Close( sourceFile ) END END END Main; END ToWiki.
Phix
Note the utility I use for this on a day-to-day basis (pwa/p2js.exw/<Ctrl M>) must be easily over 50,000 lines of code by now...
The following is deliberately the simplest possible thing that gets the job done, and there are of course 1,001 things missing:
No support for [multiline] shebangs, C-style comments, nested block comments, or (as noted) Eu-compatible block comments; and keywords c/should easily be several hundred entries long, and tested/constructed using A-Z and 0-9, ...
-- -- demo\rosetta\syntax_highlight.exw -- ================================= -- string pgm = substitute(get_text(command_line()[$]),"\r\n","\n") -- or(/for javascript compatibility) specify constant pgm = """...""" constant qqq = `""`&`"`, /* (split to assist with permitting ^^^) */ keywords = {`and`,`assert`,`bool`,`command_line`,`constant`,`do`,`else`,`elsif`,`end`, `find`,`for`,`function`,`get_text`,`if`,`iff`,`in`,`integer`,`length`,`match`,`not`, `procedure`,`puts`,`return`,`sequence`,`string`,`substitute`,`then`,`wait_key`,`while`}, htmlify = {"'&<>",{`apos`,`amp`,`lt`,`gt`}} integer i = 1, l = length(pgm), word_start = 0 string out = " " procedure spacenl(sequence s) for ch in s do integer k = find(ch,htmlify[1]) if k then ch = '&' & htmlify[2][k] & ';' end if out &= ch if ch='\n' then out &= ' ' end if end for end procedure function do_string(integer i, ni, l, string stype) assert(ni>0,"%d quoted string not closed",{stype}) ni += l spacenl(pgm[i..ni]) return ni end function while i<=l do integer ch = pgm[i] if (ch>='a' and ch<='z') or ch='_' then if not word_start then word_start := i end if else if word_start then string one_word = pgm[word_start..i-1] bool is_key = find(one_word,keywords) if is_key then out &= `'''` end if out &= one_word if is_key then out &= `'''` end if word_start = 0 end if if ch='-' and i<l and pgm[i+1]='-' then -- nb: does not handle --/* style comments integer line_comment = i while i<l and pgm[i+1]!='\n' do i += 1 end while out &= `''` spacenl(pgm[line_comment..i]) out &= `''` elsif ch='/' and i<l and pgm[i+1]='*' then -- nb: does not handle nested block comments integer block_comment = i i = match(`*/`,pgm,i+2)+1 assert(i>1,"missing closing block comment") out &= `''` spacenl(pgm[block_comment..i]) out &= `''` elsif ch='"' then if i+1<l and pgm[i..i+2]=qqq then i = do_string(i,match(qqq,pgm,i+3),2,"triple") else i = do_string(i,find('"',pgm,i+1),0,"double") end if elsif find(ch,"`'") then string stype = iff(ch='`'?"backtick":"single") i = do_string(i,find(ch,pgm,i+1),0,stype) else spacenl({ch}) end if end if i += 1 end while puts(1,out) {} = wait_key()
PL/M
... under CP/M (or an emulator)
Note that PL/M doesn't have in-built I/O or standard libraries, hence the need to define the various BDOS system calls.
As CP/M doesn't have redirection, the source file and output file names must be specified on the command line, e.g. if the source is in D:SYNTAX.PLM and the desired output file is D:SYNTAX.OUT and the program is compiled to D:SYNTAX.COM, then the command:
D:SYNTAX D:SYNTAX.PLM D:SYNTAX.OUT
will create SYNTAX.OUT as a copy of SYNTAX.PLM with the markup for the highlighting. Note the output file must not exist before running the program.
The output is also echoed to the console.
100H: /* SYNTAX HIGHLIGHT A PL/M SOURCE USING MEDIAWIKI MARKUP */ DECLARE FALSE LITERALLY '0', TRUE LITERALLY '0FFH'; DECLARE NL$CHAR LITERALLY '0AH'; /* NEWLINE: CHAR 10 */ DECLARE CR$CHAR LITERALLY '0DH'; /* CARRIAGE RETURN, CHAR 13 */ DECLARE EOF$CHAR LITERALLY '26'; /* EOF: CTRL-Z */ DECLARE AMP LITERALLY '026H'; /* AMPERSAND */ DECLARE LCA LITERALLY '061H'; /* LOWER CASE 'A' */ DECLARE LCG LITERALLY '067H'; /* LOWER CASE 'G' */ DECLARE LCL LITERALLY '06CH'; /* LOWER CASE 'L' */ DECLARE LCM LITERALLY '06DH'; /* LOWER CASE 'M' */ DECLARE LCO LITERALLY '06FH'; /* LOWER CASE 'O' */ DECLARE LCP LITERALLY '070H'; /* LOWER CASE 'P' */ DECLARE LCS LITERALLY '073H'; /* LOWER CASE 'S' */ DECLARE LCT LITERALLY '074H'; /* LOWER CASE 'T' */ /* CP/M BDOS SYSTEM CALL, RETURNS A VALUE */ BDOS: PROCEDURE( FN, ARG )BYTE; DECLARE FN BYTE, ARG ADDRESS; GOTO 5; END; /* CP/M BDOS SYSTEM CALL, NO RETURN VALUE */ BDOS$P: PROCEDURE( FN, ARG ); DECLARE FN BYTE, ARG ADDRESS; GOTO 5; END; EXIT: PROCEDURE; CALL BDOS$P( 0, 0 ); END; /* CP/M SYSTEM RESET */ PR$CHAR: PROCEDURE( C ); DECLARE C BYTE; CALL BDOS$P( 2, C ); END; PR$STRING: PROCEDURE( S ); DECLARE S ADDRESS; CALL BDOS$P( 9, S ); END; PR$NL: PROCEDURE; CALL PR$STRING( .( 0DH, NL$CHAR, '$' ) ); END; FL$EXISTS: PROCEDURE( FCB )BYTE; /* RETURNS TRUE IF THE FILE NAMED IN THE */ DECLARE FCB ADDRESS; /* FCB EXISTS */ RETURN ( BDOS( 17, FCB ) < 4 ); END FL$EXISTS ; FL$OPEN: PROCEDURE( FCB )BYTE; /* OPEN THE FILE WITH THE SPECIFIED FCB */ DECLARE FCB ADDRESS; RETURN ( BDOS( 15, FCB ) < 4 ); END FL$OPEN; FL$MAKE: PROCEDURE( FCB )BYTE; /* CREATE AND OPEN THE FILE WITH THE */ DECLARE FCB ADDRESS; /* SPECIFIED FCB */ RETURN ( BDOS( 22, FCB ) < 4 ); END FL$MAKE; FL$READ: PROCEDURE( FCB )BYTE; /* READ THE NEXT RECORD FROM FCB */ DECLARE FCB ADDRESS; RETURN ( BDOS( 20, FCB ) = 0 ); END FL$READ; FL$WRITE: PROCEDURE( FCB )BYTE; /* WRITE A RECORD TO FCB */ DECLARE FCB ADDRESS; RETURN ( BDOS( 21, FCB ) = 0 ); END FL$WRITE; FL$CLOSE: PROCEDURE( FCB )BYTE; /* CLOSE THE FILE WITH THE SPECIFIED FCB */ DECLARE FCB ADDRESS; RETURN ( BDOS( 16, FCB ) < 4 ); END FL$CLOSE; DMA$SET: PROCEDURE( DMA ); /* SET THE DMA BUFFER ADDRESS FOR I/O */ DECLARE DMA ADDRESS; CALL BDOS$P( 26, DMA ); END DMA$SET; /* I/O USES FILE CONTROL BLOCKS CONTAINING THE FILE-NAME, POSITION, ETC. */ /* WHEN THE PROGRAM IS RUN, THE CCP WILL FIRST PARSE THE COMMAND LINE AND */ /* PUT THE FIRST PARAMETER IN FCB1, THE SECOND PARAMETER IN FCB2 */ /* BUT FCB2 OVERLAYS THE END OF FCB1 AND THE DMA BUFFER OVERLAYS THE END */ /* OF FCB2 */ DECLARE FCB$SIZE LITERALLY '36'; /* SIZE OF A FCB */ DECLARE FCB1 LITERALLY '5CH'; /* ADDRESS OF FIRST FCB */ DECLARE FCB2 LITERALLY '6CH'; /* ADDRESS OF SECOND FCB */ DECLARE DMA$BUFFER LITERALLY '80H'; /* DEFAULT DMA BUFFER ADDRESS */ DECLARE DMA$SIZE LITERALLY '128'; /* SIZE OF THE DMA BUFFER */ INIT$FCB: PROCEDURE( FCB ); /* INITIALISE A FILE-CONTROL-BLOCK */ DECLARE FCB ADDRESS; DECLARE F$PTR ADDRESS; DECLARE F BASED F$PTR BYTE, P BYTE; F$PTR = FCB; F = 0; /* DEFAULT DRIVE */ DO F$PTR = FCB + 1 TO FCB + 11; /* NO NAME */ F = ' '; END; DO F$PTR = FCB + 12 TO FCB + ( FCB$SIZE - 1 ); /* OTHER FIELDS */ F = 0; END; END INIT$FCB; MOVE$FCB: PROCEDURE( FROM$FCB, TO$FCB ); /* MOVE THE CONTENTS OF AN FCB */ DECLARE ( FROM$FCB, TO$FCB ) ADDRESS; DECLARE ( F$PTR, T$PTR ) ADDRESS; DECLARE F BASED F$PTR BYTE, T BASED T$PTR BYTE, P BYTE; CALL INIT$FCB( TO$FCB ); F$PTR = FROM$FCB; T$PTR = TO$FCB; DO P = 0 TO 11; /* COPY DRIVE, FILENAME AND EXTENSION */ T = F; F$PTR = F$PTR + 1; T$PTR = T$PTR + 1; END; END MOVE$FCB; SHOW$FCB: PROCEDURE( FCB ); /* SHOW THE CONTENTS OF AN FCB */ DECLARE FCB ADDRESS; DECLARE F$PTR ADDRESS; DECLARE F BASED F$PTR BYTE, P BYTE; F$PTR = FCB; DO P = 0 TO 11; /* DRIVE, FILENAME AND EXTENSION */ IF P = 9 THEN CALL PR$CHAR( '.' ); IF P = 1 THEN CALL PR$CHAR( ':' ); CALL PR$CHAR( F ); F$PTR = F$PTR + 1; END; END SHOW$FCB; DECLARE F$PTR ADDRESS, F$CHAR BASED F$PTR BYTE; DECLARE W$PTR ADDRESS, W$CHAR BASED W$PTR BYTE; DECLARE FCB$OUT$DATA ( FCB$SIZE )BYTE; DECLARE OUT$DMA ( DMA$SIZE )BYTE; DECLARE OUT$BUFFER LITERALLY '.OUT$DMA'; DECLARE FCB$IN LITERALLY 'FCB1'; DECLARE FCB$OUT LITERALLY '.FCB$OUT$DATA'; DECLARE K LITERALLY 'CALL ADDKW'; DECLARE KW ( 34 )ADDRESS; DECLARE KW$MAX BYTE; KW$MAX = -1; ADDKW: PROCEDURE( ADDR ); /* ADDS A KEYWORD TO KW */ DECLARE ADDR ADDRESS; KW( KW$MAX := KW$MAX + 1 ) = ADDR; END ADDKW ; K(.'ADDRESS$');K(.'AND$');K(.'BASED$');K(.'BY$');K(.'BYTE$');K(.'CALL$'); K(.'CASE$');K(.'DATA$');K(.'DECLARE$');K(.'DISABLE$');K(.'DO$');K(.'ELSE$'); K(.'ENABLE$');K(.'END$');K(.'EOF$');K(.'GO$');K(.'GOTO$');K(.'HALT$'); K(.'IF$');K(.'INITIAL$');K(.'INTERRUPT$');K(.'LABEL$');K(.'LITERALLY$'); K(.'MINUS$');K(.'MOD$');K(.'NOT$');K(.'OR$');K(.'PLUS$');K(.'PROCEDURE$'); K(.'RETURN$');K(.'THEN$');K(.'TO$');K(.'WHILE$');K(.'XOR$'); /* MOVE THE SECOND FCB TO A NEW PLACE SO IT ISN'T OVERWRITTEN BY FCB1 */ CALL MOVE$FCB( FCB2, FCB$OUT ); /* CLEAR THE PARTS OF FCB1 OVERLAYED BY FCB2 */ DO F$PTR = FCB1 + 12 TO FCB1 + ( FCB$SIZE - 1 ); F$CHAR = 0; END; STR$EQUAL: PROCEDURE( S1, S2 )BYTE; /* RETURN TRUE IF S1 = S2 */ DECLARE ( S1, S2 ) ADDRESS; DECLARE ( S1$PTR, S2$PTR ) ADDRESS; DECLARE C1 BASED S1$PTR BYTE, C2 BASED S2$PTR BYTE, SAME BYTE; S1$PTR = S1; S2$PTR = S2; DO WHILE ( SAME := C1 = C2 ) AND C1 <> '$' AND C2 <> '$'; S1$PTR = S1$PTR + 1; S2$PTR = S2$PTR + 1; END; RETURN SAME; END STR$EQUAL ; IS$WORD$CHAR: PROCEDURE( CH )BYTE; /* RETURN TRUE IF CH IS PART OF A WORD */ DECLARE CH BYTE; RETURN ( CH >= 'A' AND CH <= 'Z' ) OR CH = '$' OR ( CH >= '0' AND CH <= '9' ); END IS$WORD$CHAR ; IF NOT FL$EXISTS( FCB$IN ) THEN DO; CALL SHOW$FCB( FCB$IN ); CALL PR$STRING( .': INPUT FILE NOT FOUND$' );CALL PR$NL; END; ELSE IF FL$EXISTS( FCB$OUT ) THEN DO; CALL SHOW$FCB( FCB$OUT ); CALL PR$STRING( .': OUTPUT FILE ALREADY EXISTS$' );CALL PR$NL; END; ELSE IF NOT FL$OPEN( FCB$IN ) THEN DO; CALL PR$STRING( .'UNABLE TO OPEN THE INPUT FILE$' );CALL PR$NL; END; ELSE IF NOT FL$MAKE( FCB$OUT ) THEN DO; CALL PR$STRING( .'UNABLE TO OPEN THE OUTPUT FILE$' );CALL PR$NL; IF NOT FL$CLOSE( FCB$IN ) THEN DO; CALL PR$STRING( .'UNABLE TO CLOSE THE INPUT FILE$' ); CALL PR$NL; END; END; ELSE DO; /* FILES OPENED OK - ATTEMPT TO FORMAT THE SOURCE */ DECLARE ( GOT$RCD, IS$HEADING ) BYTE, ( DMA$END, OUT$END ) ADDRESS; DECLARE IN$STRING BYTE, COMMENT$STATE ADDRESS, GOT$NEXT BYTE; IN$CHAR: PROCEDURE; F$PTR = F$PTR + 1; IF F$PTR > DMA$END THEN DO; /* END OF BUFFER */ GOT$RCD = FL$READ( FCB$IN ); /* GET THE NEXT RECORDD */ IF NOT GOT$RCD THEN F$CHAR = EOF$CHAR; F$PTR = DMA$BUFFER; END; END IN$CHAR ; OUT$CHAR: PROCEDURE( CH ); /* OUTPUT A CHARECTER TO THE OUTPOUT FILE */ DECLARE CH BYTE; IF CH <> EOF$CHAR THEN CALL PR$CHAR( CH ); W$CHAR = CH; W$PTR = W$PTR + 1; IF W$PTR > OUT$END OR CH = EOF$CHAR THEN DO; /* THE OUTPUT BUFFER IS FULL OR WE ARE WRITTING EOF */ IF CH = EOF$CHAR THEN DO; /* EOF - FILL THE BUFFER WITH NULS */ DO WHILE W$PTR <= OUT$END; W$CHAR = 0; W$PTR = W$PTR + 1; END; END; CALL DMA$SET( OUT$BUFFER ); /* SWITCH DMA TO THE OUTOUT BUFFER */ IF NOT FL$WRITE( FCB$OUT ) THEN DO; /* I/O ERROR */ CALL PR$STRING( .'I/O ERROR ON WRITING $' ); CALL SHOW$FCB( FCB$OUT ); CALL PR$NL; CALL EXIT; END; CALL DMA$SET( DMA$BUFFER ); /* RESET DMA TO THE DEFAULT BUFFER */ W$PTR = OUT$BUFFER; END; END OUT$CHAR; OUT$STRING: PROCEDURE( STR ); /* OUTPUT A STRING */ DECLARE STR ADDRESS; DECLARE S$PTR ADDRESS; DECLARE S$CHAR BASED S$PTR BYTE; S$PTR = STR; DO WHILE S$CHAR <> '$'; CALL OUT$CHAR( S$CHAR ); S$PTR = S$PTR + 1; END; END OUT$STRING; DMA$END = DMA$BUFFER + ( DMA$SIZE - 1 ); OUT$END = OUT$BUFFER + ( DMA$SIZE - 1 ); GOT$RCD = FL$READ( FCB$IN ); /* GET THE FIRST RECORD */ F$PTR = DMA$BUFFER; W$PTR = OUT$BUFFER; IN$STRING = FALSE; GOT$NEXT = FALSE; COMMENT$STATE = 0; CALL OUT$CHAR( ' ' ); DO WHILE GOT$RCD; IF F$CHAR = CR$CHAR THEN DO; /* CARRIAGE RETURN */ IF COMMENT$STATE > 1 THEN DO; COMMENT$STATE = 2; CALL OUT$STRING( .'''''$' ); END; CALL OUT$CHAR( F$CHAR ); END; ELSE IF F$CHAR = NL$CHAR THEN DO; CALL OUT$CHAR( F$CHAR ); CALL OUT$CHAR( ' ' ); IF COMMENT$STATE > 1 THEN CALL OUT$STRING( .'''''$' ); END; ELSE IF F$CHAR = AMP THEN DO; CALL OUT$STRING( .( AMP, LCA, LCM, LCP, ';$' ) ); END; ELSE IF F$CHAR = '''' THEN DO; CALL OUT$STRING( .( AMP, LCA, LCP, LCO, LCS, ';$' ) ); IN$STRING = COMMENT$STATE = 0 AND NOT IN$STRING; END; ELSE IF F$CHAR = '<' THEN DO; CALL OUT$STRING( .( AMP, LCL, LCT, ';$' ) ); END; ELSE IF F$CHAR = '>' THEN DO; CALL OUT$STRING( .( AMP, LCG, LCT, ';$' ) ); END; ELSE IF IN$STRING THEN CALL OUT$CHAR( F$CHAR ); ELSE IF COMMENT$STATE = 1 THEN DO; /* HAVE A CHARACTER AFTER / */ IF F$CHAR = '*' THEN DO; COMMENT$STATE = 2; CALL OUT$STRING( .'''''/*$' ); END; ELSE DO; COMMENT$STATE = 0; CALL OUT$CHAR( '/' ); CALL OUT$CHAR( F$CHAR ); END; END; ELSE IF COMMENT$STATE = 2 THEN DO; /* IN A COMMENT */ IF F$CHAR = '*' THEN COMMENT$STATE = 3; CALL OUT$CHAR( F$CHAR ); END; ELSE IF COMMENT$STATE = 3 THEN DO; /* IN A COMMENT, EXPECTING / */ IF F$CHAR = '/' THEN DO; /* END OF COMMENT */ CALL OUT$STRING( .'/''''$' ); COMMENT$STATE = 0; END; ELSE DO; /* NOT END OF COMMENT */ CALL OUT$CHAR( F$CHAR ); IF F$CHAR <> '*' THEN COMMENT$STATE = 2; END; END; ELSE IF F$CHAR = '/' THEN DO; IF COMMENT$STATE = 0 THEN COMMENT$STATE = 1; ELSE IF COMMENT$STATE = 3 THEN DO; /* END OF COMMENT */ CALL OUT$STRING( .'/''''$' ); COMMENT$STATE = 0; END; ELSE CALL OUT$CHAR( F$CHAR ); END; ELSE IF F$CHAR = EOF$CHAR THEN GOT$RCD = FALSE; /* END OF FILE */ ELSE IF F$CHAR >= 'A' AND F$CHAR <= 'Z' THEN DO; /* WORD */ DECLARE W ( 10 )BYTE, W$POS BYTE, HAS$DOLLAR BYTE; OUT$WORD: PROCEDURE; /* OUTPUT W (WHICH MAY CONTAIN $ */ DECLARE I BYTE; DO I = 0 TO W$POS - 1; CALL OUT$CHAR( W( I ) ); END; END OUT$WORD ; W$POS = 0; HAS$DOLLAR = FALSE; DO WHILE W$POS < 9 AND IS$WORD$CHAR( F$CHAR ); IF F$CHAR = '$' THEN HAS$DOLLAR = TRUE; W( W$POS ) = F$CHAR; W$POS = W$POS + 1; CALL IN$CHAR; END; W( W$POS ) = '$'; IF IS$WORD$CHAR( F$CHAR ) THEN DO; /* WORD IS TOO LONG FOE A */ CALL OUT$WORD; /* KEYWORD */ DO WHILE IS$WORD$CHAR( F$CHAR ); CALL OUT$CHAR( F$CHAR );CALL IN$CHAR; END; END; ELSE IF HAS$DOLLAR THEN DO; /* ASSUME IT ISN'T A KEYWORD */ CALL OUT$WORD; /* I.E., THE PROGRAMMER HASN'T WRITTEN E.G.: */ END; /* RE$TURN X; */ ELSE DO; /* SHORT WORD - COULD BE A KEYWORD */ DECLARE ( IS$KW, KW$POS ) BYTE; IS$KW = FALSE; KW$POS = 0; DO WHILE NOT IS$KW AND KW$POS <= KW$MAX; IS$KW = STR$EQUAL( .W, KW( KW$POS ) ); KW$POS = KW$POS + 1; END; IF IS$KW THEN CALL OUT$STRING( .'''''''$' ); CALL OUT$WORD; IF IS$KW THEN CALL OUT$STRING( .'''''''$' ); END; GOT$NEXT = TRUE; END; ELSE DO; /* HAVE ANOTHER CHARACTER */ CALL OUT$CHAR( F$CHAR ); END; IF NOT GOT$NEXT THEN CALL IN$CHAR; GOT$NEXT = FALSE; END; CALL OUT$CHAR( EOF$CHAR ); /* CLOSE THE FILES */ IF NOT FL$CLOSE( FCB$IN ) THEN DO; CALL PR$STRING( .'UNABLE TO CLOSE THE INPUT FILE$' ); CALL PR$NL; END; IF NOT FL$CLOSE( FCB$OUT ) THEN DO; CALL PR$STRING( .'UNABLE TO CLOSE THE OUTPUT FILE$' ); CALL PR$NL; END; END; CALL EXIT; EOF
Python
This solution builds on lexers available in Pygments by defining a formatter outputting simple MediaWiki markup, and a filter to translate characters to HTML escape sequences. Note that I've taken liberties with said escaping.
"""Syntax highlighting using Mediawiki formatting.""" from html import escape from textwrap import indent from io import StringIO from pygments import highlight from pygments.filter import Filter from pygments.formatter import Formatter from pygments.lexers import get_lexer_by_name from pygments.token import Token class MediaWikiFormatter(Formatter): """Format source code using MediaWiki markup.""" name = "MediaWiki" aliases = ["mediawiki", "wiki"] filenames = [] def __init__(self, **options): super().__init__(**options) self.indent = options.get("indent", " ") self.styles = { Token: ("", ""), Token.Comment: ("''", "''"), Token.Keyword: ("'''", "'''"), Token.String.Doc: ("''", "''"), } def format(self, token_source, outfile): buffer = StringIO() last_val = "" last_type = None for token_type, value in token_source: # Work up the token hierarchy until a style is found. while token_type not in self.styles: token_type = token_type.parent # Group consecutive tokens of the same type. if token_type == last_type: last_val += value else: if last_val: style_begin, style_end = self.styles[last_type] buffer.write(style_begin + last_val + style_end) last_val = value last_type = token_type # Flush remaining values. if last_val: style_begin, style_end = self.styles[last_type] buffer.write(style_begin + last_val + style_end) # Write indented lines to the output file. outfile.write( indent( buffer.getvalue(), self.indent, lambda _: True, ) ) class HTMLEscapeFilter(Filter): """Convert the characters &, <, > and ' to HTML-safe sequences.""" def __init__(self, **options): super().__init__(**options) def filter(self, _, stream): for ttype, value in stream: yield ttype, escape(value) def main(language_name="python", infile=None): formatter = MediaWikiFormatter() lexer = get_lexer_by_name(language_name) lexer.add_filter(HTMLEscapeFilter()) with open(infile or __file__) as fd: print(highlight(fd.read(), lexer, formatter), end="") if __name__ == "__main__": main()
Wren
Note that, rightly or wrongly, this code would not highlight keywords occurring in interpolated string expressions.
// Convert a Wren source to "wiki" format: // each line is preceded by a space // keywords are enclosed in ''' and ''' and comments in '' and '' // ', &, < and > are converted to ' & < and > // everything else is output as is // The source is read from a file and written to standard output. // The file name should be passed as a command line argument. import "./ioutil" for FileUtil import "os" for Process var keywords = [ "as", "break", "class", "construct", "continue", "else", "false", "for", "foreign", "if", "in", "is", "import", "null", "return", "static", "super", "this", "true", "var", "while" ] var alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_." var highlight = Fn.new { |lines| var inStr = false // within a string literal var inRaw = false // within a raw string literal var inCom = false // within a multi-line comment var level = 0 // nesting level for multi-line comment for (line in lines) { System.write(" ") line = line.replace("&", "&").replace("'", "'") .replace("<", "<").replace(">", ">") var word = "" var chrs = line.toList // convert to list of unicode characters var cc = chrs.count var i = 0 if (inCom) System.write("''") while (i < cc) { var c = chrs[i] if (inCom) { // if inside a multi-line comment if (c == "/" && i < cc-1 && chrs[i+1] == "*") { level = level + 1 System.write("/*") i = i + 1 } else if (c == "*" && i < cc-1 && chrs[i+1] == "/") { level = level - 1 System.write("*/") i = i + 1 if (level == 0) { inCom = false System.write("''") } } else { System.write(c) } } else if (inStr && c == "\\" && i < cc-1 && chrs[i+1] == "\"") { /* escaped double quote in string literal */ System.write("\\\"") i = i + 1 } else if (c == "\"") { // any other double quote if (i > 1 && chrs[i-2] == "\"" && chrs[i-1] == "\"") { inRaw = !inRaw } else if (!inRaw) { inStr = !inStr } System.write("\"") } else if (inStr || inRaw) { // otherwise if within a string just write it System.write(c) } else if (c == "/") { // forward slash if (i < cc-1 && chrs[i+1] == c) { System.write("''" + chrs[i..-1].join() + "''") break } else if (i < cc-1 && chrs[i+1] == "*") { inCom = true level = 1 System.write("''" + "/*") i = i + 1 } else { System.write(c) } } else if (alphabet.contains(c)) { // if eligible, add to current word word = word + c } else if (keywords.contains(word)) { // if it's a keyword, embolden it System.write("'''" + word + "'''" + c) word = "" } else { // otherwise just write the word System.write(word + c) word = "" } i = i + 1 } if (inCom) { System.write("''") } else if (word != "") { if (keywords.contains(word)) { System.write("'''" + word + "'''") } else { System.write(word) } } System.print() } } var args = Process.arguments if (args.count != 1) { /* make sure double quotes and keywords in raw strings are handled properly */ Fiber.abort("""Please pass the file name to be highlighted "as" the only argument.""") } var lines = FileUtil.readLines(args[0]) highlight.call(lines) /* this code should now be saved to /* a file named */ Syntax_highlighting_using_Mediawiki_formatting.wren */
XPL0
Key words in XPL0 are easy to distinguish from variable names because they start with lowercase letters while variables start with uppercase letters. This program highlights its own code properly, but it will not properly highlight all possible cases, such as when key words appear in quoted strings.
proc CharOut(Ch); int Ch; begin case Ch of ^': Text(0, "'"); ^&: Text(0, "&"); ^<: Text(0, "<"); ^>: Text(0, ">") other ChOut(1, Ch); end; int Ch; loop begin ChOut(1, $20); \leading space Ch:= ChIn(1); loop begin while Ch <= $20 do \pass whitespace to output begin if Ch = $1A \EOF\ then return; ChOut(1, Ch); if Ch = $0A \LF\ then quit; Ch:= ChIn(1); end; if Ch = ^\ then \pass comment to output begin Text(0, "''"); \in italics ChOut(1, Ch); Ch:= ChIn(1); while Ch#^\ & Ch#$0A \LF\ do begin CharOut(Ch); Ch:= ChIn(1); end; if Ch = ^\ then ChOut(1, Ch); Text(0, "''"); if Ch = $0A \LF\ then begin ChOut(1, Ch); quit; end; Ch:= ChIn(1); end else if Ch>=^a & Ch<=^z then \pass key words to output begin Text(0, "'''"); \in bold while Ch>=^a & Ch<=^z do begin ChOut(1, Ch); Ch:= ChIn(1); end; Text(0, "'''"); end else begin \pass anything else repeat CharOut(Ch); Ch:= ChIn(1); until Ch <= $20; \until whitespace end; end; end