Compiler/code generator

From Rosetta Code
Task
Compiler/code generator
You are encouraged to solve this task according to the task description, using any language you may know.
Code Generator

A code generator translates the output of the syntax analyzer and/or semantic analyzer into lower level code, either assembly, object, or virtual.

Task[edit]

Take the output of the Syntax analyzer task - which is a flattened Abstract Syntax Tree (AST) - and convert it to virtual machine code, that can be run by the Virtual machine interpreter. The output is in text format, and represents virtual assembly code.

The program should read input from a file and/or stdin, and write output to a file and/or stdout.

Example - given the simple program (below), stored in a file called while.t, create the list of tokens, using one of the Lexical analyzer solutions
lex < while.t > while.lex
Run one of the Syntax analyzer solutions
parse < while.lex > while.ast
while.ast can be input into the code generator.
The following table shows the input to lex, lex output, the AST produced by the parser, and the generated virtual assembly code.
Run as:  lex < while.t | parse | gen
Input to lex Output from lex, input to parse Output from parse Output from gen, input to VM
count = 1;
while (count < 10) {
print("count is: ", count, "\n");
count = count + 1;
}
    1      1   Identifier      count
    1      7   Op_assign
    1      9   Integer              1
    1     10   Semicolon
    2      1   Keyword_while
    2      7   LeftParen
    2      8   Identifier      count
    2     14   Op_less
    2     16   Integer             10
    2     18   RightParen
    2     20   LeftBrace
    3      5   Keyword_print
    3     10   LeftParen
    3     11   String          "count is: "
    3     23   Comma
    3     25   Identifier      count
    3     30   Comma
    3     32   String          "\n"
    3     36   RightParen
    3     37   Semicolon
    4      5   Identifier      count
    4     11   Op_assign
    4     13   Identifier      count
    4     19   Op_add
    4     21   Integer              1
    4     22   Semicolon
    5      1   RightBrace
    6      1   End_of_input
Sequence
Sequence
;
Assign
Identifier    count
Integer       1
While
Less
Identifier    count
Integer       10
Sequence
Sequence
;
Sequence
Sequence
Sequence
;
Prts
String        "count is: "
;
Prti
Identifier    count
;
Prts
String        "\n"
;
Assign
Identifier    count
Add
Identifier    count
Integer       1
Datasize: 1 Strings: 2
"count is: "
"\n"
   0 push  1
   5 store [0]
  10 fetch [0]
  15 push  10
  20 lt
  21 jz     (43) 65
  26 push  0
  31 prts
  32 fetch [0]
  37 prti
  38 push  1
  43 prts
  44 fetch [0]
  49 push  1
  54 add
  55 store [0]
  60 jmp    (-51) 10
  65 halt
Input format

As shown in the table, above, the output from the syntax analyzer is a flattened AST.

In the AST, Identifier, Integer, and String, are terminal nodes, e.g, they do not have child nodes.

Loading this data into an internal parse tree should be as simple as:

 
def load_ast()
line = readline()
# Each line has at least one token
line_list = tokenize the line, respecting double quotes
 
text = line_list[0] # first token is always the node type
 
if text == ";"
return None
 
node_type = text # could convert to internal form if desired
 
# A line with two tokens is a leaf node
# Leaf nodes are: Identifier, Integer String
# The 2nd token is the value
if len(line_list) > 1
return make_leaf(node_type, line_list[1])
 
left = load_ast()
right = load_ast()
return make_node(node_type, left, right)
 
Output format - refer to the table above
  • The first line is the header: Size of data, and number of constant strings.
    • size of data is the number of 32-bit unique variables used. In this example, one variable, count
    • number of constant strings is just that - how many there are
  • After that, the constant strings
  • Finally, the assembly code
Registers
  • sp: the stack pointer - points to the next top of stack. The stack is a 32-bit integer array.
  • pc: the program counter - points to the current instruction to be performed. The code is an array of bytes.
Data

32-bit integers and strings

Instructions

Each instruction is one byte. The following instructions also have a 32-bit integer operand:

fetch [index]

where index is an index into the data array.

store [index]

where index is an index into the data array.

push n

where value is a 32-bit integer that will be pushed onto the stack.

jmp (n) addr

where (n) is a 32-bit integer specifying the distance between the current location and the desired location. addr is an unsigned value of the actual code address.

jz (n) addr

where (n) is a 32-bit integer specifying the distance between the current location and the desired location. addr is an unsigned value of the actual code address.

The following instructions do not have an operand. They perform their operation directly against the stack:

For the following instructions, the operation is performed against the top two entries in the stack:

add
sub
mul
div
mod
lt
gt
le
ge
eq
ne
and
or

For the following instructions, the operation is performed against the top entry in the stack:

neg
not
prtc

Print the word at stack top as a character.

prti

Print the word at stack top as an integer.

prts

Stack top points to an index into the string pool. Print that entry.

halt

Unconditional stop.

Additional examples

Your solution should pass all the test cases above and the additional tests found Here.

Reference

The C and Python versions can be considered reference implementations.

Related Tasks

ALGOL 68[edit]

Based on the Algol W sample. This generates .NET IL assembler code which can be compiled with the .NET ilasm assembler to generate an exe that can be run under Windows (and presumably Mono though I haven't tried that).
Apart from the namespace, class and method blocks surrounding the code, the main differences between IL and the task's assembly code are: no "compare-le", "compare-ge", "compare-ne", "prts", "prtc", "prti" and "not" instructions, symbolic labels are used and symbolic local variable names can be used. Some IL instructions have different names, e.g. "stloc" instead of "store". The "prt*" instructions are handled by calling the relevant System.Out.Print method. The compare and "not" instructions are handled by generating equivalent instruction sequences.
As noted in the code, the generated IL is naive - the sample focuses on simplicity.

# RC Compiler code generator #
COMMENT
this writes a .NET IL assembler source to standard output.
If the output is stored in a file called "rcsample.il",
it could be compiled the command:
ilasm /opt /out:rcsample.exe rcsample.il
(Note ilasm may not be in the PATH by default(
 
Note: The generated IL is *very* naive
COMMENT
 
# parse tree nodes #
MODE NODE = STRUCT( INT type, REF NODE left, right, INT value );
INT nidentifier = 1, nstring = 2, ninteger = 3, nsequence = 4, nif = 5, nprtc = 6, nprts = 7
, nprti = 8, nwhile = 9, nassign = 10, nnegate = 11, nnot = 12, nmultiply = 13, ndivide = 14
, nmod = 15, nadd = 16, nsubtract = 17, nless = 18, nlessequal = 19, ngreater = 20
, ngreaterequal = 21, nequal = 22, nnotequal = 23, nand = 24, nor = 25
;
# op codes #
INT ofetch = 1, ostore = 2, opush = 3, oadd = 4, osub = 5, omul = 6, odiv = 7, omod = 8
, olt = 9, ogt = 10, ole = 11, oge = 12, oeq = 13, one = 14, oand = 15, oor = 16
, oneg = 17, onot = 18, ojmp = 19, ojz = 20, oprtc = 21, oprts = 22, oprti = 23, opushstr = 24
;
[]INT ndop
= ( -1 , -1 , -1 , -1 , -1 , -1 , -1
, -1 , -1 , -1 , oneg , -1 , omul , odiv
, omod , oadd , osub , olt , -1 , ogt
, -1 , oeq , -1 , oand , oor
) ;
[]STRING ndname
= ( "Identifier" , "String" , "Integer" , "Sequence" , "If" , "Prtc" , "Prts"
, "Prti" , "While" , "Assign" , "Negate" , "Not" , "Multiply" , "Divide"
, "Mod" , "Add" , "Subtract" , "Less" , "LessEqual" , "Greater"
, "GreaterEqual" , "Equal" , "NotEqual" , "And" , "Or"
) ;
[]STRING opname
= ( "ldloc ", "stloc ", "ldc.i4 ", "add ", "sub ", "mul ", "div ", "rem "
, "clt ", "cgt ", "?le ", "?ge ", "ceq ", "?ne ", "and ", "or "
, "neg ", "?not ", "br ", "brfalse", "?prtc ", "?prts ", "?prti ", "ldstr "
) ;
# string and identifier arrays - a hash table might be better... #
INT max string number = 1024;
[ 0 : max string number ]STRING identifiers, strings;
FOR s pos FROM 0 TO max string number DO
identifiers[ s pos ] := "";
strings [ s pos ] := ""
OD;
# label number for label generation #
INT next label number := 0;
# returns the next free label number #
PROC new label = INT: next label number +:= 1;
 
# returns a new node with left and right branches #
PROC op node = ( INT op type, REF NODE left, right )REF NODE: HEAP NODE := NODE( op type, left, right, 0 );
# returns a new operand node #
PROC operand node = ( INT op type, value )REF NODE: HEAP NODE := NODE( op type, NIL, NIL, value );
 
# reports an error and stops #
PROC gen error = ( STRING message )VOID:
BEGIN
print( ( message, newline ) );
stop
END # gen error # ;
 
# reads a node from standard input #
PROC read node = REF NODE:
BEGIN
REF NODE result := NIL;
 
# parses a string from line and stores it in a string in the text array #
# - if it is not already present in the specified textElement list. #
# returns the position of the string in the text array #
PROC read string = ( REF[]STRING text list, CHAR terminator )INT:
BEGIN
# get the text of the string #
STRING str := line[ l pos ];
l pos +:= 1;
WHILE IF l pos <= UPB line THEN line[ l pos ] /= terminator ELSE FALSE FI DO
str +:= line[ l pos ];
l pos +:= 1
OD;
IF l pos > UPB line THEN gen error( "Unterminated String in node file: (" + line + ")." ) FI;
# attempt to find the text in the list of strings/identifiers #
INT t pos := LWB text list;
BOOL found := FALSE;
INT result := LWB text list - 1;
FOR t pos FROM LWB text list TO UPB text list WHILE NOT found DO
IF found := text list[ t pos ] = str THEN
# found the string #
result := t pos
ELIF text list[ t pos ] = "" THEN
# have an empty slot for ther string #
found := TRUE;
text list[ t pos ] := str;
result := t pos
FI
OD;
IF NOT found THEN gen error( "Out of string space." ) FI;
result
END # read string # ;
# gets an integer from the line - no checks for valid digits #
PROC read integer = INT:
BEGIN
INT n := 0;
WHILE line[ l pos ] /= " " DO
( n *:= 10 ) +:= ( ABS line[ l pos ] - ABS "0" );
l pos +:= 1
OD;
n
END # read integer # ;
 
STRING line, name;
INT l pos := 1, nd type := -1;
read( ( line, newline ) );
line +:= " ";
# get the node type name #
WHILE line[ l pos ] = " " DO l pos +:= 1 OD;
name := "";
WHILE IF l pos > UPB line THEN FALSE ELSE line[ l pos ] /= " " FI DO
name +:= line[ l pos ];
l pos +:= 1
OD;
# determine the node type #
nd type := LWB nd name;
IF name /= ";" THEN
# not a null node #
WHILE IF nd type <= UPB nd name THEN name /= nd name[ nd type ] ELSE FALSE FI DO nd type +:= 1 OD;
IF nd type > UPB nd name THEN gen error( "Malformed node: (" + line + ")." ) FI;
# handle the additional parameter for identifier/string/integer, or sub-nodes for operator nodes #
IF nd type = ninteger OR nd type = nidentifier OR nd type = nstring THEN
WHILE line[ l pos ] = " " DO l pos +:= 1 OD;
IF nd type = ninteger THEN result := operand node( nd type, read integer )
ELIF nd type = nidentifier THEN result := operand node( nd type, read string( identifiers, " " ) )
ELSE # nd type = nString # result := operand node( nd type, read string( strings, """" ) )
FI
ELSE
# operator node #
REF NODE left node = read node;
result := op node( nd type, left node, read node )
FI
FI;
result
END # read node # ;
 
# returns a formatted op code for code generation #
PROC operation = ( INT op code )STRING: " " + op name[ op code ] + " ";
# defines the specified label #
PROC define label = ( INT label number )VOID: print( ( "lbl_", whole( label number, 0 ), ":", newline ) );
# generates code to load a string value #
PROC gen load string = ( INT value )VOID:
BEGIN
print( ( operation( opushstr ), " ", strings[ value ], """", newline ) )
END # push string # ;
# generates code to load a constant value #
PROC gen load constant = ( INT value )VOID: print( ( operation( opush ), " ", whole( value, 0 ), newline ) );
# generates an operation acting on an address #
PROC gen data op = ( INT op, address )VOID: print( ( operation( op ), " l_", identifiers[ address ], newline ) );
# generates a nullary operation #
PROC gen op 0 = ( INT op )VOID: print( ( operation( op ), newline ) );
# generates a "not" instruction sequence #
PROC gen not = VOID:
BEGIN
gen load constant( 0 );
print( ( operation( oeq ), newline ) )
END # gen not # ;
# generates a negated condition #
PROC gen not op = ( INT op, REF NODE n )VOID:
BEGIN
gen( left OF n );
gen( right OF n );
gen op 0( op );
gen not
END # gen not op # ;
# generates a jump operation #
PROC gen jump = ( INT op, label )VOID: print( ( operation( op ), " lbl_", whole( label, 0 ), newline ) );
# generates code to output something to System.Console.Out #
PROC gen output = ( REF NODE n, STRING output type )VOID:
BEGIN
print( ( " call " ) );
print( ( "class [mscorlib]System.IO.TextWriter [mscorlib]System.Console::get_Out()", newline ) );
gen( left OF n );
print( ( " callvirt " ) );
print( ( "instance void [mscorlib]System.IO.TextWriter::Write(", output type, ")", newline ) )
END # gen output # ;
 
# generates the code header - assembly info, namespace, class and start of the Main method #
PROC code header = VOID:
BEGIN
print( ( ".assembly extern mscorlib { auto }", newline ) );
print( ( ".assembly RccSample {}", newline ) );
print( ( ".module RccSample.exe", newline ) );
print( ( ".namespace Rcc.Sample", newline ) );
print( ( "{", newline ) );
print( ( " .class public auto ansi Program extends [mscorlib]System.Object", newline ) );
print( ( " {", newline ) );
print( ( " .method public static void Main() cil managed", newline ) );
print( ( " {", newline ) );
print( ( " .entrypoint", newline ) );
# output the local variables #
BOOL have locals := FALSE;
STRING local prefix := " .locals init (int32 l_";
FOR s pos FROM LWB identifiers TO UPB identifiers WHILE identifiers[ s pos ] /= "" DO
print( ( local prefix, identifiers[ s pos ], newline ) );
local prefix := " ,int32 l_";
have locals := TRUE
OD;
IF have locals THEN
# there were some local variables defined - output the terminator #
print( ( " )", newline ) )
FI
END # code header # ;
 
# generates code for the node n #
PROC gen = ( REF NODE n )VOID:
IF n IS REF NODE( NIL ) THEN # null node #
SKIP
ELIF type OF n = nidentifier THEN # load identifier #
gen data op( ofetch, value OF n )
ELIF type OF n = nstring THEN # load string #
gen load string( value OF n )
ELIF type OF n = ninteger THEN # load integer #
gen load constant( value OF n )
ELIF type OF n = nsequence THEN # list #
gen( left OF n );
gen( right OF n )
ELIF type OF n = nif THEN # if-else #
INT else label := new label;
gen( left OF n );
gen jump( ojz, else label );
gen( left OF right OF n );
IF right OF right OF n IS REF NODE( NIL ) THEN
# no "else" part #
define label( else label )
ELSE
# have an "else" part #
INT end if label := new label;
gen jump( ojmp, end if label );
define label( else label );
gen( right OF right OF n );
define label( end if label )
FI
ELIF type OF n = nwhile THEN # while-loop #
INT loop label := new label;
INT exit label := new label;
define label( loop label );
gen( left OF n );
gen jump( ojz, exit label );
gen( right OF n );
gen jump( ojmp, loop label );
define label( exit label )
ELIF type OF n = nassign THEN # assignment #
gen( right OF n );
gen data op( ostore, value OF left OF n )
ELIF type OF n = nnot THEN # bolean not #
gen( left OF n );
gen not
ELIF type OF n = ngreaterequal THEN # compare >= #
gen not op( olt, n )
ELIF type OF n = nnotequal THEN # compare not = #
gen not op( oeq, n )
ELIF type OF n = nlessequal THEN # compare <= #
gen not op( ogt, n )
ELIF type OF n = nprts THEN # print string #
gen output( n, "string" )
ELIF type OF n = nprtc THEN # print character #
gen output( n, "char" )
ELIF type OF n = nprti THEN # print integer #
gen output( n, "int32" )
ELSE # everything else #
gen( left OF n );
gen( right OF n ); # right will be null for a unary op so no code will be generated #
print( ( operation( ndop( type OF n ) ), newline ) )
FI # gen # ;
 
# generates the code trailer - return instruction, end of Main method, end of class and end of namespace #
PROC code trailer = VOID:
BEGIN
print( ( " ret", newline ) );
print( ( " } // Main method", newline ) );
print( ( " } // Program class", newline ) );
print( ( "} // Rcc.Sample namespace", newline ) )
END # code trailer # ;
 
# parse the output from the syntax analyser and generate code from the parse tree #
REF NODE code = read node;
code header;
gen( code );
code trailer
Output:
.assembly extern mscorlib { auto }
.assembly RccSample {}
.module RccSample.exe
.namespace Rcc.Sample
{
    .class public auto ansi Program extends [mscorlib]System.Object
    {
        .method public static void Main() cil managed
        {
           .entrypoint
           .locals init (int32 l_count
                        )
            ldc.i4     1
            stloc      l_count
lbl_1:
            ldloc      l_count
            ldc.i4     10
            clt      
            brfalse    lbl_2
            call       class [mscorlib]System.IO.TextWriter [mscorlib]System.Console::get_Out()
            ldstr      "count is: "
            callvirt   instance void [mscorlib]System.IO.TextWriter::Write(string)
            call       class [mscorlib]System.IO.TextWriter [mscorlib]System.Console::get_Out()
            ldloc      l_count
            callvirt   instance void [mscorlib]System.IO.TextWriter::Write(int32)
            call       class [mscorlib]System.IO.TextWriter [mscorlib]System.Console::get_Out()
            ldstr      "\n"
            callvirt   instance void [mscorlib]System.IO.TextWriter::Write(string)
            ldloc      l_count
            ldc.i4     1
            add      
            stloc      l_count
            br         lbl_1
lbl_2:
            ret
        } // Main method
    } // Program class
} // Rcc.Sample namespace

ALGOL W[edit]

begin % code generator %
 % parse tree nodes %
record node( integer type
 ; reference(node) left, right
 ; integer iValue % nString/nIndentifier number or nInteger value %
);
integer nIdentifier, nString, nInteger, nSequence, nIf, nPrtc, nPrts
, nPrti, nWhile, nAssign, nNegate, nNot, nMultiply
, nDivide, nMod, nAdd, nSubtract, nLess, nLessEqual
, nGreater, nGreaterEqual, nEqual, nNotEqual, nAnd, nOr
 ;
string(14) array ndName ( 1 :: 25 );
integer array nOp ( 1 :: 25 );
integer MAX_NODE_TYPE;
 % string literals and identifiers - uses a linked list - a hash table might be better... %
string(1) array text ( 0 :: 4095 );
integer textNext, TEXT_MAX;
record textElement ( integer start, length; reference(textElement) next );
reference(textElement) idList, stList;
 % op codes %
integer oFetch, oStore, oPush
, oAdd, oSub, oMul, oDiv, oMod, oLt, oGt, oLe, oGe, oEq, oNe
, oAnd, oOr, oNeg, oNot, oJmp, oJz, oPrtc, oPrts, oPrti, oHalt
 ;
string(6) array opName ( 1 :: 24 );
 % code - although this is intended to be byte code, as we are going to output  %
 % an assembler source, we use integers for convenience  %
 % labelLocations are: - ( referencing location + 1 ) if they have been referenced but not defined yet, %
 % zero if they are unreferenced and undefined,  %
 % ( referencing location + 1 ) if they are defined  %
integer array byteCode ( 0 :: 4095 );
integer array labelLocation( 1 :: 4096 );
integer nextLocation, MAX_LOCATION, nextLabelNumber, MAX_LABEL_NUMBER;
 
 % returns a new node with left and right branches %
reference(node) procedure opNode ( integer value opType; reference(node) value opLeft, opRight ) ; begin
node( opType, opLeft, opRight, 0 )
end opNode ;
 
 % returns a new operand node %
reference(node) procedure operandNode ( integer value opType, opValue ) ; begin
node( opType, null, null, opValue )
end operandNode ;
 
 % reports an error and stops %
procedure genError( string(80) value message ); begin
integer errorPos;
write( s_w := 0, "**** Code generation error: " );
errorPos := 0;
while errorPos < 80 and message( errorPos // 1 ) not = "." do begin
writeon( s_w := 0, message( errorPos // 1 ) );
errorPos := errorPos + 1
end while_not_at_end_of_message ;
writeon( s_w := 0, "." );
assert( false )
end genError ;
 
 % reads a node from standard input %
reference(node) procedure readNode ; begin
reference(node) resultNode;
 
 % parses a string from line and stores it in a string in the text array %
 % - if it is not already present in the specified textElement list.  %
 % returns the position of the string in the text array  %
integer procedure readString ( reference(textElement) value result txList; string(1) value terminator ) ; begin
string(256) str;
integer sLen, sPos, ePos;
logical found;
reference(textElement) txPos, txLastPos;
 % get the text of the string %
str  := " ";
sLen := 0;
str( sLen // 1 ) := line( lPos // 1 );
sLen := sLen + 1;
lPos := lPos + 1;
while lPos <= 255 and line( lPos // 1 ) not = terminator do begin
str( sLen // 1 ) := line( lPos // 1 );
sLen := sLen + 1;
lPos := lPos + 1
end while_more_string ;
if lPos > 255 then genError( "Unterminated String in node file." );
 % attempt to find the text in the list of strings/identifiers %
txLastPos := txPos := txList;
found := false;
ePos := 0;
while not found and txPos not = null do begin
ePos  := ePos + 1;
found := ( length(txPos) = sLen );
sPos  := 0;
while found and sPos < sLen do begin
found := str( sPos // 1 ) = text( start(txPos) + sPos );
sPos  := sPos + 1
end while_not_found ;
txLastPos := txPos;
if not found then txPos := next(txPos)
end while_string_not_found ;
if not found then begin
 % the string/identifier is not in the list - add it %
ePos := ePos + 1;
if txList = null then txList := textElement( textNext, sLen, null )
else next(txLastPos) := textElement( textNext, sLen, null );
if textNext + sLen > TEXT_MAX then genError( "Text space exhausted." )
else begin
for cPos := 0 until sLen - 1 do begin
text( textNext ) := str( cPos // 1 );
textNext := textNext + 1
end for_cPos
end
end if_not_found ;
ePos
end readString ;
 
 % gets an integer from the line - no checks for valid digits %
integer procedure readInteger ; begin
integer n;
n := 0;
while line( lPos // 1 ) not = " " do begin
n  := ( n * 10 ) + ( decode( line( lPos // 1 ) ) - decode( "0" ) );
lPos := lPos + 1
end while_not_end_of_integer ;
n
end readInteger ;
 
string(256) line;
string(16) name;
integer lPos, tPos, ndType;
tPos := lPos := 0;
readcard( line );
 % get the node type name %
while line( lPos // 1 ) = " " do lPos := lPos + 1;
name := "";
while lPos < 256 and line( lPos // 1 ) not = " " do begin
name( tPos // 1 ) := line( lPos // 1 );
lPos := lPos + 1;
tPos := tPos + 1
end while_more_name ;
 % determine the node type %
ndType  := 1;
resultNode  := null;
if name not = ";" then begin
 % not a null node %
while ndType <= MAX_NODE_TYPE and name not = ndName( ndType ) do ndType := ndType + 1;
if ndType > MAX_NODE_TYPE then genError( "Malformed node." );
 % handle the additional parameter for identifier/string/integer, or sub-nodes for operator nodes %
if ndType = nInteger or ndType = nIdentifier or ndType = nString then begin
while line( lPos // 1 ) = " " do lPos := lPos + 1;
if ndType = nInteger then resultNode := operandNode( ndType, readInteger )
else if ndType = nIdentifier then resultNode := operandNode( ndType, readString( idList, " " ) )
else  % ndType = nString  % resultNode := operandNode( ndType, readString( stList, """" ) )
end
else begin
 % operator node %
reference(node) leftNode;
leftNode  := readNode;
resultNode := opNode( ndType, leftNode, readNode )
end
end if_non_null_node ;
resultNode
end readNode ;
 
 % returns the next free label number %
integer procedure newLabel ; begin
nextLabelNumber := nextLabelNumber + 1;
if nextLabelNumber > MAX_LABEL_NUMBER then genError( "Program too complex" );
nextLabelNumber
end newLabel ;
 
 % defines the specified label to be at the next location %
procedure defineLabel ( integer value labelNumber ) ; begin
if labelLocation( labelNumber ) > 0 then genError( "Label already defined" )
else begin
 % this is the first definition of the label, define it and if it has already been referenced, fill in the reference %
integer currValue;
currValue := labelLocation( labelNumber );
labelLocation( labelNumber ) := nextLocation + 1; % we store pc + 1 to ensure the label location is positive %
if currValue < 0 then % already referenced % byteCode( - ( currValue + 1 ) ) := labelLocation( labelNumber )
end
end defineLabel ;
 
 % stores a byte in the code %
procedure genByte ( integer value byteValue ) ; begin
if nextLocation > MAX_LOCATION then genError( "Program too large" );
byteCode( nextLocation ) := byteValue;
nextLocation := nextLocation + 1
end genByte ;
 
 % stores an integer in the code %
procedure genInteger ( integer value integerValue ) ; begin
 % we are storing the bytes of the code in separate integers for convenience %
genByte( integerValue ); genByte( 0 ); genByte( 0 ); genByte( 0 )
end genInteger ;
 
 % generates an operation acting on an address %
procedure genDataOp ( integer value opCode, address ) ; begin
genByte( opCode );
genInteger( address )
end genDataOp ;
 
 % generates a nullary operation %
procedure genOp0 ( integer value opCode ) ; begin
genByte( opCode )
end genOp0 ;
 
 % generates a unary/binary operation %
procedure genOp ( reference(node) value n ) ; begin
gen( left(n) );
gen( right(n) ); % right will be null for a unary op so no code will be generated %
genByte( nOp( type(n) ) )
end genOp ;
 
 % generates a jump operation %
procedure genJump ( integer value opCode, labelNumber ) ; begin
genByte( opCode );
 % if the label is not defined yet - set it's location to the negative of the referencing location %
 % so it can be resolved later %
if labelLocation( labelNumber ) = 0 then labelLocation( labelNumber ) := - ( nextLocation + 1 );
genInteger( labelLocation( labelNumber ) )
end genJump ;
 
 % generates code for the node n %
procedure gen ( reference(node) value n ) ; begin
 
if n = null then % empty node % begin end
else if type(n) = nIdentifier then genDataOp( oFetch, iValue(n) )
else if type(n) = nString then genDataOp( oPush, iValue(n) - 1 )
else if type(n) = nInteger then genDataOp( oPush, iValue(n) )
else if type(n) = nSequence then begin
gen( left(n) );
gen( right(n) )
end
else if type(n) = nIf then % if-else  % begin
integer elseLabel;
elseLabel := newLabel;
gen( left(n) );
genJump( oJz, elseLabel );
gen( left( right(n) ) );
if right(right(n)) = null then % no "else" part % defineLabel( elseLabel )
else begin
 % have an "else" part %
integer endIfLabel;
endIfLabel := newLabel;
genJump( oJmp, endIfLabel );
defineLabel( elseLabel );
gen( right(right(n)) );
defineLabel( endIfLabel )
end
end
else if type(n) = nWhile then % while-loop  % begin
integer loopLabel, exitLabel;
loopLabel := newLabel;
exitLabel := newLabel;
defineLabel( loopLabel );
gen( left(n) );
genJump( oJz, exitLabel );
gen( right(n) );
genJump( oJmp, loopLabel );
defineLabel( exitLabel )
end
else if type(n) = nAssign then % assignment  % begin
gen( right( n ) );
genDataOp( oStore, iValue(left(n)) )
end
else genOp( n )
end gen ;
 
 % outputs the generated code to standard output %
procedure emitCode ; begin
 
 % counts the number of elements in a text element list %
integer procedure countElements ( reference(textElement) value txHead ) ; begin
integer count;
reference(textElement) txPos;
count := 0;
txPos := txHead;
while txPos not = null do begin
count := count + 1;
txPos := next(txPos)
end while_txPos_not_null ;
count
end countElements ;
 
integer pc, op;
reference(textElement) txPos;
 
 % code header %
write( i_w := 1, s_w := 0
, "Datasize: ", countElements( idList )
, " Strings: ", countElements( stList )
);
 % output the string literals %
txPos := stList;
while txPos not = null do begin
integer cPos;
write( """" );
cPos := 1; % start from 1 to skip over the leading " %
while cPos < length(txPos) do begin
writeon( s_w := 0, text( start(txPos) + cPos ) );
cPos := cPos + 1
end while_not_end_of_string ;
writeon( s_w := 0, """" );
txPos := next(txPos)
end while_not_at_end_of_literals ;
 
 % code body %
pc := 0;
while pc < nextLocation do begin
op := byteCode( pc );
write( i_w := 4, s_w := 0, pc, " ", opName( op ) );
pc := pc + 1;
if op = oFetch or op = oStore then begin
 % data load/store - add the address in square brackets %
writeon( i_w := 1, s_w := 0, "[", byteCode( pc ) - 1, "]" );
pc := pc + 4
end
else if op = oPush then begin
 % push constant - add the constant %
writeon( i_w := 1, s_w := 0, byteCode( pc ) );
pc := pc + 4
end
else if op = oJmp or op = oJz then begin
 % jump - show the relative address in brackets and the absolute address %
writeon( i_w := 1, s_w := 0, "(", ( byteCode( pc ) - 1 ) - pc, ") ", byteCode( pc ) - 1 );
pc := pc + 4
end
end while_pc_lt_nextLocation
end emitCode ;
 
oFetch := 1; opName( oFetch ) := "fetch"; oStore := 2; opName( oStore ) := "store"; oPush := 3; opName( oPush ) := "push";
oAdd  := 4; opName( oAdd ) := "add"; oSub  := 5; opName( oSub ) := "sub"; oMul  := 6; opName( oMul ) := "mul";
oDiv  := 7; opName( oDiv ) := "div"; oMod  := 8; opName( oMod ) := "mod"; oLt  := 9; opName( oLt ) := "lt";
oGt  := 10; opName( oGt ) := "gt"; oLe  := 11; opName( oLe ) := "le"; oGe  := 12; opName( oGe ) := "ge";
oEq  := 13; opName( oEq ) := "eq"; oNe  := 14; opName( oNe ) := "ne"; oAnd  := 15; opName( oAnd ) := "and";
oOr  := 16; opName( oOr ) := "or"; oNeg  := 17; opName( oNeg ) := "neg"; oNot  := 18; opName( oNot ) := "not";
oJmp  := 19; opName( oJmp ) := "jmp"; oJz  := 20; opName( oJz ) := "jz"; oPrtc := 21; opName( oPrtc ) := "prtc";
oPrts  := 22; opName( oPrts ) := "prts"; oPrti  := 23; opName( oPrti ) := "prti"; oHalt := 24; opName( oHalt ) := "halt";
 
nIdentifier  := 1; ndName( nIdentifier ) := "Identifier"; nString  := 2; ndName( nString ) := "String";
nInteger  := 3; ndName( nInteger ) := "Integer"; nSequence := 4; ndName( nSequence ) := "Sequence";
nIf  := 5; ndName( nIf ) := "If"; nPrtc  := 6; ndName( nPrtc ) := "Prtc";
nPrts  := 7; ndName( nPrts ) := "Prts"; nPrti  := 8; ndName( nPrti ) := "Prti";
nWhile  := 9; ndName( nWhile ) := "While"; nAssign  := 10; ndName( nAssign ) := "Assign";
nNegate  := 11; ndName( nNegate ) := "Negate"; nNot  := 12; ndName( nNot ) := "Not";
nMultiply  := 13; ndName( nMultiply ) := "Multiply"; nDivide  := 14; ndName( nDivide ) := "Divide";
nMod  := 15; ndName( nMod ) := "Mod"; nAdd  := 16; ndName( nAdd ) := "Add";
nSubtract  := 17; ndName( nSubtract ) := "Subtract"; nLess  := 18; ndName( nLess ) := "Less";
nLessEqual  := 19; ndName( nLessEqual ) := "LessEqual"; nGreater  := 20; ndName( nGreater ) := "Greater";
nGreaterEqual  := 21; ndName( nGreaterEqual ) := "GreaterEqual"; nEqual  := 22; ndName( nEqual ) := "Equal";
nNotEqual  := 23; ndName( nNotEqual ) := "NotEqual"; nAnd  := 24; ndName( nAnd ) := "And";
nOr  := 25; ndName( nOr ) := "Or";
MAX_NODE_TYPE  := 25; TEXT_MAX := 4095; textNext := 0;
stList := idList := null;
for nPos := 1 until MAX_NODE_TYPE do nOp( nPos ) := -1;
nOp( nPrtc ) := oPrtc; nOp( nPrts ) := oPrts; nOp( nPrti ) := oPrti; nOp( nNegate ) := oNeg; nOp( nNot ) := oNot;
nOp( nMultiply ) := oMul; nOp( nDivide ) := oDiv; nOp( nMod ) := oMod; nOp( nAdd ) := oAdd; nOp( nSubtract ) := oSub;
nOp( nLess ) := oLt; nOp( nLessEqual ) := oLe; nOp( nGreater ) := oGt; nOp( nGreaterEqual ) := oGe; nOp( nEqual ) := oEq;
nOp( nNotEqual ) := oNe; nOp( nAnd ) := oAnd; nOp( nOr ) := oOr;
nextLocation  := 0; MAX_LOCATION := 4095;
for pc := 0 until MAX_LOCATION do byteCode( pc ) := 0;
nextLabelNumber := 0; MAX_LABEL_NUMBER := 4096;
for lPos := 1 until MAX_LABEL_NUMBER do labelLocation( lPos ) := 0;
 
 % parse the output from the syntax analyser and generate code from the parse tree %
gen( readNode );
genOp0( oHalt );
emitCode
end.
Output:

The While Counter example

Datasize: 1 Strings: 2
"count is: "
"\n"
   0 push  1
   5 store [0]
  10 fetch [0]
  15 push  10
  20 lt
  21 jz    (43) 65
  26 push  0
  31 prts
  32 fetch [0]
  37 prti
  38 push  1
  43 prts
  44 fetch [0]
  49 push  1
  54 add
  55 store [0]
  60 jmp   (-51) 10
  65 halt

AWK[edit]

Tested with gawk 4.1.1 and mawk 1.3.4.

 
function error(msg) {
printf("%s\n", msg)
exit(1)
}
 
function bytes_to_int(bstr, i, sum) {
sum = 0
for (i=word_size-1; i>=0; i--) {
sum *= 256
sum += code[bstr+i]
}
return sum
}
 
function make_node(oper, left, right, value) {
node_type [next_free_node_index] = oper
node_left [next_free_node_index] = left
node_right[next_free_node_index] = right
node_value[next_free_node_index] = value
return next_free_node_index ++
}
 
function make_leaf(oper, n) {
return make_node(oper, 0, 0, n)
}
 
function emit_byte(x) {
code[next_free_code_index++] = x
}
 
function emit_word(x, i) {
for (i=0; i<word_size; i++) {
emit_byte(int(x)%256);
x = int(x/256)
}
}
 
function emit_word_at(at, n, i) {
for (i=0; i<word_size; i++) {
code[at+i] = int(n)%256
n = int(n/256)
}
}
 
function hole( t) {
t = next_free_code_index
emit_word(0)
return t
}
 
function fetch_var_offset(name, n) {
if (name in globals) {
n = globals[name]
} else {
globals[name] = globals_n
n = globals_n
globals_n += 1
}
return n
}
 
function fetch_string_offset(the_string, n) {
n = string_pool[the_string]
if (n == "") {
string_pool[the_string] = string_n
n = string_n
string_n += 1
}
return n
}
 
function code_gen(x, n, p1, p2) {
if (x == 0) {
return
} else if (node_type[x] == "nd_Ident") {
emit_byte(FETCH)
n = fetch_var_offset(node_value[x])
emit_word(n)
} else if (node_type[x] == "nd_Integer") {
emit_byte(PUSH)
emit_word(node_value[x])
} else if (node_type[x] == "nd_String") {
emit_byte(PUSH)
n = fetch_string_offset(node_value[x])
emit_word(n)
} else if (node_type[x] == "nd_Assign") {
n = fetch_var_offset(node_value[node_left[x]])
code_gen(node_right[x])
emit_byte(STORE)
emit_word(n)
} else if (node_type[x] == "nd_If") {
code_gen(node_left[x]) # expr
emit_byte(JZ) # if false, jump
p1 = hole() # make room for jump dest
code_gen(node_left[node_right[x]]) # if true statements
if (node_right[node_right[x]] != 0) {
emit_byte(JMP) # jump over else statements
p2 = hole()
}
emit_word_at(p1, next_free_code_index - p1)
if (node_right[node_right[x]] != 0) {
code_gen(node_right[node_right[x]]) # else statements
emit_word_at(p2, next_free_code_index - p2)
}
} else if (node_type[x] == "nd_While") {
p1 =next_free_code_index
code_gen(node_left[x])
emit_byte(JZ)
p2 = hole()
code_gen(node_right[x])
emit_byte(JMP) # jump back to the top
emit_word(p1 - next_free_code_index)
emit_word_at(p2, next_free_code_index - p2)
} else if (node_type[x] == "nd_Sequence") {
code_gen(node_left[x])
code_gen(node_right[x])
} else if (node_type[x] == "nd_Prtc") {
code_gen(node_left[x])
emit_byte(PRTC)
} else if (node_type[x] == "nd_Prti") {
code_gen(node_left[x])
emit_byte(PRTI)
} else if (node_type[x] == "nd_Prts") {
code_gen(node_left[x])
emit_byte(PRTS)
} else if (node_type[x] in operators) {
code_gen(node_left[x])
code_gen(node_right[x])
emit_byte(operators[node_type[x]])
} else if (node_type[x] in unary_operators) {
code_gen(node_left[x])
emit_byte(unary_operators[node_type[x]])
} else {
error("error in code generator - found '" node_type[x] "', expecting operator")
}
}
 
function code_finish() {
emit_byte(HALT)
}
 
function list_code() {
printf("Datasize: %d Strings: %d\n", globals_n, string_n)
# Make sure that arrays are sorted by value in ascending order.
PROCINFO["sorted_in"] = "@val_str_asc"
# This is a dependency on GAWK.
for (k in string_pool)
print(k)
pc = 0
while (pc < next_free_code_index) {
printf("%4d ", pc)
op = code[pc]
pc += 1
if (op == FETCH) {
x = bytes_to_int(pc)
printf("fetch [%d]\n", x);
pc += word_size
} else if (op == STORE) {
x = bytes_to_int(pc)
printf("store [%d]\n", x);
pc += word_size
} else if (op == PUSH) {
x = bytes_to_int(pc)
printf("push  %d\n", x);
pc += word_size
} else if (op == ADD) { print("add")
} else if (op == SUB) { print("sub")
} else if (op == MUL) { print("mul")
} else if (op == DIV) { print("div")
} else if (op == MOD) { print("mod")
} else if (op == LT) { print("lt")
} else if (op == GT) { print("gt")
} else if (op == LE) { print("le")
} else if (op == GE) { print("ge")
} else if (op == EQ) { print("eq")
} else if (op == NE) { print("ne")
} else if (op == AND) { print("and")
} else if (op == OR) { print("or")
} else if (op == NEG) { print("neg")
} else if (op == NOT) { print("not")
} else if (op == JMP) {
x = bytes_to_int(pc)
printf("jmp (%d) %d\n", x, pc + x);
pc += word_size
} else if (op == JZ) {
x = bytes_to_int(pc)
printf("jz (%d) %d\n", x, pc + x);
pc += word_size
} else if (op == PRTC) { print("prtc")
} else if (op == PRTI) { print("prti")
} else if (op == PRTS) { print("prts")
} else if (op == HALT) { print("halt")
} else { error("list_code: Unknown opcode '" op "'")
}
} # while pc
}
 
function load_ast( line, line_list, text, n, node_type, value, left, right) {
getline line
n=split(line, line_list)
text = line_list[1]
if (text == ";")
return 0
node_type = all_syms[text]
if (n > 1) {
value = line_list[2]
for (i=3;i<=n;i++)
value = value " " line_list[i]
if (value ~ /^[0-9]+$/)
value = int(value)
return make_leaf(node_type, value)
}
left = load_ast()
right = load_ast()
return make_node(node_type, left, right)
}
 
BEGIN {
all_syms["Identifier" ] = "nd_Ident"
all_syms["String" ] = "nd_String"
all_syms["Integer" ] = "nd_Integer"
all_syms["Sequence" ] = "nd_Sequence"
all_syms["If" ] = "nd_If"
all_syms["Prtc" ] = "nd_Prtc"
all_syms["Prts" ] = "nd_Prts"
all_syms["Prti" ] = "nd_Prti"
all_syms["While" ] = "nd_While"
all_syms["Assign" ] = "nd_Assign"
all_syms["Negate" ] = "nd_Negate"
all_syms["Not" ] = "nd_Not"
all_syms["Multiply" ] = "nd_Mul"
all_syms["Divide" ] = "nd_Div"
all_syms["Mod" ] = "nd_Mod"
all_syms["Add" ] = "nd_Add"
all_syms["Subtract" ] = "nd_Sub"
all_syms["Less" ] = "nd_Lss"
all_syms["LessEqual" ] = "nd_Leq"
all_syms["Greater" ] = "nd_Gtr"
all_syms["GreaterEqual"] = "nd_Geq"
all_syms["Equal" ] = "nd_Eql"
all_syms["NotEqual" ] = "nd_Neq"
all_syms["And" ] = "nd_And"
all_syms["Or" ] = "nd_Or"
 
FETCH=1; STORE=2; PUSH=3; ADD=4; SUB=5; MUL=6;
DIV=7; MOD=8; LT=9; GT=10; LE=11; GE=12;
EQ=13; NE=14; AND=15; OR=16; NEG=17; NOT=18;
JMP=19; JZ=20; PRTC=21; PRTS=22; PRTI=23; HALT=24;
 
operators["nd_Lss"] = LT
operators["nd_Gtr"] = GT
operators["nd_Leq"] = LE
operators["nd_Geq"] = GE
operators["nd_Eql"] = EQ
operators["nd_Neq"] = NE
operators["nd_And"] = AND
operators["nd_Or" ] = OR
operators["nd_Sub"] = SUB
operators["nd_Add"] = ADD
operators["nd_Div"] = DIV
operators["nd_Mul"] = MUL
operators["nd_Mod"] = MOD
 
unary_operators["nd_Negate"] = NEG
unary_operators["nd_Not" ] = NOT
 
next_free_node_index = 1
next_free_code_index = 0
globals_n = 0
string_n = 0
word_size = 4
input_file = "-"
 
if (ARGC > 1)
input_file = ARGV[1]
n = load_ast()
code_gen(n)
code_finish()
list_code()
}
 
Output  —  count:

Datasize: 1 Strings: 2
"count is: "
"\n"
   0 push  1
   5 store [0]
  10 fetch [0]
  15 push  10
  20 lt
  21 jz     (43) 65
  26 push  0
  31 prts
  32 fetch [0]
  37 prti
  38 push  1
  43 prts
  44 fetch [0]
  49 push  1
  54 add
  55 store [0]
  60 jmp    (-51) 10
  65 halt

C[edit]

Tested with gcc 4.81 and later, compiles warning free with -Wall -Wextra

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <stdarg.h>
#include <stdint.h>
#include <ctype.h>
 
typedef unsigned char uchar;
 
typedef enum {
nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While,
nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq,
nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or
} NodeType;
 
typedef enum { FETCH, STORE, PUSH, ADD, SUB, MUL, DIV, MOD, LT, GT, LE, GE, EQ, NE, AND,
OR, NEG, NOT, JMP, JZ, PRTC, PRTS, PRTI, HALT
} Code_t;
 
typedef uchar code;
 
typedef struct Tree {
NodeType node_type;
struct Tree *left;
struct Tree *right;
char *value;
} Tree;
 
#define da_dim(name, type) type *name = NULL; \
int _qy_ ## name ## _p = 0; \
int _qy_ ## name ## _max = 0

 
#define da_redim(name) do {if (_qy_ ## name ## _p >= _qy_ ## name ## _max) \
name = realloc(name, (_qy_ ## name ## _max += 32) * sizeof(name[0]));} while (0)

 
#define da_rewind(name) _qy_ ## name ## _p = 0
 
#define da_append(name, x) do {da_redim(name); name[_qy_ ## name ## _p++] = x;} while (0)
#define da_len(name) _qy_ ## name ## _p
#define da_add(name) do {da_redim(name); _qy_ ## name ## _p++;} while (0)
 
FILE *source_fp, *dest_fp;
static int here;
da_dim(object, code);
da_dim(globals, const char *);
da_dim(string_pool, const char *);
 
// dependency: Ordered by NodeType, must remain in same order as NodeType enum
struct {
char *enum_text;
NodeType node_type;
Code_t opcode;
} atr[] = {
{"Identifier" , nd_Ident, -1 },
{"String" , nd_String, -1 },
{"Integer" , nd_Integer, -1 },
{"Sequence" , nd_Sequence, -1 },
{"If" , nd_If, -1 },
{"Prtc" , nd_Prtc, -1 },
{"Prts" , nd_Prts, -1 },
{"Prti" , nd_Prti, -1 },
{"While" , nd_While, -1 },
{"Assign" , nd_Assign, -1 },
{"Negate" , nd_Negate, NEG},
{"Not" , nd_Not, NOT},
{"Multiply" , nd_Mul, MUL},
{"Divide" , nd_Div, DIV},
{"Mod" , nd_Mod, MOD},
{"Add" , nd_Add, ADD},
{"Subtract" , nd_Sub, SUB},
{"Less" , nd_Lss, LT },
{"LessEqual" , nd_Leq, LE },
{"Greater" , nd_Gtr, GT },
{"GreaterEqual", nd_Geq, GE },
{"Equal" , nd_Eql, EQ },
{"NotEqual" , nd_Neq, NE },
{"And" , nd_And, AND},
{"Or" , nd_Or, OR },
};
 
void error(const char *fmt, ... ) {
va_list ap;
char buf[1000];
 
va_start(ap, fmt);
vsprintf(buf, fmt, ap);
va_end(ap);
printf("error: %s\n", buf);
exit(1);
}
 
Code_t type_to_op(NodeType type) {
return atr[type].opcode;
}
 
Tree *make_node(NodeType node_type, Tree *left, Tree *right) {
Tree *t = calloc(sizeof(Tree), 1);
t->node_type = node_type;
t->left = left;
t->right = right;
return t;
}
 
Tree *make_leaf(NodeType node_type, char *value) {
Tree *t = calloc(sizeof(Tree), 1);
t->node_type = node_type;
t->value = strdup(value);
return t;
}
 
/*** Code generator ***/
 
void emit_byte(int c) {
da_append(object, (uchar)c);
++here;
}
 
void emit_int(int32_t n) {
union {
int32_t n;
unsigned char c[sizeof(int32_t)];
} x;
 
x.n = n;
 
for (size_t i = 0; i < sizeof(x.n); ++i) {
emit_byte(x.c[i]);
}
}
 
int hole() {
int t = here;
emit_int(0);
return t;
}
 
void fix(int src, int dst) {
*(int32_t *)(object + src) = dst-src;
}
 
int fetch_var_offset(const char *id) {
for (int i = 0; i < da_len(globals); ++i) {
if (strcmp(id, globals[i]) == 0)
return i;
}
da_add(globals);
int n = da_len(globals) - 1;
globals[n] = strdup(id);
return n;
}
 
int fetch_string_offset(const char *st) {
for (int i = 0; i < da_len(string_pool); ++i) {
if (strcmp(st, string_pool[i]) == 0)
return i;
}
da_add(string_pool);
int n = da_len(string_pool) - 1;
string_pool[n] = strdup(st);
return n;
}
 
void code_gen(Tree *x) {
int p1, p2, n;
 
if (x == NULL) return;
switch (x->node_type) {
case nd_Ident:
emit_byte(FETCH);
n = fetch_var_offset(x->value);
emit_int(n);
break;
case nd_Integer:
emit_byte(PUSH);
emit_int(atoi(x->value));
break;
case nd_String:
emit_byte(PUSH);
n = fetch_string_offset(x->value);
emit_int(n);
break;
case nd_Assign:
n = fetch_var_offset(x->left->value);
code_gen(x->right);
emit_byte(STORE);
emit_int(n);
break;
case nd_If:
code_gen(x->left); // if expr
emit_byte(JZ); // if false, jump
p1 = hole(); // make room for jump dest
code_gen(x->right->left); // if true statements
if (x->right->right != NULL) {
emit_byte(JMP);
p2 = hole();
}
fix(p1, here);
if (x->right->right != NULL) {
code_gen(x->right->right);
fix(p2, here);
}
break;
case nd_While:
p1 = here;
code_gen(x->left); // while expr
emit_byte(JZ); // if false, jump
p2 = hole(); // make room for jump dest
code_gen(x->right); // statements
emit_byte(JMP); // back to the top
fix(hole(), p1); // plug the top
fix(p2, here); // plug the 'if false, jump'
break;
case nd_Sequence:
code_gen(x->left);
code_gen(x->right);
break;
case nd_Prtc:
code_gen(x->left);
emit_byte(PRTC);
break;
case nd_Prti:
code_gen(x->left);
emit_byte(PRTI);
break;
case nd_Prts:
code_gen(x->left);
emit_byte(PRTS);
break;
case nd_Lss: case nd_Gtr: case nd_Leq: case nd_Geq: case nd_Eql: case nd_Neq:
case nd_And: case nd_Or: case nd_Sub: case nd_Add: case nd_Div: case nd_Mul:
case nd_Mod:
code_gen(x->left);
code_gen(x->right);
emit_byte(type_to_op(x->node_type));
break;
case nd_Negate: case nd_Not:
code_gen(x->left);
emit_byte(type_to_op(x->node_type));
break;
default:
error("error in code generator - found %d, expecting operator\n", x->node_type);
}
}
 
void code_finish() {
emit_byte(HALT);
}
 
void list_code() {
fprintf(dest_fp, "Datasize: %d Strings: %d\n", da_len(globals), da_len(string_pool));
for (int i = 0; i < da_len(string_pool); ++i)
fprintf(dest_fp, "%s\n", string_pool[i]);
 
code *pc = object;
 
again: fprintf(dest_fp, "%5d ", (int)(pc - object));
switch (*pc++) {
case FETCH: fprintf(dest_fp, "fetch [%d]\n", *(int32_t *)pc);
pc += sizeof(int32_t); goto again;
case STORE: fprintf(dest_fp, "store [%d]\n", *(int32_t *)pc);
pc += sizeof(int32_t); goto again;
case PUSH : fprintf(dest_fp, "push  %d\n", *(int32_t *)pc);
pc += sizeof(int32_t); goto again;
case ADD : fprintf(dest_fp, "add\n"); goto again;
case SUB : fprintf(dest_fp, "sub\n"); goto again;
case MUL : fprintf(dest_fp, "mul\n"); goto again;
case DIV : fprintf(dest_fp, "div\n"); goto again;
case MOD : fprintf(dest_fp, "mod\n"); goto again;
case LT : fprintf(dest_fp, "lt\n"); goto again;
case GT : fprintf(dest_fp, "gt\n"); goto again;
case LE : fprintf(dest_fp, "le\n"); goto again;
case GE : fprintf(dest_fp, "ge\n"); goto again;
case EQ : fprintf(dest_fp, "eq\n"); goto again;
case NE : fprintf(dest_fp, "ne\n"); goto again;
case AND : fprintf(dest_fp, "and\n"); goto again;
case OR : fprintf(dest_fp, "or\n"); goto again;
case NOT : fprintf(dest_fp, "not\n"); goto again;
case NEG : fprintf(dest_fp, "neg\n"); goto again;
case JMP : fprintf(dest_fp, "jmp (%d) %d\n",
*(int32_t *)pc, (int32_t)(pc + *(int32_t *)pc - object));
pc += sizeof(int32_t); goto again;
case JZ : fprintf(dest_fp, "jz (%d) %d\n",
*(int32_t *)pc, (int32_t)(pc + *(int32_t *)pc - object));
pc += sizeof(int32_t); goto again;
case PRTC : fprintf(dest_fp, "prtc\n"); goto again;
case PRTI : fprintf(dest_fp, "prti\n"); goto again;
case PRTS : fprintf(dest_fp, "prts\n"); goto again;
case HALT : fprintf(dest_fp, "halt\n"); break;
default:error("listcode:Unknown opcode %d\n", *(pc - 1));
}
}
 
void init_io(FILE **fp, FILE *std, const char mode[], const char fn[]) {
if (fn[0] == '\0')
*fp = std;
else if ((*fp = fopen(fn, mode)) == NULL)
error(0, 0, "Can't open %s\n", fn);
}
 
NodeType get_enum_value(const char name[]) {
for (size_t i = 0; i < sizeof(atr) / sizeof(atr[0]); i++) {
if (strcmp(atr[i].enum_text, name) == 0) {
return atr[i].node_type;
}
}
error("Unknown token %s\n", name);
return -1;
}
 
char *read_line(int *len) {
static char *text = NULL;
static int textmax = 0;
 
for (*len = 0; ; (*len)++) {
int ch = fgetc(source_fp);
if (ch == EOF || ch == '\n') {
if (*len == 0)
return NULL;
break;
}
if (*len + 1 >= textmax) {
textmax = (textmax == 0 ? 128 : textmax * 2);
text = realloc(text, textmax);
}
text[*len] = ch;
}
text[*len] = '\0';
return text;
}
 
char *rtrim(char *text, int *len) { // remove trailing spaces
for (; *len > 0 && isspace(text[*len - 1]); --(*len))
;
 
text[*len] = '\0';
return text;
}
 
Tree *load_ast() {
int len;
char *yytext = read_line(&len);
yytext = rtrim(yytext, &len);
 
// get first token
char *tok = strtok(yytext, " ");
 
if (tok[0] == ';') {
return NULL;
}
NodeType node_type = get_enum_value(tok);
 
// if there is extra data, get it
char *p = tok + strlen(tok);
if (p != &yytext[len]) {
for (++p; isspace(*p); ++p)
;
return make_leaf(node_type, p);
}
 
Tree *left = load_ast();
Tree *right = load_ast();
return make_node(node_type, left, right);
}
 
int main(int argc, char *argv[]) {
init_io(&source_fp, stdin, "r", argc > 1 ? argv[1] : "");
init_io(&dest_fp, stdout, "wb", argc > 2 ? argv[2] : "");
 
code_gen(load_ast());
code_finish();
list_code();
 
return 0;
}
Output  —  While counter example:

Datasize: 1 Strings: 2
"count is: "
"\n"
    0 push  1
    5 store [0]
   10 fetch [0]
   15 push  10
   20 lt
   21 jz     (43) 65
   26 push  0
   31 prts
   32 fetch [0]
   37 prti
   38 push  1
   43 prts
   44 fetch [0]
   49 push  1
   54 add
   55 store [0]
   60 jmp    (-51) 10
   65 halt

Forth[edit]

Tested with Gforth 0.7.3

CREATE BUF 0 ,
: PEEK BUF @ 0= IF KEY BUF ! THEN BUF @ ;
: GETC PEEK 0 BUF ! ;
: SPACE? DUP BL = SWAP 9 14 WITHIN OR ;
: >SPACE BEGIN PEEK SPACE? WHILE GETC DROP REPEAT ;
: DIGIT? 48 58 WITHIN ;
: >Integer >SPACE 0
BEGIN PEEK DIGIT?
WHILE GETC [CHAR] 0 - SWAP 10 * + REPEAT ;
: SKIP ( xt --)
BEGIN PEEK OVER EXECUTE WHILE GETC DROP REPEAT DROP ;
: WORD ( xt -- c-addr) DUP >R SKIP PAD 1+
BEGIN PEEK R@ EXECUTE INVERT
WHILE GETC OVER C! CHAR+
REPEAT R> SKIP PAD TUCK - 1- PAD C! ;
: INTERN ( c-addr -- c-addr)
HERE TUCK OVER C@ CHAR+ DUP ALLOT CMOVE ;
: "? [CHAR] " = ;
: "TYPE" [CHAR] " EMIT TYPE [CHAR] " EMIT ;
: . 0 .R ;
: 3@ ( addr -- w3 w2 w1)
[ 2 CELLS ]L + DUP @ SWAP CELL - DUP @ SWAP CELL - @ ;
 
CREATE BUF' 12 ALLOT
: PREPEND ( c-addr c -- c-addr) BUF' 1+ C!
COUNT 10 MIN DUP 1+ BUF' C! BUF' 2 + SWAP CMOVE BUF' ;
: >NODE ( c-addr -- n) [CHAR] $ PREPEND FIND
IF EXECUTE ELSE ." unrecognized node " COUNT TYPE CR THEN ;
: NODE ( n left right -- addr) HERE >R , , , R> ;
 
: CONS ( a b l -- l) HERE >R , , , R> ;
: FIRST ( l -- a) [ 2 CELLS ]L + @ ;
: SECOND ( l -- b) CELL+ @ ;
: C=? ( c-addr1 c-addr2 -- t|f) COUNT ROT COUNT COMPARE 0= ;
: LOOKUP ( c-addr l -- n t | c-addr f)
BEGIN DUP WHILE OVER OVER FIRST C=?
IF NIP SECOND TRUE EXIT THEN @
REPEAT DROP FALSE ;
 
CREATE GLOBALS 0 , CREATE STRINGS 0 ,
: DEPTH ( pool -- n) DUP IF SECOND 1+ THEN ;
: FISH ( c-addr pool -- n pool') TUCK LOOKUP IF SWAP
ELSE INTERN OVER DEPTH ROT OVER >R CONS R> SWAP THEN ;
: >Identifier ['] SPACE? WORD GLOBALS @ FISH GLOBALS ! ;
: >String ['] "? WORD STRINGS @ FISH STRINGS ! ;
: >; 0 ;
: HANDLER [CHAR] @ PREPEND FIND DROP ;
: READER ( c-addr -- xt t | f)
[CHAR] > PREPEND FIND DUP 0= IF NIP THEN ;
DEFER GETAST
: READ ( c-addr -- right left) READER
IF EXECUTE 0 ELSE GETAST GETAST THEN SWAP ;
: (GETAST) ['] SPACE? WORD DUP HANDLER >R READ R> NODE ;
' (GETAST) IS GETAST
 
CREATE PC 0 ,
: i32! ( n addr --)
OVER $FF AND OVER C! 1+
OVER 8 RSHIFT $FF AND OVER C! 1+
OVER 16 RSHIFT $FF AND OVER C! 1+
OVER 24 RSHIFT $FF AND OVER C! DROP DROP ;
: i32, ( n --) HERE i32! 4 ALLOT 4 PC +! ;
: i8, ( c --) C, 1 PC +! ;
: i8@+ DUP 1+ SWAP C@ 1 PC +! ;
: i32@+ ( addr -- addr+4 n)
i8@+ >R i8@+ 8 LSHIFT R> OR >R
i8@+ 16 LSHIFT R> OR >R i8@+ 24 LSHIFT R> OR ;
 
CREATE #OPS 0 ,
: OP: CREATE #OPS @ , 1 #OPS +! DOES> @ ;
OP: fetch OP: store OP: push OP: jmp OP: jz
OP: prtc OP: prti OP: prts OP: neg OP: not
OP: add OP: sub OP: mul OP: div OP: mod
OP: lt OP: gt OP: le OP: ge
OP: eq OP: ne OP: and OP: or OP: halt
 
: GEN ( ast --) 3@ EXECUTE ;
: @; ( r l) DROP DROP ;
: @Identifier fetch i8, i32, DROP ;
: @Integer push i8, i32, DROP ;
: @String push i8, i32, DROP ;
: @Prtc GEN prtc i8, DROP ;
: @Prti GEN prti i8, DROP ;
: @Prts GEN prts i8, DROP ;
: @Not GEN not i8, DROP ;
: @Negate GEN neg i8, DROP ;
: @Sequence GEN GEN ;
: @Assign CELL+ @ >R GEN store i8, R> i32, ;
: @While PC @ SWAP GEN jz i8, HERE >R 0 i32,
SWAP GEN jmp i8, i32, PC @ R> i32! ;
: @If GEN jz i8, HERE >R 0 i32,
CELL+ DUP CELL+ @ DUP @ ['] @; = IF DROP @
ELSE SWAP @ GEN jmp i8, HERE 0 i32, PC @ R> i32! >R
THEN GEN PC @ R> i32! ;
: BINARY >R GEN GEN R> i8, ;
: @Subtract sub BINARY ;  : @Add add BINARY ;
: @Mod mod BINARY ;  : @Multiply mul BINARY ;
: @Divide div BINARY ;
: @Less lt BINARY ;  : @LessEqual le BINARY ;
: @Greater gt BINARY ;  : @GreaterEqual ge BINARY ;
: @Equal eq BINARY ;  : @NotEqual ne BINARY ;
: @And and BINARY ;  : @Or or BINARY ;
 
: REVERSE ( l -- l') 0 SWAP
BEGIN DUP WHILE TUCK DUP @ ROT ROT  ! REPEAT DROP ;
: .STRINGS STRINGS @ REVERSE BEGIN DUP
WHILE DUP FIRST COUNT "TYPE" CR @ REPEAT DROP ;
: .HEADER ( --)
." Datasize: " GLOBALS @ DEPTH . SPACE
." Strings: " STRINGS @ DEPTH . CR .STRINGS ;
: GENERATE ( ast -- addr u)
0 PC ! HERE >R GEN halt i8, R> PC @ ;
: ," [CHAR] " PARSE TUCK HERE SWAP CMOVE ALLOT ;
CREATE "OPS"
," fetch store push jmp jz prtc prti prts "
," neg not add sub mul div mod lt "
," gt le ge eq ne and or halt "
: .i32 i32@+ . ;
: .[i32] [CHAR] [ EMIT .i32 [CHAR] ] EMIT ;
: .off [CHAR] ( EMIT PC @ >R i32@+ DUP R> - . [CHAR] ) EMIT
SPACE . ;
CREATE .INT ' .[i32] , ' .[i32] , ' .i32 , ' .off , ' .off ,
: EMIT ( addr u --) >R 0 PC !
BEGIN PC @ R@ <
WHILE PC @ 5 .R SPACE i8@+
DUP 6 * "OPS" + 6 TYPE
DUP 5 < IF CELLS .INT + @ EXECUTE ELSE DROP THEN CR
REPEAT DROP R> DROP ;
GENERATE EMIT BYE

Passes all tests.

Phix[edit]

Reusing parse.e from the Syntax Analyzer task
Deviates somewhat from the task specification in that it generates executable machine code.

--
-- demo\rosetta\Compiler\cgen.e
-- ============================
--
-- The reusable part of cgen.exw
--
 
include parse.e
 
global sequence vars = {},
strings = {},
stringptrs = {}
 
global integer chain = 0
global sequence code = {}
 
function var_idx(sequence inode)
if inode[1]!=tk_Identifier then ?9/0 end if
string ident = inode[2]
integer n = find(ident,vars)
if n=0 then
vars = append(vars,ident)
n = length(vars)
end if
return n
end function
 
function string_idx(sequence inode)
if inode[1]!=tk_String then ?9/0 end if
string s = inode[2]
integer n = find(s,strings)
if n=0 then
strings = append(strings,s)
stringptrs = append(stringptrs,0)
n = length(strings)
end if
return n
end function
 
function gen_size(object t)
-- note: must be kept precisely in sync with gen_rec!
-- (relentlessly tested via estsize/actsize)
integer size = 0
if t!=NULL then
integer n_type = t[1]
string node_type = tkNames[n_type]
switch n_type do
case tk_Sequence:
size += gen_size(t[2])
size += gen_size(t[3])
case tk_assign:
size += gen_size(t[3])+6
case tk_Integer:
size += 5
case tk_Identifier:
size += 6
case tk_String:
size += 5
case tk_while:
-- emit: @@:<condition><topjmp(@f)><body><tailjmp(@b)>@@:
size += gen_size(t[2])+3
integer body = gen_size(t[3])
integer stail = iff(size+body+2>128?5:2)
integer stop = iff(body+stail >127?6:2)
size += stop+body+stail
case tk_lt:
case tk_le:
case tk_ne:
case tk_eq:
case tk_gt:
case tk_ge:
size += gen_size(t[2])
size += gen_size(t[3])
size += 10
case tk_add:
case tk_and:
case tk_sub:
size += gen_size(t[2])
size += gen_size(t[3])
size += 4
case tk_mul:
size += gen_size(t[2])
size += gen_size(t[3])
size += 5
case tk_div:
case tk_mod:
size += gen_size(t[2])
size += gen_size(t[3])
size += 6
case tk_putc:
case tk_Printi:
case tk_Prints:
size += gen_size(t[2])
size += 5
case tk_if:
size += gen_size(t[2])+3
if t[3][1]!=tk_if then ?9/0 end if
integer truesize = gen_size(t[3][2])
integer falsesize = gen_size(t[3][3])
integer elsejmp = iff(falsesize=0?0:iff(falsesize>127?5:2))
integer mainjmp = iff(truesize+elsejmp>127?6:2)
size += mainjmp+truesize+elsejmp+falsesize
case tk_not:
size += gen_size(t[2])
size += 9
case tk_neg:
size += gen_size(t[2])
size += 4
else:
 ?9/0
end switch
end if
return size
end function
 
procedure gen_rec(object t)
-- the recursive part of code_gen
if t!=NULL then
integer initsize = length(code)
integer estsize = gen_size(t) -- (test the gen_size function)
integer n_type = t[1]
string node_type = tkNames[n_type]
switch n_type do
case tk_Sequence:
gen_rec(t[2])
gen_rec(t[3])
case tk_assign:
integer n = var_idx(t[2])
gen_rec(t[3])
code &= {0o217,0o005,chain,1,n,0} -- pop [i]
chain = length(code)-3
case tk_Integer:
integer n = t[2]
code &= 0o150&int_to_bytes(n) -- push imm32
case tk_while:
-- emit: @@:<condition><topjmp(@f)><body><tailjmp(@b)>@@:
integer looptop = length(code)
gen_rec(t[2])
code &= {0o130, -- pop eax
0o205,0o300} -- test eax,eax
integer bodysize = gen_size(t[3])
-- can we use short jumps?
-- disclaimer: size calcs are not heavily tested; if in
-- doubt reduce 128/7 by 8, and if that works
-- then yep, you just found a boundary case.
integer stail = iff(length(code)+bodysize+4-looptop>128?5:2)
integer offset = bodysize+stail
integer stop = iff(offset>127?6:2)
if stop=2 then
code &= {0o164,offset} -- jz (short) end
else
code &= {0o017,0o204}&int_to_bytes(offset) -- jz (long) end
end if
gen_rec(t[3])
offset = looptop-(length(code)+stail)
if stail=2 then
code &= 0o353&offset -- jmp looptop (short)
else
code &= 0o351&int_to_bytes(offset) -- jmp looptop (long)
end if
case tk_lt:
case tk_le:
case tk_gt:
case tk_ge:
case tk_ne:
case tk_eq:
gen_rec(t[2])
gen_rec(t[3])
integer xrm
if n_type=tk_ne then xrm = 0o225 -- (#95)
elsif n_type=tk_lt then xrm = 0o234 -- (#9C)
elsif n_type=tk_ge then xrm = 0o235 -- (#9D)
elsif n_type=tk_le then xrm = 0o236 -- (#9E)
elsif n_type=tk_gt then xrm = 0o237 -- (#9F)
else ?9/0
end if
code &= { 0o061,0o300, -- xor eax,eax
0o132, -- pop edx
0o131, -- pop ecx
0o071,0o321, -- cmp ecx,edx
0o017,xrm,0o300, -- setcc al
0o120} -- push eax
case tk_add:
case tk_or:
case tk_and:
case tk_sub:
gen_rec(t[2])
gen_rec(t[3])
integer op = find(n_type,{tk_add,tk_or,0,0,tk_and,tk_sub})
op = 0o001 + (op-1)*0o010
code &= { 0o130, -- pop eax
op,0o004,0o044} -- add/or/and/sub [esp],eax
case tk_mul:
gen_rec(t[2])
gen_rec(t[3])
code &= { 0o131, -- pop ecx
0o130, -- pop eax
0o367,0o341, -- mul ecx
0o120} -- push eax
case tk_div:
case tk_mod:
gen_rec(t[2])
gen_rec(t[3])
integer push = 0o120+(n_type=tk_mod)*2
code &= { 0o131, -- pop ecx
0o130, -- pop eax
0o231, -- cdq (eax -> edx:eax)
0o367,0o371, -- idiv ecx
push} -- push eax|edx
case tk_Identifier:
integer n = var_idx(t)
code &= {0o377,0o065,chain,1,n,0} -- push [n]
chain = length(code)-3
case tk_putc:
case tk_Printi:
case tk_Prints:
gen_rec(t[2])
integer n = find(n_type,{tk_putc,tk_Printi,tk_Prints})
code &= {0o350,chain,3,n,0} -- call :printc/i/s
chain = length(code)-3
case tk_String:
integer n = string_idx(t)
code &= {0o150,chain,2,n,0} -- push RawStringPtr(string)
chain = length(code)-3
case tk_if:
-- emit: <condition><mainjmp><truepart>[<elsejmp><falsepart>]
gen_rec(t[2])
code &= {0o130, -- pop eax
0o205,0o300} -- test eax,eax
if t[3][1]!=tk_if then ?9/0 end if
integer truesize = gen_size(t[3][2])
integer falsesize = gen_size(t[3][3])
integer elsejmp = iff(falsesize=0?0:iff(falsesize>127?5:2))
integer offset = truesize+elsejmp
integer mainjmp = iff(offset>127?6:2)
if mainjmp=2 then
code &= {0o164,offset} -- jz (short) else/end
else
code &= {0o017,0o204}&int_to_bytes(offset) -- jz (long) else/end
end if
gen_rec(t[3][2])
if falsesize!=0 then
offset = falsesize
if elsejmp=2 then
code &= 0o353&offset -- jmp end if (short)
else
code &= 0o351&int_to_bytes(offset) -- jmp end if (long)
end if
gen_rec(t[3][3])
end if
case tk_not:
gen_rec(t[2])
code &= {0o132, -- pop edx
0o061,0o300, -- xor eax,eax
0o205,0o322, -- test edx,edx
0o017,0o224,0o300, -- setz al
0o120} -- push eax
case tk_neg:
gen_rec(t[2])
code &= {0o130, -- pop eax
0o367,0o330, -- neg eax
0o120} -- push eax
else:
error("error in code generator - found %d, expecting operator\n", {n_type})
end switch
integer actsize = length(code)
if initsize+estsize!=actsize then ?"9/0" end if -- (test gen_size)
end if
end procedure
 
global procedure code_gen(object t)
--
-- Generates proper machine code.
--
-- Example: i=10; print "\n"; print i; print "\n"
-- Result in vars, strings, chain, code (declared above)
-- where vars is: {"i"},
-- strings is {"\n"},
-- code is { 0o150,#0A,#00,#00,#00, -- 1: push 10
-- 0o217,0o005,0,1,1,0 -- 6: pop [i]
-- 0o150,8,2,1,0, -- 12: push ("\n")
-- 0o350,13,3,3,0, -- 17: call :prints
-- 0o377,0o065,18,1,1,0, -- 22: push [i]
-- 0o350,24,3,2,0, -- 28: call :printi
-- 0o150,29,2,1,0, -- 33: push ("\n")
-- 0o350,34,3,3,0, -- 38: call :prints
-- 0o303} -- 43: ret
-- and chain is 39 (->34->29->24->18->13->8->0)
-- The chain connects all places where we need an actual address before
-- the code is executed, with the byte after the link differentiating
-- between var(1), string(2), and builtin(3), and the byte after that
-- determining the instance of the given type - not that any of them
-- are actually limited to a byte in the above intermediate form, and
-- of course the trailing 0 of each {link,type,id,0} is just there to
-- reserve the space we will need.
--
gen_rec(t)
code = append(code,0o303) -- ret (0o303=#C3)
end procedure
 
include builtins/VM/puts1.e -- low-level console i/o routines
 
function setbuiltins()
atom printc,printi,prints
#ilASM{
jmp :setbuiltins
 ::printc
lea edi,[esp+4]
mov esi,1
call :%puts1ediesi -- (edi=raw text, esi=length)
ret 4
 ::printi
mov eax,[esp+4]
push 0 -- no cr
call :%putsint -- (nb limited to +/-9,999,999,999)
ret 4
 ::prints
mov edi,[esp+4]
mov esi,[edi-12]
call :%puts1ediesi -- (edi=raw text, esi=length)
ret 4
 ::setbuiltins
mov eax,:printc
lea edi,[printc]
call :%pStoreMint
mov eax,:printi
lea edi,[printi]
call :%pStoreMint
mov eax,:prints
lea edi,[prints]
call :%pStoreMint
}
return {printc,printi,prints}
end function
 
global constant builtin_names = {"printc","printi","prints"}
global constant builtins = setbuiltins()
 
global atom var_mem, code_mem
 
function RawStringPtr(integer n) -- (based on IupRawStringPtr from pGUI.e)
--
-- Returns a raw string pointer for s, somewhat like allocate_string(s), but using the existing memory.
-- NOTE: The return is only valid as long as the value passed as the parameter remains in existence.
--
atom res
string s = strings[n]
#ilASM{
mov eax,[s]
lea edi,[res]
shl eax,2
call :%pStoreMint
}
stringptrs[n] = res
return res
end function
 
global procedure fixup()
var_mem = allocate(length(vars)*4)
mem_set(var_mem,0,length(vars)*4)
code_mem = allocate(length(code))
poke(code_mem,code)
while chain!=0 do
integer this = chain
chain = code[this]
integer ftype = code[this+1]
integer id = code[this+2]
switch ftype do
case 1: -- vars
poke4(code_mem+this-1,var_mem+(id-1)*4)
case 2: -- strings
poke4(code_mem+this-1,RawStringPtr(id))
case 3: -- builtins
poke4(code_mem+this-1,builtins[id]-(code_mem+this+3))
end switch
end while
end procedure

And a simple test driver for the specific task:

--
-- demo\rosetta\Compiler\cgen.exw
-- ==============================
--
-- Generates 32-bit machine code (see note in vm.exw)
--
 
include cgen.e
 
function get_var_name(atom addr)
integer n = (addr-var_mem)/4+1
if n<1 or n>length(vars) then ?9/0 end if
return vars[n]
end function
 
function hxl(integer pc, object oh, string fmt, sequence args={})
-- helper routine to display the octal/hex bytes just decoded,
-- along with the code offset and the human-readable text.
if length(args) then fmt = sprintf(fmt,args) end if
sequence octhex = {}
atom base = code_mem+pc
integer len = 0
if integer(oh) then -- all octal
for i=1 to oh do
octhex = append(octhex,sprintf("0o%03o",peek(base)))
base += 1
end for
len = oh
else -- some octal and some hex
for i=1 to length(oh) by 2 do
for j=1 to oh[i] do
octhex = append(octhex,sprintf("0o%03o",peek(base)))
base += 1
end for
len += oh[i]
for j=1 to oh[i+1] do
octhex = append(octhex,sprintf("#%02x",peek(base)))
base += 1
end for
len += oh[i+1]
end for
end if
printf(output_file,"%4d: %-30s %s\n",{pc+1,join(octhex,","),fmt})
return len
end function
 
constant cccodes = {"o?" ,"no?","b?" ,"ae?","z" ,"ne" ,"be?","a?",
-- 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ,
"s?" ,"ns?","pe?","po?","l" ,"ge" ,"le" ,"g" }
-- 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15
 
constant regs = {"eax","ecx","edx"} -- (others as/when needed)
 
procedure decode()
-- for a much more complete (and better organised) disassembler, see p2asm.e
integer pc = 0, -- nb 0-based
opcode, xrm
 
while pc<length(code) do
opcode = peek(code_mem+pc)
xrm = -1
switch opcode do
case 0o150:
atom vaddr = peek4s(code_mem+pc+1)
integer n = find(vaddr,stringptrs)
object arg = iff(n?enquote(strings[n])
 :sprintf("%d",vaddr))
pc += hxl(pc,{1,4},"push %s",{arg})
case 0o217:
case 0o377:
integer n = find(opcode,{0o217,0o377})
string op = {"pop","push"}[n]
xrm = peek(code_mem+pc+1)
if n!=find(xrm,{0o005,0o065}) then exit end if
atom addr = peek4u(code_mem+pc+2)
pc += hxl(pc,{2,4},"pop [%s]",{get_var_name(addr)})
case 0o061:
case 0o071:
case 0o205:
integer n = find(opcode,{0o061,0o071,0o205})
string op = {"xor","cmp","test"}[n]
xrm = peek(code_mem+pc+1)
if and_bits(xrm,0o300)!=0o300 then exit end if
string r1 = regs[and_bits(xrm,0o070)/0o010+1]
string r2 = regs[and_bits(xrm,0o007)+1]
pc += hxl(pc,2,"%s %s,%s",{op,r1,r2})
case 0o017:
xrm = peek(code_mem+pc+1)
switch xrm do
case 0o224:
case 0o225:
case 0o234:
case 0o235:
case 0o236:
case 0o237:
string cc = cccodes[and_bits(xrm,0o017)+1]
if peek(code_mem+pc+2)=0o300 then
pc += hxl(pc,3,"set%s al",{cc})
else
exit
end if
case 0o204:
integer offset = peek4s(code_mem+pc+2)
pc += hxl(pc,{2,4},"jz %d",{pc+6+offset+1})
else
exit
end switch
case 0o120:
case 0o122:
case 0o130:
case 0o131:
case 0o132:
string op = {"push","pop"}[find(and_bits(opcode,0o070),{0o020,0o030})]
string reg = regs[and_bits(opcode,0o007)+1]
pc += hxl(pc,1,"%s %s",{op,reg})
case 0o231:
pc += hxl(pc,1,"cdq")
case 0o164:
case 0o353:
string jop = iff(opcode=0o164?"jz":"jmp")
integer offset = peek1s(code_mem+pc+1)
pc += hxl(pc,{1,1},"%s %d",{jop,pc+2+offset+1})
case 0o351:
integer offset = peek4s(code_mem+pc+1)
pc += hxl(pc,{1,4},"jmp %d",{pc+5+offset+1})
case 0o303:
pc += hxl(pc,1,"ret")
case 0o350:
integer offset = peek4s(code_mem+pc+1)
atom addr = offset+code_mem+pc+5
integer n = find(addr,builtins)
pc += hxl(pc,{1,4},"call :%s",{builtin_names[n]})
case 0o001:
case 0o041:
case 0o051:
integer n = find(opcode,{0o001,0o041,0o051})
string op = {"add","and","sub"}[n]
xrm = peek(code_mem+pc+1)
switch xrm do
case 0o004:
if peek(code_mem+pc+2)=0o044 then
pc += hxl(pc,3,"%s [esp],eax",{op})
else
exit
end if
else
exit
end switch
case 0o367:
xrm = peek(code_mem+pc+1)
if and_bits(xrm,0o300)!=0o300 then exit end if
integer n = find(and_bits(xrm,0o070),{0o030,0o040,0o070})
if n=0 then exit end if
string op = {"neg","mul","idiv"}[n]
string reg = regs[and_bits(xrm,0o007)+1]
pc += hxl(pc,2,"%s %s",{op,reg})
else
exit
end switch
end while
if pc<length(code) then
 ?"incomplete:"
if xrm=-1 then
 ?{pc+1,sprintf("0o%03o",opcode)}
else
 ?{pc+1,sprintf("0o%03o 0o%03o",{opcode,xrm})}
end if
end if
end procedure
 
procedure main(sequence cl)
open_files(cl)
toks = lex()
object t = parse()
code_gen(t)
fixup()
decode()
free({var_mem,code_mem})
close_files()
end procedure
 
--main(command_line())
main({0,0,"gcd.c"})
Output:
   1: 0o150,#2F,#04,#00,#00          push 1071
   6: 0o217,0o005,#70,#BE,#73,#00    pop [a]
  12: 0o150,#05,#04,#00,#00          push 1029
  17: 0o217,0o005,#74,#BE,#73,#00    pop [b]
  23: 0o377,0o065,#74,#BE,#73,#00    push [b]
  29: 0o150,#00,#00,#00,#00          push 0
  34: 0o061,0o300                    xor eax,eax
  36: 0o132                          pop edx
  37: 0o131                          pop ecx
  38: 0o071,0o321                    cmp edx,ecx
  40: 0o017,0o225,0o300              setne al
  43: 0o120                          push eax
  44: 0o130                          pop eax
  45: 0o205,0o300                    test eax,eax
  47: 0o164,#32                      jz 99
  49: 0o377,0o065,#74,#BE,#73,#00    push [b]
  55: 0o217,0o005,#78,#BE,#73,#00    pop [new_a]
  61: 0o377,0o065,#70,#BE,#73,#00    push [a]
  67: 0o377,0o065,#74,#BE,#73,#00    push [b]
  73: 0o131                          pop ecx
  74: 0o130                          pop eax
  75: 0o231                          cdq
  76: 0o367,0o371                    idiv ecx
  78: 0o122                          push edx
  79: 0o217,0o005,#74,#BE,#73,#00    pop [b]
  85: 0o377,0o065,#78,#BE,#73,#00    push [new_a]
  91: 0o217,0o005,#70,#BE,#73,#00    pop [a]
  97: 0o353,#B4                      jmp 23
  99: 0o377,0o065,#70,#BE,#73,#00    push [a]
 105: 0o350,#2F,#49,#0B,#00          call :printi
 110: 0o303                          ret

Python[edit]

Tested with Python 2.7 and 3.x

from __future__ import print_function
import sys, struct, shlex, operator
 
nd_Ident, nd_String, nd_Integer, nd_Sequence, nd_If, nd_Prtc, nd_Prts, nd_Prti, nd_While, \
nd_Assign, nd_Negate, nd_Not, nd_Mul, nd_Div, nd_Mod, nd_Add, nd_Sub, nd_Lss, nd_Leq, \
nd_Gtr, nd_Geq, nd_Eql, nd_Neq, nd_And, nd_Or = range(25)
 
all_syms = {
"Identifier"  : nd_Ident, "String"  : nd_String,
"Integer"  : nd_Integer, "Sequence"  : nd_Sequence,
"If"  : nd_If, "Prtc"  : nd_Prtc,
"Prts"  : nd_Prts, "Prti"  : nd_Prti,
"While"  : nd_While, "Assign"  : nd_Assign,
"Negate"  : nd_Negate, "Not"  : nd_Not,
"Multiply"  : nd_Mul, "Divide"  : nd_Div,
"Mod"  : nd_Mod, "Add"  : nd_Add,
"Subtract"  : nd_Sub, "Less"  : nd_Lss,
"LessEqual"  : nd_Leq, "Greater"  : nd_Gtr,
"GreaterEqual": nd_Geq, "Equal"  : nd_Eql,
"NotEqual"  : nd_Neq, "And"  : nd_And,
"Or"  : nd_Or}
 
FETCH, STORE, PUSH, ADD, SUB, MUL, DIV, MOD, LT, GT, LE, GE, EQ, NE, AND, OR, NEG, NOT, \
JMP, JZ, PRTC, PRTS, PRTI, HALT = range(24)
 
operators = {nd_Lss: LT, nd_Gtr: GT, nd_Leq: LE, nd_Geq: GE, nd_Eql: EQ, nd_Neq: NE,
nd_And: AND, nd_Or: OR, nd_Sub: SUB, nd_Add: ADD, nd_Div: DIV, nd_Mul: MUL, nd_Mod: MOD}
 
unary_operators = {nd_Negate: NEG, nd_Not: NOT}
 
input_file = None
code = bytearray()
string_pool = {}
globals = {}
string_n = 0
globals_n = 0
word_size = 4
 
#*** show error and exit
def error(msg):
print("%s" % (msg))
exit(1)
 
def int_to_bytes(val):
return struct.pack("<i", val)
 
def bytes_to_int(bstr):
return struct.unpack("<i", bstr)
 
class Node:
def __init__(self, node_type, left = None, right = None, value = None):
self.node_type = node_type
self.left = left
self.right = right
self.value = value
 
#***
def make_node(oper, left, right = None):
return Node(oper, left, right)
 
#***
def make_leaf(oper, n):
return Node(oper, value = n)
 
#***
def emit_byte(x):
code.append(x)
 
#***
def emit_word(x):
s = int_to_bytes(x)
for x in s:
code.append(x)
 
def emit_word_at(at, n):
code[at:at+word_size] = int_to_bytes(n)
 
def hole():
t = len(code)
emit_word(0)
return t
 
#***
def fetch_var_offset(name):
global globals_n
 
n = globals.get(name, None)
if n == None:
globals[name] = globals_n
n = globals_n
globals_n += 1
return n
 
#***
def fetch_string_offset(the_string):
global string_n
 
n = string_pool.get(the_string, None)
if n == None:
string_pool[the_string] = string_n
n = string_n
string_n += 1
return n
 
#***
def code_gen(x):
if x == None: return
elif x.node_type == nd_Ident:
emit_byte(FETCH)
n = fetch_var_offset(x.value)
emit_word(n)
elif x.node_type == nd_Integer:
emit_byte(PUSH)
emit_word(x.value)
elif x.node_type == nd_String:
emit_byte(PUSH)
n = fetch_string_offset(x.value)
emit_word(n)
elif x.node_type == nd_Assign:
n = fetch_var_offset(x.left.value)
code_gen(x.right)
emit_byte(STORE)
emit_word(n)
elif x.node_type == nd_If:
code_gen(x.left) # expr
emit_byte(JZ) # if false, jump
p1 = hole() # make room for jump dest
code_gen(x.right.left) # if true statements
if (x.right.right != None):
emit_byte(JMP) # jump over else statements
p2 = hole()
emit_word_at(p1, len(code) - p1)
if (x.right.right != None):
code_gen(x.right.right) # else statements
emit_word_at(p2, len(code) - p2)
elif x.node_type == nd_While:
p1 = len(code)
code_gen(x.left)
emit_byte(JZ)
p2 = hole()
code_gen(x.right)
emit_byte(JMP) # jump back to the top
emit_word(p1 - len(code))
emit_word_at(p2, len(code) - p2)
elif x.node_type == nd_Sequence:
code_gen(x.left)
code_gen(x.right)
elif x.node_type == nd_Prtc:
code_gen(x.left)
emit_byte(PRTC)
elif x.node_type == nd_Prti:
code_gen(x.left)
emit_byte(PRTI)
elif x.node_type == nd_Prts:
code_gen(x.left)
emit_byte(PRTS)
elif x.node_type in operators:
code_gen(x.left)
code_gen(x.right)
emit_byte(operators[x.node_type])
elif x.node_type in unary_operators:
code_gen(x.left)
emit_byte(unary_operators[x.node_type])
else:
error("error in code generator - found %d, expecting operator" % (x.node_type))
 
#***
def code_finish():
emit_byte(HALT)
 
#***
def list_code():
print("Datasize: %d Strings: %d" % (len(globals), len(string_pool)))
 
for k in sorted(string_pool, key=string_pool.get):
print(k)
 
pc = 0
while pc < len(code):
print("%4d " % (pc), end='')
op = code[pc]
pc += 1
if op == FETCH:
x = bytes_to_int(code[pc:pc+word_size])[0]
print("fetch [%d]" % (x));
pc += word_size
elif op == STORE:
x = bytes_to_int(code[pc:pc+word_size])[0]
print("store [%d]" % (x));
pc += word_size
elif op == PUSH:
x = bytes_to_int(code[pc:pc+word_size])[0]
print("push  %d" % (x));
pc += word_size
elif op == ADD: print("add")
elif op == SUB: print("sub")
elif op == MUL: print("mul")
elif op == DIV: print("div")
elif op == MOD: print("mod")
elif op == LT: print("lt")
elif op == GT: print("gt")
elif op == LE: print("le")
elif op == GE: print("ge")
elif op == EQ: print("eq")
elif op == NE: print("ne")
elif op == AND: print("and")
elif op == OR: print("or")
elif op == NEG: print("neg")
elif op == NOT: print("not")
elif op == JMP:
x = bytes_to_int(code[pc:pc+word_size])[0]
print("jmp (%d) %d" % (x, pc + x));
pc += word_size
elif op == JZ:
x = bytes_to_int(code[pc:pc+word_size])[0]
print("jz (%d) %d" % (x, pc + x));
pc += word_size
elif op == PRTC: print("prtc")
elif op == PRTI: print("prti")
elif op == PRTS: print("prts")
elif op == HALT: print("halt")
else: error("list_code: Unknown opcode %d", (op));
 
def load_ast():
line = input_file.readline()
line_list = shlex.split(line, False, False)
 
text = line_list[0]
if text == ";":
return None
node_type = all_syms[text]
 
if len(line_list) > 1:
value = line_list[1]
if value.isdigit():
value = int(value)
return make_leaf(node_type, value)
 
left = load_ast()
right = load_ast()
return make_node(node_type, left, right)
 
#*** main driver
input_file = sys.stdin
if len(sys.argv) > 1:
try:
input_file = open(sys.argv[1], "r", 4096)
except IOError as e:
error("Can't open %s" % sys.argv[1])
 
n = load_ast()
code_gen(n)
code_finish()
list_code()
Output  —  While counter example:

Datasize: 1 Strings: 2
"count is: "
"\n"
   0 push  1
   5 store [0]
  10 fetch [0]
  15 push  10
  20 lt
  21 jz     (43) 65
  26 push  0
  31 prts
  32 fetch [0]
  37 prti
  38 push  1
  43 prts
  44 fetch [0]
  49 push  1
  54 add
  55 store [0]
  60 jmp    (-51) 10
  65 halt

Scheme[edit]

 
(import (scheme base)
(scheme file)
(scheme process-context)
(scheme write)
(only (srfi 1) delete-duplicates list-index)
(only (srfi 13) string-delete string-index string-trim))
 
(define *names* '((Add add) (Subtract sub) (Multiply mul) (Divide div) (Mod mod)
(Less lt) (Greater gt) (LessEqual le) (GreaterEqual ge)
(Equal eq) (NotEqual ne) (And and) (Or or) (Negate neg)
(Not not) (Prts prts) (Prti prti) (Prtc prtc)))
 
(define (change-name name)
(if (assq name *names*)
(cdr (assq name *names*))
(error "Cannot find name" name)))
 
;; Read AST from given filename
;; - return as an s-expression
(define (read-code filename)
(define (read-expr)
(let ((line (string-trim (read-line))))
(if (string=? line ";")
'()
(let ((space (string-index line #\space)))
(if space
(list (string->symbol (string-trim (substring line 0 space)))
(string-trim (substring line space (string-length line))))
(list (string->symbol line) (read-expr) (read-expr)))))))
;
(with-input-from-file filename (lambda () (read-expr))))
 
;; run a three-pass assembler
(define (generate-code ast)
(define new-address ; create a new unique address - for jump locations
(let ((count 0))
(lambda ()
(set! count (+ 1 count))
(string->symbol (string-append "loc-" (number->string count))))))
; define some names for fields
(define left cadr)
(define right (lambda (x) (cadr (cdr x))))
;
(define (extract-values ast)
(if (null? ast)
(values '() '())
(case (car ast)
((Integer)
(values '() '()))
((Negate Not Prtc Prti Prts)
(extract-values (left ast)))
((Assign Add Subtract Multiply Divide Mod Less Greater LessEqual GreaterEqual
Equal NotEqual And Or If While Sequence)
(let-values (((a b) (extract-values (left ast)))
((c d) (extract-values (right ast))))
(values (delete-duplicates (append a c) string=?)
(delete-duplicates (append b d) string=?))))
((String)
(values '() (list (left ast))))
((Identifier)
(values (list (left ast)) '())))))
;
(let-values (((constants strings) (extract-values ast)))
(define (constant-idx term)
(list-index (lambda (s) (string=? s term)) constants))
(define (string-idx term)
(list-index (lambda (s) (string=? s term)) strings))
;
(define (pass-1 ast asm) ; translates ast into a list of basic operations
(if (null? ast)
asm
(case (car ast)
((Integer)
(cons (list 'push (left ast)) asm))
((Identifier)
(cons (list 'fetch (constant-idx (left ast))) asm))
((String)
(cons (list 'push (string-idx (left ast))) asm))
((Assign)
(cons (list 'store (constant-idx (left (left ast)))) (pass-1 (right ast) asm)))
((Add Subtract Multiply Divide Mod Less Greater LessEqual GreaterEqual
Equal NotEqual And Or) ; binary operators
(cons (change-name (car ast))
(pass-1 (right ast) (pass-1 (left ast) asm))))
((Negate Not Prtc Prti Prts) ; unary operations
(cons (change-name (car ast))
(pass-1 (left ast) asm)))
((If)
(let ((label-else (new-address))
(label-end (new-address)))
(if (null? (right (right ast)))
(cons (list 'label label-end) ; label for end of if statement
(pass-1 (left (right ast)) ; output the 'then block
(cons (list 'jz label-end) ; jump to end when test is false
(pass-1 (left ast) asm))))
(cons (list 'label label-end) ; label for end of if statement
(pass-1 (right (right ast)) ; output the 'else block
(cons (list 'label label-else)
(cons (list 'jmp label-end) ; jump past 'else, after 'then
(pass-1 (left (right ast)) ; output the 'then block
(cons (list 'jz label-else) ; jumpt to else when false
(pass-1 (left ast) asm))))))))))
((While)
(let ((label-test (new-address))
(label-end (new-address)))
(cons (list 'label label-end) ; introduce a label for end of while block
(cons (list 'jmp label-test) ; jump back to repeat test
(pass-1 (right ast) ; output the block
(cons (list 'jz label-end) ; test failed, jump around block
(pass-1 (left ast) ; output the test
(cons (list 'label label-test) ; introduce a label for test
asm))))))))
((Sequence)
(pass-1 (right ast) (pass-1 (left ast) asm)))
(else
"Unknown token type"))))
;
(define (pass-2 asm) ; adds addresses and fills in jump locations
(define (fill-addresses)
(let ((addr 0))
(map (lambda (instr)
(let ((res (cons addr instr)))
(unless (eq? (car instr) 'label)
(set! addr (+ addr (if (= 1 (length instr)) 1 5))))
res))
asm)))
;
(define (extract-labels asm)
(let ((labels '()))
(for-each (lambda (instr)
(when (eq? (cadr instr) 'label)
(set! labels (cons (cons (cadr (cdr instr)) (car instr))
labels))))
asm)
labels))
;
(define (add-jump-locations asm labels rec)
(cond ((null? asm)
(reverse rec))
((eq? (cadr (car asm)) 'label) ; ignore the labels
(add-jump-locations (cdr asm) labels rec))
((memq (cadr (car asm)) '(jmp jz)) ; replace labels with addresses for jumps
(add-jump-locations (cdr asm)
labels
(cons (list (car (car asm)) ; previous address
(cadr (car asm)) ; previous jump type
(cdr (assq (cadr (cdar asm)) labels))) ; actual address
rec)))
(else
(add-jump-locations (cdr asm) labels (cons (car asm) rec)))))
;
(let ((asm+addr (fill-addresses)))
(add-jump-locations asm+addr (extract-labels asm+addr) '())))
;
(define (output-instruction instr)
(display (number->string (car instr))) (display #\tab)
(display (cadr instr)) (display #\tab)
(case (cadr instr)
((fetch store)
(display "[") (display (number->string (cadr (cdr instr)))) (display "]\n"))
((jmp jz)
(display
(string-append "("
(number->string (- (cadr (cdr instr)) (car instr) 1))
")"))
(display #\tab)
(display (number->string (cadr (cdr instr)))) (newline))
((push)
(display (cadr (cdr instr))) (newline))
(else
(newline))))
; generate the code and output to stdout
(display
(string-append "Datasize: "
(number->string (length constants))
" Strings: "
(number->string (length strings))))
(newline)
(for-each (lambda (str) (display str) (newline))
strings)
(for-each output-instruction
(pass-2 (reverse (cons (list 'halt) (pass-1 ast '())))))))
 
;; read AST from file and output code to stdout
(if (= 2 (length (command-line)))
(generate-code (read-code (cadr (command-line))))
(display "Error: pass an ast filename\n"))
 

Tested on all examples in Compiler/Sample programs.

zkl[edit]

Translation of: Python
// This is a little endian machine
 
const WORD_SIZE=4;
const{ var _n=-1; var[proxy]N=fcn{ _n+=1 }; } // enumerator
const FETCH=N, STORE=N, PUSH=N, ADD=N, SUB=N, MUL=N, DIV=N, MOD=N,
LT=N, GT=N, LE=N, GE=N, EQ=N, NE=N,
AND=N, OR=N, NEG=N, NOT=N,
JMP=N, JZ=N, PRTC=N, PRTS=N, PRTI=N, HALT=N;
const nd_String=N, nd_Sequence=N, nd_If=N, nd_While=N;
var all_syms=Dictionary(
"Identifier" ,FETCH, "String" ,nd_String,
"Integer" ,PUSH, "Sequence" ,nd_Sequence,
"If" ,nd_If, "Prtc" ,PRTC,
"Prts" ,PRTS, "Prti" ,PRTI,
"While" ,nd_While, "Assign" ,STORE,
"Negate" ,NEG, "Not" ,NOT,
"Multiply" ,MUL, "Divide" ,DIV,
"Mod" ,MOD, "Add" ,ADD,
"Subtract" ,SUB, "Less" ,LT,
"LessEqual" ,LE, "Greater" ,GT,
"GreaterEqual",GE, "Equal" ,EQ,
"NotEqual" ,NE, "And" ,AND,
"Or" ,OR, "halt" ,HALT);
var binOps=T(LT,GT,LE,GE,EQ,NE, AND,OR, SUB,ADD,DIV,MUL,MOD),
unaryOps=T(NEG,NOT);
 
class Node{
fcn init(_node_type, _value, _left=Void, _right=Void){
var type=_node_type, left=_left, right=_right, value=_value;
}
}
 
var vars=Dictionary(), strings=Dictionary(); // ( value:offset, ...)
fcn doVar(value){
var offset=-1; // fcn local static var
offset=_doValue(value,vars,offset)
}
fcn doString(str){ str=str[1,-1]; // str is \"text\"
var offset=-1; // fcn local static var
str=str.replace("\\n","\n");
offset=_doValue(str,strings,offset)
}
fcn _doValue(value,vars,offset){ //--> offset of value in vars
if(Void!=(n:=vars.find(value))) return(n); // fetch existing value
vars[value]=offset+=1; // store new value
}
 
fcn asm(node,code){
if(Void==node) return(code);
emitB:='wrap(n){ code.append(n) };
emitW:='wrap(n){ code.append(n.toLittleEndian(WORD_SIZE)) }; // signed
switch(node.type){
case(FETCH) { emitB(FETCH); emitW(doVar(node.value)); }
case(PUSH) { emitB(PUSH); emitW(node.value); }
case(nd_String){ emitB(PUSH); emitW(doString(node.value)); }
case(STORE){
asm(node.right,code);
emitB(STORE); emitW(doVar(node.left.value));
}
case(nd_If){
asm(node.left,code); # expr
emitB(JZ); # if false, jump
p1,p2 := code.len(),0;
emitW(0); # place holder for jump dest
asm(node.right.left,code); # if true statements
if (node.right.right!=Void){
emitB(JMP); # jump over else statements
p2=code.len();
emitW(0);
}
code[p1,WORD_SIZE]=(code.len() - p1).toLittleEndian(WORD_SIZE);
if(node.right.right!=Void){
asm(node.right.right,code); # else statements
code[p2,WORD_SIZE]=(code.len() - p2).toLittleEndian(WORD_SIZE)
}
}
case(nd_While){
p1:=code.len();
asm(node.left,code);
emitB(JZ);
p2:=code.len();
emitW(0); # place holder
asm(node.right,code);
emitB(JMP); # jump back to the top
emitW(p1 - code.len());
code[p2,WORD_SIZE]=(code.len() - p2).toLittleEndian(WORD_SIZE);
}
case(nd_Sequence){ asm(node.left,code); asm(node.right,code); }
case(PRTC,PRTI,PRTS){ asm(node.left,code); emitB(node.type); }
else{
if(binOps.holds(node.type)){
asm(node.left,code); asm(node.right,code);
emitB(node.type);
}
else if(unaryOps.holds(node.type))
{ asm(node.left,code); emitB(node.type); }
else throw(Exception.AssertionError(
"error in code generator - found %d, expecting operator"
.fmt(node.type)))
}
}
code
}
fcn code_finish(code){
code.append(HALT);
// prepend the strings to the code,
// using my magic [66,1 byte len,text], no trailing '\0' needed
idxs:=strings.pump(Dictionary(),"reverse");
idxs.keys.sort().reverse().pump(Void,'wrap(n){
text:=idxs[n];
code.insert(0,66,text.len(),text);
})
}
fcn unasm(code){
all_ops,nthString := all_syms.pump(Dictionary(),"reverse"),-1;
println("Datasize: %d bytes, Strings: %d bytes"
.fmt(vars.len()*WORD_SIZE,strings.reduce(fcn(s,[(k,v)]){ s+k.len() },0)));
word:='wrap(pc){ code.toLittleEndian(pc,WORD_SIZE,False) }; // signed
pc:=0; while(pc<code.len()){
op:=code[pc]; print("%4d: %2d ".fmt(pc,op));
pc+=1;
switch(op){
case(66){
n,str := code[pc], code[pc+=1,n].text;
println("String #%d %3d \"%s\"".fmt(nthString+=1,n,
Compiler.Asm.quotify(str)));
pc+=n;
}
case(FETCH,STORE,PUSH){
println("%s [%d]".fmt(all_ops[op],word(pc)));
pc+=WORD_SIZE;
}
case(ADD,SUB,MUL,DIV,MOD,LT,GT,LE,GE,EQ,NE,AND,OR,NEG,NOT,
PRTC,PRTI,PRTS,HALT){ println(all_ops[op]) }
case(JMP){
n:=word(pc);
println("jmp (%d) %d".fmt(n, pc + n));
pc+=WORD_SIZE;
}
case(JZ){
n:=word(pc);
println("jz (%d) %d".fmt(n, pc + n));
pc+=WORD_SIZE;
}
else throw(Exception.AssertionError("Unknown opcode %d".fmt(op)));
}
}
}
fcn load_ast(file){
line:=file.readln().strip(); // one or two tokens
if(line[0]==";") return(Void);
parts,type,value := line.split(),parts[0],parts[1,*].concat(" ");
type=all_syms[type];
if(value){
try{ value=value.toInt() }catch{}
return(Node(type,value));
}
left,right := load_ast(file),load_ast(file);
Node(type,Void,left,right)
}
ast:=load_ast(File(vm.nthArg(0)));
code:=asm(ast,Data());
code_finish(code);
unasm(code);
File("code.bin","wb").write(code);
println("Wrote %d bytes to code.bin".fmt(code.len()));

File ast.txt is the text at the start of this task.

Output:
$ zkl codeGen.zkl ast.txt 
Datasize: 4 bytes, Strings: 11 bytes
   0: 66 String #0  10 "\ncount is:"
  12: 66 String #1   1 "\n"
  15:  2 Integer [1]
  20:  1 Assign [0]
  25:  0 Identifier [0]
  30:  2 Integer [10]
  35:  8 LessEqual
  36: 19 jz     (43) 80
  41:  2 Integer [0]
  46: 21 Prts
  47:  0 Identifier [0]
  52: 22 Prti
  53:  2 Integer [1]
  58: 21 Prts
  59:  0 Identifier [0]
  64:  2 Integer [1]
  69:  3 Add
  70:  1 Assign [0]
  75: 18 jmp    (-51) 25
  80: 23 halt
Wrote 81 bytes to code.bin

$ zkl hexDump code1.bin 
   0: 42 0a 63 6f 75 6e 74 20 | 69 73 3a 20 42 01 0a 02   B.count is: B...
  16: 01 00 00 00 01 00 00 00 | 00 00 00 00 00 00 02 0a   ................
  32: 00 00 00 08 13 2b 00 00 | 00 02 00 00 00 00 15 00   .....+..........
  48: 00 00 00 00 16 02 01 00 | 00 00 15 00 00 00 00 00   ................
  64: 02 01 00 00 00 03 01 00 | 00 00 00 12 cd ff ff ff   ................
  80: 17