CloudFlare suffered a massive security issue affecting all of its customers, including Rosetta Code. All passwords not changed since February 19th 2017 have been expired, and session cookie longevity will be reduced until late March.--Michael Mol (talk) 05:15, 25 February 2017 (UTC)

Compiler/virtual machine interpreter

From Rosetta Code
Task
Compiler/virtual machine interpreter
You are encouraged to solve this task according to the task description, using any language you may know.
Virtual Machine Interpreter

A virtual machine implements a computer in software.

Task[edit]

Write a virtual machine interpreter. This interpreter should be able to run virtual assembly language programs created via the task. This is a byte-coded, 32-bit word stack based virtual machine.

The program should read input from a file and/or stdin, and write output to a file and/or stdout.

Input format:

Given the following program:

count = 1;
while (count < 10) {
    print("count is: ", count, "\n");
    count = count + 1;
}

The output from the Code generator is a virtual assembly code program:

Output from gen, input to VM
Datasize: 1 Strings: 2
"count is: "
"\n"
    0 push  1
    5 store [0]
   10 fetch [0]
   15 push  10
   20 lt
   21 jz     (43) 65
   26 push  0
   31 prts
   32 fetch [0]
   37 prti
   38 push  1
   43 prts
   44 fetch [0]
   49 push  1
   54 add
   55 store [0]
   60 jmp    (-51) 10
   65 halt

The first line of the input specifies the datasize required and the number of constant strings, in the order that they are reference via the code.

The data can be stored in a separate array, or the data can be stored at the beginning of the stack. Data is addressed starting at 0. If there are 3 variables, the 3rd one if referenced at address 2.

If there are one or more constant strings, they come next. The code refers to these strings by their index. The index starts at 0. So if there are 3 strings, and the code wants to reference the 3rd string, 2 will be used.

Next comes the actual virtual assembly code. The first number is the code address of that instruction. After that is the instruction mnemonic, followed by optional operands, depending on the instruction.

Registers:

sp:

   the stack pointer - points to the next top of stack.  The stack is a 32-bit integer
   array.

pc:

   the program counter - points to the current instruction to be performed.  The code is an
   array of bytes.

Data:

   data
   string pool

Instructions:

Each instruction is one byte. The following instructions also have a 32-bit integer operand:

fetch [index]

where index is an index into the data array.

store [index]

where index is an index into the data array.

push n

where value is a 32-bit integer that will be pushed onto the stack.

jmp (n) addr

where (n) is a 32-bit integer specifying the distance between the current location and the desired location. addr is an unsigned value of the actual code address.

jz (n) addr

where (n) is a 32-bit integer specifying the distance between the current location and the desired location. addr is an unsigned value of the actual code address.

The following instructions do not have an operand. They perform their operation directly against the stack:

For the following instructions, the operation is performed against the top two entries in the stack:

add
sub
mul
div
mod
lt
gt
le
ge
eq
ne
and
or

For the following instructions, the operation is performed against the top entry in the stack:

neg
not

Print the word at stack top as a character.

prtc

Print the word at stack top as an integer.

prti

Stack top points to an index into the string pool. Print that entry.

prts

Unconditional stop.

halt
A simple example virtual machine
def run_vm(data_size)
int stack[data_size + 1000]
set stack[0..data_size - 1] to 0
int pc = 0
while True:
op = code[pc]
pc += 1
 
if op == FETCH:
stack.append(stack[bytes_to_int(code[pc:pc+word_size])[0]]);
pc += word_size
elif op == STORE:
stack[bytes_to_int(code[pc:pc+word_size])[0]] = stack.pop();
pc += word_size
elif op == PUSH:
stack.append(bytes_to_int(code[pc:pc+word_size])[0]);
pc += word_size
elif op == ADD: stack[-2] += stack[-1]; stack.pop()
elif op == SUB: stack[-2] -= stack[-1]; stack.pop()
elif op == MUL: stack[-2] *= stack[-1]; stack.pop()
elif op == DIV: stack[-2] /= stack[-1]; stack.pop()
elif op == MOD: stack[-2] %= stack[-1]; stack.pop()
elif op == LT: stack[-2] = stack[-2] < stack[-1]; stack.pop()
elif op == GT: stack[-2] = stack[-2] > stack[-1]; stack.pop()
elif op == LE: stack[-2] = stack[-2] <= stack[-1]; stack.pop()
elif op == GE: stack[-2] = stack[-2] >= stack[-1]; stack.pop()
elif op == EQ: stack[-2] = stack[-2] == stack[-1]; stack.pop()
elif op == NE: stack[-2] = stack[-2] != stack[-1]; stack.pop()
elif op == AND: stack[-2] = stack[-2] and stack[-1]; stack.pop()
elif op == OR: stack[-2] = stack[-2] or stack[-1]; stack.pop()
elif op == NEG: stack[-1] = -stack[-1]
elif op == NOT: stack[-1] = not stack[-1]
elif op == JMP: pc += bytes_to_int(code[pc:pc+word_size])[0]
elif op == JZ: if stack.pop() then pc += word_size else pc += bytes_to_int(code[pc:pc+word_size])[0]
elif op == PRTC: print stack[-1] as a character; stack.pop()
elif op == PRTS: print the constant string referred to by stack[-1]; stack.pop()
elif op == PRTI: print stack[-1] as an integer; stack.pop()
elif op == HALT: break
Additional examples

Your solution should pass all the test cases above and the additional tests found Here.

Reference

The C and Python versions can be considered reference implementations.

Related Tasks

ALGOL W[edit]

begin % virtual machine interpreter %
 % string literals %
string(256) array stringValue ( 0 :: 256 );
integer array stringLength ( 0 :: 256 );
integer MAX_STRINGS;
 % op codes %
integer oFetch, oStore, oPush
, oAdd, oSub, oMul, oDiv, oMod, oLt, oGt, oLe, oGe, oEq, oNe
, oAnd, oOr, oNeg, oNot, oJmp, oJz, oPrtc, oPrts, oPrti, oHalt
 ;
string(6) array opName ( 1 :: 24 );
integer OP_MAX;
 % code %
string(1) array byteCode ( 0 :: 4096 );
integer nextLocation, MAX_LOCATION;
 % data %
integer array data ( 0 :: 4096 );
integer dataSize, MAX_DATA, MAX_STACK;
 % tracing %
logical trace;
 
 % reports an error and stops %
procedure rtError( string(80) value message ); begin
integer errorPos;
write( s_w := 0, "**** Runtime error: " );
errorPos := 0;
while errorPos < 80 and message( errorPos // 1 ) not = "." do begin
writeon( s_w := 0, message( errorPos // 1 ) );
errorPos := errorPos + 1
end while_not_at_end_of_message ;
writeon( s_w := 0, "." );
assert( false )
end genError ;
 
oFetch := 1; opName( oFetch ) := "fetch"; oStore := 2; opName( oStore ) := "store"; oPush := 3; opName( oPush ) := "push";
oAdd  := 4; opName( oAdd ) := "add"; oSub  := 5; opName( oSub ) := "sub"; oMul  := 6; opName( oMul ) := "mul";
oDiv  := 7; opName( oDiv ) := "div"; oMod  := 8; opName( oMod ) := "mod"; oLt  := 9; opName( oLt ) := "lt";
oGt  := 10; opName( oGt ) := "gt"; oLe  := 11; opName( oLe ) := "le"; oGe  := 12; opName( oGe ) := "ge";
oEq  := 13; opName( oEq ) := "eq"; oNe  := 14; opName( oNe ) := "ne"; oAnd  := 15; opName( oAnd ) := "and";
oOr  := 16; opName( oOr ) := "or"; oNeg  := 17; opName( oNeg ) := "neg"; oNot  := 18; opName( oNot ) := "not";
oJmp  := 19; opName( oJmp ) := "jmp"; oJz  := 20; opName( oJz ) := "jz"; oPrtc := 21; opName( oPrtc ) := "prtc";
oPrts  := 22; opName( oPrts ) := "prts"; oPrti  := 23; opName( oPrti ) := "prti"; oHalt := 24; opName( oHalt ) := "halt";
OP_MAX := oHalt;
 
trace  := false;
MAX_STACK  := 256;
MAX_LOCATION := 4096;
for pc := 0 until MAX_LOCATION do byteCode( pc ) := code( 0 );
MAX_DATA := 4096;
for dPos := 0 until MAX_DATA do data( dPos ) := 0;
MAX_STRINGS := 256;
for sPos := 0 until MAX_STRINGS do begin
stringValue( sPos ) := " ";
stringLength( sPos ) := 0
end for_sPos ;
 
 % load thge output from syntaxc analyser %
begin % readCode %
 
 % skips spaces on the source line %
procedure skipSpaces ; begin
while line( lPos // 1 ) = " " do lPos := lPos + 1
end skipSpaces ;
 
 % parses a string from line and stores it in the string literals table %
procedure readString ( integer value stringNumber ) ; begin
string(256) str;
integer sLen;
str  := " ";
sLen := 0;
lPos := lPos + 1; % skip the opening double-quote %
while lPos <= 255 and line( lPos // 1 ) not = """" do begin
str( sLen // 1 ) := line( lPos // 1 );
sLen := sLen + 1;
lPos := lPos + 1
end while_more_string ;
if lPos > 255 then rtError( "Unterminated String." );
 % store the string %
stringValue( stringNumber ) := str;
stringLength( stringNumber ) := sLen
end readString ;
 
 % gets an integer from the line - checks for valid digits %
integer procedure readInteger ; begin
integer n;
skipSpaces;
n := 0;
while line( lPos // 1 ) >= "0" and line( lPos // 1 ) <= "9" do begin
n  := ( n * 10 ) + ( decode( line( lPos // 1 ) ) - decode( "0" ) );
lPos := lPos + 1
end while_not_end_of_integer ;
n
end readInteger ;
 
 % reads the next line from standard input %
procedure readALine ; begin
lPos := 0;
readcard( line );
if trace then write( s_w := 0, ">> ", line( 0 // 32 ) )
end readALine ;
 
 % loads an instruction from the current source line %
procedure loadCodeFromLine ; begin
integer pc, opCode, operand, oPos;
string(256) op;
logical haveOperand;
 % get the code location %
pc := readInteger;
if pc > MAX_LOCATION then rtError( "Code too large." );
 % get the opCode %
skipSpaces;
oPos := 0;
op := " ";
while lPos <= 255 and line( lPos // 1 ) not = " " do begin
op( oPos // 1 ) := line( lPos // 1 );
oPos := oPos + 1;
lPos := lPos + 1
end while_more_opName ;
 % lookup the op code %
opCode := 0;
oPos  := 1;
while oPos <= OP_MAX and opCode = 0 do begin
if opName( oPos ) = op then opCode := oPos
else oPos  := oPos + 1
end while_op_not_found ;
if opCode = 0 then rtError( "Unknown op code." );
 % get the operand if there is one %
operand  := 0;
haveOperand := false;
if opCode = oFetch or opCode = oStore then begin
 % fetch or store - operand is enclosed in square brackets %
skipSpaces;
if line( lPos // 1 ) not = "[" then rtError( """["" expected after fetch/store." );
lPos  := lPos + 1;
operand  := readInteger;
if operand > dataSize then rtError( "fetch/store address out of range." );
haveOperand := true
end
else if opCode = oPush then begin
 % push integer literal instruction %
operand  := readInteger;
haveOperand := true
end
else if opCode = oJmp or opCode = oJz then begin
 % jump - the operand is the relative address enclosed in parenthesis %
 % followed by the absolute address - we use the absolute address so  %
 % the opewrand will be >= 0 %
skipSpaces;
if line( lPos // 1 ) not = "(" then rtError( """("" expected after jmp/jz." );
lPos  := lPos + 1;
if line( lPos // 1 ) = "-" then % negative relative address % lPos := lPos + 1;
operand  := readInteger;
if line( lPos // 1 ) not = ")" then rtError( """)"" expected after jmp/jz." );
lPos  := lPos + 1;
operand  := readInteger;
haveOperand := true
end if_various_opcodes ;
 % store the code %
byteCode( pc ) := code( opCode );
if haveOperand then begin
 % have an operand for the op code %
if ( pc + 4 ) > MAX_LOCATION then rtError( "Code too large." );
for oPos := 1 until 4 do begin
pc := pc + 1;
byteCode( pc ) := code( operand rem 256 );
operand := operand div 256;
end for_oPos
end if_have_operand ;
end loadCodeFromLine ;
 
string(256) line;
string(16) name;
integer lPos, tPos, stringCount;
 
 % allow us to detect EOF %
ENDFILE := EXCEPTION( false, 1, 0, false, "EOF" );
 
 % first line should be "Datasize: d Strings: s" where d = number variables %
 % and s = number of strings  %
readALine;
if line = "trace" then begin
 % extension - run in trace mode %
trace := true;
readALine
end if_line_eq_trace ;
if XCPNOTED(ENDFILE) then rtError( "Empty program file." );
if line( 0 // 10 ) not = "Datasize: " then rtError( "Header line missing." );
lPos := 10;
dataSize := readInteger;
if dataSize > MAX_DATA then rtError( "Datasize too large." );
skipSpaces;
if line( lPos // 9 ) not = "Strings: " then rtError( """Strings: "" missing on header line." );
lPos := lPos + 9;
stringCount := readInteger;
if stringCount > MAX_STRINGS then rtError( "Too many strings." );
 % read the string table %
for stringNumber := 0 until stringCount - 1 do begin
string(256) str;
integer sLen, sPos;
readALine;
if XCPNOTED(ENDFILE) then rtError( "End-of-file in string table." );
if line( lPos // 1 ) not = """" then rtError( "String literal expected." );
str  := " ";
sLen := 0;
lPos := lPos + 1; % skip the opening double-quote %
while lPos <= 255 and line( lPos // 1 ) not = """" do begin
str( sLen // 1 ) := line( lPos // 1 );
sLen := sLen + 1;
lPos := lPos + 1
end while_more_string ;
if lPos > 255 then rtError( "Unterminated String." );
 % store the string %
stringValue( stringNumber ) := str;
stringLength( stringNumber ) := sLen
end for_sPos ;
 % read the code %
readALine;
while not XCPNOTED(ENDFILE) do begin
if line not = " " then loadCodeFromLine;
readALine
end while_not_eof
end;
 % run the program %
begin
integer pc, opCode, operand, sp;
integer array st ( 0 :: MAX_STACK );
logical halted;
 % prints a string from the string pool, escape sequences are interpreted %
procedure writeOnString( integer value stringNumber ) ;
begin
integer cPos, sLen;
string(256) text;
if stringNumber < 0 or stringNumber > MAX_STRINGS then rtError( "Invalid string number." );
cPos := 0;
sLen := stringLength( stringNumber );
text := stringValue( stringNumber );
while cPos < stringLength( stringNumber ) do begin
string(1) ch;
ch := text( cPos // 1 );
if ch not = "\" then writeon( s_w := 0, ch )
else begin
 % escaped character %
cPos := cPos + 1;
if cPos > sLen then rtError( "String terminates with ""\""." );
ch := text( cPos // 1 );
if ch = "n" then % newline % write()
else writeon( s_w := 0, ch )
end;
cPos := cPos + 1
end while_not_end_of_string
end writeOnString ;
 
pc  := 0;
sp  := -1;
halted := false;
while not halted do begin;
 % get the next op code and operand %
opCode  := decode( byteCode( pc ) );
pc  := pc + 1;
operand := 0;
if opCode = oFetch or opCode = oStore or opCode = oPush or opCode = oJmp or opCode = oJz then begin
 % this opCode has an operand %
pc := pc + 4;
for bPos := 1 until 4 do begin
operand := ( operand * 256 ) + decode( byteCode( pc - bPos ) );
end for_bPos
end if_opCode_with_an_operand ;
if trace then begin
write( i_w:= 1, s_w := 0, pc, " op(", opCode, "): ", opName( opCode ), " ", operand );
write()
end if_trace ;
 % interpret the instruction %
if opCode = oFetch then begin sp := sp + 1; st( sp ) := data( operand ) end
else if opCode = oStore then begin data( operand ) := st( sp ); sp := sp - 1 end
else if opCode = oPush then begin sp := sp + 1; st( sp ) := operand end
else if opCode = oHalt then halted := true
else if opCode = oJmp then pc  := operand
else if oPCode = oJz then begin
if st( sp ) = 0 then pc := operand;
sp := sp - 1
end
else if opCode = oPrtc then begin writeon( i_w := 1, s_w := 0, code( st( sp ) ) ); sp := sp - 1 end
else if opCode = oPrti then begin writeon( i_w := 1, s_w := 0, st( sp ) ); sp := sp - 1 end
else if opCode = oPrts then begin writeonString( st( sp ) ); sp := sp - 1 end
else if opCode = oNeg then st( sp ) := - st( sp )
else if opCode = oNot then st( sp ) := ( if st( sp ) = 0 then 1 else 0 )
else begin
operand := st( sp );
sp  := sp - 1;
if opCode = oAdd then st( sp ) := st( sp ) + operand
else if opCode = oSub then st( sp ) := st( sp ) - operand
else if opCode = oMul then st( sp ) := st( sp ) * operand
else if opCode = oDiv then st( sp ) := st( sp ) div operand
else if opCode = oMod then st( sp ) := st( sp ) rem operand
else if opCode = oLt then st( sp ) := if st( sp ) < operand then 1 else 0
else if opCode = oGt then st( sp ) := if st( sp ) > operand then 1 else 0
else if opCode = oLe then st( sp ) := if st( sp ) <= operand then 1 else 0
else if opCode = oGe then st( sp ) := if st( sp ) >= operand then 1 else 0
else if opCode = oEq then st( sp ) := if st( sp ) = operand then 1 else 0
else if opCode = oNe then st( sp ) := if st( sp ) not = operand then 1 else 0
else if opCode = oAnd then st( sp ) := if st( sp ) not = 0 and operand not = 0 then 1 else 0
else if opCode = oOr then st( sp ) := if st( sp ) not = 0 or operand not = 0 then 1 else 0
else rtError( "Unknown opCode." )
end if_various_opCodes
end while_not_halted
end
end.

AWK[edit]

Tested with gawk 4.1.1 and mawk 1.3.4.

 
function error(msg) {
printf("%s\n", msg)
exit(1)
}
 
function bytes_to_int(bstr, i, sum) {
sum = 0
for (i=word_size-1; i>=0; i--) {
sum *= 256
sum += code[bstr+i]
}
return sum
}
 
function emit_byte(x) {
code[next_free_code_index++] = x
}
 
function emit_word(x, i) {
for (i=0; i<word_size; i++) {
emit_byte(int(x)%256);
x = int(x/256)
}
}
 
function run_vm(data_size) {
sp = data_size + 1
pc = 0
while (1) {
op = code[pc++]
if (op == FETCH) {
stack[sp++] = stack[bytes_to_int(pc)]
pc += word_size
} else if (op == STORE) {
stack[bytes_to_int(pc)] = stack[--sp]
pc += word_size
} else if (op == PUSH) {
stack[sp++] = bytes_to_int(pc)
pc += word_size
} else if (op == ADD ) { stack[sp-2] += stack[sp-1]; sp--
} else if (op == SUB ) { stack[sp-2] -= stack[sp-1]; sp--
} else if (op == MUL ) { stack[sp-2] *= stack[sp-1]; sp--
} else if (op == DIV ) { stack[sp-2] = int(stack[sp-2] / stack[sp-1]); sp--
} else if (op == MOD ) { stack[sp-2] %= stack[sp-1]; sp--
} else if (op == LT ) { stack[sp-2] = stack[sp-2] < stack[sp-1]; sp--
} else if (op == GT ) { stack[sp-2] = stack[sp-2] > stack[sp-1]; sp--
} else if (op == LE ) { stack[sp-2] = stack[sp-2] <= stack[sp-1]; sp--
} else if (op == GE ) { stack[sp-2] = stack[sp-2] >= stack[sp-1]; sp--
} else if (op == EQ ) { stack[sp-2] = stack[sp-2] == stack[sp-1]; sp--
} else if (op == NE ) { stack[sp-2] = stack[sp-2] != stack[sp-1]; sp--
} else if (op == AND ) { stack[sp-2] = stack[sp-2] && stack[sp-1]; sp--
} else if (op == OR ) { stack[sp-2] = stack[sp-2] || stack[sp-1]; sp--
} else if (op == NEG ) { stack[sp-1] = - stack[sp-1]
} else if (op == NOT ) { stack[sp-1] = ! stack[sp-1]
} else if (op == JMP ) { pc += bytes_to_int(pc)
} else if (op == JZ ) { if (stack[--sp]) { pc += word_size } else { pc += bytes_to_int(pc) }
} else if (op == PRTC) { printf("%c", stack[--sp])
} else if (op == PRTS) { printf("%s", string_pool[stack[--sp]])
} else if (op == PRTI) { printf("%d", stack[--sp])
} else if (op == HALT) { break
}
} # while
}
 
function str_trans(srce, dest, i) {
dest = ""
for (i=1; i <= length(srce); ) {
if (substr(srce, i, 1) == "\\" && i < length(srce)) {
if (substr(srce, i+1, 1) == "n") {
dest = dest "\n"
i += 2
} else if (substr(srce, i+1, 1) == "\\") {
dest = dest "\\"
i += 2
}
} else {
dest = dest substr(srce, i, 1)
i += 1
}
}
return dest
}
 
function load_code( n, i) {
getline line
if (line == "")
error("empty line")
n=split(line, line_list)
data_size = line_list[2]
n_strings = line_list[4]
for (i=0; i<n_strings; i++) {
getline line
gsub(/\n/, "", line)
gsub(/"/ , "", line)
string_pool[i] = str_trans(line)
}
while (getline) {
offset = int($1)
instr = $2
opcode = code_map[instr]
if (opcode == "
")
error("
Unknown instruction " instr " at " offset)
emit_byte(opcode)
if (opcode == JMP || opcode == JZ) {
p = int($4)
emit_word(p - (offset + 1))
} else if (opcode == PUSH) {
value = int($3)
emit_word(value)
} else if (opcode == FETCH || opcode == STORE) {
gsub(/\[/, "
", $3)
gsub(/\]/, "
", $3)
value = int($3)
emit_word(value)
}
}
return data_size
}
 
BEGIN {
code_map["
fetch"] = FETCH = 1
code_map["
store"] = STORE = 2
code_map["
push" ] = PUSH = 3
code_map["
add" ] = ADD = 4
code_map["
sub" ] = SUB = 5
code_map["
mul" ] = MUL = 6
code_map["
div" ] = DIV = 7
code_map["
mod" ] = MOD = 8
code_map["
lt" ] = LT = 9
code_map["
gt" ] = GT = 10
code_map["
le" ] = LE = 11
code_map["
ge" ] = GE = 12
code_map["
eq" ] = EQ = 13
code_map["
ne" ] = NE = 14
code_map["
and" ] = AND = 15
code_map["
or" ] = OR = 16
code_map["
neg" ] = NEG = 17
code_map["
not" ] = NOT = 18
code_map["
jmp" ] = JMP = 19
code_map["
jz" ] = JZ = 20
code_map["
prtc" ] = PRTC = 21
code_map["
prts" ] = PRTS = 22
code_map["
prti" ] = PRTI = 23
code_map["
halt" ] = HALT = 24
 
next_free_node_index = 1
next_free_code_index = 0
word_size = 4
input_file = "
-"
if (ARGC > 1)
input_file = ARGV[1]
data_size = load_code()
run_vm(data_size)
}
Output  —  count:

count is: 1
count is: 2
count is: 3
count is: 4
count is: 5
count is: 6
count is: 7
count is: 8
count is: 9

C[edit]

Tested with gcc 4.81 and later, compiles warning free with -Wall -Wextra

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <stdint.h>
#include <ctype.h>
 
#define NELEMS(arr) (sizeof(arr) / sizeof(arr[0]))
 
#define da_dim(name, type) type *name = NULL; \
int _qy_ ## name ## _p = 0; \
int _qy_ ## name ## _max = 0

 
#define da_redim(name) do {if (_qy_ ## name ## _p >= _qy_ ## name ## _max) \
name = realloc(name, (_qy_ ## name ## _max += 32) * sizeof(name[0]));} while (0)

 
#define da_rewind(name) _qy_ ## name ## _p = 0
 
#define da_append(name, x) do {da_redim(name); name[_qy_ ## name ## _p++] = x;} while (0)
 
typedef unsigned char uchar;
typedef uchar code;
 
typedef enum { FETCH, STORE, PUSH, ADD, SUB, MUL, DIV, MOD, LT, GT, LE, GE, EQ, NE, AND,
OR, NEG, NOT, JMP, JZ, PRTC, PRTS, PRTI, HALT
} Code_t;
 
typedef struct Code_map {
char *text;
Code_t op;
} Code_map;
 
Code_map code_map[] = {
{"fetch", FETCH},
{"store", STORE},
{"push", PUSH },
{"add", ADD },
{"sub", SUB },
{"mul", MUL },
{"div", DIV },
{"mod", MOD },
{"lt", LT },
{"gt", GT },
{"le", LE },
{"ge", GE },
{"eq", EQ },
{"ne", NE },
{"and", AND },
{"or", OR },
{"neg", NEG },
{"not", NOT },
{"jmp", JMP },
{"jz", JZ },
{"prtc", PRTC },
{"prts", PRTS },
{"prti", PRTI },
{"halt", HALT },
};
 
FILE *source_fp;
da_dim(object, code);
 
void error(const char *fmt, ... ) {
va_list ap;
char buf[1000];
 
va_start(ap, fmt);
vsprintf(buf, fmt, ap);
va_end(ap);
printf("error: %s\n", buf);
exit(1);
}
 
/*** Virtual Machine interpreter ***/
void run_vm(const code obj[], int32_t data[], int g_size, char **string_pool) {
int32_t *sp = &data[g_size + 1];
const code *pc = obj;
 
again:
switch (*pc++) {
case FETCH: *sp++ = data[*(int32_t *)pc]; pc += sizeof(int32_t); goto again;
case STORE: data[*(int32_t *)pc] = *--sp; pc += sizeof(int32_t); goto again;
case PUSH: *sp++ = *(int32_t *)pc; pc += sizeof(int32_t); goto again;
case ADD: sp[-2] += sp[-1]; --sp; goto again;
case SUB: sp[-2] -= sp[-1]; --sp; goto again;
case MUL: sp[-2] *= sp[-1]; --sp; goto again;
case DIV: sp[-2] /= sp[-1]; --sp; goto again;
case MOD: sp[-2] %= sp[-1]; --sp; goto again;
case LT: sp[-2] = sp[-2] < sp[-1]; --sp; goto again;
case GT: sp[-2] = sp[-2] > sp[-1]; --sp; goto again;
case LE: sp[-2] = sp[-2] <= sp[-1]; --sp; goto again;
case GE: sp[-2] = sp[-2] >= sp[-1]; --sp; goto again;
case EQ: sp[-2] = sp[-2] == sp[-1]; --sp; goto again;
case NE: sp[-2] = sp[-2] != sp[-1]; --sp; goto again;
case AND: sp[-2] = sp[-2] && sp[-1]; --sp; goto again;
case OR: sp[-2] = sp[-2] || sp[-1]; --sp; goto again;
case NEG: sp[-1] = -sp[-1]; goto again;
case NOT: sp[-1] = !sp[-1]; goto again;
case JMP: pc += *(int32_t *)pc; goto again;
case JZ: pc += (*--sp == 0) ? *(int32_t *)pc : (int32_t)sizeof(int32_t); goto again;
case PRTC: printf("%c", sp[-1]); --sp; goto again;
case PRTS: printf("%s", string_pool[sp[-1]]); --sp; goto again;
case PRTI: printf("%d", sp[-1]); --sp; goto again;
case HALT: break;
default: error("Unknown opcode %d\n", *(pc - 1));
}
}
 
char *read_line(int *len) {
static char *text = NULL;
static int textmax = 0;
 
for (*len = 0; ; (*len)++) {
int ch = fgetc(source_fp);
if (ch == EOF || ch == '\n') {
if (*len == 0)
return NULL;
break;
}
if (*len + 1 >= textmax) {
textmax = (textmax == 0 ? 128 : textmax * 2);
text = realloc(text, textmax);
}
text[*len] = ch;
}
text[*len] = '\0';
return text;
}
 
char *rtrim(char *text, int *len) { // remove trailing spaces
for (; *len > 0 && isspace(text[*len - 1]); --(*len))
;
 
text[*len] = '\0';
return text;
}
 
char *translate(char *st) {
char *p, *q;
if (st[0] == '"') // skip leading " if there
++st;
p = q = st;
 
while ((*p++ = *q++) != '\0') {
if (q[-1] == '\\') {
if (q[0] == 'n') {
p[-1] = '\n';
++q;
} else if (q[0] == '\\') {
++q;
}
}
if (q[0] == '"' && q[1] == '\0') // skip trialing " if there
++q;
}
 
return st;
}
 
/* convert an opcode string into its byte value */
int findit(const char text[], int offset) {
for (size_t i = 0; i < sizeof(code_map) / sizeof(code_map[0]); i++) {
if (strcmp(code_map[i].text, text) == 0)
return code_map[i].op;
}
error("Unknown instruction %s at %d\n", text, offset);
return -1;
}
 
void emit_byte(int c) {
da_append(object, (uchar)c);
}
 
void emit_int(int32_t n) {
union {
int32_t n;
unsigned char c[sizeof(int32_t)];
} x;
 
x.n = n;
 
for (size_t i = 0; i < sizeof(x.n); ++i) {
emit_byte(x.c[i]);
}
}
 
/*
Datasize: 5 Strings: 3
" is prime\n"
"Total primes found: "
"\n"
154 jmp (-73) 82
164 jz (32) 197
175 push 0
159 fetch [4]
149 store [3]
*/

 
/* Load code into global array object, return the string pool and data size */
char **load_code(int *ds) {
int line_len, n_strings;
char **string_pool;
char *text = read_line(&line_len);
text = rtrim(text, &line_len);
 
strtok(text, " "); // skip "Datasize:"
*ds = atoi(strtok(NULL, " ")); // get actual data_size
strtok(NULL, " "); // skip "Strings:"
n_strings = atoi(strtok(NULL, " ")); // get number of strings
 
string_pool = malloc(n_strings * sizeof(char *));
for (int i = 0; i < n_strings; ++i) {
text = read_line(&line_len);
text = rtrim(text, &line_len);
text = translate(text);
string_pool[i] = strdup(text);
}
 
for (;;) {
int len;
 
text = read_line(&line_len);
if (text == NULL)
break;
text = rtrim(text, &line_len);
 
int offset = atoi(strtok(text, " ")); // get the offset
char *instr = strtok(NULL, " "); // get the instruction
int opcode = findit(instr, offset);
emit_byte(opcode);
char *operand = strtok(NULL, " ");
 
switch (opcode) {
case JMP: case JZ:
operand++; // skip the '('
len = strlen(operand);
operand[len - 1] = '\0'; // remove the ')'
emit_int(atoi(operand));
break;
case PUSH:
emit_int(atoi(operand));
break;
case FETCH: case STORE:
operand++; // skip the '['
len = strlen(operand);
operand[len - 1] = '\0'; // remove the ']'
emit_int(atoi(operand));
break;
}
}
return string_pool;
}
 
void init_io(FILE **fp, FILE *std, const char mode[], const char fn[]) {
if (fn[0] == '\0')
*fp = std;
else if ((*fp = fopen(fn, mode)) == NULL)
error(0, 0, "Can't open %s\n", fn);
}
 
int main(int argc, char *argv[]) {
init_io(&source_fp, stdin, "r", argc > 1 ? argv[1] : "");
int data_size;
char **string_pool = load_code(&data_size);
int data[1000 + data_size];
run_vm(object, data, data_size, string_pool);
}

Phix[edit]

Reusing cgen.e from the Code Generator task

--
-- demo\rosetta\Compiler\vm.exw
-- ============================
--
-- Since we have generated executable machine code, the virtual machine, such as it is, is just
-- the higher level implementations of printc/i/s, see setbuiltins() in cgen.e
-- Otherwise the only difference between this and cgen.exw is call(code_mem) instead of decode().
--
-- A quick test (calculating fib(44) 10^6 times) suggests ~500 times faster than interp.exw -
-- which is to be expected given that a single add instruction (1 clock) here is implemented as
-- at least three (and quite possibly five!) resursive calls to interp() in the other.
 
 
format PE32
--format ELF32
-- Note: cgen generates 32-bit machine code, which cannot be executed directly from a 64-bit interpreter.
-- You can however, via the magic of either the above format directives, use a 64-bit version of
-- Phix to compile this (just add a -c command line option) to a 32-bit executable, which can.
-- It would not be particularly difficult to emit 32 or 64 bit code, but some source code files
-- would, fairly obviously, then be very nearly twice as long, and a fair bit harder to read.
 
include cgen.e
 
procedure main(sequence cl)
open_files(cl)
toks = lex()
object t = parse()
code_gen(t)
fixup()
if machine_bits()=32 then
-- ^ as per note above
call(code_mem)
end if
free({var_mem,code_mem})
close_files()
end procedure
 
--main(command_line())
main({0,0,"deep.c"})
Output:
-5
10
15

Python[edit]

Tested with Python 2.7 and 3.x

from __future__ import print_function
import sys, struct
 
FETCH, STORE, PUSH, ADD, SUB, MUL, DIV, MOD, LT, GT, LE, GE, EQ, NE, AND, OR, NEG, NOT, \
JMP, JZ, PRTC, PRTS, PRTI, HALT = range(24)
 
code_map = {
"fetch": FETCH,
"store": STORE,
"push": PUSH,
"add": ADD,
"sub": SUB,
"mul": MUL,
"div": DIV,
"mod": MOD,
"lt": LT,
"gt": GT,
"le": LE,
"ge": GE,
"eq": EQ,
"ne": NE,
"and": AND,
"or": OR,
"not": NOT,
"neg": NEG,
"jmp": JMP,
"jz": JZ,
"prtc": PRTC,
"prts": PRTS,
"prti": PRTI,
"halt": HALT
}
 
input_file = None
code = bytearray()
string_pool = []
word_size = 4
 
#*** show error and exit
def error(msg):
print("%s" % (msg))
exit(1)
 
def int_to_bytes(val):
return struct.pack("<i", val)
 
def bytes_to_int(bstr):
return struct.unpack("<i", bstr)
 
#***
def emit_byte(x):
code.append(x)
 
#***
def emit_word(x):
s = int_to_bytes(x)
for x in s:
code.append(x)
 
#***
def run_vm(data_size):
stack = [0 for i in range(data_size + 1)]
pc = 0
while True:
op = code[pc]
pc += 1
 
if op == FETCH:
stack.append(stack[bytes_to_int(code[pc:pc+word_size])[0]]);
pc += word_size
elif op == STORE:
stack[bytes_to_int(code[pc:pc+word_size])[0]] = stack.pop();
pc += word_size
elif op == PUSH:
stack.append(bytes_to_int(code[pc:pc+word_size])[0]);
pc += word_size
elif op == ADD: stack[-2] += stack[-1]; stack.pop()
elif op == SUB: stack[-2] -= stack[-1]; stack.pop()
elif op == MUL: stack[-2] *= stack[-1]; stack.pop()
# use C like division semantics
elif op == DIV: stack[-2] = int(float(stack[-2]) / stack[-1]); stack.pop()
elif op == MOD: stack[-2] = int(float(stack[-2]) % stack[-1]); stack.pop()
elif op == LT: stack[-2] = stack[-2] < stack[-1]; stack.pop()
elif op == GT: stack[-2] = stack[-2] > stack[-1]; stack.pop()
elif op == LE: stack[-2] = stack[-2] <= stack[-1]; stack.pop()
elif op == GE: stack[-2] = stack[-2] >= stack[-1]; stack.pop()
elif op == EQ: stack[-2] = stack[-2] == stack[-1]; stack.pop()
elif op == NE: stack[-2] = stack[-2] != stack[-1]; stack.pop()
elif op == AND: stack[-2] = stack[-2] and stack[-1]; stack.pop()
elif op == OR: stack[-2] = stack[-2] or stack[-1]; stack.pop()
elif op == NEG: stack[-1] = -stack[-1]
elif op == NOT: stack[-1] = not stack[-1]
elif op == JMP: pc += bytes_to_int(code[pc:pc+word_size])[0]
elif op == JZ:
if stack.pop():
pc += word_size
else:
pc += bytes_to_int(code[pc:pc+word_size])[0]
elif op == PRTC: print("%c" % (stack[-1]), end=''); stack.pop()
elif op == PRTS: print("%s" % (string_pool[stack[-1]]), end=''); stack.pop()
elif op == PRTI: print("%d" % (stack[-1]), end=''); stack.pop()
elif op == HALT: break
 
def str_trans(srce):
dest = ""
i = 0
while i < len(srce):
if srce[i] == '\\' and i + 1 < len(srce):
if srce[i + 1] == 'n':
dest += '\n'
i += 2
elif srce[i + 1] == '\\':
dest += '\\'
i += 2
else:
dest += srce[i]
i += 1
 
return dest
 
#***
def load_code():
global string_pool
 
line = input_file.readline()
if len(line) == 0:
error("empty line")
 
line_list = line.split()
data_size = int(line_list[1])
n_strings = int(line_list[3])
 
for i in range(n_strings):
string_pool.append(str_trans(input_file.readline().strip('"\n')))
 
while True:
line = input_file.readline()
if len(line) == 0:
break
line_list = line.split()
offset = int(line_list[0])
instr = line_list[1]
opcode = code_map.get(instr)
if opcode == None:
error("Unknown instruction %s at %d" % (instr, offset))
emit_byte(opcode)
if opcode in [JMP, JZ]:
p = int(line_list[3])
emit_word(p - (offset + 1))
elif opcode == PUSH:
value = int(line_list[2])
emit_word(value)
elif opcode in [FETCH, STORE]:
value = int(line_list[2].strip('[]'))
emit_word(value)
 
return data_size
 
#*** main driver
input_file = sys.stdin
if len(sys.argv) > 1:
try:
input_file = open(sys.argv[1], "r", 4096)
except IOError as e:
error(0, 0, "Can't open %s" % sys.argv[1])
 
data_size = load_code()
run_vm(data_size)

Scheme[edit]

The interpreter uses recursion, representing the stack as a list; the stack pointer is the reference to the top of the list. This is a more natural solution in Scheme than a fixed stack array, and removes the danger of stack overflow. Operations on or returning booleans have been adapted to use integers, 0 for false and anything else for true.

All of the "Compiler/Sample programs" are correctly interpreted.

 
(import (scheme base)
(scheme char)
(scheme file)
(scheme process-context)
(scheme write)
(only (srfi 13) string-contains string-delete string-filter
string-replace string-tokenize))
 
(define *word-size* 4)
 
;; Mappings from operation symbols to internal procedures.
;; We define operations appropriate to virtual machine:
;; e.g. division must return an int, not a rational
;; boolean values are treated as numbers: 0 is false, other is true
(define *unary-ops*
(list (cons 'neg (lambda (a) (- a)))
(cons 'not (lambda (a) (if (zero? a) 1 0)))))
(define *binary-ops*
(let ((number-comp (lambda (op) (lambda (a b) (if (op a b) 1 0)))))
(list (cons 'add +)
(cons 'sub -)
(cons 'mul *)
(cons 'div (lambda (a b) (truncate (/ a b)))) ; int division
(cons 'mod modulo)
(cons 'lt (number-comp <))
(cons 'gt (number-comp >))
(cons 'le (number-comp <=))
(cons 'ge (number-comp >=))
(cons 'eq (lambda (a b) (if (= a b) 1 0)))
(cons 'ne (lambda (a b) (if (= a b) 0 1)))
(cons 'and (lambda (a b) ; make "and" work on numbers
(if (and (not (zero? a)) (not (zero? b))) 1 0)))
(cons 'or (lambda (a b) ; make "or" work on numbers
(if (or (not (zero? a)) (not (zero? b))) 1 0))))))
 
;; read information from file, returning vectors for data and strings
;; and a list of the code instructions
(define (read-code filename)
(define (setup-definitions str)
(values ; return vectors for (data strings) of required size
(make-vector (string->number (list-ref str 1)) #f)
(make-vector (string->number (list-ref str 3)) #f)))
(define (read-strings strings) ; read constant strings into data structure
(define (replace-newlines chars) ; replace newlines, obeying \\n
(cond ((< (length chars) 2) ; finished list
chars)
((and (>= (length chars) 3) ; preserve \\n
(char=? #\\ (car chars))
(char=? #\\ (cadr chars))
(char=? #\n (cadr (cdr chars))))
(cons (car chars)
(cons (cadr chars)
(cons (cadr (cdr chars))
(replace-newlines (cdr (cdr (cdr chars))))))))
((and (char=? #\\ (car chars)) ; replace \n with newline
(char=? #\n (cadr chars)))
(cons #\newline (replace-newlines (cdr (cdr chars)))))
(else ; keep char and look further
(cons (car chars) (replace-newlines (cdr chars))))))
(define (tidy-string str) ; remove quotes, map newlines to actual newlines
(list->string
(replace-newlines
(string->list
(string-delete #\" str))))) ; " (needed to satisfy rosettacode's scheme syntax highlighter)
;
(do ((i 0 (+ i 1)))
((= i (vector-length strings)) )
(vector-set! strings i (tidy-string (read-line)))))
(define (read-code)
(define (cleanup-code opn) ; tidy instructions, parsing numbers
(let ((addr (string->number (car opn)))
(instr (string->symbol (cadr opn))))
(cond ((= 2 (length opn))
(list addr instr))
((= 3 (length opn))
(list addr
instr
(string->number
(string-filter char-numeric? (list-ref opn 2)))))
(else ; assume length 4, jump instructions
(list addr instr (string->number (list-ref opn 3)))))))
;
(let loop ((result '()))
(let ((line (read-line)))
(if (eof-object? line)
(reverse (map cleanup-code result))
(loop (cons (string-tokenize line) result))))))
;
(with-input-from-file
filename
(lambda ()
(let-values (((data strings)
(setup-definitions (string-tokenize (read-line)))))
(read-strings strings)
(values data
strings
(read-code))))))
 
;; run the virtual machine
(define (run-program data strings code)
(define (get-instruction n)
(if (assq n code)
(cdr (assq n code))
(error "Could not find instruction")))
;
(let loop ((stack '())
(pc 0))
(let ((op (get-instruction pc)))
(case (car op)
((fetch)
(loop (cons (vector-ref data (cadr op)) stack)
(+ pc 1 *word-size*)))
((store)
(vector-set! data (cadr op) (car stack))
(loop (cdr stack)
(+ pc 1 *word-size*)))
((push)
(loop (cons (cadr op) stack)
(+ pc 1 *word-size*)))
((add sub mul div mod lt gt le eq ne and or)
(let ((instr (assq (car op) *binary-ops*)))
(if instr
(loop (cons ((cdr instr) (cadr stack) ; replace top two with result
(car stack))
(cdr (cdr stack)))
(+ pc 1))
(error "Unknown binary operation"))))
((neg not)
(let ((instr (assq (car op) *unary-ops*)))
(if instr
(loop (cons ((cdr instr) (car stack)) ; replace top with result
(cdr stack))
(+ pc 1))
(error "Unknown unary operation"))))
((jmp)
(loop stack
(cadr op)))
((jz)
(loop (cdr stack)
(if (zero? (car stack))
(cadr op)
(+ pc 1 *word-size*))))
((prtc)
(display (integer->char (car stack)))
(loop (cdr stack)
(+ pc 1)))
((prti)
(display (car stack))
(loop (cdr stack)
(+ pc 1)))
((prts)
(display (vector-ref strings (car stack)))
(loop (cdr stack)
(+ pc 1)))
((halt)
#t)))))
 
;; create and run virtual machine from filename passed on command line
(if (= 2 (length (command-line)))
(let-values (((data strings code) (read-code (cadr (command-line)))))
(run-program data strings code))
(display "Error: pass a .asm filename\n"))
 

zkl[edit]

Translation of: Python

File rvm.zkl:

// This is a little endian machine
const WORD_SIZE=4;
const{ var _n=-1; var[proxy]N=fcn{ _n+=1 } } // enumerator
const FETCH=N, STORE=N, PUSH=N, ADD=N, SUB=N, MUL=N, DIV=N, MOD=N,
LT=N, GT=N, LE=N, GE=N, EQ=N, NE=N, AND=N, OR=N, NEG=N, NOT=N,
JMP=N, JZ=N, PRTC=N, PRTS=N, PRTI=N, HALT=N;
 
var [const]
bops=Dictionary(ADD,'+, SUB,'-, MUL,'*, DIV,'/, MOD,'%,
LT,'<, GT,'>, LE,'<=, GE,'>=, NE,'!=, EQ,'==, NE,'!=),
strings=List(); // filled in by the loader
;
 
// do a binary op
fcn bop(stack,op){ a,b:=stack.pop(),stack.pop(); stack.append(bops[op](b,a)) }
 
fcn run_vm(code,stackSz){
stack,pc := List.createLong(stackSz,0), 0;
while(True){
op:=code[pc]; pc+=1;
switch(op){
case(FETCH){
stack.append(stack[code.toLittleEndian(pc,WORD_SIZE,False)]);
pc+=WORD_SIZE;
}
case(STORE){
stack[code.toLittleEndian(pc,WORD_SIZE)]=stack.pop();
pc+=WORD_SIZE;
}
case(PUSH){
stack.append(code.toLittleEndian(pc,WORD_SIZE,False)); // signed
pc+=WORD_SIZE;
}
case(ADD,SUB,MUL,DIV,MOD,LT,GT,LE,GE,EQ,NE) { bop(stack,op) }
case(AND){ stack[-2] = stack[-2] and stack[-1]; stack.pop() }
case(OR) { stack[-2] = stack[-2] or stack[-1]; stack.pop() }
case(NEG){ stack[-1] = -stack[-1] }
case(NOT){ stack[-1] = not stack[-1] }
case(JMP){ pc+=code.toLittleEndian(pc,WORD_SIZE,False); } // signed
case(JZ) {
if(stack.pop()) pc+=WORD_SIZE;
else pc+=code.toLittleEndian(pc,WORD_SIZE,False);
}
case(PRTC){ } // not implemented
case(PRTS){ print(strings[stack.pop()]) }
case(PRTI){ print(stack.pop()) }
case(HALT){ break }
else{ throw(Exception.AssertionError(
"Bad op code (%d) @%d".fmt(op,pc-1))) }
}
}
}
 
code:=File(vm.nthArg(0)).read(); // binary code file
// the string table is prepended to the code:
// 66,1 byte len,text, no trailing '\0' needed
while(code[0]==66){ // read the string table
sz:=code[1];
strings.append(code[2,sz].text);
code.del(0,sz+2);
}
run_vm(code,1000);

The binary code file code.bin:

Output:
$ zkl hexDump code.bin 
   0: 42 0a 63 6f 75 6e 74 20 | 69 73 3a 20 42 01 0a 02   B.count is: B...
  16: 01 00 00 00 01 00 00 00 | 00 00 00 00 00 00 02 0a   ................
  32: 00 00 00 08 13 2b 00 00 | 00 02 00 00 00 00 15 00   .....+..........
  48: 00 00 00 00 16 02 01 00 | 00 00 15 00 00 00 00 00   ................
  64: 02 01 00 00 00 03 01 00 | 00 00 00 12 cd ff ff ff   ................
  80: 17
Output:
$ zkl rvm code.bin 
count is: 1
count is: 2
count is: 3
count is: 4
count is: 5
count is: 6
count is: 7
count is: 8
count is: 9