Machine code

From Rosetta Code
Task
Machine code
You are encouraged to solve this task according to the task description, using any language you may know.

The task requires poking machine code directly into memory and executing it. The machine code is the architecture-specific opcodes which have the simple task of adding two unsigned bytes together and making the result available to the high-level language.

For example, the following assembly language program is given for x86 (32 bit) architectures:

mov EAX, [ESP+4]
add EAX, [ESP+8]
ret

This would translate into the following opcode bytes:

139 68 36 4 3 68 36 8 195

Or in hexadecimal:

8B 44 24 04 03 44 24 08 C3
Task

If different than 32-bit x86, specify the target architecture of the machine code for your example. It may be helpful to also include an assembly version of the machine code for others to reference and understand what is being executed. Then, implement the following in your favorite programming language:

  • Poke the necessary opcodes into a memory location.
  • Provide a means to pass two values to the machine code.
  • Execute the machine code with the following arguments: unsigned-byte argument of value 7; unsigned-byte argument of value 12; The result would be 19.
  • Perform any clean up actions that are appropriate for your chosen language (free the pointer or memory allocations, etc.)


6502 Assembly

We'll execute the following program:

LDA #$07
CLC
ADC #$0C
RTS


main:
LDX #$00        ;initialize array offset to 0

LDA #$A9        ;LDA #immediate
STA Array,x     ;store at offset 0
INX             ;next offset

LDA #$07        ;first parameter
STA Array,x     ;store at offset 1
INX             ;next offset

LDA #$18        ;CLC
STA Array,x     ;store at offset 2
INX             ;next offset

LDA #$69        ;ADC #immediate
STA Array,x     ;store at offset 3
INX             ;next offset

LDA #$0C        ;second parameter
STA Array,x     ;store at offset 4
INX             ;next offset

LDA #$60        ;RTS
STA Array,x     ;store at offset 5


JMP Array       ;assuming we used a JSR to get to main, the RTS at the end of this RAM will return us back to BASIC.
                ;if array is directly underneath this statement, we can actually omit this JMP entirely 
                ;and execution will simply fall through to the array.
 
Array:
byte 0,0,0,0,0,0

If you're going to do this in actual programming (which is somewhat common on 8-bit computers for quick interrupt handling), it may be a good idea to know ahead of time the maximum size of your RAM area for machine code and fill it with return statements to avoid crashing.

68000 Assembly

We'll execute the following program:

MOVE.B #7,D0
ADD.B #12,D0
RTS

And here is the code that sets it up:

LEA CodeArray,A0
MOVE.L #$103C0007,(A0)+  ;MOVE.B #7,D0
MOVE.L #$D03C000C,(A0)+  ;ADD.B #12,D0
MOVE.W #$4E75,(A0)+      ;RTS
JSR CodeArray

JMP $                    ;halt the cpu, we're done.

CodeArray:
DS.B 16                  ;16 bytes of padding (this is assumed to be RAM)

Action!

DEFINE ADC="$6D"
DEFINE CLC="$18"
DEFINE JSR="$20"
DEFINE LDA="$AD"
DEFINE RTS="$60"
DEFINE STA="$8D"

PROC Main()
  BYTE ARRAY buf(20)
  BYTE a=[19],b=[37],s
  CARD addr

  addr=buf
  Poke(addr,CLC) addr==+1
  Poke(addr,LDA) addr==+1
  PokeC(addr,@a) addr==+2
  Poke(addr,ADC) addr==+1
  PokeC(addr,@b) addr==+2
  Poke(addr,STA) addr==+1
  PokeC(addr,@s) addr==+2
  Poke(addr,RTS) addr==+1

  [JSR buf] ;run the machine code stored on buf

  PrintF("%B+%B=%B%E",a,b,s)
RETURN
Output:

Screenshot from Atari 8-bit computer

19+37=56

Applesoft BASIC

POKE768,169:POKE770,24:POKE771,105:POKE773,133:POKE775,96:POKE774,235:POKE769,7:POKE772,12:CALL768:?PEEK(235)

AutoHotkey

MCode Tutorial (Forum Thread)

MCode4GCC (Forum Thread | GitHub) - An MCode generator using the GCC Compiler.

MCode(Var, "8B44240403442408C3")
MsgBox, % DllCall(&Var, "Char",7, "Char",12)
Var := ""
return

; http://www.autohotkey.com/board/topic/19483-machine-code-functions-bit-wizardry/
MCode(ByRef code, hex) { ; allocate memory and write Machine Code there
   VarSetCapacity(code, StrLen(hex) // 2)
   Loop % StrLen(hex) // 2
      NumPut("0x" . SubStr(hex, 2 * A_Index - 1, 2), code, A_Index - 1, "Char")
}

BBC BASIC

Note that BBC BASIC for Windows includes an 80386/80486 assembler as standard!

      REM Claim 9 bytes of memory
      SYS "GlobalAlloc",0,9 TO code%

      REM Poke machine code into it
      P%=code%
      [OPT 0
      mov EAX, [ESP+4]
      add EAX, [ESP+8]
      ret
      ]

      REM Run code
      SYS code%,7,12 TO result%
      PRINT result%

      REM Free memory
      SYS "GlobalFree",code%
      END

C

#include <stdio.h>
#include <sys/mman.h>
#include <string.h>

int test (int a, int b)
{
  /*
       mov EAX, [ESP+4]
       add EAX, [ESP+8]
       ret
  */
  char code[] = {0x8B, 0x44, 0x24, 0x4, 0x3, 0x44, 0x24, 0x8, 0xC3};
  void *buf;
  int c;
  /* copy code to executable buffer */
  buf = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
             MAP_PRIVATE|MAP_ANON,-1,0);

  memcpy (buf, code, sizeof(code));
  /* run code */
  c = ((int (*) (int, int))buf)(a, b);
  /* free buffer */
  munmap (buf, sizeof(code));
  return c;
}

int main ()
{
  printf("%d\n", test(7,12));
  return 0;
}

COBOL

This solution is a 64-bit adaptation of the task, using the macOS ABI and 64-bit instructions. The assembly code in question is:

pushq	%rbp
movq	%rsp, %rbp
movl	%edi, -0x4(%rbp)
movl	%esi, -0x8(%rbp)
movl	-0x4(%rbp), %esi
addl	-0x8(%rbp), %esi
movl	%esi, -0xc(%rbp)
movl	-0xc(%rbp), %eax
popq	%rbp
retq

The 64-bit "wrapper code" used by the PicoLisp and Go implementations have the parameters 7 and 12 baked into it, so I opted for a pure 64-bit implementation rather than manipulating the 64-bit stack to support the 32-bit instructions.

       >>SOURCE FORMAT IS FIXED
       IDENTIFICATION DIVISION.
       PROGRAM-ID. MC.
       DATA DIVISION.
       WORKING-STORAGE SECTION.
           01 INSTRUCTIONS.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'55'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'48'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'89'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'E5'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'89'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'7D'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'FC'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'89'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'75'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'F8'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'8B'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'75'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'FC'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'03'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'75'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'F8'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'89'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'75'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'F4'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'8B'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'45'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'F4'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'5D'.
              03 USAGE BINARY-CHAR UNSIGNED VALUE H'C3'.
           01 MMAP.
              03 MMAP-ADDR   USAGE POINTER VALUE NULL.
              03 MMAP-LEN    USAGE BINARY-LONG UNSIGNED VALUE 24.
              03 MMAP-PROT   USAGE BINARY-INT VALUE H'0007'.
              03 MMAP-FLAGS  USAGE BINARY-INT VALUE H'1002'.
              03 MMAP-FD     USAGE BINARY-INT VALUE -1.
              03 MMAP-OFFSET USAGE BINARY-LONG VALUE 0.
           03 CODE-PTR USAGE PROCEDURE-POINTER.
           01 ARG-A USAGE BINARY-INT VALUE 7.
           01 ARG-B USAGE BINARY-INT VALUE 12.
           01 RESULT USAGE BINARY-INT.
       LINKAGE SECTION.
           01 MACHINE-CODE PIC X(24).
       PROCEDURE DIVISION.
       MAIN SECTION.
           PERFORM SET-UP.
           CALL CODE-PTR USING
              BY VALUE ARG-A
              BY VALUE ARG-B
              RETURNING RESULT.
           DISPLAY RESULT.
           PERFORM TEAR-DOWN.
           STOP RUN.

       SET-UP SECTION.
           CALL 'mmap' USING
              BY VALUE MMAP-ADDR
              BY VALUE MMAP-LEN
              BY VALUE MMAP-PROT
              BY VALUE MMAP-FLAGS
              BY VALUE MMAP-FD
              BY VALUE MMAP-OFFSET
              RETURNING CODE-PTR.
           SET ADDRESS OF MACHINE-CODE TO CODE-PTR.
           MOVE INSTRUCTIONS TO MACHINE-CODE.

       TEAR-DOWN SECTION.
           SET ADDRESS OF MACHINE-CODE TO NULL.
           CALL 'munmap' USING
              BY VALUE CODE-PTR
              BY VALUE MMAP-LEN.
Output:

+0000000019

Commodore BASIC

For Commodore computers, machine language routines are often written to increase speed of a program, provide functionality not otherwise provided for in BASIC, or execute code in the background. It is common to see machine language routines incorporated into BASIC programs for these purposes.

The machine code shown for adding two numbers targets the MOS 65xx/85xx architecture (6502, 6510, etc.) and is given as follows:

Assembly         Hexadecimal     Decimal

CLC              18              24
LDA $2000        AD 00 20        173 0 32
ADC $2001        6D 01 20        109 1 32
STA $2002        8D 02 20        141 2 32
RTS              60              96
  1. The numbers to be added must be poked into locations $2000 and $2001 (8192 and 8193)
  2. The machine code is called for execution at $2003 with the SYS statement. (Note, since we are using ADd with Carry, we should CLear the Carry flag first to be sure the result is accurate.)
  3. The result is stored in location $2002 (8194).
  4. The machine code must issue a RTS (ReTurn from Subroutine) to allow BASIC to continue.


Note: This example, using RAM space at $2000, provides a wide range of cross-compatibility across Commodore 8-bit models, and has been tested to work on the Commodore PET (32k), VIC-20 (28k), Commodore 64, Commodore Plus/4, and Commodore 128.

No memory management is performed in this example, which would protect the machine code from being overwritten by BASIC. There are more optimal memory locations for the machine code to reside that are not affected by BASIC, however, the locations of such are unique to each model. For example, the range from $C000 to $CFFF (49152 to 53247) on the Commodore 64 is one such ideal location since BASIC memory ends at $9FFF, and there is no overlapping ROM to interfere.

10 print chr$(147);
15 ml=8192      
20 if peek(ml+3)<>173 and peek(ml+12)<>96 then gosub 100           
30 for ad=ml to ml+2:poke ad,0:next            
40 poke ml,7:poke ml+1,12                      
50 print "before:";peek(ml+2)
60 sys ml+3                                   
70 print "after:";peek(ml+2)
80 end
100 rem machine language loader
105 for ad=ml+3 to ml+13
110 read b
115 poke ad,b
120 next
125 return
8195 data 24       :rem clc
8196 data 173,0,32 :rem lda $2000
8199 data 109,1,32 :rem adc $2001
8202 data 141,2,32 :rem sta $2002
8205 data 96       :rem rts

Notes about Program

  • Line 20 checks to see if the ML routine has already been loaded into memory. If not, visit the loader routine.
  • Line 30 clears the memory locations (sets to zero) for the two addends and the sum.


Output:
BEFORE: 0
AFTER: 19

READY.
█

Common Lisp

;;Note that by using the 'CFFI' library, one can apply this procedure portably in any lisp implementation; 
;; in this code however I chose to demonstrate only the implementation-dependent programs.

;;CCL
;; Allocate a memory pointer and poke the opcode into it
(defparameter ptr (ccl::malloc 9))

(loop for i in '(139 68 36 4 3 68 36 8 195) 
   for j from 0 do
   (setf (ccl::%get-unsigned-byte ptr j) i))

;; Execute with the required arguments and return the result as an unsigned-byte
(ccl::ff-call ptr :UNSIGNED-BYTE 7 :UNSIGNED-BYTE 12 :UNSIGNED-BYTE)

;; Output = 19

;; Free the pointer
(ccl::free ptr)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;SBCL
(defparameter mmap (list 139 68 36 4 3 68 36 8 195))

(defparameter pointer (sb-alien:make-alien sb-alien:unsigned-char (length mmap)))

(defparameter callp (loop for byte in mmap
                          for i from 0
		       do
		       (setf (sb-alien:deref pointer i) byte)
		       finally
		       (return (sb-alien:cast pointer (function integer integer integer)))))

(sb-alien:alien-funcall callp 7 12)

(loop for i from 0 below 18 collect (sb-alien:deref ptr i))

(sb-alien:free-alien pointer)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;CLISP
(defparameter mmap (list 139 68 36 4 3 68 36 8 195))

(defparameter POINTER (FFI:FOREIGN-ADDRESS  (FFI:FOREIGN-ALLOCATE 'FFI:UINT8 :COUNT 9)))

(loop for i in mmap
   for j from 0 do
   (FUNCALL #'(SETF FFI:MEMORY-AS) i POINTER 'FFI:INT j))

(FUNCALL
 (FFI:FOREIGN-FUNCTION POINTER
		       (LOAD-TIME-VALUE
			(FFI:PARSE-C-TYPE
			 '(FFI:C-FUNCTION (:ARGUMENTS 'FFI:INT 'FFI:INT) (:RETURN-TYPE FFI:INT) (:LANGUAGE :STDC)))))
 7 12)

(FFI:FOREIGN-FREE POINTER)

Cowgol

include "cowgol.coh";

# Run machine code at cptr, given two 32-bit arguments,
# return the value returned from EAX. 
sub RunCode(cptr: [uint8], arg1: uint32, arg2: uint32): (rslt: uint32) is
    # Inline assembly is supported, so this whole rigmarole
    # is not even necessary. 
    # Though note that this (obviously) depends on the assembly back-end used.
    # Linux as uses AT&T syntax, so that's what I'm doing here.
    # Cowgol supports many processors but this will, obviously, only work
    # on x86. 
    
    @asm "pushl (",arg1,")";      # Push the two arguments on the stack
    @asm "pushl (",arg2,")";
    @asm "call *(",cptr,")";      # Call the code at the pointer
    @asm "movl %eax, (",rslt,")"; # Store the result in rslt
    @asm "popl %eax";             # Clean up the stack
    @asm "popl %eax";
end sub;

# Store code in an array. This is enough to make it available.
var code: uint8[] := {139, 68, 36, 4, 3, 68, 36, 8, 195};

# Use the function
print_i32(RunCode(&code as [uint8], 7, 12)); # this prints 7+12 = 19
print_nl();

# As a demonstration, this shows it can be patched at runtime to multiply instead
code[4] := 247;
code[5] := 100;
print_i32(RunCode(&code as [uint8], 7, 12)); # this prints 7*12 = 84
print_nl();
Output:
19
84

D

In D you usually use a nicer asm {} statement for similar purposes.

Generally new operating systems forbid execution of any address unless it's known to contain executable code. This is a basic version that unlike the C entry executes from array memory. This may crash on some operating systems.

int test(in int a, in int b) pure nothrow @nogc {
    /*
    mov EAX, [ESP+4]
    add EAX, [ESP+8]
    ret
    */
    immutable ubyte[9] code = [0x8B, 0x44, 0x24, 0x4, 0x3, 0x44, 0x24, 0x8, 0xC3];
    alias F = extern(C) int function(int, int) pure nothrow @nogc;
    immutable f = cast(F)code.ptr;
    return f(a, b); // Run code.
}

void main() {
    import std.stdio;

    test(7, 12).writeln;
}
Output:
 19

Delphi

Works with: Delphi version 6.0

With the "Asm" directive, you can insert machine language directly into Delphi code. If you use the assembly language instructions, you don't have to know the opcode numbers. However, it you really want to insert raw binary values, you can use directives like "db" and "dw" to insert raw values.

function AddNumbers(Num1, Num2: integer): Integer;
{Add two numbers in assembly language}
asm
        PUSH    EBX
        PUSH    EDX
        MOV     ECX,Num1
        MOV     EDX,Num2
        ADD	ECX,EDX
        MOV	Result,ECX
        POP     EDX
        POP     EBX
end;



procedure TestAssembly(Memo: TMemo);
var I,J,K: integer;
begin
for I:=1 to 5 do
 for J:=1 to 5 do
	begin
	K:=AddNumbers(I,J);
	Memo.Lines.Add(IntToStr(I)+' + '+IntToStr(J)+' = '+IntToStr(K));
	end;
end;
Output:
1 + 1 = 2
1 + 2 = 3
1 + 3 = 4
1 + 4 = 5
1 + 5 = 6
2 + 1 = 3
2 + 2 = 4
2 + 3 = 5
2 + 4 = 6
2 + 5 = 7
3 + 1 = 4
3 + 2 = 5
3 + 3 = 6
3 + 4 = 7
3 + 5 = 8
4 + 1 = 5
4 + 2 = 6
4 + 3 = 7
4 + 4 = 8
4 + 5 = 9
5 + 1 = 6
5 + 2 = 7
5 + 3 = 8
5 + 4 = 9
5 + 5 = 10


Draco

It is possible to call any random area of memory as if it were a function by casting it to a function of the right type, similar to C. The following example stores the code in an array. The CP/M version of Draco passes the arguments on the stack in left-to-right order (so they are popped off in right-to-left order).

/* 8080 machine code in an array */
[6] byte add_mc = (
    0xC1,           /* POP B - get return address */
    0xD1,           /* POP D - get second argument */
    0xE1,           /* POP H - get first argument */
    0x19,           /* DAD D - add arguments */
    0xC5,           /* PUSH B - push return address back */
    0xC9            /* RET - return */
);

proc nonrec main() void:
    /* Declare a function pointer */
    type fn = proc(word a, b) word;
    fn add;
    
    /* Pretend the array is actually a function */
    add := pretend(add_mc, fn);
    
    /* Call the function and print the result */
    writeln(add(12, 7))
corp
Output:
19

Draco also supports inline machine code directly, using the code() construct. The machine code is emitted in place during compilation, and cannot be changed at runtime. It is possible to refer to variables directly, whose address will be emitted. The linker will even adjust these addresses as required.

proc nonrec main() void:
    word a, b, c; 
    
    /* assign values to the input variables */
    a := 12;
    b := 7;
    
    /* inline machine code to add A and B
     * 
     * Note that we have to cast each value to a byte,
     * because by default, numeric constants are assumed
     * to be 16-bit words, and would be emitted as two
     * bytes each.
     *
     * The intent is for the programmer to define byte 
     * constants corresponding to opcodes, and write
     * "assembly", but that is beyond the scope here. */
    code(
        make(0x2A, byte), a,      /* LHLD a  - load var A into HL */
        make(0xEB, byte),         /* XCHG    - put it in DE */
        make(0x2A, byte), b,      /* LHLD b  - load var B into HL */
        make(0x19, byte),         /* DAD D   - add DE to HL */
        make(0x22, byte), c       /* SHLD c  - store the result in var C */
    );
    
    /* print the result */
    writeln(c);
corp
Output:
19


FreeBASIC

'' This is an example for the x86 architecture.
Function test (Byval a As Long, Byval b As Long) As Long
    Asm
        mov eax, [a]
        Add eax, [b]
        mov [Function], eax
    End Asm
End Function

Print test(12, 7)
Sleep
Output:
19


Go

Translation of: C


This task requires the use of 'cgo' which enables Go to interface with C code by importing a pseudo-package called "C".

Although Go supports both 32-bit and 64-bit architectures, I'm writing this on a 64-bit Ubuntu 20.04 system. I'm therefore using x64 opcodes rather than the x86 (32-bit) opcodes listed in the task description.

There doesn't appear to be a way to cast a pointer to a native buffer to a Go function pointer so that the machine code can be run directly. I've therefore written a C function to perform this step and embedded it in the program which 'cgo' allows us to do.

package main

import "fmt"

/*
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>

typedef unsigned char byte;
typedef byte (*mcfunc) (byte, byte);

void runMachineCode(void *buf, byte a, byte b) {
    mcfunc fp = (mcfunc)buf;
    printf("%d\n", fp(a, b));
}
*/
import "C"

func main() {
    code := []byte{
        0x55, 0x48, 0x89, 0xe5, 0x89, 0x7d,
        0xfc, 0x89, 0x75, 0xf8, 0x8b, 0x75,
        0xfc, 0x03, 0x75, 0xf8, 0x89, 0x75,
        0xf4, 0x8b, 0x45, 0xf4, 0x5d, 0xc3,
    }
    le := len(code)
    buf := C.mmap(nil, C.size_t(le), C.PROT_READ|C.PROT_WRITE|C.PROT_EXEC,
        C.MAP_PRIVATE|C.MAP_ANON, -1, 0)
    codePtr := C.CBytes(code)
    C.memcpy(buf, codePtr, C.size_t(le))
    var a, b byte = 7, 12
    fmt.Printf("%d + %d = ", a, b)
    C.runMachineCode(buf, C.byte(a), C.byte(b))
    C.munmap(buf, C.size_t(le))
    C.free(codePtr)
}
Output:
7 + 12 = 19

Julia

Translation of: C

Julia cannot execute machine code directly, but can embed C and C++ with the Cxx library.

using Cxx

cxx"""
#include <stdio.h>
#include <sys/mman.h>
#include <string.h>
 
int test (int a, int b)
{
  /*
       mov EAX, [ESP+4]
       add EAX, [ESP+8]
       ret
  */
  char code[] = {0x8B, 0x44, 0x24, 0x4, 0x3, 0x44, 0x24, 0x8, 0xC3};
  void *buf;
  int c;
  /* copy code to executable buffer */
  buf = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
             MAP_PRIVATE|MAP_ANON,-1,0);
 
  memcpy (buf, code, sizeof(code));
  /* run code */
  c = ((int (*) (int, int))buf)(a, b);
  /* free buffer */
  munmap (buf, sizeof(code));
  return c;
}
 
int main ()
{
  printf("%d\n", test(7,12));
  return 0;
}
"""

julia_function = @cxx main()
julia_function()

Kotlin

Translation of: C


This task presents a number of issues for Kotlin Native which at the time of writing (August 2017) is still in the earlier stages of development:-

1. The language doesn't (yet) have an unsigned Byte type, though this is easily solved by subtracting 256 from unsigned values between 128 and 255 inclusive and then using the signed Byte type.

2. As far as x86 is concerned, the language is currently only targetting 64-bit platforms including Ubuntu 14.04 on which I'm writing this. Rather than rewrite the task using x64 opcodes, I've used the PicoLisp entry's 'glue code' to enable the 32-bit machine code to run on a 64-bit system.

3. There doesn't appear to be a way to cast a pointer to a native buffer to a Kotlin function pointer so that the machine code can be run. I've therefore written a 'one line' C helper function (in mcode.def) to perform this step and compiled it to a library (mcode.klib) so that it can be called from Kotlin code.

// mcode.def
---

static inline unsigned char runMachineCode(void *code, unsigned char a, unsigned char b) {      
    return ((unsigned char (*) (unsigned char, unsigned char))code)(a, b);
}
// Kotlin Native version 0.3

import kotlinx.cinterop.*
import string.*
import mman.*
import mcode.*

fun main(args: Array<String>) {
    memScoped {        
        val bytes = byteArrayOf(
            144 - 256,                            // Align
            144 - 256,
            106, 12,                              // Prepare stack
            184 - 256, 7, 0, 0, 0, 
            72, 193 - 256, 224 - 256, 32,
            80,
            139 - 256, 68, 36, 4, 3, 68, 36, 8,   // Rosetta task code
            76, 137 - 256, 227 - 256,             // Get result
            137 - 256, 195 - 256,
            72, 193 - 256, 227 - 256, 4,
            128 - 256, 203 - 256, 2,
            72, 131 - 256, 196 - 256, 16,         // Clean up stack 
            195 - 256                             // Return
        )
        val len = bytes.size    
        val code = allocArray<ByteVar>(len)
        for (i in 0 until len) code[i] = bytes[i]
        val buf = mmap(null, len.toLong(), PROT_READ or PROT_WRITE or PROT_EXEC,
                       MAP_PRIVATE or MAP_ANON, -1, 0) 
        memcpy(buf, code, len.toLong())
        val a: Byte = 7
        val b: Byte = 12
        val c = runMachineCode(buf, a, b)
        munmap(buf, len.toLong()) 
        println("$a + $b = ${if(c >= 0) c.toInt() else c + 256}")
    }
}
Output:
7 + 12 = 19

M2000 Interpreter

We can execute machine code, in a buffer for code. We can't push to stack and then call, we can use a buffer for data. If eax is non zero then error raised, with error number the eax number. When execute code the code buffer can't be used to write over. so we have to use a buffer for data for read/write data. This example perform these: At Datamem(1) put 500, eax=5100, eax add Datamem(1), eax add 5, store eax to Datamem(0). We have an option to clear eax, or use it to return value as error code. We have to leave all other registers, and stack as we found it. Both running in Wine (Linux 64bit) too


Module Checkit {
      Buffer DataMem as Long*10
      Return DataMem, 1:=500    ' second Long
      Print Eval(DataMem, 1)+5100+5=5605
      \\ Now we do math executing machine code
      Buffer Code ExecMem as byte*1024
      Address=0
      EmbLong(0xb8, 5100) ' mov eax,5100
      EmbByteByte(0x83, 0xC0, 5) ' add  eax,0x5
      EmbByteLong(0x3,0x5, DataMem(1)) ' add eax, [DataMem(1)] 
      EmbLong(0xa3, DataMem(0)) ' mov [DataMem(0)], eax
      \\ split rem to execute xor eax eax (eax=0)
      Rem : EmbByte(0x31, 0xC0) ' xor eax, eax 
      Ret() ' Return
      \\  
      Try ok {
            Execute Code ExecMem, 0
      }
      \\If  Eax <>0 then we get error, so we read error as Uint()
      \\ Error read once then change to zero
      m=Uint(Error)
      \\ Hex is Print Hexadecimal for unsigned numbers
      Hex m
      Print m=5605
      Print Error=0, ok=False
      
      Print Eval(DataMem, 0)=5605,  Eval(DataMem, 0)
      \\ sub used as Exit here
      Sub Ret()
            Return ExecMem, Address:=0xC3
            Address++
      End Sub
      Sub EmbByteByte()
            Return ExecMem, Address:=Number, Address+1:=Number, Address+2:=Number
            Address+=3
      End Sub
      Sub EmbByte()
            Return ExecMem, Address:=Number, Address+1:=Number
            Address+=2
      End Sub
      Sub EmbLong()
            Return ExecMem, Address:=Number, Address+1:=Number as Long
            Address+=5
      End Sub
      Sub EmbByteLong()
            Return ExecMem, Address:=Number, Address+1:=Number, Address+2:=Number as Long
            Address+=6
      End Sub
      
}
CheckIt

Using a lambda function with closures two buffers (buffers are objects in M2000 to handle memory blocks). This also add 12 +7 as the task want (but with no pushing to stack, but poke to data buffer)

 
Function MyAdd {
      Buffer DataMem as Long*2
      Buffer Code ExecMem as byte*32
      Address=0
      EmbByte(0x31, 0xC0)
      EmbByteLong(0x3,0x5, DataMem(0)) ' add eax, [DataMem(0)] 
      EmbByteLong(0x3,0x5, DataMem(1)) ' add eax, [DataMem(1)] 
      EmbLong(0xa3, DataMem(0)) ' mov [DataMem(0)], eax
      Rem :
      EmbByte(0x31, 0xC0) ' xor eax, eax 
      Ret() ' Return
      =lambda ExecMem, DataMem (a as double, b as double)-> {
            Return DataMem, 0:=a, 1:=b
            Try ok  {
                  Execute Code ExecMem, 0
            }
            If not ok then {
                  =Uint(Error)   
            }  Else {
                  =Eval(DataMem, 0)
            }
      }
      Sub Ret()
            Return ExecMem, Address:=0xC3
            Address++
      End Sub
      Sub EmbByte()
            Return ExecMem, Address:=Number, Address+1:=Number
            Address+=2
      End Sub
      Sub EmbLong()
            Return ExecMem, Address:=Number, Address+1:=Number as Long
            Address+=5
      End Sub
      Sub EmbByteLong()
            Return ExecMem, Address:=Number, Address+1:=Number, Address+2:=Number as Long
            Address+=6
      End Sub     
}
\\ Produce a lambda function with machine code inside
UnsingedAdd=MyAdd()
Print UnsingedAdd(12, 7), UnsingedAdd(500, 100)

Nim

Translation of: C
import posix

let MAP_ANONYMOUS {.importc: "MAP_ANONYMOUS", header: "<sys/mman.h>".}: cint

proc test(a, b: cint): cint =
  # mov EAX, [ESP+4]
  # add EAX, [ESP+8]
  # ret
  var code = [0x8B'u8, 0x44, 0x24, 0x4, 0x3, 0x44, 0x24, 0x8, 0xC3]

  # create an executable buffer
  var buf = mmap(nil, sizeof(code), PROT_READ or PROT_WRITE or PROT_EXEC,
    MAP_PRIVATE or MAP_ANONYMOUS, -1, 0)

  # copy code to the buffer
  copyMem(buf, addr code[0], sizeof(code))
  # run code
  result = cast[proc(a, b: cint): cint {.nimcall.}](buf)(a, b)
  # free buffer
  discard munmap(buf, sizeof(code))

echo test(7, 12)

PARI/GP

GP can't peek and poke into memory, but PARI can add in those capabilities via C.

Translation of: C
#include <stdio.h>
#include <sys/mman.h>
#include <string.h>
#include <pari/pari.h>
 
int
test(int a, int b)
{
  char code[] = {0x8B, 0x44, 0x24, 0x4, 0x3, 0x44, 0x24, 0x8, 0xC3};
  void *buf;
  int c;
  /* copy code to executable buffer */
  buf = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
             MAP_PRIVATE|MAP_ANON,-1,0);
 
  memcpy (buf, code, sizeof(code));
  /* run code */
  c = ((int (*) (int, int))buf)(a, b);
  /* free buffer */
  munmap (buf, sizeof(code));
  return c;
}
 
void
init_auto(void)
{
  pari_printf("%d\n", test(7,12));
  return 0;
}

Pascal

Tested under Linux with Freepascal 2.6.4-32BIt ( like the Code used ) cdecl doesn't work in Freepascal under Linux 64-bit

Program Example66;
{Inspired... program to demonstrate the MMap function. Freepascal docs }
Uses
  BaseUnix,Unix;

const
  code : array[0..9] of byte = ($8B, $44, $24, $4, $3, $44, $24, $8, $C3, $00);
  a :longInt= 12; 
  b :longInt=  7;  
type
  tDummyFunc = function(a,b:LongInt):LongInt;cdecl;
Var
    Len,k  : cint;
    P    : Pointer;

begin
  len := sizeof(code);
  P:= fpmmap(nil,
             len+1 ,
             PROT_READ OR PROT_WRITE OR PROT_EXEC,
             MAP_ANONYMOUS OR MAP_PRIVATE,
             -1, // for MAP_ANONYMOUS
             0);
  If P =  Pointer(-1) then
    Halt(4);                  

  for k := 0 to len-1 do
    pChar(p)[k] := char(code[k]);

  k := tDummyFunc(P)(a,b);

  Writeln(a,'+',b,' = ',k);
  if fpMUnMap(P,Len)<>0 Then
    Halt(fpgeterrno);
end.
output
12+7 = 19

Phix

Phix has a builtin assembler, which makes the specifics of this task "the hard way to do it".

atom mem = allocate(9)
poke(mem,{#8B,#44,#24,#04,#03,#44,#24,#08,#C3})
constant mfunc = define_c_func({},mem,{C_INT,C_INT},C_INT)
?c_func(mfunc,{12,7})
free(mem)

In Phix the #ilASM statement (which has guards to allow 32/64/WIN/LNX variants) is usually used for inline assembly, for example (but sticking to the task):

atom mem = allocate(9)
poke(mem,{#8B,#44,#24,#04,#03,#44,#24,#08,#C3})
integer res
#ilASM{ mov eax,[mem]
        call :%pLoadMint -- eax:=(int32)eax, in case mem>#3FFFFFFF
        push 12
        push 7
        call eax
        add esp,8
        mov [res],eax }
?res
free(mem)

Better yet, albeit perhaps deviating somewhat from the task, but adhering in spirit, and unlike the above runnable on both 32 and 64 bit:
(Were you to produce a list.asm from this, using p -d, it would show the x86 bytes above or the x64 equivalent, using a mix of octal and hex representations)

integer res
#ilASM{ jmp @f
      ::add
    [32]
        mov eax,[esp+4]
        add eax,[esp+8]
    [64]
        mov rax,[rsp+8]
        add rax,[rsp+16]
    []  
        ret
      @@:
        push 12
        push 7
        call :add
    [32]
        add esp,8
        mov [res],eax
    [64]
        add rsp,16
        mov [res],rax
    []
      }
?res

In practice I would omit the jmp/labels/call/ret and probably just use registers instead of the stack:

integer res
#ilASM{
    [32]
        mov eax,12
        mov edx,7
        add eax,edx
        mov [res],eax
    [64]
        mov rax,12
        mov rdx,7
        add rax,rdx
        mov [res],rax
    []
      }
?res

All four cases output 19

PicoLisp

The following runs on 64-bit PicoLisp. Therefore we need some glue code to interface to the task's 32-bit code.

(setq P
   (struct (native "@" "malloc" 'N 39) 'N
      # Align
      144                  # nop
      144                  # nop

      # Prepare stack
      106 12               # pushq $12
      184 7 0 0 0          # mov $7, %eax
      72 193 224 32        # shl $32, %rax
      80                   # pushq %rax

      # Rosetta task code
      139 68 36 4 3 68 36 8

      # Get result
      76 137 227           # mov %r12, %rbx
      137 195              # mov %eax, %ebx
      72 193 227 4         # shl $4, %rbx
      128 203 2            # orb $2, %bl

      # Clean up stack
      72 131 196 16        # add $16, %rsp

      # Return 
      195 )                # ret
   foo (>> 4 P) )

# Execute
(println (foo))

# Free memory
(native "@" "free" NIL P)

Output:

19

PL/M

This code runs on an 8080 (or Z80) processor.

PL/M does not assume the existence of any operating system features or come with a standard library, so communicating with the OS actually has to be done in the same manner as calling (any other) machine language routines.

100H:

/* 8080 MACHINE CODE TO ADD TWO BYTES:
   79       MOV A,C     ; LOAD FIRST ARG INTO ACCUMULATOR
   83       ADD E       ; ADD SECOND ARG TO ACCUMULATOR
   C9       RET         ; RETURN */
   
DECLARE ADD$8080 DATA (79H, 83H, 0C9H);

/* THE 8080 PL/M CALLING CONVENTION IS THAT THE
   NEXT-TO-LAST ARG IS PUT IN (B)C, THE LAST ARG IN (D)E.
   (THE REST ARE IN MEMORY BUT WE DO NOT NEED ANY MORE.)
   THE RETURN ARGUMENT SHOULD BE IN THE ACCUMULATOR.
   WE CAN DEFINE A WRAPPER PROCEDURE TO DECLARE THE
   TYPES OF THE ARGUMENTS. */

EXEC$ADD: PROCEDURE (A,B) BYTE;
    DECLARE (A,B) BYTE;
    /* WE CAN 'GO TO' CONSTANTS OR VARIABLES, BUT NOT TO
       EXPRESSIONS. SO WE HAVE TO FETCH THE ADDRESS FIRST. */ 
    DECLARE LOC ADDRESS;
    LOC = .ADD$8080;
    GO TO LOC;
END EXEC$ADD;

/* IN FACT, PL/M DOES NOT COME WITH ANY STANDARD LIBARIES.
   IT IS FROM BEFORE THE TIME THAT YOU COULD ASSUME THERE
   WOULD EVEN BE AN OPERATING SYSTEM, THOUGH CP/M
   (THE PREDECESSOR TO DOS) WOULD QUICKLY BECOME STANDARD.
    
   WE NEED TO USE THIS EXACT TRICK TO GET CP/M TO PRINT THE
   RESULT TO THE OUTPUT. LUCKILY (AND NOT COINCIDENTALLY), 
   THE CP/M SYSCALL ENTRY POINT IS COMPATIBLE WITH THE
   PL/M CALLING CONVENTION. */

BDOS: PROCEDURE (FUNC, ARG);
    DECLARE FUNC BYTE;
    DECLARE ARG ADDRESS;
    /* 5 IS THE CP/M BDOS ENTRY POINT */
    GO TO 5; 
END BDOS;

/* WE ALSO NEED OUR OWN NUMBER OUTPUT ROUTINE. WE CAN WRITE
   IT IN PL/M, THEN USE THE ABOVE ROUTINE TO TELL CP/M
   TO PRINT THE RESULT. */
   
PRINT$NUMBER: PROCEDURE(N);
    DECLARE S (4) BYTE INITIAL ('...$');
    DECLARE P ADDRESS;
    DECLARE (N, C BASED P) BYTE;
  
    /* EXTRACT EACH DIGIT AND WRITE THEM BACKWARDS TO A STRING */
    P = .S(3);
DIGIT:
    P = P-1;
    C = (N MOD 10) + '0';
    N = N/10;
    IF N > 0 THEN GO TO DIGIT;

    /* TELL CP/M TO PRINT THE RESULTING STRING */
    CALL BDOS(9, P);
END PRINT$NUMBER;

/* USING OUR OWN MACHINE CODE WORKS IN THE SAME WAY */
CALL PRINT$NUMBER( EXEC$ADD( 7, 12) ); /* THIS PRINTS 19 */

CALL BDOS(0,0); /* EXIT */
EOF
Output:
19

PureBasic

Using the Windows API:

CompilerIf #PB_Compiler_Processor <> #PB_Processor_x86
  CompilerError "Code requires a 32-bit processor."
CompilerEndIf


; Machine code using the Windows API

Procedure MachineCodeVirtualAlloc(a,b)
*vm = VirtualAlloc_(#Null,?ecode-?scode,#MEM_COMMIT,#PAGE_EXECUTE_READWRITE)
    If(*vm)
        CopyMemory(?scode, *vm, ?ecode-?scode)
        eax_result=CallFunctionFast(*vm,a,b)
        VirtualFree_(*vm,0,#MEM_RELEASE)
        ProcedureReturn eax_result
    EndIf
EndProcedure
 
rv=MachineCodeVirtualAlloc( 7, 12)
MessageRequester("MachineCodeVirtualAlloc",Str(rv)+Space(50),#PB_MessageRequester_Ok)
 
#HEAP_CREATE_ENABLE_EXECUTE=$00040000 
 
Procedure MachineCodeHeapCreate(a,b)
hHeap=HeapCreate_(#HEAP_CREATE_ENABLE_EXECUTE,?ecode-?scode,?ecode-?scode)
    If(hHeap)
        CopyMemory(?scode, hHeap, ?ecode-?scode)
        eax_result=CallFunctionFast(hHeap,a,b)
        HeapDestroy_(hHeap)
        ProcedureReturn eax_result
    EndIf
EndProcedure
 
rv=MachineCodeHeapCreate(7,12)
MessageRequester("MachineCodeHeapCreate",Str(rv)+Space(50),#PB_MessageRequester_Ok)
End
 
; 8B442404               mov     eax,[esp+4]
; 03442408               add     eax,[esp+8]
; C20800                 ret     8
 
DataSection
scode:
Data.a $8B,$44,$24,$04,$03,$44,$24,$08,$C2,$08,$00
ecode:
EndDataSection

Python

Works with: CPython version 3.x

The ctypes module is meant for calling existing native code from Python, but you can get it to execute your own bytes with some tricks. The bulk of the code is spent establishing an executable memory area - once that's done, the actual execution takes just a few lines.

import ctypes
import os
from ctypes import c_ubyte, c_int

code = bytes([0x8b, 0x44, 0x24, 0x04, 0x03, 0x44, 0x24, 0x08, 0xc3])

code_size = len(code)
# copy code into an executable buffer
if (os.name == 'posix'):
    import mmap
    executable_map = mmap.mmap(-1, code_size, mmap.MAP_PRIVATE | mmap.MAP_ANON, mmap.PROT_READ | mmap.PROT_WRITE | mmap.PROT_EXEC)
    # we must keep a reference to executable_map until the call, to avoid freeing the mapped memory
    executable_map.write(code)
    # the mmap object won't tell us the actual address of the mapping, but we can fish it out by allocating
    # some ctypes object over its buffer, then asking the address of that
    func_address = ctypes.addressof(c_ubyte.from_buffer(executable_map))
elif (os.name == 'nt'):
    # the mmap module doesn't support protection flags on Windows, so execute VirtualAlloc instead
    code_buffer = ctypes.create_string_buffer(code)
    PAGE_EXECUTE_READWRITE = 0x40  # Windows constants that would usually come from header files
    MEM_COMMIT = 0x1000
    executable_buffer_address = ctypes.windll.kernel32.VirtualAlloc(0, code_size, MEM_COMMIT, PAGE_EXECUTE_READWRITE)
    if (executable_buffer_address == 0):
        print('Warning: Failed to enable code execution, call will likely cause a protection fault.')
        func_address = ctypes.addressof(code_buffer)
    else:
        ctypes.memmove(executable_buffer_address, code_buffer, code_size)
        func_address = executable_buffer_address
else:
    # for other platforms, we just hope DEP isn't enabled
    code_buffer = ctypes.create_string_buffer(code)
    func_address = ctypes.addressof(code_buffer)

prototype = ctypes.CFUNCTYPE(c_int, c_ubyte, c_ubyte) # build a function prototype from return type and argument types
func = prototype(func_address)                        # build an actual function from the prototype by specifying the address
res = func(7,12)
print(res)

Quackery

This task is a bit of a stretch, but the Quackery engine is a virtual machine, so technically Quackery is an assembly language.

Specifically the engine is a stack based processor that does not have direct access to RAM; instead a co-processor mediates requests for dynamically allocated memory. More details in the About Quackery section of Quackery's Category page. (Click on the header for this task.)

So, with the limitations that using Hex numbers to indicate opcodes is not possible, and understanding that when a chunk of memory is requested it is addressed by an offset and a reference to the start of the allocated memory (the numerical address of which is not available to the programmer), this is a walkthrough of the process in the Quackery shell.

As it is a stack machine arguments are passed via the data stack, reducing the specified task to a single operation; the MOV is not required. Therefore I have substituted a different specification that requires two operators - it is "add two numbers and negate the result".

Numbers in Quackery are signed BigIntegers; there are no unsigned numbers, so that part of the task is omitted.

Please don't flag this as incorrect - it's the best you'll get.

/O> ( check that + and negate are operators (i.e. op-codes ) 
... ' + operator? ' negate operator? and if [ say "true"]
... 
true
Stack empty.

/O> ( create a memory block large enough to hold them, )
... ( filled with zeros )
... 0 2 of
... 

Stack: [ 0 0 ] 

/O> ( poke the + operator into place )
... ' + swap 0 poke 
... 

Stack: [ + 0 ] 

/O> ( poke the negate operator into place )
... ' negate swap 1 poke
... 

Stack: [ + negate ] 

/O> ( Using the phrase               )
... (                                )
... (     ' +  ' negate  join        )
... (                                )
... ( would be more idiomatic, but   )
... ( the task specifies poking.     )

Stack: [ + negate ] 

/O> ( now put two numbers underneath it on the stack )
... 7 12 rot
... 

Stack: 7 12 [ + negate ] 

/O> ( and run the machine code )
... do
... 

Stack: -19

/O> ( ta-da! )

Racket

#lang racket/base

(require ffi/unsafe)

; set up access to racket internals
(define scheme-malloc-code
  (get-ffi-obj 'scheme_malloc_code #f (_fun (len : _intptr) -> _pointer)))
(define scheme-free-code
  (get-ffi-obj 'scheme_free_code #f (_fun _pointer -> _void)))

(define opcodes '(139 68 36 4 3 68 36 8 195))

(define code (scheme-malloc-code 64))

(for ([byte opcodes]
      [i (in-naturals)])
  (ptr-set! code _ubyte i byte))

(define function (cast code _pointer (_fun _ubyte _ubyte -> _ubyte)))

(function 7 12)

(scheme-free-code code)

Raku

I don't know how to translate this C line
c = ((int (*) (int, int))buf)(a, b);
so cannot solve the task with an idiomatic solution. I have also tried with Go's approach by adding a helper program but it also doesn't work out. Nonetheless I just present the attempt here so perhaps someone can fix that in 10 seconds.
use NativeCall;

constant PROT_READ   = 0x1;   #
constant PROT_WRITE  = 0x2;   #
constant PROT_EXEC   = 0x4;   # from local /usr/include/bits/mman.h
constant MAP_PRIVATE = 0x02;  #
constant MAP_ANON    = 0x20;  #

sub mmap(Pointer $addr, size_t $length, int32 $prot, int32 $flags,
   int32 $fd, size_t $offset --> Pointer) is native { * };
sub memcpy(Pointer $dest, Pointer $src, size_t $size --> Pointer) is native {*}
sub munmap(Pointer $addr, size_t $length) is native { * };

sub test (uint8 $a, uint8 $b) {
   my $code = CArray[uint8].new(
      0x90, 0x90, 0x6A, 0xC, 0xB8, 0x7, 0x0, 0x0, 0x0, 0x48, 0xC1, 0xE0, 0x20,
      0x50, 0x8B, 0x44, 0x24, 0x4, 0x3, 0x44, 0x24, 0x8, 0x4C, 0x89, 0xE3, 0x89,
      0xC3, 0x48, 0xC1, 0xE3, 0x4, 0x80, 0xCB, 0x2, 0x48, 0x83, 0xC4, 0x10, 0xC3   
   );
   my $buf =
      mmap(Pointer, nativesizeof($code), PROT_READ +| PROT_WRITE +| PROT_EXEC,
      MAP_PRIVATE +| MAP_ANON, -1, 0);

   memcpy($buf, nativecast(Pointer,$code), nativesizeof($code));

   my $c; # = ((int (*) (int, int))buf)(a, b);

   munmap($buf, nativesizeof($code));

   return $c = "Incomplete Attempt";
}

say test 7, 12;

In the mean time, here is a less desirable approach by writing a wrapper for the C entry, with the 64 bit instructions from PicoLisp ..

Translation of: C
test.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <string.h>

int test (int a, int b)
{
  char code[] = {
     0x90, 0x90, 0x6A, 0xC, 0xB8, 0x7, 0x0, 0x0, 0x0, 0x48, 0xC1, 0xE0, 0x20,
     0x50, 0x8B, 0x44, 0x24, 0x4, 0x3, 0x44, 0x24, 0x8, 0x4C, 0x89, 0xE3, 0x89,
     0xC3, 0x48, 0xC1, 0xE3, 0x4, 0x80, 0xCB, 0x2, 0x48, 0x83, 0xC4, 0x10, 0xC3
  };

  void *buf;
  int c;
  /* copy code to executable buffer */
  buf = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
             MAP_PRIVATE|MAP_ANON,-1,0);
  memcpy (buf, code, sizeof(code));
  /* run code */
  c = ((int (*) (int, int))buf)(a, b);
  /* free buffer */
  munmap (buf, sizeof(code));
  return c;
}

mcode.raku

#!/usr/bin/env raku

# 20200501 Raku programming solution

use NativeCall;

constant LIBTEST = '/home/user/LibTest.so';

sub test(uint8 $a, uint8 $b) returns uint8 is native(LIBTEST) { * };

say test 7, 12;
Output:
gcc -Wall -fPIC -shared -o LibTest.so test.c

file LibTest.so LibTest.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=90d2695df9a56b88a57144147fb9288ac07b172f, not stripped ./mcode.raku

19

Rust

This is heavily inspired by https://www.jonathanturner.org/2015/12/building-a-simple-jit-in-rust.html
Hence, only working on Linux (the only other way to disable memory execution protection on other OSes was to use other crates, which kind of defeats the purpose.)

extern crate libc;

#[cfg(all(
    target_os = "linux",
    any(target_pointer_width = "32", target_pointer_width = "64")
))]
fn main() {
    use std::mem;
    use std::ptr;

    let page_size: usize = 4096;
    let (bytes, size): (Vec<u8>, usize) = if cfg!(target_pointer_width = "32") {
        (
            vec![0x8b, 0x44, 0x24, 0x04, 0x03, 0x44, 0x24, 0x08, 0xc3],
            9,
        )
    } else {
        (vec![0x48, 0x89, 0xf8, 0x48, 0x01, 0xf0, 0xc3], 7)
    };
    let f: fn(u8, u8) -> u8 = unsafe {
        let mut page: *mut libc::c_void = ptr::null_mut();
        libc::posix_memalign(&mut page, page_size, size);
        libc::mprotect(
            page,
            size,
            libc::PROT_EXEC | libc::PROT_READ | libc::PROT_WRITE,
        );
        let contents: *mut u8 = page as *mut u8;
        ptr::copy(bytes.as_ptr(), contents, 9);
        mem::transmute(contents)
    };

    let return_value = f(7, 12);
    println!("Returned value: {}", return_value);
    assert_eq!(return_value, 19);
}

#[cfg(any(
    not(target_os = "linux"),
    not(any(target_pointer_width = "32", target_pointer_width = "64"))
))]
fn main() {
    println!("Not supported on this platform.");
}

Scala

This example is incomplete. The text below cannot be considered a contribution that specifically addresses the task, it is rather a subjective judgement. Please either contribute code or remove and place in the talk section of this page. Regards. Please ensure that it meets all task requirements and remove this message.

PEEK, POKE and inserting machine opcode makes your system vulnerable which is not quite professional.

Considered to be more harmful than useful.

Smalltalk

I agree that this is more harmful than useful. The only target audience are compiler writers and Smalltalk-core developers (of which I guess are not too many around).

Also, this is highly cpu specific, the task is for an x86 and also assuming a particular calling convention; both is probably (definitely) not the case (most, incl. myself are on 64bit machines these days).

Anyway, as a sketch, here is how to do it (I won't waste time in making an x86_64 version for the particular calling convention used on my machine; and yes: it is even different between Unix and Windows systems!):

First we need a way to allocate executable memory (btw. we should probably also care to flush any instruction caches, which I won't go into here); This is very Smalltalk dialect specific, and probably not supported on other Smalltalks; in ST/X, where inline C-code can be compiled dynamically, we can define it as:

Works with: Smalltalk/X
!ExternalBytes class methods!

mapExecutableBytes:size
%{
#   include <sys/mman.h>

    void *mem;
    OBJ retVal;
    int nBytes = __intVal(size);

    mem = mmap(nil, nBytes, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0);
    if (mem != MAP_FAILED) {
        RETURN( __MKEXTERNALBYTES_N(mem, nBytes));
    }
%}.
    self primitiveFailed
! !

next, we need the correct code; the following presents the x86 version (but do not expect this code to NOT crash the Smalltalk VM, as the calling convention is certainly wrong..)

OperatingSystem getCPUType = #'x86' ifTrue:[
    code := #[0x8B 0x44 0x24 0x04 0x03 0x44 0x24 0x08 0xC3].
] ifFalse:[
    self error:'unsupported cpu'
].

handle := ExternalBytes mapExecutableBytes:100.
handle replaceFrom:1 with:code.

" dump it (debugging only)... "
e'code at {handle address hexPrintString} is:' printCR.
(handle copyFrom:1 to:50) asByteArray hexPrintString printCR.

" create an ExternalFunction for it "
func := ExternalLibraryFunction new code:handle address.
func name:'unnamed' module:nil returnType:#int argumentTypes:#(int int).
func beCallTypeC.
func printCR.

" now call it "
result := func invokeWithArguments:{10 . 20}

With a few more tricks, it is even possible to install that function as a method in a class; but additional code needs to be generated, to assert that the passed data is correctly boxed/unboxed.

Swift

Translation of: C


Using 64-bit glue code since Swift has limited 32-bit support on x86.

import Foundation

typealias TwoIntsOneInt = @convention(c) (Int, Int) -> Int

let code = [
  144, // Align
  144,
  106, 12, // Prepare stack
  184, 7, 0, 0, 0,
  72, 193, 224, 32,
  80,
  139, 68, 36, 4, 3, 68, 36, 8, // Rosetta task code
  76, 137, 227, // Get result
  137, 195,
  72, 193, 227, 4,
  128, 203, 2,
  72, 131, 196, 16, // Clean up stack
  195, // Return
] as [UInt8]

func fudge(x: Int, y: Int) -> Int {
  let buf = mmap(nil, code.count, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0)

  memcpy(buf, code, code.count)

  let fun = unsafeBitCast(buf, to: TwoIntsOneInt.self)
  let ret = fun(x, y)

  munmap(buf, code.count)

  return ret
}

print(fudge(x: 7, y: 12))

Tcl

Translation of: C
Library: Critcl
package require critcl

critcl::ccode {
    #include <sys/mman.h>
}

# Define a command using C. The C is embedded in Tcl, and will be
# built into a shared library at runtime. Note that Tcl does not
# provide a native way of doing this sort of thing; this thunk is
# mandatory.
critcl::cproc runMachineCode {Tcl_Obj* codeObj int a int b} int {
    int size, result;
    unsigned char *code = Tcl_GetByteArrayFromObj(codeObj, &size);
    void *buf;

    /* copy code to executable buffer */
    buf = mmap(0, (size_t) size, PROT_READ|PROT_WRITE|PROT_EXEC,
            MAP_PRIVATE|MAP_ANON, -1, 0); 
    memcpy(buf, code, (size_t) size);
    /* run code */
    result = ((int (*) (int, int)) buf)(a, b);
    /* dispose buffer */
    munmap(buf, (size_t) size);

    return result;
}

# But now we have our thunk, we can execute arbitrary binary blobs
set code [binary format c* {0x8B 0x44 0x24 0x4 0x3 0x44 0x24 0x8 0xC3}]
puts [runMachineCode $code 7 12]

Note that it would be more common to put that thunk in its own package (e.g., machineCodeThunk) and then just do something like this:

package require machineCodeThunk 1.0

set code [binary format c* {0x8B 0x44 0x24 0x4 0x3 0x44 0x24 0x8 0xC3}]
puts [runMachineCode $code 7 12]

Wren

Translation of: C

Wren is a high-level scripting language and cannot execute machine code directly.

However, it is designed for embedding and we can therefore ask the host to do this for us. Here, we use a host program written in C, the language which Wren itself is written in.

/* Machine_code.wren */
 
class C {
    // pass the machine code in string form to the host
    foreign static runMachineCode(s, a, b) 
}

var a = 7
var b = 12

// x64 opcodes for this task 
var m = [
    0x55, 0x48, 0x89, 0xe5, 0x89, 0x7d,
    0xfc, 0x89, 0x75, 0xf8, 0x8b, 0x75,
    0xfc, 0x03, 0x75, 0xf8, 0x89, 0x75,
    0xf4, 0x8b, 0x45, 0xf4, 0x5d, 0xc3
]

var s = m.map { |byte| String.fromByte(byte) }.join()
System.print("%(a) + %(b) = %(C.runMachineCode(s, a, b))")


We now embed this Wren script in the following C program, compile and run it.

/* gcc Machine_code.c -o Machine_code -lwren -lm */

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include "wren.h"

unsigned char rmc_helper(const char *code, unsigned char a, unsigned char b, int l) {
    void *buf;
    unsigned char c;

    /* copy code to executable buffer */
    buf = mmap (0, l, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0);
    memcpy(buf, code, l);

    /* run code */
    c = ((unsigned char (*) (unsigned char, unsigned char))buf)(a, b);

    /* free buffer */
    munmap(buf, l);

    /* return result to caller */
    return c;
}

void C_runMachineCode(WrenVM* vm) {
    /* unpack arguments passed from Wren */
    int len;
    const char *code = wrenGetSlotBytes(vm, 1, &len);
    unsigned char a = (unsigned char)wrenGetSlotDouble(vm, 2);
    unsigned char b = (unsigned char)wrenGetSlotDouble(vm, 3);

    /* obtain result */
    unsigned char c = rmc_helper(code, a, b, len);
 
    /* return result to Wren */
    wrenSetSlotDouble(vm, 0, (double)c);
}

WrenForeignMethodFn bindForeignMethod(
    WrenVM* vm,
    const char* module,
    const char* className,
    bool isStatic,
    const char* signature) {
    if (strcmp(module, "main") == 0) {
        if (strcmp(className, "C") == 0) {
            if (isStatic && strcmp(signature, "runMachineCode(_,_,_)") == 0) {
                return C_runMachineCode;
            }
        }
    }
    return NULL;
}

static void writeFn(WrenVM* vm, const char* text) {
    printf("%s", text);
}

void errorFn(WrenVM* vm, WrenErrorType errorType, const char* module, const int line, const char* msg) {
    switch (errorType) {
        case WREN_ERROR_COMPILE:
            printf("[%s line %d] [Error] %s\n", module, line, msg);
            break;
        case WREN_ERROR_STACK_TRACE:
            printf("[%s line %d] in %s\n", module, line, msg);
            break;
        case WREN_ERROR_RUNTIME:
            printf("[Runtime Error] %s\n", msg);
            break;
    }
}

char *readFile(const char *fileName) {
    FILE *f = fopen(fileName, "r");
    fseek(f, 0, SEEK_END);
    long fsize = ftell(f);
    rewind(f);
    char *script = malloc(fsize + 1);
    fread(script, 1, fsize, f);
    fclose(f);
    script[fsize] = 0;
    return script;
}

int main() {
    WrenConfiguration config;
    wrenInitConfiguration(&config);
    config.writeFn = &writeFn;
    config.errorFn = &errorFn;
    config.bindForeignMethodFn = &bindForeignMethod;
    WrenVM* vm = wrenNewVM(&config);
    const char* module = "main";
    const char* fileName = "Machine_code.wren";
    char *script = readFile(fileName);
    WrenInterpretResult result = wrenInterpret(vm, module, script);
    switch (result) {
        case WREN_RESULT_COMPILE_ERROR:
            printf("Compile Error!\n");
            break;
        case WREN_RESULT_RUNTIME_ERROR:
            printf("Runtime Error!\n");
            break;
        case WREN_RESULT_SUCCESS:
            break;
    }
    wrenFreeVM(vm);
    free(script);
    return 0;
}
Output:
7 + 12 = 19

X86-64 Assembly

UASM 2.52

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Linux Build:
;;   $ uasm -elf64 mexec.asm
;;   $ gcc -o mexec mexec.o -no-pie
;;  With MUSL libc
;;   $ musl-gcc -o mexec mexec.o -e main -nostartfiles -no-pie
;;
;; Windows Build: 
;;   $ uasm64 -win64 mexec.asm
;;   $ link /machine:x64 /subsystem:console /release mexec.obj 
;;         kernel32.lib msvcrt.lib
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
option casemap:none
option literals:on 

WIN64 equ 1
LIN64 equ 3

ifndef __MEMEXEC_CLASS__
__MEMEXEC_CLASS__ equ 1

  if @Platform eq WIN64 
    option dllimport:<kernel32>
      HeapAlloc       proto fd:qword, flgs:dword, hlen:qword 
      HeapFree        proto fd:qword, flgs:dword, lpmem:qword 
      GetProcessHeap  proto 
      ExitProcess     proto uexit:word 
    option dllimport:<msvcrt>
      printf      proto fmt:qword, args:VARARG
      memcpy      proto d:qword, s:qword, mlen:qword 
    option dllimport:none 
      exit equ ExitProcess 
  elseif @Platform eq LIN64 
    malloc        proto SYSTEMV len:qword
    free          proto SYSTEMV m:qword 
    printf        proto SYSTEMV fmt:qword, args:VARARG
    mprotect      proto SYSTEMV m:qword, s:qword, flgs:dword
    memcpy        proto SYSTEMV d:qword, s:qword, mlen:qword 
    exit          proto SYSTEMV uexit:word     
    
    PROT_READ     equ 01h
    PROT_WRITE    equ 02h
    PROT_EXEC     equ 04h
    PROT_NONE     equ 00h
    PROT_ALL      equ PROT_READ + PROT_WRITE + PROT_EXEC
  endif

  CLASS memexec 
    CMETHOD run 
  ENDMETHODS 
    buff     db 048h, 089h, 0F8h         ;; mov rax, rdi
            db 048h, 001h, 0F0h         ;; add rax, rsi
            db 0C3h                     ;; ret
    mem     dq ?                        ;; Memory address
    mlen    dq 0                        ;; Memory size allocated?
  ENDCLASS

  pmemexec typedef ptr memexec

  METHOD memexec, Init, <VOIDARG>, <uses rcx>
    local tmp:qword 

    mov rbx, thisPtr
    assume rbx:ptr memexec
    lea rdx, [rbx].buff 
    invoke printf, CSTR("[mexec->Init] - bytecode addr: 0x%X",10), rdx 
    mov tmp, rdx
    mov [rbx].mlen, sizeof(tmp)
    invoke printf, CSTR("[mexec->Init] - bytecode len: %i",10), [rbx].mlen 
    ;; In Built memory allocator, used by the Class extention 
    ;; Uses either HeapAlloc for windows or malloc for everything else. 
    ;; Which is why I didn't use mmap in the first place.
    MEMALLOC([rbx].mlen) 
    .if rax == -1
      invoke printf, CSTR("[exec->Init->Error] - Malloc failed with -1",10)
      mov rax, rbx 
      ret
    .endif
    mov [rbx].mem, rax 
    invoke printf, CSTR("[mexec->Init] - [rbx].mem addr: 0x%X",10), [rbx].mem 
    ;; Memory wont be executable by default from Malloc, So we make it
    ;; so with mprotect. Not sure about windows, Might need to use a VirtualProtect
    ;; call..
    if @Platform eq LIN64
      invoke mprotect, [rbx].mem, [rbx].mlen, PROT_ALL
      .if rax == -1
        invoke printf, CSTR("[exec]-Init->Error] - mprotect failed with -1",10)
        mov rax, rbx 
        ret 
      .endif
    endif
    invoke printf, CSTR("[mexec->Init] Copying [rbx].buff bytecode to 0x%X",10), [rbx].mem 
    invoke memcpy, [rbx].mem, addr [rbx].buff, [rbx].mlen 
    .if rax == -1
      invoke printf, CSTR("[mexec->Init->Error] - memcpy failed with -l",10)
      mov rax, rbx 
      ret 
    .endif 
    mov rcx, [rbx].mem
    mov rax, rbx 
    assume rbx:nothing
    ret
  ENDMETHOD 

  METHOD memexec, run, <VOIDARG>, <>, arg1:qword, arg2:qword
    mov rbx, thisPtr 
    assume rbx:ptr memexec 
    mov rdi, arg1
    mov rsi, arg2 
    call [rbx].mem
    assume rbx:nothing
    ret 
  ENDMETHOD 

  METHOD memexec, Destroy, <VOIDARG>, <>
    mov rbx, thisPtr 
    assume rbx:ptr memexec 
    mov [rbx].mlen, 0
    MEMFREE([rbx].mem)
    assume rbx:nothing 
    ret 
  ENDMETHOD 

endif      ;; __MEMEXEC_CLASS__
.data
a1   dq 7
a2   dq 12

.code
main proc
  local pmem:ptr memexec 
  
  mov pmem, _NEW(memexec)
  pmem->run(a1,a2)
  invoke printf, CSTR("[pmem->run(%i, %i)] - returned: %i",10), a1, a2, rax 
  _DELETE(pmem)
  invoke exit, 0
  ret
main endp 

end

XPL0

This is for the Raspberry Pi's ARM architecture. The opcodes are effectively "poked" by merely loading the program.

func Sum(A, B);         \Return sum of A+B
char A, B;
[asm {  ldrb r0, A
        ldrb r1, B
        add  r0, r1
        strb r0, A
     }
return A;
];

IntOut(0, Sum(7, 12))
Output:
19

Z80 Assembly

;;;;;;;;;;;;;;;;;;; HEADER   ;;;;;;;;;;;;;;;;;;;
read "\SrcCPC\winape_macros.asm"
read "\SrcCPC\MemoryMap.asm"
read "\SrcALL\winapeBuildCompat.asm"
read "\SrcALL\lib\z80_opcode_chart.asm"
;;;;;;;;;;;;;;;;;;; PROGRAM  ;;;;;;;;;;;;;;;;;;;
org &1000
ld hl,machine_code_area

;assembles the following:
;LD A,7
;ADD 12
;DAA
;CALL SHOWHEX
;RET

ld (hl),&3E ;LD A,nn
inc hl
ld (hl),7
inc hl
ld (hl),&C6 ;ADD nn
inc hl
ld (hl),12
inc hl
ld (hl),&27 ;DAA
inc hl
ld (hl),&CD ;call
inc hl
ld (hl),&00 ;low byte of address of showhex
inc hl
ld (hl),&11 ;high byte of address of showhex
inc hl
ld (hl),&C9 ;RET

;FALLTHROUGH IS INTENTIONAL
machine_code_area:
;0 = nop
byte 0,0,0,0,0,0,0,0
byte 0,0,0,0,0,0,0,0
byte 0,0,0,0,0,0,0,0
byte 0,0,0,0,0,0,0,0
byte 0,0,0,0,0,0,0,0
byte 0,0,0,0,0,0,0,0
byte 0,0,0,0,0,0,0,0
byte 0,0,0,0,0,0,0,0

org &1100
read "\SrcCPC\winape_showhex.asm"  ;showhex is at &1100 thanks to the org.
read "\SrcCPC\winape_stringop.asm"

If you were doing this for real, it's much better to define the opcodes as labels so that you don't need to memorize them. (I really wish assemblers let you use instruction names as aliases for data, but I imagine that would make parsing much more difficult. Oh well.)