Protecting Memory Secrets

From Rosetta Code
Protecting Memory Secrets is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

The object of the task is to show how to minimize the exposure of secret data, basically to remove it or render it unrecoverable at a point in time such as a specific event (e.g. authorization of a transaction, completion of a transaction, completion of a process, removal of plain text after encryption, etc.

Some of you may find this task a bit unusual in that it doesn't demonstrate a specific algorithm, derive a specific output, or demonstrate a visible functionality. It's about how you can meet a security requirement in your language. It's also a security requirement that hasn't been explicitly thought through in the design of many languages (Hint: it is entirely possible that it may not be possible in all languages). There isn't going to be a single right answer or solution and it isn't about translating from another language. Even similar types of languages or even different implementations of the same language could have dissimilar solutions. The idea is to show how your language can protect secrets in memory from all comers!

Why this task? It reflects the growing need to better protect secrets as well as changes in best practices being driven by security standards.

The point is to help developers by providing patterns that they could actually use to use to achieve these objectives.

For the purposes of this task:

- secrets: information like credit card or social insurance numbers, passwords, cryptographic keys or IV's, random number seeds, etc. - threats: things like memory scrapers that may not even be written in the same language - don't assume the only threat is code written in your language.

The object of the task is to show how to minimize the exposure of secret data, basically to remove it or render it unrecoverable at a point in time such as a specific event (e.g. authorization of a transaction, completion of a transaction, completion of a process, removal of plain text after encryption, etc.

Task

The basic task is:

  • read a secret into memory
  • write or display the secret
  • securely erase or destroy the secret
  • In all cases you will want to be careful of temporary variables, system calls, and other things that could leave plain-text artifacts.

How you accomplish this will depend on your language and your knowledge of its memory management:

In all of these variations take care not to contaminate temporary variables, the stack, etc. with function calls or conversions.

1. Unmanaged Memory (programmer managed)

In languages like C, assembly, etc. this can be as simple as zeroing all of the data after use and before freeing. Take care not to contaminate temporary variables, the stack, etc.

2. Managed Memory (garbage collectors)

In languages like Java and .Net which manage memory for the programmer this can be challenging. Many of these represent some types of data (e.g. strings) as immutable objects so zeroing them isn't possible. Simply discarding them and waiting for the garbage collector to possibly sort things out doesn't meet the intent. If your language has a type for secrets or a guaranteed destructor that would work this is exactly what this task is for.

a) If possible switch to a mutable data type. In some languages strings are immutable but arrays of single characters or even numbers are not and can be "zeroed" out.

b) If you can't switch to mutable type, there may be techniques to constrain the data and force a local memory clean up.

c) Data obfuscation techniques may be possible.

d) In-memory encryption may be possible using a language or platform feature (No DIY).

e) Call out to a platform function or API that will contain and manage the secret.

f) Call out to a custom external function that will or could maintain and manage the secret (just show an external call, no need to write the external function).

g) Something completely different that achieves the same goal.

Part of your solution will be describe what you are doing and the "secret sauce" that lets it work (i.e. how you overcame challenges). References (links, book references) to supporting language features, APIs, etc. will be important to any developer needing to use this technique to comply with a standard.


References

If anyone has examples of similar regulations or standards please add them below or on the talk page.

PCI Point-to-Point-Encryption (P2PE) Standard (v3.1) PCI P2PE Standard see requirements 2A-2.3 & 2B-1.5

  • has two types of secrets called PAN (Primary Account Number) and SAD (Sensitive Authentication Data)
  • don't keep secrets in working memory any longer than strictly necessary
  • developers should have secure coding training for their language that includes managing sensitive data in memory

PCI Secure Software Standard (v1.2) PCI Secure Software Standard see requirements 1.1 & 3.5

  • has a broader definition of secrets or sensitive information
  • implement methods to render transient sensitive data irretrievable and to confirm that sensitive data is unrecoverable after the process is complete even if it is only stored temporarily in program memory / variables during operation of the software
  • requires knowledge of any platform or implementation level issues that complicate the erasure of transient sensitive data and to confirm that methods have been implemented to minimize the risk posed by these complications (methods may be external to your language).

See also Reddit discussion of the issue


J[edit]

Generally speaking, it's probably best to avoid using general purpose computers in contexts where we want to enforce the early expiration of secrets. In other words, custom VLSI or FPGA hardware would be more suitable for this requirement.

That said, expediency can often force suboptimal approaches. And, here, current implementations of J are probably less suited for this requirement than certain other languages.

Still... some tactics might prove useful here. (And the usefulness of these tactics could be better assessed if we had some mechanisms to concretely measure their effectiveness in specific examples.)

(1) Incorporating "input data" as "memory mapped files" would eliminate a variety of intermediate results, as this would eliminate J's normal "copy on write" or "copy on update" semantics.

(2) Another (often conflicting) approach would be to diffuse the "secret bits" throughout memory and rely on J's ability to proceed with regular access patterns as a veil over the secret part.

(3) As a variation on (2), adopting an ongoing stream of noise, to accompany the secrets, would offer both distraction and a statistical tendency to overwrite any lingering remnants of secrets.

(4) If the J engine (libj) is built with the compiler flag MEMAUDIT=4 (or some other value which has that bit set), then J will write garbage to memory when values are freed. (Ensuring that values are freed means tracking reference counts, though it's also worth noting that names can be discarded early using erase.)

Still, ... if timely expiration of secrets is critical, specialized hardware is probably the way to go.

Julia[edit]

# turn off garbage collector to prevent the Secret from being copied to a new page """
GC.enable(false)

""" struct Secret, contains a vector of Char that can be blanked with setzeros() """
struct Secret
    chars::Vector{Char}
    Secret(secret_length) = new(fill(Char(0), secret_length))
end

# set the memory in the Secret struct to zeros """
setzeros(s) = (fill!(s.chars, Char(0)); return s)

# Use C library character-based IO to prevent making julia immutable String """
getch() = ccall(:_getch, Cint, ())
putch(ch) = (ccall(:_putch, Cint, (UInt8,), ch); flush(stdout))


""" test the function """
function testsecret(maxlength = 32)
    print("Enter secret (up to $maxlength ascii chars): ")
    secret = Secret(maxlength)

    # keep track of the actual length of the entered Secret Char array
    slen = 0

    # entry of secret does not echo chars (unix password style)
    for _ in 1:maxlength
        ch = Char(getch())
        ch in ['\r', '\n'] && break
        slen += 1
        secret.chars[slen] = ch
    end

    # now display what was entered
    print("\nDisplaying secret: ")
    for i in 1:slen
        putch(secret.chars[i])
    end

    # destroy the secret
    print("\nDestroying secret... ")
    setzeros(secret)
    slen = 0 # zero out the length, which is on the stack for this function
    println("Secret is now: $secret.")
end

testsecret()
Output:
Enter secret (up to 32 ascii chars):
Displaying secret: abcdefg
Destroying secret... Secret is now: Secret(['\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0']).

Phix[edit]

The main recommendation for Phix would be to store any sensitive data in programmer managed memory via allocate(), poke(), and peek().
Ideally you should only ever keep one or two bytes of that in hll (named) variables at any given point, but that may not be practical.
Higher-level language constructs, specifically sequences and strings, are mutable and extendable, and the latter risks leaving partial fragments of sensitive data scattered across the heap, so if such local/temporary variables are needed it would be much better if they are retrieved from said memory in one go rather than built up (say) one byte at a time. They are also (reference counted and) by default have copy-on-write semantics. Parameters, apart from integers, are passed by reference in Phix, so it is perfectly safe to pass sensitive data to routines as long as you do not modify it, which would create a clone that needs to be independently shredded. You can also partly disable cow-cloning of sequences (but not [binary] strings) via "with javascript_semantics" - note that applies to the entire application rather than just the parts that deal with sensitive data. See builtins/IupRawStringPtr.e for some inline assembly that obtains the raw memory address of a string using inline assembly, as the basis for wiping memory, which would of course be needed for any hll variables, and something rather similar could be constructed for clearing [flat] sequences of 31/63-bit integers. Obviously it would be far safer to invoke allocate() five times than construct one hll sequence of string {name,password,card_number,pin_number,security_code}, but no additional risk to keep those five allocated memory addresses in a single hll sequence, as long as the actual data gets wiped a.s.a.p.

Also bear in mind that any fatal error may dump local variables to an ex.err file, which is not dissimilar to a core dump, so it is probably wise to wrap the three basic steps of the task (ie read, process, wipe) in a try/catch to prevent that from ever happening. An ex.err would only show the raw memory address from allocate() as a meaningless (decimal) number, and none of the memory contents, whereas it would show any strings and similar peek()'d from it and stored in hll variables. It might however also be possible to suppress a bit of that using a "without debug" compiler directive on a few selected sensitive routines. It is worth nothing that any "core dump" type files are, due to a combination of their persistence and relative ease of reading/copying, a much more significant security risk than transient in-memory variables and the raw process heap, hence any programming level measures should be matched by admin level measures to locate/detect and either securely erase or otherwise properly secure any such files, since they will often be created without any access restrictions whatsoever. It may also help to deliberately include (say) "senstive" in file names and selected routines to at least enable a quick vetting of ex.err files before sending them off-site, but that being much more "just don't tell me" than anything even vaguely "secure".

As for Phix code transpiled to JavaScript, unfortunately I cannot say anything particularly useful and would join you in waiting to see whether a JavaScript entry ever appears for this task, and what if any parts of that could possibly be reused or stolen.

Wren[edit]

Library: Wren-crypto

Wren is designed to be an embedded scripting language - 'embedded' in the sense that it is embedded in another application, not used for scripting an embedded device though the latter may be possible if the device has sufficient memory available.

As such Wren code is effectively sandboxed and can only communicate with external processes (even just printing to the terminal) to the extent that the host allows it to do so.

It is written in C (as is the embedding API) and consequently most host applications are written in C/C++ though other powerful languages with a C FFI such as Rust or Go can be used instead.

Wren itself can be thought of as a sort of mini-Java in that it is deeply object oriented, managed by its VM, garbage collected and strings are immutable.

Given this state of affairs, I think in practice most host application developers would conclude that any secret should be handled entirely by the host where it can be cleaned up after use by zeroing memory in the usual fashion or, if it does need to be shared with Wren code, then it should be passed as a (possibly encrypted) list of bytes which the Wren code would need to zero out before it was eventually garbage collected.

Wren also has a tool (Wren-CLI) for running Wren scripts directly without a host which is the main focus for solving RC tasks. In reality, there is still a host (written in C) which uses the cross-platform library 'libuv' to implement IO functionality (both terminal and file) and provides Wren with the appropriate modules, though this is transparent to the user.

The tool is a work in progress and would need considerable hardening to handle secrets in a secure fashion. Perhaps the nearest one could get to the task as described would be the following though note that, to display anything to the terminal, Wren first converts it to a string and as these are immutable there is no way to remove them from memory before the GC frees them. To avoid this, we therefore display the bytes entered after encryption by SHA-256 so that only the hash will remain in memory.

import "io" for Stdin
import "./crypto" for Sha256

// Ensure input is not echoed or buffered
Stdin.isRaw = true

// Obtain secret from terminal input as a byte list
System.print("Enter a secret word and press return:")
var bytes = []
while (true) {
    var b = Stdin.readByte()
    if (b == 13) break
    bytes.add(b)
}

// Encrypt bytes and display on terminal
System.print(Sha256.digest(bytes))

// Zero out bytes
for (i in 0...bytes.count) bytes[i] = 0

// Check it worked
System.print(bytes)

// Make byte list eligible for GC
bytes = null

// Force an immediate GC
System.gc()

// Restore normal input before exit
Stdin.isRaw = false
Output:

This assumes the secret word entered is 'abc'.

Enter a secret word and press return:
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
[0, 0, 0]