Compiler/Preprocessor

From Rosetta Code
Compiler/Preprocessor is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

This task modifies the source code prior to the lexical analysis similar to the C built-in preprocessor. All design decisions go with as simple an example as possible to show concept and match the simple language the other tasks support.

Create a preprocessor for the simple programming language specified in the lexical analysis task referenced below. The program should read input from a file and/or stdin, and write output to a file and/or stdout.

The program should treat any line starting with a hashtag (#) as a command to process. There are currently two valid commands, include and define. No space between the hashtag and its command. Multiple whitespace is treated the same as one.

The include command must be followed by whitespace and a double quoted string whose contents is the actual file to read. Includes should allow the inclusion of other files to a recursive limit of five active header files plus the original source file.

The define command must be followed by whitespace and a new macro name. Redefinition or overloading of the macro name is illegal. The same character convention for naming variables in the language is used for macro names. No whitespace is required in the arguments but is allowed between every token. When there are no parameters, both the definition or usage must either have an empty argument list or there must not be one. The empty list is required to avoid confusion when the definition needs parenthesizes to force precedence. If there is a close parenthesis, the whitespace trailing it is optional. Otherwise, it is required. From that point to end of line is the definition, whitespace removed from both the start and end of it. Whitespace within the definition between tokens must be maintained. Any names within the definition are treated as macro names first before it is assumed they are a variable in the language during the usage.

To make it easier to find, the usage will be within hashtags, and replaces its usage elsewhere in the files processed. These usages will be processed everywhere they are encountered without regard to the syntax of the sample language. The calling arguments replace the define's parameters as a simple string substitution. You may not assume the usage proceeds in an order to form complex combinations. Tokens detected during definition processing can remain separated during usage processing. If the contents within the double hashtags is not a valid macro usage, its entire text is written to the output as if it was not detected. It is not required to use the ending hashtag as the start of another macro usage. Both start and end hashtags must be on the same line.

There are three possible [optional] command line arguments, debug, input, and output. Debug is an implementer depended switch such as -d or --debug to allow the user to pick between the commands vanishing from the output or the commands appearing as comments in the output. Debug can be specified in any order on the command line beyond the command. Input is the file to process, when missing the console input is used. The Input is always specified before the Output. Output is the file to create, when missing the console output is used. If only one file is specified, it is the input file. If you wish to use an output file and console input, you must specify both arguments, who's usage is left up to the implementer. It is not required that there be a way to specify the console on the command line so you could use a file for the output and console for the input.

Input Specification

This is an example usage of this concept, given a header and source the output should be able to feed straight into the lexical analyzer task.

~~ Header.h ~~
#define area(h, w) h * w

~~ Source.t ~~
#include "Header.h"
#define width 5
#define height 6
area = #area(height, width)#;
Output Specification

If you do not support a runtime debugging flag, your code should support only the second version. Otherwise, it should provide either. Yielding code output of:

area = 6 * 5;

Or:

/* Include Header.h */
/* Define area(h, w) as h * w */
/* End Header.h */
/* Define width as 5 */
/* Define height as 6 */
/* Use area, height, and width */
area = 6 * 5;
Related Tasks




Julia

""" Rosetta Code task Compiler/Preprocessor task """

""" If the line is a macro definition, add the macro to macros. """
function addmacro!(macros, line)
    macroname, comment = "", ""
    if (m = match(r"#define\s+(\w[\w\d]*)\(([^\)]+)\)\s+(.+)", line)) != nothing
        macroname, argstring, definition = m.captures
        comment = "/* Define $macroname($argstring) as $definition */\n"
        @assert !haskey(macros, macroname) "Duplicate macro names are not allowed"
        argv = strip.(split(argstring, r"\s*,\s*"))
        @assert allunique(argv) "Parameter argument symbols must be different from each other"
        def = " " * definition
        defstrings, argnums = String[], Int[]
        for m in reverse(collect(eachmatch(Regex(join(argv, "|")), def)))
            cutposition = m.offset + length(m.match)
            pushfirst!(defstrings, def[begin+cutposition-1:end])
            pushfirst!(argnums, findfirst(==(m.match), argv))
            def = def[begin:m.offset-1]
        end
        pushfirst!(defstrings, def)
        macros[macroname] = (defstrings, argnums)
    elseif (m = match(r"#define\s+(\w[\w\d]*)(?:\(\))?\s+(.+)", line)) != nothing
        macroname, definition = m.captures
        comment = "/* Define $macroname as $definition */\n"
        macros[macroname] = ([string(definition)], Int[])
    else
        return false, ""
    end
    return true, comment
end

""" If the line contains macro or macros, substitute all and return results. """
function replaceifmacro(macros, line; withinhashtag = true)
    replacedline, allmacronames, usedmacros = line, join(keys(macros), "|"), String[]
    for m in reverse(
        collect(
            eachmatch(
                Regex(
                    withinhashtag ? "#(" * allmacronames * raw")(?:(?:\(([^\)]+)\)#)|#)" :
                    "(" * allmacronames * raw")(?:(?:\(([^\)]+)\))?)",
                ),
                replacedline,
            ),
        ),
    )
        push!(usedmacros, string(m.captures[1]))
        if m.offsets[end] != 0 # has arguments
            args = split(m.captures[end], r"\s*,\s*")
            for (i, arg) in enumerate(args)
                used, newtext = replaceifmacro(macros, arg; withinhashtag = false)
                if !isempty(used)
                    submacro = first(used)
                    push!(usedmacros, submacro)
                    args[i] = macros[submacro][1][1]
                end
            end
            strings, nums = macros[m.captures[1]]
            s =
                first(strings) *
                prod([args[n] * strings[i+1] for (i, n) in enumerate(nums)])
            replacedline =
                replacedline[begin:m.offsets[1]-2] *
                s *
                replacedline[m.offset+length(m.match):end]
        else
            replacedline =
                replacedline[begin:m.offsets[1]-2] *
                macros[m.captures[1]][1][1] *
                replacedline[m.offset+length(m.match):end]
        end
    end
    return usedmacros, replacedline
end

""" If a line starts with #include, return the lines in the include file. """
function processinclude(line)
    lines, fname = String[], ""
    if (m = match(r"#include\s+\"([^\"]+)\"", line)) != nothing
        fname = first(m.captures)
        lines = readlines(fname, keep = true)
    end
    return fname, lines
end

""" Preprocess the file to prepare it for the Rosetta Code lexical analyzer task. """
function preprocess(instream, outstream, debug)
    lines = readlines(instream, keep = true)
    macros = Dict{String,Tuple{Vector{String},Vector{Int}}}()
    linesread = 0
    while !isempty(lines)
        line = popfirst!(lines)
        linesread += 1
        if startswith(line, '#')
            fname, includelines = processinclude(line)
            if !isempty(fname)
                if debug
                    pushfirst!(includelines, """/* Include $fname */\n""")
                    push!(includelines, """/* End $fname */\n""")
                end
                lines = append!(includelines, lines)
            elseif startswith(line, r"#define\s")
                gotmacro, comment = addmacro!(macros, line)
                gotmacro && debug && print(outstream, comment)
            else
                error("Unknown preprocessor directive in line: $line")
            end
        else
            usedmacros, replacedline = replaceifmacro(macros, line)
            if !isempty(usedmacros)
                debug && print(outstream, "/* Used " * join(usedmacros, ", ", " and ") * " */\n")
                line = replacedline
            end
            print(outstream, line)
        end
    end
    return linesread
end

""" Process command line, open files if needed, hand off to function `func`, close files """
function runwithopts(func, minargs = 0, maxargs = 3)
    minargs <= length(ARGS) <= maxargs || error("Wrong number of arguments ($minargs:$maxargs)")
    debug, infile, outfile = false, "", ""
    for arg in ARGS
        if arg == "-d" || arg == "-debug"
            debug = true
        elseif isempty(infile)
            infile = arg
        elseif isempty(outfile)
            outfile = arg
        end
    end
    ioin = isempty(infile) ? stdin : open(infile, "r")
    ioout = isempty(outfile) ? stdout : open(outfile, "w")

    func(ioin, ioout, debug)
    !isempty(infile) && close(ioin)
    !isempty(outfile) && close(ioout)
end

runwithopts(preprocess)
Output:
Same output as Phix entry.

Phix

No attempt to implement command line arguments, obviously trivial on desktop/Phix but not possible under pwa/p2js, aka within a browser.

--
-- demo\rosetta\Compiler\preprocess.exw
-- ====================================
--
-- Note this uses js_open() and js_gets() directly, to avoid distributing another two files.
-- Also implemented as a standalone demonstration of the general approach, and as such
-- might require a bit more work to integrate this properly into the likes of next_ch(),
-- unless of course you write it out to disk and/or add some kind of js_write() function.
-- Also as noted this won't cope particularly well with #macro("1st,first","2nd")#, etc.
-- In other words splitting up the parameters may need to be made significantly smarter.
--
with javascript_semantics
include core.e -- (see Compiler/lexical_analyzer#Phix - specifcally js_io.e's Source.t)

sequence stack, includes, defines, arglst, bodies
integer stack_ptr

procedure begin(string filename)
    -- (to allow with and without comments, sequentially, and
    --  specifically not moaning about things being redefined)
    stack = repeat(0,5) -- (limited as per task description)
    includes = repeat("?",5) -- ""
    defines = {} -- eg "area(h, w) h * w" -> "area"
    arglst = {} -- -1 if () absent, else eg {"h","w"}
    bodies = {} -- eg "area(h, w) h * w" -> {1," * ",2}
    stack_ptr = 1
    stack[stack_ptr] = js_open(filename)
end procedure
        
function get_word(string line, integer k=1)
    string word = ""
    for ch in line[k..$] do
        if not find(charmap[ch],{LETTER,DIGIT}) then exit end if
        word &= ch
    end for
    return word
end function

function preprocess(string fragment, bool comments)
    string word = get_word(fragment)
    integer k = find(word,defines)
    assert(k!=0,"no such macro:%s",{word})
    sequence used = {word},
             body = deep_copy(bodies[k])
    fragment = fragment[length(word)+1..$]
    object args = arglst[k]
    if sequence(args) then
        assert(fragment[1]='(' and fragment[$]=')')
        fragment = fragment[2..$-1]
        // NB: won't cope with eg #macro("1st,first","2nd")#, etc.
        sequence params = apply(split(fragment,','),trim)
        assert(length(params)==length(args))
        for i=1 to length(body) do
            if integer(body[i]) then
                word = params[body[i]]
                k = find(word,defines)
                if k then
                    // (this /might/ want to be recursive...)
                    used = append(used,word)
                    assert(atom(arglst[k])) // placeholder
                    word = join(bodies[k],"")
                end if
                body[i] = word
            end if
        end for
    else
        assert(fragment="")     
    end if
    if comments then
        printf(1,"/* Use %s */\n",{join(used,", ",", and ")})   
    end if
    string replacement = join(body,"")
    return replacement
end function

for comments in {false,true} do
    printf(1,"with%s comments:\n",{iff(comments?"":"out")})
    begin("Source.t")
    while stack_ptr do
        object oneline = js_gets(stack[stack_ptr])
        if oneline=EOF then
            if comments and stack_ptr>1 then
                printf(1,"/* End %s */\n",{includes[stack_ptr]})
            end if
            stack_ptr -= 1
        else
            integer k = find('#',oneline)
            if k then
                string word = get_word(oneline,k+1)
                if word="include" then
                    stack_ptr += 1
                    assert(k=1)
                    -- 10 is length("#include ")+1
                    oneline = trim(oneline[10..$],` "`)
                    stack[stack_ptr] = js_open(oneline)
                    if comments then
                        printf(1,"/* Include %s */\n",{oneline})
                        includes[stack_ptr] = oneline
                    end if
                elsif word="define" then
                    assert(k=1)
                    -- 9 is length("#define ")+1
                    word = get_word(oneline,9)
                    assert(not find(word,defines))
                    defines = append(defines,word)
                    oneline = trim(oneline[9+length(word)..$])
                    sequence body = {}
                    if oneline[1]='(' then
                        k = find(')',oneline,2)
                        assert(k>0,"closing parenthesis missing")
                        sequence args = apply(split(oneline[2..k-1],','),trim)
                        oneline = trim(oneline[k+1..$])
                        string fixed = ""
                        while length(oneline) do
                            word = get_word(oneline)
                            if length(word)=0 then
                                fixed &= oneline[1]
                                oneline = oneline[2..$]
                            else
                                k = find(word,args)
                                if k then
                                    if length(fixed) then
                                        body = append(body,fixed)
                                        fixed = ""
                                    end if
                                    body = append(body,k)
                                else
                                    fixed &= word
                                end if
                                oneline = oneline[length(word)+1..$]
                            end if
                        end while
                        if length(fixed) then
                            body = append(body,fixed)
                            fixed = ""
                        end if
                        arglst = append(arglst,args)
                    else
                        body = {oneline}
                        arglst = append(arglst,-1)
                    end if
                    bodies = append(bodies,body)
                    if comments then
                        object al = arglst[$]
                        string n = defines[$],
                               a = iff(atom(al)?"":sprintf("(%s)",{join(al,',')}))
                        sequence b = deep_copy(bodies[$])
                        for i=1 to length(b) do
                            if atom(b[i]) then
                                b[i] = al[b[i]]
                            end if
                        end for
                        b = join(b,"")      
                        printf(1,"/* Define %s%s as %s */\n",{n,a,b})
                    end if
                else
                    while k do
                        integer l = find('#',oneline,k+1)
                        assert(l!=0,"missing closing #")
                        string fragment = oneline[k+1..l-1],
                            replacement = preprocess(fragment,comments)
                        oneline[k..l] = replacement
                        k = find('#',oneline,k+length(replacement)-1)
                    end while
                    printf(1,"%s\n",{oneline})
                end if
            else
                printf(1,"%s\n",{oneline})
            end if
        end if
    end while
    printf(1,"\n")
end for

--close_files()
Output:
without comments:
area = 6 * 5;

with comments:
/* Include Header.h */
/* Define area(h,w) as h * w */
/* End Header.h */
/* Define width as 5 */
/* Define height as 6 */
/* Use area, height, and width */
area = 6 * 5;

Python

#!/usr/bin/env python
"""Rosetta Code compiler/preprocessor. Requires Python >= 3.7."""
import re
import sys

from typing import Dict
from typing import Iterator
from typing import List
from typing import NamedTuple
from typing import Optional
from typing import TextIO
from typing import Tuple

MAX_INCLUDE_DEPTH = 5

TOKEN_INCLUDE = "INCLUDE"
TOKEN_CONSTANT = "CONSTANT"
TOKEN_MACRO = "MACRO"
TOKEN_CALL = "CALL"
TOKEN_STRING = "STRING"
TOKEN_COMMENT = "COMMENT"
TOKEN_LITERAL = "LITERAL"
TOKEN_ILLEGAL = "ILLEGAL"


class Token(NamedTuple):
    kind: str
    name: str
    params: str
    expr: str
    start: int
    end: int


FILENAME_PATTERN = r"[_a-zA-Z][_a-zA-Z0-9\.]*"
IDENT_PATTERN = r"[_a-zA-Z][_a-zA-Z0-9]*"
PARAMS_PATTERN = r"[_a-zA-Z0-9\., \t]*?"

TOKEN_RULES = (
    (
        TOKEN_STRING,
        r"\"[^\"\n]*?\"",
    ),
    (
        TOKEN_COMMENT,
        r"/\*.*?\*/",
    ),
    (
        TOKEN_LITERAL,
        r"[^#]+",
    ),
    (
        TOKEN_INCLUDE,
        rf"^\#include[ \t]+\"(?P<filename>{FILENAME_PATTERN})\"\s*?$",
    ),
    (
        TOKEN_CONSTANT,
        rf"^\#define[ \t]+(?P<constant>{IDENT_PATTERN}) +(?P<constant_expr>.*?)$",
    ),
    (
        TOKEN_MACRO,
        rf"^\#define[ \t](?P<macro>{IDENT_PATTERN})"
        rf"\((?P<macro_params>{PARAMS_PATTERN})\) +(?P<macro_expr>.*?)$",
    ),
    (
        TOKEN_CALL,
        rf"\#(?P<call>{IDENT_PATTERN})\((?P<call_params>{PARAMS_PATTERN})\)\#",
    ),
    (
        TOKEN_ILLEGAL,
        r".",
    ),
)

RE_TOKENS = re.compile(
    "|".join(f"(?P<{name}>{pattern})" for name, pattern in TOKEN_RULES),
    re.MULTILINE,
)


class PreprocessorError(Exception):
    def __init__(
        self,
        *args: object,
        source: str,
        filename: str,
        token: Token,
    ) -> None:
        super().__init__(*args)
        self.source = source
        self.token = token
        self.filename = filename

    def __str__(self) -> str:
        msg = super().__str__()
        line_num = self.source[: self.token.start].count("\n") + 1
        return f"{msg} ({self.filename}:{line_num})"


def tokenize(source: str, filename: str) -> Iterator[Token]:
    for match in RE_TOKENS.finditer(source):
        kind = match.lastgroup

        if kind in (TOKEN_LITERAL, TOKEN_COMMENT, TOKEN_STRING):
            yield Token(
                TOKEN_LITERAL,
                "",
                "",
                match.group(),
                match.start(),
                match.end(),
            )
        elif kind == TOKEN_INCLUDE:
            yield Token(
                TOKEN_INCLUDE,
                "",
                "",
                match.group("filename"),
                match.start(),
                match.end(),
            )
        elif kind == TOKEN_CONSTANT:
            yield Token(
                kind,
                match.group("constant"),
                "",
                match.group("constant_expr"),
                match.start(),
                match.end(),
            )
        elif kind == TOKEN_MACRO:
            yield Token(
                kind,
                match.group("macro"),
                match.group("macro_params"),
                match.group("macro_expr"),
                match.start(),
                match.end(),
            )
        elif kind == TOKEN_CALL:
            yield Token(
                kind,
                match.group("call"),
                match.group("call_params"),
                "",
                match.start(),
                match.end(),
            )
        elif kind == TOKEN_ILLEGAL:
            # Probably part of an invalid macro call
            yield Token(
                TOKEN_LITERAL,
                "",
                "",
                match.group(),
                match.start(),
                match.end(),
            )
        else:
            raise PreprocessorError(
                f"unexpected token kind {kind} ({match.group()!r})",
                source=source,
                filename=filename,
                token=Token(
                    TOKEN_ILLEGAL,
                    "",
                    "",
                    match.group(),
                    match.start(),
                    match.end(),
                ),
            )


def preprocess(
    source: str,
    filename: str,
    stream: TextIO,
    debug: bool = False,
    constants: Optional[Dict[str, str]] = None,
    include_depth: int = 0,
    macros: Optional[Dict[str, Tuple[str, int]]] = None,
) -> None:
    constants = constants if constants is not None else {}
    include_depth = include_depth
    macros = macros if macros is not None else {}
    left_strip = False

    for token in tokenize(source, filename):
        if token.kind == TOKEN_LITERAL:
            if left_strip:
                stream.write(_lstrip_one(token.expr))
                left_strip = False
            else:
                stream.write(token.expr)
        elif token.kind == TOKEN_CONSTANT:
            if debug:
                stream.write(f"/* Define {token.name} as {token.expr} */\n")

            if token.name in constants:
                raise PreprocessorError(
                    f"illegal constant redefinition '{token.name}'",
                    source=source,
                    filename=filename,
                    token=token,
                )

            constants[token.name] = token.expr
            left_strip = True
        elif token.kind == TOKEN_INCLUDE:
            if include_depth + 1 > MAX_INCLUDE_DEPTH:
                raise PreprocessorError(
                    "maximum include depth reached",
                    source=source,
                    filename=filename,
                    token=token,
                )

            if debug:
                stream.write(f"/* Include {token.expr} */\n")

            with open(token.expr) as fd:
                preprocess(
                    fd.read(),
                    filename,
                    stream,
                    debug,
                    constants,
                    include_depth + 1,
                    macros,
                )

            if debug:
                stream.write(f"/* End {token.expr} */\n")

            left_strip = True
        elif token.kind == TOKEN_MACRO:
            if debug:
                stream.write(
                    f"/* Define {token.name}({token.params}) as {token.expr} */\n"
                )

            if token.name in macros:
                raise PreprocessorError(
                    f"illegal macro redefinition '{token.name}'",
                    source=source,
                    filename=filename,
                    token=token,
                )

            params = parse_parameters(token.params)
            expr = parse_expression(params, token.expr)
            macros[token.name] = (expr, len(params))
            left_strip = True
        elif token.kind == TOKEN_CALL:
            params = parse_parameters(token.params)
            expr, n_args = macros.get(token.name, ("", 0))

            if debug:
                if params:
                    used = [token.name, *params]
                    stream.write(f"/* Use {', '.join(used[:-1])} and {used[-1]} */ ")
                else:
                    stream.write(f"/* Use {token.name} */ ")

            if len(params) != n_args:
                print(token.name, len(params), n_args, macros)
                stream.write(source[token.start : token.end])
            else:
                stream.write(
                    substitute_constants(
                        constants,
                        expr.format(*params),
                    )
                )

            left_strip = False
        else:
            raise PreprocessorError(
                f"unknown token kind {token}",
                source=source,
                filename=filename,
                token=token,
            )


def parse_parameters(params: str) -> List[str]:
    return [param.strip() for param in params.split(",")]


def parse_expression(params: List[str], expr: str) -> str:
    _params = {p: str(i) for i, p in enumerate(params)}
    pattern = "|".join(rf"\b{param}\b" for param in params)
    return re.sub(
        f"({pattern})",
        lambda m: f"{{{_params[m.group(0)]}}}",
        expr,
    )


def substitute_constants(constants: Dict[str, str], expr: str) -> str:
    pattern = "|".join(rf"\b{const}\b" for const in constants)
    return re.sub(
        f"({pattern})",
        lambda m: constants[m.group(0)],
        expr,
    )


def _lstrip_one(s: str) -> str:
    """Strip at most one newline from the left of `s`."""
    if s and s[0] == "\n":
        return s[1:]
    return s


if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser(description="Rosetta Code compiler preprocessor.")
    parser.add_argument(
        "infile",
        nargs="?",
        type=argparse.FileType("r"),
        default=sys.stdin,
        help="source file to preprocess, '-' means stdin (default: stdin)",
    )
    parser.add_argument(
        "outfile",
        nargs="?",
        type=argparse.FileType("w"),
        default=sys.stdout,
        help="destination file (default: stdout)",
    )
    parser.add_argument(
        "--debug",
        "-d",
        action="store_true",
        help="enable debugging output (default: false)",
    )

    args = parser.parse_args()
    preprocess(args.infile.read(), args.infile.name, args.outfile, debug=args.debug)
Output:

Command line options. Show the help message and exit.

$ ./preproc.py --help
usage: preproc.py [-h] [--debug] [infile] [outfile]

Rosetta Code compiler preprocessor.

positional arguments:
  infile       source file to preprocess, '-' means stdin (default: stdin)
  outfile      destination file (default: stdout)

options:
  -h, --help   show this help message and exit
  --debug, -d  enable debugging output (default: false)

Where Source.t and Header.h contain the source code shown in the task description.

$ ./preproc.py Source.t
area = 6 * 5;

And with debugging enabled. Notice debug output appears immediately before a macro call rather than on the line above.

$ ./preproc.py source.t --debug
/* Include Header.h */
/* Define area(h, w) as h * w */
/* End Header.h */
/* Define width as 5 */
/* Define height as 6 */
area = /* Use area, height and width */ 6 * 5;

And piped to the lexer defined in the Lexical Analysis task.

$ ./preproc.py source.t | ./compiler_lex.py 
    1      1   Identifier      area
    1      6   Op_assign     
    1      8   Integer              6
    1     10   Op_multiply   
    1     12   Integer              5
    1     13   Semicolon     
    2      1   End_of_input

Wren

Library: Wren-ioutil
Library: Wren-str
Library: Wren-pattern
Library: Wren-seq

A fairly naive solution compared to the complexities of a modern C pre-processor.

I've made the simplifying assumption that macro parameters in a macro definition will always be separated from other tokens by at least one space.

I've also assumed that the header files will always be actual files, and never entered from the console.

Note that the program errors out if there are any syntax or other errors when defining the macros.

import "os" for Process
import "./ioutil" for FileUtil, File, Input
import "./str" for Char
import "./pattern" for Pattern
import "./seq" for Lst, Stack

var isIdentChar = Fn.new { |c| Char.isAsciiAlphaNum(c) || c == "_" }

var isIdent = Fn.new { |s|
    if (s == "") return false
    if (Char.isDigit(s[0])) return false
    return s.all { |c| isIdentChar.call(c) }
}

var clargs = Process.arguments
if (clargs.count > 3) {
    System.print("There can't be more than 3 command line arguments:
        -d     // debug mode, comments will be included in output 
        input  // filename: if absent  or  == console, gets input from console
        output // filename: if absent  or  == console, sends output to console")
    return
}
var debug = clargs.contains("-d") || clargs.contains("--debug")
if (debug) {
    clargs.remove("-d")
    clargs.remove("--debug")
}
var inputFileName = "console"
if (clargs.count > 0) inputFileName = clargs[0]
var lines
if (inputFileName != "console") {
   lines = FileUtil.readLines(inputFileName)
} else {
    var n = Input.integer("How many lines are to be entered? : ", 1)
    System.print("\nOK, enter the lines and press enter after each one.\n")
    lines = List.filled(n, null)
    for (i in 0...n) lines[i] = Input.text("")
    System.print()
}

var macros = []
var comments = []
var used = []
var includes = Stack.new()
var i = 0
while (i < lines.count) {
    var line = lines[i].trim()
    if (line == "" || !line.startsWith("#")) {
        i = i + 1
    } else if (line.startsWith("#include")) {
        var fname = line[8..-1].trimStart()
        if (fname.count < 3 || fname[0] != "\"" || fname[-1] != "\"") {
            Fiber.abort("'#include' directive must be followed by a non-empty string.")
        }
        var lines2 = FileUtil.readLines(fname[1..-2])
        if (includes.count == 5) {
            Fiber.abort("Can't have more than 5 active 'include' files.")
        } else {
            includes.push([fname, i + lines2.count - 1])
            if (debug) comments.add("/* Include Header %(fname) */")
        }
        lines = lines[0...i] + lines2 + lines[i+1..-1]
    } else if (line.startsWith("#define")) {
        line = line[7..-1].trimStart()
        if (line == "") Fiber.abort("Missing macro name.")
        var name = ""
        var j = 0
        while (j < line.count) {
            var c = line[j]
            if (isIdentChar.call(c)) name = name + c else break
            j = j + 1
        }
        if (name == "") Fiber.abort("Missing macro name.")
        if (!isIdent.call(name)) Fiber.abort("Macro name is not a valid identifier.")
        if (macros.any { |macro| macro[0] == name }) Fiber.abort("Macro '%(name)' cannot be redefined.")
        if (j == line.count) Fiber.abort("Missing macro definition.")
        var paramStr = ""
        var params = null
        if (line[j] == "(") {
            j = j + 1
            var k = line.indexOf(")", j)
            if (k == -1) Fiber.abort("Missing ')' in macro parameter list.")
            if (k == j) {
                params = []
            } else {
                paramStr = line[j...k]
                params = paramStr.split(",")
                params = params.map { |param| param.trim() }.toList
                if (!params.all { |param| isIdent.call(param) }) {
                    Fiber.abort("Macro parameter is not a valid identifier.")
                }
            }
            j = k + 1
        }
        if (j == line.count) Fiber.abort("Missing macro definition.")
        var defn = line[j..-1].trimStart()
        macros.add([name, params, defn])
        if (debug) {
            if (params == null) {
                comments.add("/* Define %(name) as %(defn) */")
            } else {
                comments.add("/* Define %(name)(%(params.toString[1...-1])) as %(defn) */")
            }
        }
        lines.removeAt(i)
    } else {
        Fiber.abort("Unknown directive.")
    }
    if (debug) {
        while (includes.count > 0 && i >= includes.peek()[1]) {
            comments.add("/* End %(includes.pop()[0]) */")
        }
    }
}
var src = lines.where { |line| line != "" }.join("\n")
for (macro in macros) {
    var name   = macro[0]
    var params = macro[1]
    var defn   = macro[2]
    var p
    if (params == null) {
        p = Pattern.new("/X[%(name)]~/X")
    } else if (params.count == 0) {
        p = Pattern.new("[/#%(name)()/#]")
    } else if (params.count > 0) {
        p = Pattern.new("[/#%(name)(+1^))/#]")
    }
    var m = null
    while (m = p.find(src)) {
        var span = m.captures[0].span
        if (params == null || params.count == 0) {
            src = src[0...span[0]] + defn + src[span[1]+1..-1]
            used.add(name)
        } else {
            var argStr = m.captures[0].text
            var ix1 = argStr.indexOf("(") + 1
            var ix2 = argStr.indexOf(")") - 1
            argStr = argStr[ix1..ix2]
            var args = argStr.split(",")
            if (args.count == params.count) {
                var temp = " " + defn + " "
                for (i in 0...args.count) {
                    temp = temp.replace(" " + params[i] + " ", " " + args[i].trim() + " ")
                }
                src = src[0...span[0]] + temp.trim() + src[span[1]+1..-1]
                used.add(name)
            }
        }
    }
}
if (debug) {
    while (includes.count > 0) {
        comments.add("/* End %(includes.pop()[0]) */")
    }
}
used = Lst.distinct(used)
if (used.count > 0) {
    var temp = (used.count == 1) ? used[0] : used[0..-2].join(", ") + " and " + used[-1]
    if (debug) comments.add("/* Used %(temp) */")
}
if (debug) comments = comments.join("\n")

var outputFileName = "console"
if (clargs.count > 1) outputFileName = clargs[1]

if (outputFileName == "console") {
    System.print("Output:\n")
    if (debug) System.print(comments)
    System.print(src)
} else {
    File.create(outputFileName) { |file|
        if (debug) {
            file.writeBytes(comments)
            file.writeBytes("\n")
        }
        file.writeBytes(src)
        file.writeBytes("\n")
    }
}
Output:

Using the example files;

$ wren-cli Compiler_Preprocessor.wren -d
How many lines are to be entered? : 4

OK, enter the lines and press enter after each one.

#include "Header.h"
#define width 5
#define height 6
area = #area(height, width)#;

Output:

/* Include Header "Header.h" */
/* Define area(h, w]) as h * w */
/* End "Header.h" */
/* Define width as 5 */
/* Define height as 6 */
/* Used area, width and height */
area = 6 * 5;

Or adding another header file to make the example slightly more interesting:

~~ Header.h ~~
#define area(h, w) h * w
#include "Header2.h"

~~ Header2.h ~~
#define depth 7
#define volume(h, w, d) h * w * d

~~ Source.t ~~
#include "Header.h"
#define width 5
#define height 6
area = #area(height, width)#;
volume = #volume(height, width, depth)#;
Output:
$ wren-cli Compiler_Preprocessor.wren -d Source.t
Output:

/* Include Header "Header.h" */
/* Define area(h, w) as h * w */
/* Include Header "Header2.h" */
/* Define depth as 7 */
/* Define volume(h, w, d) as h * w * d */
/* End "Header2.h" */
/* End "Header.h" */
/* Define width as 5 */
/* Define height as 6 */
/* Used area, depth, volume, width and height */
area = 6 * 5;
volume = 6 * 5 * 7;