Compiler/Preprocessor: Difference between revisions

m
mNo edit summary
m (→‎{{header|Wren}}: Minor tidy)
 
(8 intermediate revisions by 5 users not shown)
Line 1:
{{draft task}}
 
This task modifies the source code prior to the lexical analysis similar to the C built-in preprocessor. All design decisions go with as simple an example as possible to show concept and match the simple language the other tasks support.
 
{{task heading}}
Line 9:
The program should treat any line starting with a hashtag (#) as a command to process. There are currently two valid commands, include and define. No space between the hashtag and its command. Multiple whitespace is treated the same as one.
 
The include command must be followed by whitespace and a double quoted string whowhose contents is the actual file to read. Includes should allow the inclusion of other files to a recursive limit of five active header files plus the original source file.
 
The define command must be followed by whitespace and a new macro name. Redefinition or overloading of the macro name is illegal. The same character convention for naming variables in the language is used for macro names. No whitespace is required in the arguments but is allowed between every token. When there are no parameters, both the definition or usage must either have an empty argument list or there must not be one. The empty list is required to avoid confusion when the definition needs parenthesizes to force precedence. If there is a close parenthesis, the whitespace trailing it is optional. Otherwise, it is required. From that point to end of line is the definition, whitespace removed from both the start and end of it. Whitespace within the definition between tokens must be maintained. Any names within the definition are treated as macro names first before it is assumed they are a variable in the language during the usage.
 
To make it easier to find, the usage will be within hashtags, and replaces its usage elsewhere in the files processed. These usages will be processed everywhere they are encountered without regard to the syntax of the sample language. The calling arguments replace the define's parameters as a simple string substitution. You may not assume the usage proceeds in an order to form complex combinations. Tokens detected during definition processing can remain separated during usage processing. If the contents within the double hashtags is not a valid macro usage, its entire text is written to the output as if it was not detected. It is not required to use the ending hashtag as the start of another macro usage. asBoth westart areand goingend forhashtags asmust simplebe anon examplethe assame possible to show conceptline.
 
There are three possible [optional] command line arguments, debug, input, and output. Debug is an implementer depended switch such as -d or --debug to allow the user to pick between the commands vanishing from the output or the commands appearing as comments in the output. Debug can be specified in any order on the command line beyond the command. Input is the file to process, when missing the console input is used. The Input is always specified before the Output. Output is the file to create, when missing the console output is used. If only one file is specified, it is the input file. If you wish to use an output file and console input, you must specify both arguments, who's usage is left up to the implementer. It is not required that there be a way to specify the console on the command line so you could use a file for the output and console for the input.
 
{{task heading|Input Specification}}
Line 59:
<hr>
__TOC__
 
=={{header|Julia}}==
<syntaxhighlight lang="julia">""" Rosetta Code task Compiler/Preprocessor task """
 
""" If the line is a macro definition, add the macro to macros. """
function addmacro!(macros, line)
macroname, comment = "", ""
if (m = match(r"#define\s+(\w[\w\d]*)\(([^\)]+)\)\s+(.+)", line)) != nothing
macroname, argstring, definition = m.captures
comment = "/* Define $macroname($argstring) as $definition */\n"
@assert !haskey(macros, macroname) "Duplicate macro names are not allowed"
argv = strip.(split(argstring, r"\s*,\s*"))
@assert allunique(argv) "Parameter argument symbols must be different from each other"
def = " " * definition
defstrings, argnums = String[], Int[]
for m in reverse(collect(eachmatch(Regex(join(argv, "|")), def)))
cutposition = m.offset + length(m.match)
pushfirst!(defstrings, def[begin+cutposition-1:end])
pushfirst!(argnums, findfirst(==(m.match), argv))
def = def[begin:m.offset-1]
end
pushfirst!(defstrings, def)
macros[macroname] = (defstrings, argnums)
elseif (m = match(r"#define\s+(\w[\w\d]*)(?:\(\))?\s+(.+)", line)) != nothing
macroname, definition = m.captures
comment = "/* Define $macroname as $definition */\n"
macros[macroname] = ([string(definition)], Int[])
else
return false, ""
end
return true, comment
end
 
""" If the line contains macro or macros, substitute all and return results. """
function replaceifmacro(macros, line; withinhashtag = true)
replacedline, allmacronames, usedmacros = line, join(keys(macros), "|"), String[]
for m in reverse(
collect(
eachmatch(
Regex(
withinhashtag ? "#(" * allmacronames * raw")(?:(?:\(([^\)]+)\)#)|#)" :
"(" * allmacronames * raw")(?:(?:\(([^\)]+)\))?)",
),
replacedline,
),
),
)
push!(usedmacros, string(m.captures[1]))
if m.offsets[end] != 0 # has arguments
args = split(m.captures[end], r"\s*,\s*")
for (i, arg) in enumerate(args)
used, newtext = replaceifmacro(macros, arg; withinhashtag = false)
if !isempty(used)
submacro = first(used)
push!(usedmacros, submacro)
args[i] = macros[submacro][1][1]
end
end
strings, nums = macros[m.captures[1]]
s =
first(strings) *
prod([args[n] * strings[i+1] for (i, n) in enumerate(nums)])
replacedline =
replacedline[begin:m.offsets[1]-2] *
s *
replacedline[m.offset+length(m.match):end]
else
replacedline =
replacedline[begin:m.offsets[1]-2] *
macros[m.captures[1]][1][1] *
replacedline[m.offset+length(m.match):end]
end
end
return usedmacros, replacedline
end
 
""" If a line starts with #include, return the lines in the include file. """
function processinclude(line)
lines, fname = String[], ""
if (m = match(r"#include\s+\"([^\"]+)\"", line)) != nothing
fname = first(m.captures)
lines = readlines(fname, keep = true)
end
return fname, lines
end
 
""" Preprocess the file to prepare it for the Rosetta Code lexical analyzer task. """
function preprocess(instream, outstream, debug)
lines = readlines(instream, keep = true)
macros = Dict{String,Tuple{Vector{String},Vector{Int}}}()
linesread = 0
while !isempty(lines)
line = popfirst!(lines)
linesread += 1
if startswith(line, '#')
fname, includelines = processinclude(line)
if !isempty(fname)
if debug
pushfirst!(includelines, """/* Include $fname */\n""")
push!(includelines, """/* End $fname */\n""")
end
lines = append!(includelines, lines)
elseif startswith(line, r"#define\s")
gotmacro, comment = addmacro!(macros, line)
gotmacro && debug && print(outstream, comment)
else
error("Unknown preprocessor directive in line: $line")
end
else
usedmacros, replacedline = replaceifmacro(macros, line)
if !isempty(usedmacros)
debug && print(outstream, "/* Used " * join(usedmacros, ", ", " and ") * " */\n")
line = replacedline
end
print(outstream, line)
end
end
return linesread
end
 
""" Process command line, open files if needed, hand off to function `func`, close files """
function runwithopts(func, minargs = 0, maxargs = 3)
minargs <= length(ARGS) <= maxargs || error("Wrong number of arguments ($minargs:$maxargs)")
debug, infile, outfile = false, "", ""
for arg in ARGS
if arg == "-d" || arg == "-debug"
debug = true
elseif isempty(infile)
infile = arg
elseif isempty(outfile)
outfile = arg
end
end
ioin = isempty(infile) ? stdin : open(infile, "r")
ioout = isempty(outfile) ? stdout : open(outfile, "w")
 
func(ioin, ioout, debug)
!isempty(infile) && close(ioin)
!isempty(outfile) && close(ioout)
end
 
runwithopts(preprocess)
</syntaxhighlight>{{out}} Same output as Phix entry.
 
=={{header|Phix}}==
No attempt to implement command line arguments, obviously trivial on desktop/Phix but not possible under pwa/p2js, aka within a browser.
<!--<lang Phix>(phixonline)-->
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Compiler\preprocess.exw
Line 237 ⟶ 381:
<span style="color: #000080;font-style:italic;">--close_files()</span>
<!--</langsyntaxhighlight>-->
{{out}}
<pre>
Line 251 ⟶ 395:
/* Use area, height, and width */
area = 6 * 5;
</pre>
 
=={{header|Python}}==
 
<syntaxhighlight lang="python">#!/usr/bin/env python
"""Rosetta Code compiler/preprocessor. Requires Python >= 3.7."""
import re
import sys
 
from typing import Dict
from typing import Iterator
from typing import List
from typing import NamedTuple
from typing import Optional
from typing import TextIO
from typing import Tuple
 
MAX_INCLUDE_DEPTH = 5
 
TOKEN_INCLUDE = "INCLUDE"
TOKEN_CONSTANT = "CONSTANT"
TOKEN_MACRO = "MACRO"
TOKEN_CALL = "CALL"
TOKEN_STRING = "STRING"
TOKEN_COMMENT = "COMMENT"
TOKEN_LITERAL = "LITERAL"
TOKEN_ILLEGAL = "ILLEGAL"
 
 
class Token(NamedTuple):
kind: str
name: str
params: str
expr: str
start: int
end: int
 
 
FILENAME_PATTERN = r"[_a-zA-Z][_a-zA-Z0-9\.]*"
IDENT_PATTERN = r"[_a-zA-Z][_a-zA-Z0-9]*"
PARAMS_PATTERN = r"[_a-zA-Z0-9\., \t]*?"
 
TOKEN_RULES = (
(
TOKEN_STRING,
r"\"[^\"\n]*?\"",
),
(
TOKEN_COMMENT,
r"/\*.*?\*/",
),
(
TOKEN_LITERAL,
r"[^#]+",
),
(
TOKEN_INCLUDE,
rf"^\#include[ \t]+\"(?P<filename>{FILENAME_PATTERN})\"\s*?$",
),
(
TOKEN_CONSTANT,
rf"^\#define[ \t]+(?P<constant>{IDENT_PATTERN}) +(?P<constant_expr>.*?)$",
),
(
TOKEN_MACRO,
rf"^\#define[ \t](?P<macro>{IDENT_PATTERN})"
rf"\((?P<macro_params>{PARAMS_PATTERN})\) +(?P<macro_expr>.*?)$",
),
(
TOKEN_CALL,
rf"\#(?P<call>{IDENT_PATTERN})\((?P<call_params>{PARAMS_PATTERN})\)\#",
),
(
TOKEN_ILLEGAL,
r".",
),
)
 
RE_TOKENS = re.compile(
"|".join(f"(?P<{name}>{pattern})" for name, pattern in TOKEN_RULES),
re.MULTILINE,
)
 
 
class PreprocessorError(Exception):
def __init__(
self,
*args: object,
source: str,
filename: str,
token: Token,
) -> None:
super().__init__(*args)
self.source = source
self.token = token
self.filename = filename
 
def __str__(self) -> str:
msg = super().__str__()
line_num = self.source[: self.token.start].count("\n") + 1
return f"{msg} ({self.filename}:{line_num})"
 
 
def tokenize(source: str, filename: str) -> Iterator[Token]:
for match in RE_TOKENS.finditer(source):
kind = match.lastgroup
 
if kind in (TOKEN_LITERAL, TOKEN_COMMENT, TOKEN_STRING):
yield Token(
TOKEN_LITERAL,
"",
"",
match.group(),
match.start(),
match.end(),
)
elif kind == TOKEN_INCLUDE:
yield Token(
TOKEN_INCLUDE,
"",
"",
match.group("filename"),
match.start(),
match.end(),
)
elif kind == TOKEN_CONSTANT:
yield Token(
kind,
match.group("constant"),
"",
match.group("constant_expr"),
match.start(),
match.end(),
)
elif kind == TOKEN_MACRO:
yield Token(
kind,
match.group("macro"),
match.group("macro_params"),
match.group("macro_expr"),
match.start(),
match.end(),
)
elif kind == TOKEN_CALL:
yield Token(
kind,
match.group("call"),
match.group("call_params"),
"",
match.start(),
match.end(),
)
elif kind == TOKEN_ILLEGAL:
# Probably part of an invalid macro call
yield Token(
TOKEN_LITERAL,
"",
"",
match.group(),
match.start(),
match.end(),
)
else:
raise PreprocessorError(
f"unexpected token kind {kind} ({match.group()!r})",
source=source,
filename=filename,
token=Token(
TOKEN_ILLEGAL,
"",
"",
match.group(),
match.start(),
match.end(),
),
)
 
 
def preprocess(
source: str,
filename: str,
stream: TextIO,
debug: bool = False,
constants: Optional[Dict[str, str]] = None,
include_depth: int = 0,
macros: Optional[Dict[str, Tuple[str, int]]] = None,
) -> None:
constants = constants if constants is not None else {}
include_depth = include_depth
macros = macros if macros is not None else {}
left_strip = False
 
for token in tokenize(source, filename):
if token.kind == TOKEN_LITERAL:
if left_strip:
stream.write(_lstrip_one(token.expr))
left_strip = False
else:
stream.write(token.expr)
elif token.kind == TOKEN_CONSTANT:
if debug:
stream.write(f"/* Define {token.name} as {token.expr} */\n")
 
if token.name in constants:
raise PreprocessorError(
f"illegal constant redefinition '{token.name}'",
source=source,
filename=filename,
token=token,
)
 
constants[token.name] = token.expr
left_strip = True
elif token.kind == TOKEN_INCLUDE:
if include_depth + 1 > MAX_INCLUDE_DEPTH:
raise PreprocessorError(
"maximum include depth reached",
source=source,
filename=filename,
token=token,
)
 
if debug:
stream.write(f"/* Include {token.expr} */\n")
 
with open(token.expr) as fd:
preprocess(
fd.read(),
filename,
stream,
debug,
constants,
include_depth + 1,
macros,
)
 
if debug:
stream.write(f"/* End {token.expr} */\n")
 
left_strip = True
elif token.kind == TOKEN_MACRO:
if debug:
stream.write(
f"/* Define {token.name}({token.params}) as {token.expr} */\n"
)
 
if token.name in macros:
raise PreprocessorError(
f"illegal macro redefinition '{token.name}'",
source=source,
filename=filename,
token=token,
)
 
params = parse_parameters(token.params)
expr = parse_expression(params, token.expr)
macros[token.name] = (expr, len(params))
left_strip = True
elif token.kind == TOKEN_CALL:
params = parse_parameters(token.params)
expr, n_args = macros.get(token.name, ("", 0))
 
if debug:
if params:
used = [token.name, *params]
stream.write(f"/* Use {', '.join(used[:-1])} and {used[-1]} */ ")
else:
stream.write(f"/* Use {token.name} */ ")
 
if len(params) != n_args:
print(token.name, len(params), n_args, macros)
stream.write(source[token.start : token.end])
else:
stream.write(
substitute_constants(
constants,
expr.format(*params),
)
)
 
left_strip = False
else:
raise PreprocessorError(
f"unknown token kind {token}",
source=source,
filename=filename,
token=token,
)
 
 
def parse_parameters(params: str) -> List[str]:
return [param.strip() for param in params.split(",")]
 
 
def parse_expression(params: List[str], expr: str) -> str:
_params = {p: str(i) for i, p in enumerate(params)}
pattern = "|".join(rf"\b{param}\b" for param in params)
return re.sub(
f"({pattern})",
lambda m: f"{{{_params[m.group(0)]}}}",
expr,
)
 
 
def substitute_constants(constants: Dict[str, str], expr: str) -> str:
pattern = "|".join(rf"\b{const}\b" for const in constants)
return re.sub(
f"({pattern})",
lambda m: constants[m.group(0)],
expr,
)
 
 
def _lstrip_one(s: str) -> str:
"""Strip at most one newline from the left of `s`."""
if s and s[0] == "\n":
return s[1:]
return s
 
 
if __name__ == "__main__":
import argparse
 
parser = argparse.ArgumentParser(description="Rosetta Code compiler preprocessor.")
parser.add_argument(
"infile",
nargs="?",
type=argparse.FileType("r"),
default=sys.stdin,
help="source file to preprocess, '-' means stdin (default: stdin)",
)
parser.add_argument(
"outfile",
nargs="?",
type=argparse.FileType("w"),
default=sys.stdout,
help="destination file (default: stdout)",
)
parser.add_argument(
"--debug",
"-d",
action="store_true",
help="enable debugging output (default: false)",
)
 
args = parser.parse_args()
preprocess(args.infile.read(), args.infile.name, args.outfile, debug=args.debug)
</syntaxhighlight>
 
{{out}}
Command line options. Show the help message and exit.
<pre>
$ ./preproc.py --help
usage: preproc.py [-h] [--debug] [infile] [outfile]
 
Rosetta Code compiler preprocessor.
 
positional arguments:
infile source file to preprocess, '-' means stdin (default: stdin)
outfile destination file (default: stdout)
 
options:
-h, --help show this help message and exit
--debug, -d enable debugging output (default: false)
</pre>
 
Where <code>Source.t</code> and <code>Header.h</code> contain the source code shown in the task description.
<pre>
$ ./preproc.py Source.t
area = 6 * 5;
</pre>
 
And with debugging enabled. Notice debug output appears immediately before a macro call rather than on the line above.
<pre>
$ ./preproc.py source.t --debug
/* Include Header.h */
/* Define area(h, w) as h * w */
/* End Header.h */
/* Define width as 5 */
/* Define height as 6 */
area = /* Use area, height and width */ 6 * 5;
</pre>
 
And piped to the lexer defined in the Lexical Analysis task.
<pre>
$ ./preproc.py source.t | ./compiler_lex.py
1 1 Identifier area
1 6 Op_assign
1 8 Integer 6
1 10 Op_multiply
1 12 Integer 5
1 13 Semicolon
2 1 End_of_input
</pre>
 
Line 265 ⟶ 802:
 
Note that the program errors out if there are any syntax or other errors when defining the macros.
<langsyntaxhighlight ecmascriptlang="wren">import "os" for Process
import "./ioutil" for FileUtil, File, Input
import "./str" for Char
Line 443 ⟶ 980:
file.writeBytes("\n")
}
}</langsyntaxhighlight>
 
{{out}}
Using the example files;
<pre>
$ wren-cli compiler_preprocessorCompiler_Preprocessor.wren -d
How many lines are to be entered? : 4
 
Line 488 ⟶ 1,025:
{{out}}
<pre>
$ wren-cli compiler_preprocessorCompiler_Preprocessor.wren -d Source.t
Output:
 
9,476

edits