I'm working on modernizing Rosetta Code's infrastructure. Starting with communications. Please accept this time-limited open invite to RC's Slack.. --Michael Mol (talk) 20:59, 30 May 2020 (UTC)

Compiler/Simple file inclusion pre processor

From Rosetta Code
Compiler/Simple file inclusion pre processor is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Many programming languages allow file inclusion, so that for example, standard declarations or code-snippets can be stored in a separate source file and be included into the programs that require them at compile-time, without the need for cut-and-paste.

Probably the C pre-processor is the most well-known example.

The purpose of this task is to demonstrate how a file inclusion pre-processor could be implemented in your language, for your language. This does not mean to imply that your language needs a file inclusion pre-processor - many languages don't have such a facility.
Other languages, on the other hand do have file inclusion, e.g.: C, COBOL, PL/1.

The distinction between compiled and interpreted languages should be irrelevent - this is a specialised text processing excersise - reading a source file and producing a modified source file that contains the contents of one or more other files.

Hopefully, the solutions to this task will enable the file inclusion facilities of the languages to be compared, even for languages that don't need such facilities because they have sophisticated equivalents.

Create a simple pre-processor that implements file-inclusion for your language.
The pre-processor need not implement macros, conditional compilation, etc. (E.g. for COBOL, the REPLACING option need not be implemented ).

The pre-processor need not check the validity of the resultant source. The pre-processor's job is to insert te specified files into the source at the required points. Whether the result is syntacticly valid or not is a matter for the compiler/interpreter when it is presented with the source.

The syntax accepted for your pre-processor should be as per the standard for your language, should your language have such a facility. E.g. for C the pre-processor must recognise and handle "#include" directives. For PL/1, %include statements would be processed and for COBOL, COPY statements, etc.

If your language does not have a standard facility for file-inclusion, implement that used by a popular compiler for the language.
If there is no such feature, either use the C style #include directive or choose something of your own invention.

State the syntax expected and any limitations, including whether nested includes are supported and if so, how deep the nesting can be.

If possible, implement your pre-processor as a filter, i.e. read the main source file from standard input and write the pre-processed source to standard output.

NOTE to task implementors: The Task is about implementing a pre-processor for your language, not just describing it's features.
Just as the task Calculating the value of e is not about using your language's in-built exp function but showing how e could be calculated, this is about showing how file inclusion could be implemented - even if the compiler/interpreter you are using has such a facility.

NOTE to anyone who uses the pre-processors on this page: They are supplied as-is, with no warranty - use at your own peril : )

See Also

ALGOL 68[edit]

Should work with any Algol 68 implementation that uses upper-stropping.

Implements file inclusion via pragmatic comments as in ALGOL 68G.

A pragmatic comment such as PR read "somefile.incl.a68" PR or PR include "somefile.incl.a68" PR can appear anywhere in the source and will cause the text of somefile.incl.a68 to be included at that point (Note, ALGOL 68G does not support "include" as an alternative to "read").
The PR...PR will not be recognised inside comments or string literals and cannot appear inside a symbol, i.e. 1PR...PR2 is 1 followed by a pragmatic comment followed by 2.
PR can also be written as PRAGMA.
In ALGOL 68G, PR read ... only includes the file if it has not already been included, this is handled by this implementation but at most 200 different files can be included.
Includes can be nested to a depth of 10.

When run, the pre-processor will read source from standard input and write the resultant source to standard output. If standard output is re-directed to a temporary source file, it can then be compiled/interpreted with the actual Algol 68 compiler.
(NB: The source must end with a line-feed)

# Algol 68 pre-processor                                                      #
# Processes read/include pragmas in an Algol 68 upper stropping source #
# It is assumed _ is allowd in tags and bold words #
# It is assumed that {} are alternatives for () as in ALGOL 68G #
# to use {} as ALGOL 68RS/algol68toc style nestable brief-comments, #
# change rs style brief comments to TRUE #
# ALGOL 68G ( and probably other compilers ) allow quote-stropped bold #
# words to appear in an otherwise upper-stropped source, #
# e.g. BEGIN 'SKIP' END would be a valid program #
# this is not supported here so pragmatic comments such as: #
# 'PR' read "someFile" 'PR' will cause problems #
# pragmatic comments should be disabled by PR pragmats off PR, #
# this is not implemented #
# the read/include must be in lower case #
# ALGOL 68G's read pragmatic comment only includes the file the first time #
# it is mentioned in a read pragmatic comment - this is implemented by #
# keeping a list of the included files - the list is limited to 200 #
# entries #
# TRUE if {} delimits a nestable brief comment, as in ALGOL 68RS and #
# algol68toc, FALSE if {} are alternatives to () as in ALGOL 68G #
BOOL rs style brief comments = FALSE;
# input file information #
MODE INFILE = STRUCT( REF FILE src # actual source file #
, STRING line # latest source line #
, INT pos # character position in line #
# initialises the INFILE f to be associated with the FILE src #
line OF f := "";
pos OF f := 1 + UPB line OF f;
src OF f := src;
set eof handler( f );
END # INIT # ;
# TRUE if EOF has been reached, FALSE otherwise #
BOOL at eof := FALSE;
CHAR c := " ";
# newline character #
CHAR nl = REPR 10;
# maximum number of include files that can be nested #
INT max include depth = 10;
# source file stack #
[ 0 : max include depth ]INFILE in stack;
# current include depth #
INT include depth := 0;
# number of errors reported #
INT error count := 0;
# number of included files #
INT include count := 0;
# names of previously included files #
[ 1 : 200 ]STRING included files;
# sets the logical file end procedure of the specified file to a routine #
# that allows us to detect EOF on a source file #
PROC set eof handler = ( REF INFILE inf )VOID:
on logical file end( src OF inf
# note that we reached EOF on the #
# latest read #
IF NOT at eof
# first time we have spotted eof, #
# we need to call newline so that #
# if the last line didn't have a #
# newline at the end, it is still read #
# however that will call this routine #
# so we have to ensure we only do it #
# once #
at eof := TRUE;
newline( f )
# return TRUE so processing can continue #
# reports an error #
PROC error = ( STRING message )VOID:
error count +:= 1;
print( ( newline, newline, "**** ", message, newline ) )
END # error # ;
# gets the next source character, handling end-of-file on include files #
# the source character is stored in c #
PROC next char = VOID:
BOOL read again := FALSE;
REF INFILE s = in stack[ include depth ];
IF pos OF s <= UPB line OF s THEN
# not past the end of the source line #
c := ( line OF s )[ pos OF s ];
pos OF s +:= 1
# past the end of the current source line - get the next #
at eof := FALSE;
get( src OF s, ( line OF s, newline ) );
NOT at eof
# got a new line #
line OF s +:= nl;
pos OF s := LWB line OF s;
read again := TRUE
ELIF include depth = 0 THEN
# eof on the main source #
line OF s := ""
# got eof on an include file #
include depth -:= 1;
read again := TRUE;
at eof := FALSE;
close( src OF s )
read again
END # next char # ;
# returns TRUE if the current character is whitespace #
PROC have whitespace = BOOL: c <= " ";
# returns TRUE if the current character is a string delimiter #
PROC have string delimiter = BOOL: c = """";
# returns TRUE if the current character can start a bold word #
PROC have bold = BOOL: c >= "A" AND c <= "Z";
# returns TRUE if the current character can start a brief tag #
PROC have tag = BOOL: c >= "a" AND c <= "z";
# reports an unterminated construct ( e.g. string, comment ) #
PROC unterminated = ( STRING construct )VOID:
error( "Unterminated " + construct );
# outputs ch to stand out #
PROC put char = ( CHAR ch )VOID:
IF ch = nl THEN print( ( newline ) ) ELSE print( ch ) FI;
# outputs str to stand out #
PROC put string = ( STRING str )VOID: print( ( str ) );
# outputs a brief comment to stand out #
# end char is the closing delimiter, #
# nested char is the opening delimiter for nestable brief comments #
# if nested char is blank, the brief comment does not nest #
# this handles ALGOL 68RS and algol68toc style {} comments #
PROC skip brief comment = ( CHAR end char, CHAR nested char )VOID:
put char( c );
WHILE next char;
NOT at eof AND c /= end char
IF c = nested char AND NOT have whitespace THEN
# nested brief comment #
skip brief comment( end char, nested char )
# notmal comment char #
put char( c )
IF at eof THEN
# unterminated comment #
unterminated( """" + end char + """ comment" );
c := end char
put char( c );
next char
END # skip brief comment # ;
# gets a string of spaces from the source #
PROC get whitespace = STRING:
STRING result := "";
WHILE NOT at eof AND have whitespace DO result +:= c; next char OD;
END # get whitespace # ;
# gets a string denotation from the source #
PROC get string = STRING:
STRING result := "";
# within a string denotation, "" denotes the " character #
WHILE have string delimiter DO
WHILE result +:= c;
next char;
NOT at eof AND NOT have string delimiter
IF NOT have string delimiter THEN
# unterminated string #
unterminated( "string" );
c := """"
result +:= c;
next char
END # get string # ;
# returns s unquoted #
PROC unquote string = ( STRING s )STRING:
STRING result := "";
# within a string denotation, "" denotes the " character #
INT c pos := LWB s + 1;
WHILE cpos < UPB s DO
CHAR ch = s[ c pos ];
IF ch = """" THEN
# have an embedded quote - it will be doubled #
c pos +:= 1
result +:= ch;
c pos +:= 1
END # unquote string # ;
# gets a bold word from then source #
PROC get bold word = STRING:
STRING result := "";
WHILE have bold OR c = "_" DO result +:= c; next char OD;
END # get bold word # ;
# geta a brief tag from the source #
PROC get tag = STRING:
STRING result := "" ;
WHILE have tag OR c = "_" DO result +:= c; next char OD;
END # get tag # ;
# copies the source to the output until a bold word is encountered #
PROC skip to bold = STRING:
IF at eof
ELSE STRING result := "";
WHILE put char( c );
next char;
NOT at eof
AND NOT have bold
IF NOT at eof THEN result := get bold word FI;
FI # skip to bold # ;
# handles a bold PRAGMA, COMMENT or other bold word #
PROC bold word or pragment = VOID:
IF STRING bold word := get bold word;
bold word = "CO" OR bold word = "COMMENT"
# have a bold comment #
STRING delimiter = bold word;
WHILE put string( bold word );
bold word := skip to bold;
NOT at eof
AND bold word /= delimiter
IF at eof THEN
# unterminated commant #
unterminated( """" + delimiter + """ comment" )
put string( delimiter )
ELIF bold word = "PR" OR bold word = "PRAGMA"
# have a pragmatic comment - could be file inclusion #
STRING delimiter = bold word;
STRING pragment := bold word;
STRING op := "";
STRING file name := "";
# skip spaces after the PR/PRAGMA #
pragment +:= get whitespace;
# get the operaqtion, if there is a tag #
IF have tag THEN
# have an operation #
op := get tag;
pragment +:= op + get whitespace
# get the file name, if there is one #
IF have string delimiter THEN
# have a file name #
file name := get string;
pragment +:= file name + get whitespace;
file name := unquote string( file name )
# should now have the closing delimiter #
IF NOT have bold THEN
# no bold word in/at-the-nd-of the pragment #
bold word := ""
# have a bold word - could be the delimiter #
pragment +:= ( bold word := get bold word )
IF ( op /= "read" AND op /= "include" )
OR file name = ""
OR bold word /= delimiter
# not a read/include pragmatic comment #
put string( pragment );
IF bold word /= delimiter THEN
# haven't got the closing delimiter yet #
WHILE bold word := skip to bold;
NOT at eof
AND bold word /= delimiter
IF at eof THEN
# unterminated comment #
unterminated( """" + delimiter + """" )
put string( delimiter )
ELIF # check for an already included file and add the name to #
# the list if it hasn't been included before #
BOOL already included := FALSE;
FOR file pos TO include count
WHILE NOT ( already included := included files[ file pos ] = file name )
IF NOT already included THEN
# first time this file has been included #
# - add it to the list #
IF include count < UPB included files THEN
# room to include this file #
included files[ include count +:= 1 ] := file name
# too many include files #
error( "Too many include files: " + file name )
op = "read" AND already included
# the file is already included and the operation is "read" so #
# the pragma should be ignored #
# check the include file depth #
include depth >= UPB in stack
# max include depth exceeded #
put string( pragment );
error( "Include files nested too deply: " + file name )
open( inc, file name, stand in channel ) /= 0
# couldn't open the file #
put string( pragment );
error( "Unable to include: " + file name )
# file opened OK #
in stack[ include depth +:= 1 ] := inc INIT HEAP INFILE
# some other bold word #
put string( bold word )
FI # bold word or pragment # ;
# copy the source to stand out, expanding read/incldue pragmas #
in stack[ include depth := 0 ] := stand in INIT HEAP INFILE;
next char;
IF c = "#" THEN
# brief comment #
skip brief comment( "#", " " )
ELIF c = "{" AND rs style brief comments THEN
# nestable brief comment ( ALGOL 68RS and algol68toc ) #
skip brief comment( "}", "{" )
ELIF have string delimiter THEN
# STRING or CHAR denotation #
put string( get string )
ELIF have bold THEN
# have a bold word #
bold word or pragment
# anything else #
put char( c );
next char
IF error count > 0 THEN
# had errors processing the source #
print( ( "**** ", whole( error count, 0 ), " errors", newline ) )

Pre-processing the following program:

PR include "ex1.a68" PR

where ex1.a68 contains:

    PR precision 200 PR
    INT x := 1;
    PR read "in1.incl.a68" PR

and in1.incl.a68 contains:

    IF x > 0 THEN print( ( x, newline ) ) FI

Produces the following output:

    PR precision 200 PR
    INT x := 1;

    IF x > 0 THEN print( ( x, newline ) ) FI


AWK does not have file-inclusion as standard, however some implementations, particularly GNU Awk do provide file inclusion.
This uses @include as the file inclusion directive, as in GAWK.
It differs from GAWK syntax in that the include directive can appear inside or outside functions. The file name can be quoted or not. Nested includes are not supported.
The source can be a named file or read from stdin. If it is read from stdin, -v sec=sourceName can be specified on the AWK command line to name the file. The pre-processed source is writen to stdout.

# include.awk: simple file inclusion pre-processor
# the command line can specify:
# -v srcName=<source file path>
srcName = srcName "";
if( $1 == "@include" )
# must include a file
includeFile( $0 );
# normal line
printf( "%s\n", $0 );
function includeFile( includeLine, fileName,
line )
# get the file name from the @include line
fileName = includeLine;
sub( /^ *@include */, "", fileName );
sub( / *$/, "", fileName );
sub( / *#.*$/, "", fileName );
if( fileName ~ /^"/ )
# quoted file name
sub( /^"
/, "", fileName );
sub( /"$/, "", fileName );
gsub( /"
"/, "\"", fileName );
printf( "#line 1 %s\n", fileName );
while( ( ioStat = ( getline line < fileName ) ) > 0 )
# have a source line
printf( "%s\n", line );
if( ioStat < 0 )
# I/O error
printf( "@include %s # not found or I/O error\n", fileName );
close( fileName );
printf( "#line %d %s\n", NR, ( srcName != "" ? srcName : FILENAME ) );
} # includeFile


Julia implements the include function, which includes a file into the current file for processing. Julia does JIT compiling, so neither the included file nor the file included into are necessarily compiled until their code is due to be run. One exception is the precompiling of modules, which allows preprocessed, precompiled code to be imported with the using or import keyword.

Since Julia already has an include function, adding another seems superfluous. However, one can be created for the purpose, and will be shown below. Such a Julia program could be compiled and run as the file preprocess.jl.

To use the program, run julia preprocess.jl infile.jl outfile.jl and include files using the standard Julia syntax. Calls to the include function that contain a single argument which is a string in parentheses will be preproccessed. Other calls to include with different arguments will not be preprocessed by preprocess.jl.

# preprocess.jl convert includes to file contenets
infile = length(ARGS) > 0 ? ARGS[1] : stdin
outfile = length(ARGS) > 1 ? ARGS[2] : stdout
function includefile(s)
m = match(r"(\s)include\(\"([^\"]+)\"\)(\s)", s)
return m.captures[1] * read(m.captures[2], String) * m.captures[3]
catch y
@warn y
return s
input = read(infile, String)
output = replace(input, r"\sinclude\(\"[^\"]+\"\)\s" => includefile)
write(outfile, output)


Standard feature. Phix ships with a bunch of standard files in a builtins directory, most of which it knows how to "autoinclude", but some must be explicitly included (full docs). You can explicitly specify the builtins directory or not (obviously without it will look in the project directory first), and use the same mechanism for files you have written yourself. There is no limit to the number or depth of files than can be included. Relative directories are honoured, so if you specify a (partial) directory that is where it will look first for any sub-includes. You can also use single line "stub includes" to redirect include statements to different directories/versions. Note that namespaces are not supported by pwa/p2js. You can optionally use double quotes, but may then need to escape backslashes. Includes occur at compile time, as opposed to dynamically.

include builtins/complex.e
include complex.e             -- also valid
include "builtins\\complex.e" -- ditto

If the compiler detects that some file has already been included it does not do it again (from the same directory, two or more files of the same name can be included from different directories). I should perhaps also state that include handling is part of normal compilation/interpretation, as opposed to a separate "preprocessing" step, and that each file is granted a new private scope, and while of course there is only one "global" scope, it will use the implicit include hierarchy to automatically resolve any clashes that might arise to the most appropriate one, aka "if it works standalone it should work exactly the same when included in as part of a larger application".

And so on to the task as specified: Since the task specifies it "is about implementing a pre-processor for your language, not just describing it's features" and as per discussions on the talk page, and the above, a "preprocessor" for Phix would fail in so many ways it is simply not really worth attempting, and should certainly never actually be used.
The following will replace include statements with file contents, but do not expect it to work or do anything useful on any existing [Phix] code. Mutually recursive includes will cause a mutually recursive infinite loop, until you run out of memory, that is without adding some kind of "already done" stack to the following.

function preprocess(string filename)
    sequence inlines = get_text(filename,GT_LF_STRIPPED),
             outlines = {}
    for l=1 to length(inlines) do
        string line = inlines[l]
        if match("include ",line)=1 then
            line = trim(line[9..match("--",line)-1],{' ','\t','"'})
            outlines &= preprocess(line)
            outlines &= line
        end if
    end for
    return outlines
end function

As the Wren entry eloquently puts it: The above code is limited in the sense that all top-level variables of the imported module (not just the specifically "global" ones) are now visible in the outer scope. Consequently, the code will no longer compile if there are any name clashes.

In fact pwa/p2js contains some code (see insert_dollars() in p2js.exw) that renames selected pre-known top-level variables in the standard includes, eg base64.e `aleph` -> `$aleph` to minimise such disruption, however there is (as yet) no such mechanism for user-written include files, though it is on the to-do list.


A file pre-processor is very nearly pointless for Raku. Raku is nominally an interpreted language; in actuality, it does single pass, just-in-time compilation on the fly as it loads the file. For libraries and constants (or semi constant variables) there is a robust built-in library (module) include system for code that can be pre-compiled and used on demand. use some-library; to import any objects exported by the module.

That's great for code references and constants, but isn't really intended for text inclusion of external files. Since the text file can not be pre-compiled, there isn't much point to having such a system. (For code at least, there are many different templating system modules for text templates; primarily, though not exclusively used for on-the-fly generation of dynamic HTML / XML documents.) One of those could certainly be bent into doing something like what this task is asking for, but are probably both over- and under-powered for the job.

  1. they are mostly intended for text, not code.
  2. they tend to be more geared toward inserting the content of a variable rather than the content of a file, although that could be subverted pretty easily.

So in Raku, there isn't a readily available system to do this, there isn't much point, and it's probably a bad idea...

Ah well, that never stopped us before.

A Raku script to do source filtering / preprocessing: save it and call it 'include'

unit sub MAIN ($file-name);
my $file = slurp $file-name;
put $file.=subst(/[^^|['{{' \s*]] '#include' \s+ (\S+) \s* '}}'?/, {run(«$*EXECUTABLE-NAME $*PROGRAM-NAME $0», :out).out.slurp(:close).trim}, :g);

This will find: any line starting with '#include' followed by a (absolute or relative) path to a file, or #include ./path/to/file.name enclosed in double curly brackets anywhere in the file.

It will replace the #include notation by the contents of the file referenced, will follow nested #includes arbitrarily deeply, and echo the processed file to STDOUT.

Let's test it out. Here is a test script and a bunch of include files.

Top level named... whatever, let's call it 'preprocess.raku'

# Top level test script file for #include Rosettacode
# 'Compiler/Simple file inclusion pre processor' task
# some code
say .³ for ^10;
#include ./include1.file


# included #include1 file >1>
# test to ensure it only tries to execute #include of the right format

# more code
say .³³ for ^10;
# and more comments

#include    ./include2.file    # comments ok at end of include line
# <1<


# nested #include2.file >2>
say "Test for an nested include inside a line: {{ #include ./include3.file }}";
# pointless but why not? <2<


>3> Yep, it works! <3<

Invoke at a command line:

raku include preprocess.raku

# Top level test script file for #include Rosettacode
# 'Compiler/Simple file inclusion pre processor' task

# some code

say .³ for ^10;

# included #include1 file >1>
# test to ensure it only tries to execute #include of the right format

# more code
say .³³ for ^10;
# and more comments

# nested #include2.file >2>
say "Test for an nested include inside a line: >3> Yep, it works! <3<";
# pointless but why not? <2<# comments ok at end of include line
# <1<

You can either redirect that into a file, or just pass it back into the compiler to execute it:

raku include preprocess.raku | raku

Test for an nested include inside a line: >3> Yep, it works! <3<

Note that this is not very robust, (it's 3 lines of code, what do you expect?) but it satisfies the task requirements as far as I can tell.


Wren already implements file inclusion via its import statement which can appear anywhere that a variable declaration can, including within a block, and can therefore be subject to a condition.

Any number of files can be imported in this way and can contain any Wren source code including other import statements to any level of nesting. However, Wren's virtual machine (VM) keeps track of the imported files and will only load a particular file once.

All the modules listed on the language's main page need to be imported in this way.

Although Wren is an interpreted language, source code is not interpreted directly. Instead a single pass compiler first converts the source code into a simple stack-based byte-code which is then interpreted by the VM.

However, there is no facility for outputting the byte-code to a separate file which can then be examined or manipulated prior to subsequent interpretation. Compilation and interpretation are both done as a single indivisible operation.

It will therefore be seen that Wren neither has nor needs a pre-processor; it doesn't support macros or other text substitution devices and imports are always in source code rather than binary form.

Nevertheless, it is possible to write a limited pre-processor in Wren (the VM itself is written in C):

import "io" for File
var source = File.read("source.wren")
var ix
var start = 0
while ((ix = source.indexOf("import", start)) && ix >= 0) {
var ix2 = source.indexOf("\n", ix + 6)
if (ix2 == -1) ix2 = source.count
start = ix + 1
var imp = source[ix...ix2]
var tokens = imp.split(" ").where { |s| s != "" }.toList
var filePath = tokens[1][1...-1]
if (filePath.startsWith("./")) {
filePath = filePath[2..-1]
} else if (filePath.startsWith("/")) {
filePath = filePath[1..-1]
} else {
continue // leave resolution of other modules to compiler
var text = File.read(filePath + ".wren")
source = source[0...ix] + "\n%(text)\n" + source[ix2..-1]
File.create("source2.wren") { |file| file.writeBytes(source) }

The above code is limited in the sense that all top-level variables of the imported module (not just the specifically imported ones) are now visible in the outer scope. Consequently, the code will no longer compile if there are any name clashes.

The obvious solution of placing the imported code in a block and then 'lifting' the specifically imported variables into the outer scope does not work because of Wren's rather strange scoping rules. If you did this, then the imported module's top level variables would no longer be top-level relative to the code as a whole and hence would no longer be visible to classes defined within the module itself! For example, if you try to run the following script, you get the error shown:

var B
{ // block starts here
var A = 2
class C {
static method1() {
System.print(A) // Error at 'A': Variable is used but not defined.
B = C
} // end of block

Other problems include dealing with import statements which have been commented out (not catered for in the above pre-processor) and resolving import file paths which is not actually set in stone but depends on how Wren is being embedded. The above pre-processor code only deals with paths which are relative to the current directory.