Compiler/Simple file inclusion pre processor

Task

Introduction

Many programming languages allow file inclusion, so that for example, standard declarations or code-snippets can be stored in a separate source file and be included into the programs that require them at compile-time, without the need for cut-and-paste.

Probably the C pre-processor is the most well-known example.

The purpose of this task is to demonstrate how a file inclusion pre-processor could be implemented in your language, for your language. This does not mean to imply that your language needs a file inclusion pre-processor - many languages don't have such a facility.
Other languages, on the other hand do have file inclusion, e.g.: C, COBOL, PL/1.

The distinction between compiled and interpreted languages should be irrelevent - this is a specialised text processing exercise - reading a source file and producing a modified source file that aldo contains the contents of one or more other files.

Hopefully, the solutions to this task will enable the file inclusion facilities of the languages to be compared, even for languages that don't need such facilities because they have sophisticated equivalents.

The pre-processor

Create a simple pre-processor that implements file-inclusion for your language.

The pre-processor need not check the validity of the resultant source. The pre-processor's job is purely to insert te specified files into the source at the required points. Whether the result is syntacticly valid or not is a matter for the compiler/interpreter when it is presented with the source.

The syntax accepted for your pre-processor should be as per the standard for your language, should your language have such a facility. E.g. for C the pre-processor must recognise and handle "#include" directives. For PL/1, %include statements would be processed and for COBOL, COPY statements, and so on.

If your language does not have a standard facility for file-inclusion, implement that used by a popular compiler/interpreter for the language.
If there is no such feature, either use the C style #include directive or choose something of your own invention, e.g., #include would be problematic for languages where # introduces a comment.

State the syntax expected and any limitations, including whether nested includes are supported and if so, how deep the nesting can be.

Minimum requirements

As a minimum, your pre-procdessor must be able to process a source file (read from a file or standard input, as you prefer) and generate another source file (written to a file or standard output, as you prefer). The file-inclusion directives in the source should be replaced by the contents of the specified files. Implementing nested file inclusion directives (i.e., if an included file contains another file-inclusion directive) is optional.

Pre-processors for some languages offer additional facilities, such as macro expansion and conditional compilation. Your pre-processor need not implement such things.

Notes

implementors: The Task is about implementing a pre-processor for your language, not just describing it's features. Just as the task Calculating the value of e is not about using your language's in-built exp function but showing how e could be calculated, this is about showing how file inclusion could be implemented - even if the compiler/interpreter you are using already has such a facility.
the pre-processors on this page are supplied as-is, with no warranty - use at your own peril : )

See Also

include a file

ALGOL 68

Should work with any Algol 68 implementation that uses upper-stropping.

Implements file inclusion via pragmatic comments as in ALGOL 68G.

A pragmatic comment such as PR read "somefile.incl.a68" PR or PR include "somefile.incl.a68" PR can appear anywhere in the source and will cause the text of somefile.incl.a68 to be included at that point (Note, ALGOL 68G does not support "include" as an alternative to "read").
The PR...PR will not be recognised inside comments or string literals and cannot appear inside a symbol, i.e. 1PR...PR2 is 1 followed by a pragmatic comment followed by 2.
PR can also be written as PRAGMA.
In ALGOL 68G, PR read ... only includes the file if it has not already been included, this is handled by this implementation but at most 200 different files can be included.
Includes can be nested to a depth of 10.

When run, the pre-processor will read source from standard input and write the resultant source to standard output. If standard output is re-directed to a temporary source file, it can then be compiled/interpreted with the actual Algol 68 compiler.
(NB: The source must end with a line-feed)

# Algol 68 pre-processor                                                      #
# Processes read/include pragmas in an Algol 68 upper stropping source        #
#    It is assumed _ is allowd in tags and bold words                         #
#    It is assumed that {} are alternatives for () as in ALGOL 68G            #
#       to use {} as ALGOL 68RS/algol68toc style nestable brief-comments,     #
#       change rs style brief comments to TRUE                                #
#    ALGOL 68G ( and probably other compilers ) allow quote-stropped bold     #
#       words to appear in an otherwise upper-stropped source,                #
#           e.g. BEGIN 'SKIP' END would be a valid program                    #
#       this is not supported here so pragmatic comments such as:             #
#       'PR' read "someFile" 'PR' will cause problems                         #
#    pragmatic comments should be disabled by PR pragmats off PR,             #
#       this is not implemented                                               #
#    the read/include must be in lower case                                   #
#    ALGOL 68G's read pragmatic comment only includes the file the first time #
#       it is mentioned in a read pragmatic comment - this is implemented by  #
#       keeping a list of the included files - the list is limited to 200     #
#       entries                                                               #
BEGIN

    # TRUE if {} delimits a nestable brief comment, as in ALGOL 68RS and      #
    #      algol68toc, FALSE if {} are alternatives to () as in ALGOL 68G     #
    BOOL rs style brief comments = FALSE;

    # input file information                                                  #
    MODE INFILE = STRUCT( REF FILE src         # actual source file           #
                        , STRING   line        # latest source line           #
                        , INT      pos         # character position in line   #
                        );
    # initialises the INFILE f to be associated with the FILE src             #
    PRIO INIT = 9;
    OP   INIT = ( REF FILE src, REF INFILE f )REF INFILE:
         BEGIN
            line OF f := "";
            pos  OF f := 1 + UPB line OF f;
            src  OF f := src;
            set eof handler( f );
            f
         END # INIT # ;
    # TRUE if EOF has been reached, FALSE otherwise                           #
    BOOL at eof := FALSE;
    CHAR c      := " ";
    # newline character                                                       #
    CHAR nl      = REPR 10;
    # maximum number of include files that can be nested                      #
    INT max include depth = 10;
    # source file stack                                                       #
    [ 0 : max include depth ]INFILE in stack;
    # current include depth                                                   #
    INT include depth := 0;
    # number of errors reported                                               #
    INT error count   := 0;
    # number of included files                                                #
    INT include count := 0;
    # names of previously included files                                      #
    [ 1 : 200 ]STRING included files;

    # sets the logical file end procedure of the specified file to a routine  #
    # that allows us to detect EOF on a source file                           #
    PROC set eof handler = ( REF INFILE inf )VOID:
         on logical file end( src OF inf
                            , ( REF FILE f )BOOL:
                              BEGIN
                                  # note that we reached EOF on the          #
                                  # latest read                              #
                                  IF NOT at eof
                                  THEN
                                      # first time we have spotted eof,      #
                                      # we need to call newline so that      #
                                      # if the last line didn't have a       #
                                      # newline at the end, it is still read #
                                      # however that will call this routine  #
                                      # so we have to ensure we only do it   #
                                      # once                                 #
                                      at eof := TRUE;
                                      newline( f )
                                  FI;
                                  # return TRUE so processing can continue   #
                                  TRUE
                              END
                            );

    # reports an error                                                        #
    PROC error = ( STRING message )VOID:
         BEGIN
            error count +:= 1;
            print( ( newline, newline, "**** ", message, newline ) )
         END # error # ;
    # gets the next source character, handling end-of-file on include files   #
    # the source character is stored in c                                     #
    PROC next char = VOID:
         BEGIN
            WHILE
                BOOL read again := FALSE;
                REF INFILE s = in stack[ include depth ];
                IF pos OF s <= UPB line OF s THEN
                    # not past the end of the source line                     #
                    c := ( line OF s )[ pos OF s ];
                    pos OF s +:= 1
                ELIF
                    # past the end of the current source line - get the next  #
                    at eof := FALSE;
                    get( src OF s, ( line OF s, newline ) );
                    NOT at eof
                THEN
                    # got a new line                                          #
                    line OF s +:= nl;
                    pos  OF s  := LWB line OF s;
                    read again := TRUE
                ELIF include depth = 0 THEN
                    # eof on the main source                                  #
                    line OF s := ""
                ELSE
                    # got eof on an include file                              #
                    include depth -:= 1;
                    read again     := TRUE;
                    at eof         := FALSE;
                    close( src OF s )
                FI;
                read again
            DO SKIP OD
         END # next char # ;
    # returns TRUE if the current character is whitespace                     #
    PROC have whitespace = BOOL: c <= " ";
    # returns TRUE if the current character is a string delimiter             #
    PROC have string delimiter = BOOL: c = """";
    # returns TRUE if the current character can start a bold word             #
    PROC have bold = BOOL: c >= "A" AND c <= "Z";
    # returns TRUE if the current character can start a brief tag             #
    PROC have tag  = BOOL: c >= "a" AND c <= "z";
    # reports an unterminated construct ( e.g. string, comment )              #
    PROC unterminated = ( STRING construct )VOID:
         error( "Unterminated " + construct );
    # outputs ch to stand out                                                 #
    PROC put char = ( CHAR ch )VOID:
         IF ch = nl THEN print( ( newline ) ) ELSE print( ch ) FI;
    # outputs str to stand out                                                #
    PROC put string = ( STRING str )VOID: print( ( str ) );
    # outputs a brief comment to stand out                                    #
    #    end char is the closing delimiter,                                   #
    #    nested char is the opening delimiter for nestable brief comments     #
    #        if nested char is blank, the brief comment does not nest         #
    #    this handles ALGOL 68RS and algol68toc style {} comments             #
    PROC skip brief comment = ( CHAR end char, CHAR nested char )VOID:
         BEGIN
            put char( c );
            WHILE next char;
                  NOT at eof AND c /= end char
            DO
                IF c = nested char AND NOT have whitespace THEN
                    # nested brief comment                                    #
                    skip brief comment( end char, nested char )
                ELSE
                    # notmal comment char                                     #
                    put char( c )
                FI
            OD;
            IF at eof THEN
                # unterminated comment                                        #
                unterminated( """" + end char + """ comment" );
                c := end char
            FI;
            put char( c );
            next char
         END # skip brief comment # ;
    # gets a string of spaces from the source                                 #
    PROC get whitespace = STRING:
         BEGIN
            STRING result := "";
            WHILE NOT at eof AND have whitespace DO result +:= c; next char OD;
            result
         END # get whitespace # ;
    # gets a string denotation from the source                                #
    PROC get string = STRING:
         BEGIN
            STRING result := "";
            # within a string denotation, "" denotes the " character          #
            WHILE have string delimiter DO
                WHILE result +:= c;
                      next char;
                      NOT at eof AND NOT have string delimiter
                DO SKIP OD;
                IF NOT have string delimiter THEN
                    # unterminated string                                     #
                    unterminated( "string" );
                    c := """"
                FI;
                result +:= c;
                next char
            OD;
            result
         END # get string # ;
    # returns s unquoted                                                      #
    PROC unquote string = ( STRING s )STRING:
         BEGIN
            STRING result := "";
            # within a string denotation, "" denotes the " character          #
            INT c pos := LWB s + 1;
            WHILE cpos < UPB s DO
                CHAR ch = s[ c pos ];
                IF ch = """" THEN
                    # have an embedded quote - it will be doubled             #
                    c pos +:= 1
                FI;
                result +:= ch;
                c pos  +:= 1
            OD;
            result
         END # unquote string # ;
    # gets a bold word from then source                                       #
    PROC get bold word = STRING:
         BEGIN
            STRING result := "";
            WHILE have bold OR c = "_" DO result +:= c; next char OD;
            result
         END # get bold word # ;
    # geta a brief tag from the source                                        #
    PROC get tag = STRING:
         BEGIN
            STRING result := "" ;
            WHILE have tag OR c = "_" DO result +:= c; next char OD;
            result
         END # get tag # ;
    # copies the source to the output until a bold word is encountered        #
    PROC skip to bold = STRING:
         IF at eof
         THEN ""
         ELSE STRING result := "";
              WHILE put char( c );
                    next char;
                    NOT at eof
                AND NOT have bold
              DO SKIP OD;
              IF NOT at eof THEN result := get bold word FI;
              result
         FI # skip to bold # ;
    # handles a bold PRAGMA, COMMENT or other bold word                       #
    PROC bold word or pragment = VOID:
         IF STRING bold word := get bold word;
            bold word = "CO" OR bold word = "COMMENT"
         THEN
            # have a bold comment                                             #
            STRING delimiter = bold word;
            WHILE put string( bold word );
                  bold word := skip to bold;
                  NOT at eof
              AND bold word /= delimiter
            DO SKIP OD;
            IF at eof THEN
                # unterminated commant                                        #
                unterminated( """" + delimiter + """ comment" )
            FI;
            put string( delimiter )
         ELIF bold word = "PR" OR bold word = "PRAGMA"
         THEN
            # have a pragmatic comment - could be file inclusion              #
            STRING delimiter  = bold word;
            STRING pragment  := bold word;
            STRING op        := "";
            STRING file name := "";
            # skip spaces after the PR/PRAGMA                                 #
            pragment +:= get whitespace;
            # get the operaqtion, if there is a tag                           #
            IF have tag THEN
                # have an operation                                           #
                op        := get tag;
                pragment +:= op + get whitespace
            FI;
            # get the file name, if there is one                              #
            IF have string delimiter THEN
                # have a file name                                            #
                file name := get string;
                pragment +:= file name + get whitespace;
                file name := unquote string( file name )
            FI;
            # should now have the closing delimiter                           #
            IF NOT have bold THEN
                # no bold word in/at-the-nd-of the pragment                   #
                bold word := ""
            ELSE
                # have a bold word - could be the delimiter                   #
                pragment +:= ( bold word := get bold word )
            FI;
            IF ( op /= "read" AND op /= "include" )
            OR file name  = ""
            OR bold word /= delimiter
            THEN
                # not a read/include pragmatic comment                        #
                put string( pragment );
                IF bold word /= delimiter THEN
                    # haven't got the closing delimiter yet                   #
                    WHILE bold word := skip to bold;
                          NOT at eof
                      AND bold word /= delimiter
                    DO SKIP OD;
                    IF at eof THEN
                        # unterminated comment                                #
                        unterminated( """" + delimiter + """" )
                    FI;
                    put string( delimiter )
                FI
            ELIF # check for an already included file  and add the name to    #
                 # the list if it hasn't been included before                 #
                 BOOL already included := FALSE;
                 FOR file pos TO include count
                 WHILE NOT ( already included := included files[ file pos ] = file name )
                 DO SKIP OD;
                 IF NOT already included THEN
                     # first time this file has been included                 #
                     # - add it to the list                                   #
                     IF include count < UPB included files THEN
                         # room to include this file                          #
                         included files[ include count +:= 1 ] := file name
                     ELSE
                         # too many include files                             #
                         error( "Too many include files: " + file name )
                     FI
                 FI;
                 op = "read" AND already included
            THEN
                # the file is already included and the operation is "read" so #
                # the pragma should be ignored                                #
                SKIP
            ELIF
                # check the include file depth                                #
                include depth >= UPB in stack
            THEN
                # max include depth exceeded                                  #
                put string( pragment );
                error( "Include files nested too deply: " + file name )
            ELIF REF FILE inc := HEAP FILE;
                 open( inc, file name, stand in channel ) /= 0
            THEN
                # couldn't open the file                                      #
                put string( pragment );
                error( "Unable to include: " + file name )
            ELSE
                # file opened OK                                              #
                in stack[ include depth +:= 1 ] := inc INIT HEAP INFILE
            FI
         ELSE
            # some other bold word                                            #
            put string( bold word )
         FI # bold word or pragment # ;

    # copy the source to stand out, expanding read/incldue pragmas            #

    in stack[ include depth := 0 ] := stand in INIT HEAP INFILE;
    next char;
    WHILE NOT at eof DO
        IF   c = "#" THEN
            # brief comment                                                   #
            skip brief comment( "#", " " )
        ELIF c = "{" AND rs style brief comments THEN
            # nestable brief comment ( ALGOL 68RS and algol68toc )            #
            skip brief comment( "}", "{" )
        ELIF have string delimiter THEN
            # STRING or CHAR denotation                                       #
            put string( get string )
        ELIF have bold THEN
            # have a bold word                                                #
            bold word or pragment
        ELSE
            # anything else                                                   #
            put char( c );
            next char
        FI
    OD;

    IF error count > 0 THEN
        # had errors processing the source                                    #
        print( ( "**** ", whole( error count, 0 ), " errors", newline ) )
    FI

END

Output:

Pre-processing the following program:

PR include "ex1.a68" PR

where ex1.a68 contains:

BEGIN
    PR precision 200 PR
    INT x := 1;
    PR read "in1.incl.a68" PR
END

and in1.incl.a68 contains:

    IF x > 0 THEN print( ( x, newline ) ) FI

Produces the following output:

BEGIN
    PR precision 200 PR
    INT x := 1;

    IF x > 0 THEN print( ( x, newline ) ) FI
END

AWK

AWK does not have file-inclusion as standard, however some implementations, particularly GNU Awk do provide file inclusion.
This uses @include as the file inclusion directive, as in GAWK.
It differs from GAWK syntax in that the include directive can appear inside or outside functions. The file name can be quoted or not. Nested includes are not supported.
The source can be a named file or read from stdin. If it is read from stdin, -v sec=sourceName can be specified on the AWK command line to name the file. The pre-processed source is writen to stdout.

# include.awk: simple file inclusion pre-processor
#
#    the command line can specify:
#        -v srcName=<source file path>

BEGIN {
    srcName  = srcName "";
} # BEGIN

{
    if( $1 == "@include" )
    {
        # must include a file
        includeFile( $0 );
    }
    else
    {
        # normal line
        printf( "%s\n", $0 );
    }
}

function includeFile( includeLine,                                   fileName,
                                                                       ioStat,
                                                                         line )
{
    # get the file name from the @include line
    fileName = includeLine;
    sub(  /^ *@include */, "", fileName );
    sub(  / *$/,           "", fileName );
    sub(  / *#.*$/,        "", fileName );
    if( fileName ~ /^"/ )
    {
        # quoted file name
        sub(  /^"/,        "", fileName );
        sub(  /"$/,        "", fileName );
        gsub( /""/,      "\"", fileName );
    }
    printf( "#line 1 %s\n",    fileName );
    while( ( ioStat = ( getline line < fileName ) ) > 0 )
    {
        # have a source line
        printf( "%s\n", line );
    }
    if( ioStat < 0 )
    {
        # I/O error
        printf( "@include %s # not found or I/O error\n", fileName );
    }
    close( fileName );
    printf( "#line %d %s\n", NR, ( srcName != "" ? srcName : FILENAME ) );

} # includeFile

J

Preprocessing is a task which, if used at all, would more likely be tackled in J through monkey-patching (wrapping library definitions for names, overriding their prior definition).

Instead, here, we handle literal statements of the form load'script references' where 'load' appears at the beginning of the line (not indented) and 'script references' is a literal string, and nothing else appears on the line, and we replace any such line(s) with the content of the referenced script(s).

This approach is not recursive (and while it seems to offer little advantage over the native implementation of 'load', it does support 'load' inside multi-line string constants, as long as the reference(s) to the content being loaded would be supportable as a J script reference).

preproc=: {{
  lines=. <;.2 LF,~CR-.~fread y
  for_ndx. |.I.'load'-:"1 (4&{.@>) lines do.
    line=. ndx{::lines
    try. parse=. ;:line catch. continue. end.
    if. 3~:#parse do. continue. end.
    if. (<'load')~:{.parse do. continue. end.
    if. ''''~:(1;0){::parse do. continue. end.
    lines=. lines ndx}~ <;fread each getscripts_j_ ".1{::parse
  end.
  0!:0;lines
}}

Julia

Julia implements the include function, which includes a file into the current file for processing. Julia does JIT compiling, so neither the included file nor the file included into are necessarily compiled until their code is due to be run. One exception is the precompiling of modules, which allows preprocessed, precompiled code to be imported with the using or import keyword.

Since Julia already has an include function, adding another seems superfluous. However, one can be created for the purpose, and will be shown below. Such a Julia program could be compiled and run as the file preprocess.jl.

To use the program, run julia preprocess.jl infile.jl outfile.jl and include files using the standard Julia syntax. Calls to the include function that contain a single argument which is a string in parentheses will be preproccessed. Other calls to include with different arguments will not be preprocessed by preprocess.jl.

# preprocess.jl convert includes to file contenets

infile = length(ARGS) > 0 ? ARGS[1] : stdin
outfile = length(ARGS) > 1 ? ARGS[2] : stdout

function includefile(s)
    try
        m = match(r"(\s)include\(\"([^\"]+)\"\)(\s)", s)
        return m.captures[1] * read(m.captures[2], String) * m.captures[3]
    catch y
        @warn y
        return s
    end
end

input = read(infile, String)
output = replace(input, r"\sinclude\(\"[^\"]+\"\)\s" => includefile)
write(outfile, output)

Phix

Standard feature. Phix ships with a bunch of standard files in a builtins directory, most of which it knows how to "autoinclude", but some must be explicitly included (full docs). You can explicitly specify the builtins directory or not (obviously without it will look in the project directory first), and use the same mechanism for files you have written yourself. There is no limit to the number or depth of files than can be included. Relative directories are honoured, so if you specify a (partial) directory that is where it will look first for any sub-includes. You can also use single line "stub includes" to redirect include statements to different directories/versions. Note that namespaces are not supported by pwa/p2js. You can optionally use double quotes, but may then need to escape backslashes. Includes occur at compile time, as opposed to dynamically.

include builtins/complex.e
include complex.e             -- also valid
include "builtins\\complex.e" -- ditto

If the compiler detects that some file has already been included it does not do it again (from the same directory, two or more files of the same name can be included from different directories). I should perhaps also state that include handling is part of normal compilation/interpretation, as opposed to a separate "preprocessing" step, and that each file is granted a new private scope, and while of course there is only one "global" scope, it will use the implicit include hierarchy to automatically resolve any clashes that might arise to the most appropriate one, aka "if it works standalone it should work exactly the same when included in as part of a larger application".

And so on to the task as specified: Since the task specifies it "is about implementing a pre-processor for your language, not just describing it's features" and as per discussions on the talk page, and the above, a "preprocessor" for Phix would fail in so many ways it is simply not really worth attempting, and should certainly never actually be used.
The following will replace include statements with file contents, but do not expect it to work or do anything useful on any existing [Phix] code. Mutually recursive includes will cause a mutually recursive infinite loop, until you run out of memory, that is without adding some kind of "already done" stack to the following.

function preprocess(string filename)
    sequence inlines = get_text(filename,GT_LF_STRIPPED),
             outlines = {}
    for l=1 to length(inlines) do
        string line = inlines[l]
        if match("include ",line)=1 then
            line = trim(line[9..match("--",line)-1],{' ','\t','"'})
            outlines &= preprocess(line)
        else
            outlines &= line
        end if
    end for
    return outlines
end function

As the Wren entry eloquently puts it: The above code is limited in the sense that all top-level variables of the imported module (not just the specifically "global" ones) are now visible in the outer scope. Consequently, the code will no longer compile if there are any name clashes.

In fact pwa/p2js contains some code (see insert_dollars() in p2js.exw) that renames selected pre-known top-level variables in the standard includes, eg base64.e `aleph` -> `$aleph` to minimise such disruption, however there is (as yet) no such mechanism for user-written include files, though it is on the to-do list.

Raku

A file pre-processor is very nearly pointless for Raku. Raku is nominally an interpreted language; in actuality, it does single pass, just-in-time compilation on the fly as it loads the file. For libraries and constants (or semi constant variables) there is a robust built-in library (module) include system for code that can be pre-compiled and used on demand. use some-library; to import any objects exported by the module.

That's great for code references and constants, but isn't really intended for text inclusion of external files. Since the text file can not be pre-compiled, there isn't much point to having such a system. (For code at least, there are many different templating system modules for text templates; primarily, though not exclusively used for on-the-fly generation of dynamic HTML / XML documents.) One of those could certainly be bent into doing something like what this task is asking for, but are probably both over- and under-powered for the job.

they are mostly intended for text, not code.
they tend to be more geared toward inserting the content of a variable rather than the content of a file, although that could be subverted pretty easily.

So in Raku, there isn't a readily available system to do this, there isn't much point, and it's probably a bad idea...

Ah well, that never stopped us before.

A Raku script to do source filtering / preprocessing: save it and call it 'include'

unit sub MAIN ($file-name);
my $file = slurp $file-name;
put $file.=subst(/[^^|['{{' \s*]] '#include' \s+ (\S+) \s* '}}'?/, {run(«$*EXECUTABLE-NAME $*PROGRAM-NAME $0», :out).out.slurp(:close).trim}, :g);

This will find: any line starting with '#include' followed by a (absolute or relative) path to a file, or #include ./path/to/file.name enclosed in double curly brackets anywhere in the file.

It will replace the #include notation by the contents of the file referenced, will follow nested #includes arbitrarily deeply, and echo the processed file to STDOUT.

Let's test it out. Here is a test script and a bunch of include files.

Top level named... whatever, let's call it 'preprocess.raku'

# Top level test script file for #include Rosettacode
# 'Compiler/Simple file inclusion pre processor' task

# some code

say .³ for ^10;

#include ./include1.file

include1.file

# included #include1 file >1>
# test to ensure it only tries to execute #include of the right format

# more code
say .³³ for ^10;
# and more comments

#include    ./include2.file    # comments ok at end of include line
# <1<

include2.file

# nested #include2.file >2>
say "Test for an nested include inside a line: {{ #include ./include3.file }}";
# pointless but why not? <2<

include3.file

>3> Yep, it works! <3<

Invoke at a command line:

raku include preprocess.raku

Output:

# Top level test script file for #include Rosettacode
# 'Compiler/Simple file inclusion pre processor' task

# some code

say .³ for ^10;

# included #include1 file >1>
# test to ensure it only tries to execute #include of the right format

# more code
say .³³ for ^10;
# and more comments

# nested #include2.file >2>
say "Test for an nested include inside a line: >3> Yep, it works! <3<";
# pointless but why not? <2<# comments ok at end of include line
# <1<

You can either redirect that into a file, or just pass it back into the compiler to execute it:

raku include preprocess.raku | raku

Output:

0
1
8
27
64
125
216
343
512
729
0
1
8589934592
5559060566555523
73786976294838206464
116415321826934814453125
47751966659678405306351616
7730993719707444524137094407
633825300114114700748351602688
30903154382632612361920641803529
Test for an nested include inside a line: >3> Yep, it works! <3<

Note that this is not very robust, (it's 3 lines of code, what do you expect?) but it satisfies the task requirements as far as I can tell.

Wren

Wren already implements file inclusion via its import statement which can appear anywhere that a variable declaration can, including within a block, and can therefore be subject to a condition.

Any number of files can be imported in this way and can contain any Wren source code including other import statements to any level of nesting. However, Wren's virtual machine (VM) keeps track of the imported files and will only load a particular file once.

All the modules listed on the language's main page need to be imported in this way.

Although Wren is an interpreted language, source code is not interpreted directly. Instead a single pass compiler first converts the source code into a simple stack-based byte-code which is then interpreted by the VM.

However, there is no facility for outputting the byte-code to a separate file which can then be examined or manipulated prior to subsequent interpretation. Compilation and interpretation are both done as a single indivisible operation.

It will therefore be seen that Wren neither has nor needs a pre-processor; it doesn't support macros or other text substitution devices and imports are always in source code rather than binary form.

Nevertheless, it is possible to write a limited pre-processor in Wren (the VM itself is written in C):

import "io" for File

var source = File.read("source.wren")

var ix
var start = 0 
while ((ix = source.indexOf("import", start)) && ix >= 0) {
    var ix2 = source.indexOf("\n", ix + 6)
    if (ix2 == -1) ix2 = source.count
    start = ix + 1
    var imp = source[ix...ix2]
    var tokens = imp.split(" ").where { |s| s != "" }.toList
    var filePath = tokens[1][1...-1]
    if (filePath.startsWith("./")) {
        filePath = filePath[2..-1]
    } else if (filePath.startsWith("/")) {
        filePath = filePath[1..-1]
    } else {
        continue // leave resolution of other modules to compiler
    }
    var text = File.read(filePath + ".wren")
    source = source[0...ix] + "\n%(text)\n" + source[ix2..-1]
}

File.create("source2.wren") { |file| file.writeBytes(source) }

The above code is limited in the sense that all top-level variables of the imported module (not just the specifically imported ones) are now visible in the outer scope. Consequently, the code will no longer compile if there are any name clashes.

The obvious solution of placing the imported code in a block and then 'lifting' the specifically imported variables into the outer scope does not work because of Wren's rather strange scoping rules. If you did this, then the imported module's top level variables would no longer be top-level relative to the code as a whole and hence would no longer be visible to classes defined within the module itself! For example, if you try to run the following script, you get the error shown:

var B

{ // block starts here
    var A = 2
 
    class C { 
        static method1() {
            System.print(A)   // Error at 'A': Variable is used but not defined.   
        }
    }

    B = C
} // end of block

B.method()

Other problems include dealing with import statements which have been commented out (not catered for in the above pre-processor) and resolving import file paths which is not actually set in stone but depends on how Wren is being embedded. The above pre-processor code only deals with paths which are relative to the current directory.