Compiler/Simple file inclusion pre processor
- Task
Many programming languages allow file inclusion, so that for example, standard declarations or code-snippets can be stored in a separate source file and be included into the programs that require them at compile-time, without the need for cut-and-paste.
Probably the C pre-processor is the most well-known example.
Create a simple pre-processor that implements file-inclusion for your language.
The pre-processor need not implement macros, conditional compilation, etc.
The syntax accepted for your pre-processor should be as per the standard for your language, e.g. for C the pre-processor must recognise and handle "#include" directives. For PL/1, %include statements would be processed and for COBOL, COPY statements, etc.
If your language does not have a standard facility for file-inclusion, implement that of used by a popular compiler for the language.
If there is no-such feature (e.g. more recent OO languages use import/using/etc. statements to include pre-compiled class definitions), either use the C style #include directive or choose something of your own invention.
State the syntax expected and any limitations, including whether nested includes are supported and if so, how deep the nesting can be.
If possible, implement your pre-processor as a filter, i.e. read the main source file from standard input and write the pre-processed source to standard output.
NOTE to anyone who uses the pre-processors on this page:
They are supplied as-is, with no warrenty - use at your own peril : )
ALGOL 68
Should work with any Algol 68 implementation that uses upper-stropping.
Implements file inclusion via pragmatic comments as in ALGOL 68G.
A pragmatic comment such as PR read "somefile.incl.a68" PR
or PR include "somefile.incl.a68" PR
can appear anywhere in the source and will cause the text of somefile.incl.a68 to be included at that point (Note, ALGOL 68G does not support "include" as an alternative to "read").
The PR...PR will not be recognised inside comments or string literals and cannot appear inside a symbol, i.e. 1PR...PR2 if 1 followed by a pragmatic comment followed by 2.
PR can also be written as PRAGMA.
In ALGOL 68G, PR read ...
only includes the file if it has not already been included. This implementation does not check for this and so includes the file everytime it is referenced.
Includes can be nested to a depth of 10.
<lang algol68># Algol 68 pre-processor #
- Processes read/include pragmas in an Algol 68 upper stropping source #
- It is assumed _ is allowd in tags and bold words #
- It is assumed that {} are alternatives for () as in ALGOL 68G #
- to use {} as ALGOL 68RS/algol68toc style nestable brief-comments, #
- change rs style brief comments to TRUE #
- ALGOL 68G ( and probably other compilers ) allow quote-stropped bold #
- words to appear in an otherwise upper-stropped source, #
- e.g. BEGIN 'SKIP' END would be a valid program #
- this is not supported here so pragmatic comments such as: #
- 'PR' read "someFile" 'PR' will cause problems #
- pragmatic comments should be disabled by PR pragmats off PR, #
- this is not implemented #
- the read/include must be in lower case #
- ALGOL 68G's read pragmatic comment only includes the file the first time #
- it is mentioned in a read pragmatic comment - this is not implemented #
- here, the file is included each time #
BEGIN
# TRUE if {} delimits a nestable brief comment, as in ALGOL 68RS and # # algol68toc, FALSE if {} are alternatives to () as in ALGOL 68G # BOOL rs style brief comments = FALSE;
# input file information # MODE INFILE = STRUCT( REF FILE src # actual source file # , STRING line # latest source line # , INT pos # character position in line # ); # initialises the INFILE f to be associated with the FILE src # PRIO INIT = 9; OP INIT = ( REF FILE src, REF INFILE f )REF INFILE: BEGIN line OF f := ""; pos OF f := 1 + UPB line OF f; src OF f := src; set eof handler( f ); f END # INIT # ; # TRUE if EOF has been reached, FALSE otherwise # BOOL at eof := FALSE; CHAR c := " "; # newline character # CHAR nl = REPR 10; # maximum number of include files that can be nested # INT max include depth = 10; # source file stack # [ 0 : max include depth ]INFILE in stack; # current include depth # INT include depth := 0; # number of errors reported # INT error count := 0;
# sets the logical file end procedure of the specified file to a routine # # that allows us to detect EOF on a source file # PROC set eof handler = ( REF INFILE inf )VOID: on logical file end( src OF inf , ( REF FILE f )BOOL: BEGIN # note that we reached EOF on the # # latest read # IF NOT at eof THEN # first time we have spotted eof, # # we need to call newline so that # # if the last line didn't have a # # newline at the end, it is still read # # however that will call this routine # # so we have to ensure we only do it # # once # at eof := TRUE; newline( f ) FI; # return TRUE so processing can continue # TRUE END );
# reports an error # PROC error = ( STRING message )VOID: BEGIN error count +:= 1; print( ( newline, newline, "**** ", message, newline ) ) END # error # ; # gets the next source character, handling end-of-file on include files # # the source character is stored in c # PROC next char = VOID: BEGIN WHILE BOOL read again := FALSE; REF INFILE s = in stack[ include depth ]; IF pos OF s <= UPB line OF s THEN # not past the end of the source line # c := ( line OF s )[ pos OF s ]; pos OF s +:= 1 ELIF # past the end of the current source line - get the next # at eof := FALSE; get( src OF s, ( line OF s, newline ) ); NOT at eof THEN # got a new line # line OF s +:= nl; pos OF s := LWB line OF s; read again := TRUE ELIF include depth = 0 THEN # eof on the main source # line OF s := "" ELSE # got eof on an include file # include depth -:= 1; read again := TRUE; at eof := FALSE; close( src OF s ) FI; read again DO SKIP OD END # next char # ; # returns TRUE if the current character is whitespace # PROC have whitespace = BOOL: c <= " "; # returns TRUE if the current character is a string delimiter # PROC have string delimiter = BOOL: c = """"; # returns TRUE if the current character can start a bold word # PROC have bold = BOOL: c >= "A" AND c <= "Z"; # returns TRUE if the current character can start a brief tag # PROC have tag = BOOL: c >= "a" AND c <= "z"; # reports an unterminated construct ( e.g. string, comment ) # PROC unterminated = ( STRING construct )VOID: error( "Unterminated " + construct ); # outputs ch to stand out # PROC put char = ( CHAR ch )VOID: IF ch = nl THEN print( ( newline ) ) ELSE print( ch ) FI; # outputs str to stand out # PROC put string = ( STRING str )VOID: print( ( str ) ); # outputs a brief comment to stand out # # end char is the closing delimiter, # # nested char is the opening delimiter for nestable brief comments # # if nested char is blank, the brief comment does not nest # # this handles ALGOL 68RS and algol68toc style {} comments # PROC skip brief comment = ( CHAR end char, CHAR nested char )VOID: BEGIN put char( c ); WHILE next char; NOT at eof AND c /= end char DO IF c = nested char AND NOT have whitespace THEN # nested brief comment # skip brief comment( end char, nested char ) ELSE # notmal comment char # put char( c ) FI OD; IF at eof THEN # unterminated comment # unterminated( """" + end char + """ comment" ); c := end char FI; put char( c ); next char END # skip brief comment # ; # gets a string of spaces from the source # PROC get whitespace = STRING: BEGIN STRING result := ""; WHILE NOT at eof AND have whitespace DO result +:= c; next char OD; result END # get whitespace # ; # gets a string denotation from the source # PROC get string = STRING: BEGIN STRING result := ""; # within a string denotation, "" denotes the " character # WHILE have string delimiter DO WHILE result +:= c; next char; NOT at eof AND NOT have string delimiter DO SKIP OD; IF NOT have string delimiter THEN # unterminated string # unterminated( "string" ); c := """" FI; result +:= c; next char OD; result END # get string # ; # gets a string denotation from the source without the quotes # PROC get unquoted string = STRING: BEGIN STRING result := ""; # within a string denotation, "" denotes the " character # WHILE have string delimiter DO WHILE next char; NOT at eof AND NOT have string delimiter DO result +:= c OD; IF NOT have string delimiter THEN # unterminated string # unterminated( "string" ) FI; next char; IF have string delimiter THEN # embedded string delimiter # result +:= c FI OD; result END # get unquoted string # ; # gets a bold word from then source # PROC get bold word = STRING: BEGIN STRING result := ""; WHILE have bold OR c = "_" DO result +:= c; next char OD; result END # get bold word # ; # geta a brief tag from the source # PROC get tag = STRING: BEGIN STRING result := "" ; WHILE have tag OR c = "_" DO result +:= c; next char OD; result END # get tag # ; # copies the source to the output until a bold word is encountered # PROC skip to bold = STRING: IF at eof THEN "" ELSE STRING result := ""; WHILE put char( c ); next char; NOT at eof AND NOT have bold DO SKIP OD; IF NOT at eof THEN result := get bold word FI; result FI # skip to bold # ; # handles a bold PRAGMA, COMMENT or other bold word # PROC bold word or pragment = VOID: IF STRING bold word := get bold word; bold word = "CO" OR bold word = "COMMENT" THEN # have a bold comment # STRING delimiter = bold word; WHILE put string( bold word ); bold word := skip to bold; NOT at eof AND bold word /= delimiter DO SKIP OD; IF at eof THEN # unterminated commant # unterminated( """" + delimiter + """ comment" ) FI; put string( delimiter ) ELIF bold word = "PR" OR bold word = "PRAGMA" THEN # have a pragmatic comment - could be file inclusion # STRING delimiter = bold word; STRING pragment := bold word; STRING op := ""; STRING file name := ""; # skip spaces after the PR/PRAGMA # pragment +:= get whitespace; # get the operaqtion, if there is a tag # IF have tag THEN # have an operation # op := get tag; pragment +:= op + get whitespace FI; # get the file name, if there is one # IF have string delimiter THEN # have a file name # file name := get unquoted string; pragment +:= file name + get whitespace FI; # should now have the closing delimiter # IF have bold THEN # have a bold word - could be the delimiter # pragment +:= ( bold word := get bold word ) FI; IF ( op /= "read" AND op /= "include" ) OR file name = "" OR bold word /= delimiter THEN # not a read/include pragmatic comment # put string( pragment ); WHILE bold word := skip to bold; NOT at eof AND bold word /= delimiter DO SKIP OD; IF at eof THEN # unterminated commant # unterminated( """" + delimiter + """ pragmatic comment" ) FI; put string( delimiter ) ELIF # attempt to include the file # include depth >= UPB in stack THEN # max include depth exceeded # put string( pragment ); error( "Include files nested too deply: " + file name ) ELIF REF FILE inc := HEAP FILE; open( inc, file name, stand in channel ) /= 0 THEN # couldn't open the file # put string( pragment ); error( "Unable to include: " + file name ) ELSE # file opened OK # in stack[ include depth +:= 1 ] := inc INIT HEAP INFILE FI ELSE # some other bold word # put string( bold word ) FI # bold word or pragment # ;
# copy the source to stand out, expanding read/incldue pragmas #
in stack[ include depth := 0 ] := stand in INIT HEAP INFILE; next char; WHILE NOT at eof DO IF c = "#" THEN # brief comment # skip brief comment( "#", " " ) ELIF c = "{" AND rs style brief comments THEN # nestable brief comment ( ALGOL 68RS and algol68toc ) # skip brief comment( "}", "{" ) ELIF have string delimiter THEN # STRING or CHAR denotation # put string( get string ) ELIF have bold THEN # have a bold word # bold word or pragment ELSE # anything else # put char( c ); next char FI OD;
IF error count > 0 THEN # had errors processing the source # print( ( "**** ", whole( error count, 0 ), " errors", newline ) ) FI
END</lang>
- Output:
Pre-processing the following program:
PR include "ex1.a68" PR
where ex1.a68 contains:
BEGIN PR precision 200 PR INT x := 1; PR read "in1.incl.a68" PR END
and in1.incl.a68 contains:
IF x > 0 THEN print( ( x, newline ) ) FI
Produces the following output:
BEGIN PR precision 200 PR INT x := 1; IF x > 0 THEN print( ( x, newline ) ) FI END