Talk:Compiler/Preprocessor

From Rosetta Code

Constant math sentence removed

It is not a preprocessor's job to perform constant math - you certainly do not want a compiler that resolves #area(5,6)# to 30 but leaves 5*6 as a constant math expression.
"All constant math that can be performed before defining a new macro name must be." removed from the task description. --Pete Lomax (talk) 18:29, 24 May 2022 (UTC)

That was not my intention. I was thinking of macro definitions values not code values such as:

#define five 5
#define six five+1
/* Define five() as 5 */
/* Define six() as 6 */

--Jwells1213 (talk) 19:16, 24 May 2022 (UTC)

For a preprocessor to handle say (2+3*5-7)*13-12/3 it would need to implement precedence and associativity, and the task specify all that, which are all clearly the job of the compiler. There is absolutely no value in handling a few simple things such as 5*6 but not managing anything any trickier, or for that matter anything not in a macro. Besides the whole point of a preprocessor is to make the input easier to write/read, the output can be and often is as ugly as sin, which should normally only ever be read by a machine. --Pete Lomax (talk) 23:39, 24 May 2022 (UTC)

Parenthesis

Is there some special reason why you want to allow "#define width 6" and then permit it to be used as #area(width(),6)#, or vice-versa? And what would it achieve?
I believe the task should be changed such that the presence or absence of parenthesis on a definition makes it require or ignore parameters.
For instance you might want to invoke (say) power(5,6) on Windows but powmod(5,6) on Linux; allowing a #define without any () leaves any args undamaged.
Mind you, there's also no mention of dealing with eg #somemacro("1st, first", "2nd, second")#, ie the ", " mid-string. --Pete Lomax (talk) 23:39, 24 May 2022 (UTC)

I will change it to disallow parenthesis that are empty. --Jwells1213 (talk) 09:19, 25 May 2022 (UTC)

In my implementation, I've verified that number of arguments when used === number of parameters in the definition, so () in a definition is allowed, but must be matched by () in use, and gone with no () in the definition means it leaves any () alone, with preprocessor output of say 6() not being particularly meaningful and quite probably invalid, at some later stage, that is. --Pete Lomax (talk) 14:16, 25 May 2022 (UTC)

I think that makes more sense as:

#define five 5
#define six five+1
#define thirty six*five
size = #thirty#;

does not end up with 30 and:

#define five 5
#define six (five+1)
#define thirty six*five
size = #thirty#;

thinks six has an argument so:

#define five 5
#define six() (five+1)
#define thirty six()*five
size = #thirty#;

would be the easiest parsing method. --Jwells1213 (talk) 19:11, 25 May 2022 (UTC)

Why not have a Turing-complete preprocessor?

It has been claimed that you can even write arbitrary Turing-complete programming in the pre-processor with the C pre-processor:

https://www.ioccc.org/2015/muth/hint.html

But to me, such tour-de-forces of pre-processing are far too obfuscated. They keep the reader from easily understanding the source code when it finally arrives. --Wherrera (talk) 06:07, 25 May 2022 (UTC)

Definitely too complex for the language it is supporting. --Jwells1213 (talk) 09:19, 25 May 2022 (UTC)

Not sure about these #s

Are leading *and* trailing #s really needed around the macro invocations ?
Either the macro has parameters or it doesn't, so #abc(23) wouldn't be ambiguous as the macro definition would tell the pre-processor whether (23) is a parameter or not.
What if "#define include" appeared ? Or "#define define" ? Do we need more rules ?
--Tigerofdarkness (talk) 20:50, 25 May 2022 (UTC) I do not see the issue with using include and define as a macro name as those keywords are fixed to start at column 2 after a #.

I kind of assumed you could do multiple substitutions like:

#define five 5
#define six 6
call test(#five, six#)

but I agree it is not a huge difference to do:

call test(#five, #six)

a bigger question would be:

#define debug debug
call #debugtest()

recursive definitions or wanting to tack characters to a symbol for test call around real call or defining it as empty for live code. I think that macro recursion should stop before doing a symbol twice. If we want to change a symbol name based on a define, we need the closing #. --Jwells1213 (talk) 00:47, 26 May 2022 (UTC)

OK thanks. I assume that a macro application can't start in column 1, otherwise
#define include print
#include "aHeader.h"
#define something somethingElse
would print aHeader/h and not define something as somethingElse ?
--Tigerofdarkness (talk) 21:23, 26 May 2022 (UTC)

If we remove the closing hashtag, your include example would be an issue so we would have to reserve those words. --Jwells1213 (talk) 04:32, 27 May 2022 (UTC)

Actully, I see why you wanted the # at the start and end and I'm not now suggesting the ending one is removed.
The Task description doesn't specify whether a macro application can span multiple lines. E.g.: does
#define M(a,b) A and B
#M(bbc
  ,itv
  )
#

generate an error or:

bbc
   and itv

or something else ?

If it doesn't generate an error, then the #define include ... example above would be ambiguous.
--Tigerofdarkness (talk) 19:09, 27 May 2022 (UTC)


I am assuming as simple as possible given the simple language so all on one line. --Jwells1213 (talk) 00:16, 23 June 2022 (UTC)

Task quirks

This task delves into environment specific features, in its requirement that pipes be used. But different environments supply different kinds of pipes (and some environments -- such as web browsers -- do not provide a pipe implementation). I suspect that rosettacode should deprecate the use of "pipes" in task requirements.

Also, the task specifies that macros may have parameters, but does not provide much specification on how parameters are treated. The programmer is apparently "just supposed to know". That's probably fine for old school programmers who have extensive knowledge of C, but would be a hurdle for programmers who have never used the C language nor its preprocessor.

Also, the requirement that the macro implementation use the same naming convention as the language gets into interesting non-portable issues. (Consider perl, for example, where different variable types have a different kind of name.)

And, that raises the question of whether the strings identifying file names should use the language's string syntax or the example string syntax. Consider J, for example, which forbids the use of double quote characters as string delimiters.

Probably this task should be put into draft status, pending a resolution of these issues. --Rdm (talk) 01:27, 27 May 2022 (UTC)

I assumed two arguments first being input and second being output files defaulting to console when missing. Plus, a debug switch to provide more verbose comments optionally that can be randomly intermix with the files.
The substitution is meant to be string replacement where usage of the parameters get replaced with the specified arguments. In my mind, they should be unique tokens so not breaking into an existing name though there is a question as to the double hastag need for usage.
The compiler sample language is already using double quoted strings so this task is being consistent with it as another process the source goes through.
It's also worth thinking about whether macro expansion should happen inside strings of the target language or (worse) inside strings of the preprocessor's language. C's preprocessor treats strings specially, but that might not make sense here. C's preprocessor also handles C comments specially, if I remember right... Anyways, this isn't a full replication of C's preprocessor, but if we're filling in the blanks to fit the pattern, that can lead to a variety of issues in the implementations. --Rdm (talk) 05:20, 27 May 2022 (UTC)
As I understand it, the task is to create a pre-processor for the language the other Compiler/... tasks compile (i.e. the cut-down, declaration free C like language), not the language the pre-processor is written in, but that's a good point about strings, comments and character literals, all of which the language has. That macros are only expanded outside of strings, comments etc. should probably be specified in the task description (or if they *are* to be expanded inside strings, etc. it should say that).
--Tigerofdarkness (talk) 19:27, 27 May 2022 (UTC)
If I knew a draft status existed, I would have used it because in my mind I was looking for input on it. It is the first time I written such complex document by phone so was hard to remember what I said or what I only believed I said.
--Jwells1213 (talk) 04:25, 27 May 2022 (UTC)
  • Conceptually, the task would support an implementation with multiple preprocessor definitions for the same name, which are distinguished by the length of the argument list (and two potential definitions for an empty arument list: one with parenthesis, one without). (This might be unintended.)
Personally, it should error a redefinition of a name as we are going for as simple as possible and overloading a name adds complexity.
--Jwells1213 (talk) 21:36, 11 August 2022 (UTC)
Shouldn't this kind of thing go into the task definition? (The task declares that redefinition is illegal, but we have plenty of real programming languages where redefinition is illegal which support non-conflicting definitions for the same name - Haskell, for example). --Rdm (talk) 23:22, 11 August 2022 (UTC)
Task definition has been updated to disallow name overloading. --Jwells1213 (talk) 15:15, 18 August 2022 (UTC)
  • The task does not go into detail about arguments. So, for example, a single right parenthesis might be interpreted as an argument. (This might be intended but is worth mentioning).
The intention was anything that would fit the grammar of the language after substitution. I have some ideas, but limited usage experience so was hoping on input of those with experience.
--Jwells1213 (talk) 21:36, 11 August 2022 (UTC)
  • The task's specification always loses track of the original input file name, which influences downstream error reporting. (This is probably an acceptable simplification for rosettacode, but is worth mentioning.)
Since you could view the output file to have the details, I felt it was an acceptable method.
--Jwells1213 (talk) 21:36, 11 August 2022 (UTC)
  • The task's specification always loses track of original line numbers following any #include statement, which also influences downstream error reporting. (This is probably an acceptable simplification for rosettacode, but is worth mentioning.)
Since you could view the output file to have the details, I felt it was an acceptable method.
--Jwells1213 (talk) 21:36, 11 August 2022 (UTC)

--Rdm (talk) 17:23, 11 August 2022 (UTC)

Parameter lists are currently underspecified

After thinking about this a bit more:

I suspect that parameter lists for this language assume that parameter lists are enclosed in parenthesis, but as currently specified, parenthesis in the expansion could be optional. Similarly, it's not clear if commas are required -- given the ## delimiters, it should be safe to treat all non-variable-name characters from inside the macro reference as whitespace. These are not necessarily defects in the specification...

That said... I suspect that the intent here is that parameter names in the substituted text in the #define line are valid tokens in the target lexical analyzer syntax, but in some contexts it might be more convenient if they were not. In either case, this aspect should be specified.

It's possible that the intent here is that only one macro expansion can appear on a line, but probably that's not a requirement. --Rdm (talk) 04:00, 23 June 2022 (UTC)

I will address defining them better. The intent is multiple expansions on the line. The intent was valid lexical tokens with the user being responsible to handle uniqueness to prevent incorrect substitutions. Each token in the parameters were separated by commas. --Jwells1213 (talk) 20:47, 28 June 2022 (UTC)

Please don't just change the task now, but instead detail any proposed changes here so they can be discussed. I can't see what the problem is, I can't see how the current specification makes parenthesis optional, and I have no idea what might be "more convenient"... --Pete Lomax (talk) 01:36, 29 June 2022 (UTC)
The current spec would allow #example 1 2 3# or maybe even #example<<1; 2; 3>># for macro expansions. And, generally speaking, it doesn't address which characters and/or context get handled in different ways during macro expansion. That said, it might instead allow nested expansions such as #F(#G(1)#, 2)# if parenthesis are significant in parameter lists. And then there's issues like languages where # is valid in an identifier (perhaps some Forth dialect, for example). --Rdm (talk) 02:07, 29 June 2022 (UTC)

Nested would be handled by #F(G(1), 2)# as everything between the #'s are treated as a macro first so if F and or G are defined they would substitute. Since the only language we are supporting with this, is that supported by lexical analyzer Forth or other is irrelevant. Parenthesis are optional when no parameters exist.

This is my suggestion to add to the draft:
Parameters when existing must be stated within parenthesis with comma separated valid names in our sample language. That string in the definition gets replaced with the corresponding argument from the usage and both must have the same number of comma separated items. If any of the replacements are addition macros, these too get processed. --Jwells1213 (talk) 02:56, 29 June 2022 (UTC)

The current specification does not distinguish between the target language and the host language when talking about "variable names" in the macros. But "variable names in the target language" would indeed be a valid interpretation and would address that ambiguity.
I'm not sure that nested macro substitution is desirable here -- I was raising that as an issue to point out that the specification could be adapted to allow them, without removing any part of the current text.
That said, there's a sentence in the specification -- "You may not assume the usage proceeds in an order to form complex combinations" -- whose implications are not clear to me. --Rdm (talk) 04:08, 29 June 2022 (UTC)
I took that to mean that if say a "capacity" macro resolves to "area", you cannot assume that the area macro is checked for/applied after the capacity macro. In other words while "#area(height, width)#" is fine, "#define area height * width" is explicitly not? Then again some of the comments on this page suggest otherwise. I might argue that any such substitutions must occur at macro definition, and not at macro application time, and it w/c/should perhaps be "#define area #height# * #width#'?? Either way, this should be a simple starting point and not something intended to be used in anger. --Pete Lomax (talk) 14:21, 29 June 2022 (UTC)

I meant it to address:

#define test(h) hrea(5,6)
#test(a)#

The substitution would replace all 'h' with 'a' making area(5,6). This is not guaranteed to get detected or processed. Only macros in the original that are fully detected as a unit would be.

#define area(h, w) (h * w)
#define five 5
#define test(h) area(h, 6)
#test(five)#

should detect and replace test, five, area, h, and w yielding.

(5 * 6)

--Jwells1213 (talk) 22:53, 30 June 2022 (UTC)

I do not have a issue with

#define area(#h#, #w#) (#h# * #w#)
#define five 5
#define test(#h#) #area(#h#, 6)#
#test(#five#)#

or some other form that makes it more understandable. But, feel that the usage must handle buried macros to make the feature useful. Otherwise, you would end up with code that does not compile because macros definitions are gone but their usage stayed around. --Jwells1213 (talk) 23:23, 30 June 2022 (UTC)

If parameter substitution is textual rather than based on delimited variable names in the defined macro body, that introduces issues with definitions like #define ex(hh,h,hhh) hhhhhhhhhhhhhhhhh --Rdm (talk) 00:21, 1 July 2022 (UTC)
Trust me, I get your point, but let's be honest that specific example is utter nonsense and completely ambiguous - and if I can't figure out what it is supposed to mean there is no way that a preprocessor ever could. Perhaps a better (pseudo) example might be #define test(h) hrea(width,height)  and #test(a)# where you want hrea -> area without mangling width or height, and of course without delimiters there is no way to (use a 'h' to) specify that. Or you could just stipulate macro definitions must be unambiguous and (eg) define it instead as #define test(H) Hrea(width,height)  --Pete Lomax (talk) 14:17, 1 July 2022 (UTC)
It was definitely a quick and dirty example, meant only to highlight an ambiguity. The "useful" cases tend to be either: choosing identifiers, building strings or building expressions. That said, for rosettacode purposes, having an unambiguous specification of the desired result is probably more important than the result being useful. (We need unambiguous specifications for the implementations to be comparable with each other. We need the specifications to focus on results to allow room for the implementations to provide relevant information about how people natively approach these kinds of problems.) --Rdm (talk) 14:32, 1 July 2022 (UTC)

Macros by example

Hi. This is an attempt at specifying preprocessor macros for this task. Hopefully it will enable us to improve the task description, once a consensus has been reached.

Please also think about edge cases, like:
   
#define PI 3.14159
#define area(r) 2*#PI#*r*r
#define area(h,w) h*w
area = #area(9)#/#PI#
square = area(3,4)

Note that the task as currently written would allow multiple "non-conflicting" definitions for the same name -- it only forbids redefinition.--Rdm (talk) 12:45, 14 August 2022 (UTC)
Line 3 from your example would be considered an error. Anything other than an identical macro redefinition is an error condition. I've tried to cover this case below with "Redefine a function-like macro where arguments differ" and "Redefine a function-like macro where expressions differ". This is broadly similar to the behaviour of the GNU C preprocessor, although I believe cpp only shows a warning in such cases. --Jgrprior (talk) 14:54, 14 August 2022 (UTC)

Refer to the lexical analyzer task for what makes a valid identifier. Macro substitution only occurs on whole identifier "tokens".


Function-like macro definition and usage.

Input

#define area(h, w) h * w
area = #area(5, 6)#;

Expected output

area = 5 * 6;

Object-like macro definition and usage.

Input

#define PI 3.14
circumference = 2 * #PI# * r

Expected output

circumference = 2 * 3.14 * r;

Object-like macro definition with parentheses. Whitespace between the macro name and first parenthesis is significant.

Input

#define f ((30 * 9/5) + 32)
result = 5 * #f#;

Expected output

result = 5 * ((30 * 9/5) + 32);

Call an object-like macro as if it were a function-like macro.

Input

#define f ((30 * 9/5) + 32)
result = 5 * #f()#;

Expected output

result = 5 * ((30 * 9/5) + 32)();

Call an object-like macro as if it were a function-like macro, including arguments.

Input

#define f ((30 * 9/5) + 32)
result = 5 * #f(7, 42)#;

Expected output

result = 5 * ((30 * 9/5) + 32)(7, 42);

Use a function-like macro as if it were an object-like macro.

Input

#define area(h, w) h * w
area = #area#;

Expected output

area = #area#;

Call a function-like macro with too many arguments.

Input

#define area(h, w) h * w
area = #area(2, 3, 4)#;

Expected error

error: macro "area" passed 3 arguments, but takes just 2

Call a function-like macro with too few arguments.

Input

#define area(h, w) h * w
area = #area(2)#;

Expected error

error: macro "area" requires 2 arguments, but only 1 given

Redefine a function-like macro with matching arguments and expression (think including the same header multiple times).

Input

#define area(h, w) h * w
#define area(h, w) h * w
area = #area(5, 6)#;

Expected output

area = 5 * 6;

Redefine a function-like macro where arguments differ.

Input

#define area(h, w) h * w
#define area(x, y) h * w
area = #area(5, 6)#;

Expected error

error: "area" redefined

Redefine a function-like macro where expressions differ.

Input

#define area(h, w) h * w
#define area(h, w) h * w * 2
area = #area(5, 6)#;

Expected error

error: "area" redefined

Redefine an object-like macro with identical constant (think including the same header multiple times).

Input

#define PI 3.14
#define PI 3.14
circumference = 2 * #PI# * r

Expected output

circumference = 2 * 3.14 * r;

Redefine an object-like macro with different constant.

Input

#define PI 3.14
#define PI 42
circumference = 2 * #PI# * r

Expected error

error: "PI" redefined

Function-like macro usage with blank arguments.

Input

#define area(h, w) h * w
area = #area(5,)#;
area = #area(,)#;

Expected output

area = 5 * ;
area =  * ;

Function-like macro usage with object-like macro arguments. Substitution occurs when an Identifier token (as defined in the lexical analyzer task) matches a defined macro name.

Input

#define area(h, w) h * w
#define width 5
#define height 6
area = #area(width, height)#;
area = #area(widthwidth, height)#;
area = #area(x, height)#;

Expected output

area = 5 * 6;
area = widthwidth * 6;
area = x * 6;

Function-like macro definition with macros in the expression.

Input

#define width 5
#define area(h) h * width
#define height 6
area = #area(height)#;

Expected output Expected output

area = 6 * 5;

Substitution occurs when an Identifier "token" (as defined in the lexical analyzer task) matches a defined macro name.

Input

#define w 5
#define area(h) h * wwww
#define height 6
area = #area(height)#;

Expected output

area = 6 * wwww;

Substitution occurs when a macro is called, not when it is defined.

Input

#define area(h) h * width
#define width 5
#define height 6
area = #area(height)#;

Expected output

area = 6 * 5;

--Jgrprior (talk) 08:11, 14 August 2022 (UTC)