I'm working on modernizing Rosetta Code's infrastructure. Starting with communications. Please accept this time-limited open invite to RC's Slack.. --Michael Mol (talk) 20:59, 30 May 2020 (UTC)


From Rosetta Code
Gotchas is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In programming, a gotcha is a valid construct in a system, program or programming language that works as documented but is counter-intuitive and almost invites mistakes because it is both easy to invoke and unexpected or unreasonable in its outcome.


Give an example or examples of common gotchas in your programming language and what, if anything, can be done to defend against it or them without using special tools.

6502 Assembly[edit]

Numeric Literals[edit]

Integer literals used in instruction operands need to begin with a #, otherwise, the CPU considers them to be a pointer to memory. This applies to any integer representation, even ASCII.

LDA 'J'  ;load the 8-bit value stored at memory address 0x004A into the accumulator.
OR 3  ;bitwise OR the accumulator with the 8-bit value stored at memory address 0x0003
LDA #'7' ;load the ASCII code for the numeral 7 (which is 0x37) into the accumulator.

However, data blocks do not get the # treatment:

byte $45  ;this is the literal constant value $45, not "the value stored at memory address 0x0045"

Memory-Mapped Hardware[edit]

Memory-mapped hardware is a huge source of gotchas in and of itself. These hardware ports are addressed as though they were memory, but are not actually memory.

  • Reading a port doesn't necessarily give you the last value written to it, unlike genuine RAM.
  • Some ports are read-only, where some are "write-only."
  • Certain instructions, such as INC and DEC, read from a value and write back to it. This can count as two accesses to a memory-mapped port (if the port cares about that, not all do.) A simple LDA or STA represents a single access.
  • Generally speaking, you'll only be able to use STA,STX, or STY to write to ports. You'll need to read the documentation for your hardware.

Some examples of bad memory-mapped port I/O for various hardware:

INC $2005 ;the intent was to scroll the NES's screen to the right one pixel. That's not gonna happen. 
;What actually happens? Who knows! (I made this mistake once long ago.)

Inverted Carry[edit]

The carry flag is "backwards" compared to most languages with regard to comparisons and subtractions. On most CPUs, carry set is used to mean "less than", and carry clear is used to mean "greater than or equal." The 6502 is the opposite!

LDA #$20
CMP #$19
BCS foo  ;this branch is always taken, since #$20 >= #$19. If this were any other CPU this branch would never be taken!

Not only that, carry clear indicates a borrow when subtracting. This is also the opposite to most CPUs. To do a normal subtraction you need to set the carry before subtracting.

LDA #8
SBC #4 ;eight minus four


The 6502 (and even its revisions) have no way to add or subtract without involving the carry flag. The "carry" is essentially the same as "carrying the one" that we all learned in elementary school arithmetic. ADC adds an extra 1 if the carry flag was set when the ADC was executed. SBC subtracts an extra 1 if the carry was clear when the SBC was executed (as stated before, on most other CPUs the equivalent of SBC behaves the opposite to the 6502).

Therefore, any time you want to add two numbers without involving the carry flag you have to do this:

ADC ___ ;your operand/addressing mode of choice goes here

Failure to correctly use the carry flag can often result in unexpected "off-by-one" errors.

Bit Rotates[edit]

Bit rotates are always performed "through carry." This chart will illustrate the concept:

Before     Carry Before           After       Carry After
%11000000  0              ROL     %10000000   1
%10000000  1              ROL     %00000001   1
%00000001  1              ROL     %00000011   0

If you were expecting ROL to immediately transform %10000000 into %00000001, there is no single 6502 instruction that can do this. However, it can be achieved using macros.

As the above chart implied, all bit rotates depend on the value of the carry before the rotate, so make sure you take that into account.

Page Boundaries[edit]

A "page" is a 256-byte region of memory, spanning from $xx00 to $xxFF. This will often be referred to as "page xx." (e.g. page 03 = the $0300 to $03FF memory range.)

An instruction that begins on one page and ends on another is said to "cross a page boundary." This often leads to a minor performance hit, in that the instruction may take an extra clock cycle it normally wouldn't if every byte of the instruction was on the same page. Typically this only applies to the complex addressing modes that use index registers.

However, the 6502 has a couple bugs regarding the following addressing modes, and they are very similar in how they operate.

  • When using $nn,x or $nn,y, if $nn + x or $nn + y exceeds 255, the instruction will wrap around back to $00 rather than advancing to $0100. For a more concrete example:
LDA $80,X ;loads from address $007F, not $017F

In other words, an indexed zero page addressing mode cannot exit the zero page.

It should be noted that the above bug does not apply to indexed absolute addressing.

LDA $2080,X ;loads from the correct address regardless of the value of X.

This also happens with the indirect JMP operation, which jumps to the 16-bit address stored at the specified address.

LDA #$20
STA $3000
LDA #$40
STA $3001
JMP ($3000) ;evaluates to JMP $4020

The following will not execute in the way you expect, because this instruction has a similar bug where it doesn't advance to the next page when calculating the address.

LDA #$20
LDA #$40
STA $3100
JMP ($30FF) ;rather than take the high byte from $3100, the high byte is taken from $3000 instead.

As long as you don't put a number that looks like $nnFF in the parentheses as shown above, you can avoid this bug entirely.

68000 Assembly[edit]

Numeric Literals[edit]

Integer literals used in instruction operands need to begin with a #, otherwise, the CPU considers them to be a pointer to memory. This applies to any integer representation, even ASCII.

MOVE.L $12345678,D0  ;move the 32-bit value stored at memory address $12345678 into D0
MOVE.L #$12345678,D0 ;load the D0 register with the constant value $12345678

However, data blocks do not get the # treatment:

DC.B $45  ;this is the literal constant value $45, not "the value stored at memory address 0x0045"

LEA Does Not Dereference[edit]

When dereferencing a pointer, it is necessary to use parentheses. For these examples,

MOVEA.L #$A04000,A0  ;load the address $A04000 into A0
MOVE.L A0,D0 ;move the quantity $A04000 into D0
MOVE.B (A0),D0 ;get the 8-bit value stored at memory address $A04000, and store it into the lowest byte of D0.
MOVE.W (4,A0),D1 ;get the 16-bit value stored at memory address $A04004, and store it into the low word of D1.

However, the LEA instruction (load effective address) uses this parentheses syntax, but does not dereference! For extra weirdness, you don't put a # in front of a literal operand either.

LEA $A04000,A0  ;effectively MOVEA.L #$A04000,A0
LEA (4,A0),A0 ;effectively ADDA.L #4,A0

Partitioned Registers[edit]

One key feature of the 68000 is that its instructions can operate at different "lengths" (8-bit, 16-bit, or 32-bit). When performing an operation at the "word length" (16-bit), only the least significant 16 bits are affected, and the rest of the register is ignored. The flags also only reflect the result of the calculation with respect to the instruction's "length", not the entire register. For example:

MOVE.L #$12345678,D0 ;set the entire register to a known value for demonstration purposes.
MOVE.W #$7FFF,D0 ;D0 = $12347FFF
ADD.W #1,D0 ;D0 = $12348000
TRAPV ;the above operation set the overflow flag, so this instruction will call the signed overflow handler.
;Even though the entire register didn't overflow, the portion we were operating on did, so that counts.

As implied by the previous example, loading a value at a length less than 32 bits into a register will leave the "high bits" the same. This can often cause subtle errors that lead to your program failing unexpectedly.

MOVE.B #16-1,D0 ;loop 16 times
; loop body goes here
DBRA D0,forloop

The above code is flawed in that DBRA (and its cousins) operate at word length. Given that, and the fact that we only loaded the loop counter at byte length, the loop will execute $nn10 times instead of the intended $10 times, where $nn is the prior value of the register being used as the loop counter.

On most RISC architectures, loading a value less than 32 bits will clear the rest of the register. This is NOT the case on the 68000. Often, you'll need to "sanitize" the register you're using by clearing its upper bits yourself, using AND.W #$FF or AND.L #$FFFF.

Automatic Sign-Extension[edit]

When moving values into address registers at word length, the value is sign-extended first.

MOVEA.W #$8000,A0 ;MOVEA.L #$FF8000,A0

There is no sign-extension when moving values into data registers.

MOVE.W #$FF,D0   ;MOVE.W #$00FF,D0
MOVE.L #$8000,D2 ;MOVE.L #$00008000,D2

If you want sign-extension on data registers, you'll need to do it manually:

EXT.W D0 ;D0 = $????FFFF
MOVE.L #$8000,D1
EXT.L D1 ;D1 = $FFFF8000

MIPS Assembly[edit]

Delay Slots[edit]

Due to the way MIPS's instruction pipeline works, an instruction placed after a branch instruction is executed during the branch, even if the branch is taken.

move $t0,$zero   ;load 0 into $t0
beqz $t0,myLabel ;branch if $t0 equals 0
addiu $t0,1  ;add 1 to $t0. This happens during the branch, even though the program counter never reaches this instruction.

Now, you may think that the 1 gets added first and therefore the branch doesn't take place since the conditions are no longer true. However, this is not the case. The condition is already "locked in" by the time addiu $t0,1 finishes. If you compared again immediately upon arriving at myLabel, the condition would be false.

The easiest way to fix this is to put a NOP (which does nothing) after every branch.

On earlier versions of MIPS, this also happened when loading from memory. The register you loaded into wouldn't have the new value during the instruction after the load. This "load delay slot" doesn't exist on MIPS III (which the Nintendo 64 uses) but it does exist on the PlayStation 1.

la $a0,0xDEADBEEF
lw $t0,($a0) ;load the 32-bit value at memory address 0xDEADBEEF
addiu $t0,5  ;5 is actually added BEFORE the register has gotten its new value from the memory load above. It will be clobbered.

Like with branches, putting a NOP after a load will solve this problem.


Issues with array rank and type should perhaps be classified as gotchas. J's display forms are not serialized forms and thus different results can look the same.

   ex1=: 1 2 3 4 5
ex2=: '1 2 3 4 5'
1 2 3 4 5
1 2 3 4 5
11 12 13 14 15
|domain error

Also, constant values with a single element are "rank 0" arrays (they have zero dimensions) while constant values with some other count of elements are "rank 1" arrays (they have one dimension -- the count of their elements -- they are lists).

Thus, 'a' is type character, while 'abc' is type list of characters (or type list of 3 characters, depending on how you view type systems). This can lead to surprises for people who are inexperienced with the language and who are working from example (todo: list some examples of this).

Another gotcha with J has to do with function composition and J's concept of "rank". Many operations, such as + are defined on individual numbers and J automatically maps these over larger collections of numbers. And, this is significant when composing functions. So, a variety of J's function composition operators come in pairs. One member of the pair composes at the rank of the initial function, the other member of the pair composes at the rank of the entire collection. Picking the wrong compose operation can be confusing for beginners.

For example:
   1 2 3 + 4 5 6
5 7 9
+/ 1 2 3 + 4 5 6
1 2 3 +/@:+ 4 5 6
1 2 3 +/@+ 4 5 6
5 7 9

Here, we are adding to lists and then (after the first sentence) summing the result. But as you can see in the last sentence, summing the individual numbers by themselves doesn't accomplish anything useful.


There are several "gotchas" in Julia related to when and how a variable or object is considered constant versus non-constant. In Julia, a global object declared as `const` is constant:

const x = 2
x = 1 # will trigger a JIT Julia compiler error

However, arrays in Julia are mutable even if the variable name of the array is constant:

const a = [1, 2]
push(a, 4); a[2] = 0; # No error, and `a` is now [1, 0, 4]
a = [0, 0] # compiler error triggered by this, since we are assigning `a` itself not its mutable contents

If you want the contents of a list to be immutable, make a tuple instead of an array:

t = (1, 2, 3)  # now t[2] = 0 is flagged as an error

In Julia, a `struct` declared as a `struct` is immutable, yet can contain arrays that remain mutable even as part of an immutable struct:

struct S

s = S(5, [3, 6])
s.x = 2 # ERROR
s.a[1] = 2 # Not an error!

but a `struct` declared as `mutable` is fully mutable:

mutable struct SM

s = SM(5, [3, 6])
s.x = 2 # Not an error
s.a[1] = 2 # Not an error

In Julia, a non `const` variable declared in global scope (outside of a function) can be changed in global scope without any issue (although handling of global variables is done with extra bookkeeping and may be slow). However, such a variable can only be read inside a function. Attempts to change such a variable in a function result in an "undeclared variable" error unless the variable is declared within the function with the global keyword:

h = 5  # h is a global variable

function triangle(b)
    return  h * b / 2

triangle(10) # returns 25, no error

function changeh(b)
    h = b  # error here!

function change_declared_h(b)
    global h
    h = b  # no error here!


Perl has lists (which are data, and ephemeral) and arrays (which are data structures, and persistent), distinct entities but tending to be thought of as inter-changable. Combine this with the idea of context, which can be 'scalar' or 'list', and the results might not be as expected. Consider the handling of results from a subroutine, in a scalar context:

sub array1 { return @{ [ 1, 2, 3 ] } }
sub list1 { return qw{ 1 2 3 } }
# both print '3', but why exactly?
say scalar array1();
say scalar list1();
sub array2 { return @{ [ 3, 2, 1 ] } }
sub list2 { return qw{ 3 2 1 } }
say scalar array2(); # prints '3', number of elements in array
say scalar list2(); # prints '1', last item in list

The behavior is documented, but does provide an evergreen topic for SO questions and blog posts.


Once I hear about a gotcha, I usually just fix it, so this might be a bit of a struggle...

There are however a few things to bear in mind for running under p2a/p2js (and sadly I can't "just fix JavaScript"):

In JavaScript, true!=1 and false!=0. Thankfully, there are very few places anyone ever actually compares bools against 0 and 1 using an infix operator, but occasionally you may need to use equal() and compare() instead, for true compatibility between desktop/Phix and p2js.

Likewise negative subscripts simply do not work in JavaScript Array destructuring, but you can however use $ and negative subscripts in non-destructuring operations.
[To be clear, I am specifically talking about working desktop/Phix code that gets transpiled to JavaScript, as opposed to making any claim about negative subscripts in hand-written JavaScript.]

One case that proved very difficult to track down was the statement tree[node+direction] = insertNode(tree[node+direction], key). As said elsewhere you should never attempt to modify the same thing in the same line twice. Breaking it into atom tnd = insertNode(tree[node+direction], key) and tree[node+direction] = tnd was needed to fix the issue.

See also Variable_declaration_reset - in particular the Phix and JavaScript entries.

Some fairly common minor mishaps:

Novice users often confuse a &= b with a = append(a,b):

  • When b is an atom (aka number) they mean the same thing.
  • When b is a sequence (or string) they are not the same:
  • a &= b can increase the length of a by any amount, including 0. [good for building strings]
  • a = append(a,b) always increases the length of a by 1. [usually bad/wrong for building strings]

Forward calls may thwart constant setup, eg:

forward procedure p()
function f(object o) return o end function
constant hello = f("hello")
procedure p()
    ?hello  -- fatal error: hello has not been assigned a value
end procedure

Not a problem if the first executed statement in your program is a final main(), or more accurately not a problem after such a last statement has been reached.
Quite a few of the standard builtins avoid a similar issue, at the cost of things not officially being "constant" anymore, using a simple flag and setup routine.

Somewhat more tongue in cheek:

There is no difference between if a=b then and if a==b then, and neither modifies a.
It is not posible to compose a dangling else in Phix.
Phix has no macros. Or any that can make innocent-looking code "mean just about anything".
041 is the same number as 41, whereas 0o41 is the octal representation of 33 decimal.
"Hello" is not really the same as {'H','e','l','l','o'} but they are treated pretty much as if they are.
An expression such as s+1 may, with a suitable warning message, get auto-corrected to sq_add(s,1).
Block comments can be nested, a and b or c is illegal and demands extra parenthesis be used.
Zero minus one is always -1 instead of the traditionally expected 4,294,967,295.
Indexes are 1-based and s[0] triggers a run-time error.
(It surprises me to read Andrew Koenig in ctraps.pdf saying "In most languages, an array with n elements normally has subscripts ranging from 1 to n inclusive.")


Raku embraces the philosophy of DWIM or "Do What I Mean". A DWIMmy language tends to avoid lots of boilerplate code, and accept a certain amount of imprecision to return the result that, for the vast majority of the time, is what the user intended.

For example, a numeric string may be treated as either a numeric value or a string literal, without needing to explicitly coerce it to one or the other. This makes for easy translation of ideas into code, and is generally a good thing. HOWEVER, it doesn't always work out. Sometimes it leads to unexpected behavior, commonly referred to as WAT.

It is something of a running joke in the Raku community that "For every DWIM, there is an equal and opposite WAT".

Larry Wall, the author and designer of Perl and lead designer of Raku, coined a term to describe this DWIM / WAT continuum. The Waterbed Theory of Computational Complexity.

The Waterbed theory is the observation that complicated systems contain a minimum amount of complexity, and that attempting to "push down" the complexity of such a system in one place will invariably cause complexity to "pop up" elsewhere.

Much like how in a waterbed mattress, it is possible to push down the mattress in one place, but the displaced water will always cause the mattress to rise elsewhere, because water does not compress. It is impossible to push down the waterbed everywhere at once.

There is a whole chapter in the Raku documentation about "Traps to Avoid" when beginning in Raku, most of which, at least partially are due to WATs arising from DWIMmy behavior someplace else.

Expanding on the numeric string example cited above; numeric values and numeric strings may be used almost interchangeably in most cases.

say 123 ~ 456; # join two integers together
say "12" + "5.7"; # add two numeric strings together
say .sqrt for <4 6 8>; # take the square root of several allomorphic numerics

You can run into problems though with certain constructs that are more strict about object typing.

A Bag is a "counting" construct. It takes a collection and counts how many of each object are within. Works great for strings.

say my $bag = <a b a c a b d>.Bag;
say $bag{'a'}; # a count?
say $bag< a >; # another way
Bag(a(3) b(2) c d)

But numerics can present unobvious problems.

say my $bag = (1, '1', '1', <1 1 1>).Bag;
say $bag{ 1 }; # how many 1s?
say $bag{'1'}; # wait, how many?
say $bag< 1 >; # WAT
dd $bag; # The different numeric types LOOK the same, but are different types behind the scenes
Bag(1 1(2) 1(3))
Bag $bag = (1=>1,"1"=>2,IntStr.new(1, "1")=>3).Bag

The different '1's are distinctive to the type system even if they visually look identical when printing them to the console. They all have a value of 1 but are respectively and Int, a String, and an IntStr allomorph. Many of the "collective" objects have this property (Bags, Sets, Maps, etc.) This behavior is correct but can be very jarring when you are used to being able to use numeric strings and numeric values nearly interchangeably.

Another such DWIMmy construct, which can trip up even experienced Raku programmers is The Single Argument Rule.

Single argument is a special exception to parameterization rules causing an iterable object to be be automatically flattened when passed to an iterator if only a single object is passed.

E.G. Say we have two small list objects; (1,2,3) and (4,5,6), and we want to print the contents to the console.

.say for (1,2,3), (4,5,6);

However, if we only pass a single list to the iterator (for), it will flatten the object due to the single argument rule.

.say for (1,2,3);

If we want the multiple object non-flattening behavior, we need to "fake it out" by adding a trailing comma to signal to the compiler that this should be treated like multiple object parameters even if there is only one. (Note the trailing comma after the list.)

.say for (1,2,3),;

Conversely, if we want the flattening behavior when passing multiple objects, we need to manually, explicitly flatten the objects.

.say for flat (1,2,3), (4,5,6);

Single argument mostly arose in Raku to make it act more like Perl 5, for which it was originally conceived of as a replacement. Perl 5 flattens collective object parameters by default, and the original non-flattening behavior was extremely confusing to early Perl / Raku crossover programmers. Larry Wall came up with single argument to reduce confusion and increase DWIMiness, and it has overall, but it still leads to the occasional WAT when programmers first bump up against it.


There are 3 gotchas in Wren which immediately spring to mind because they've bitten me more than once in the past.

1. The classic 'if (a = b) code' problem which everyone who's familiar with C/C++ will know about and which is already adequately described in the linked Wikipedia article together with the standard remedy.

2. In Wren, class fields (unlike variables) are never declared but instead their existence is deduced from usage which can be in any method of the class. This is possible because fields (which are always private to the class) are prefixed with a single underscore if instance or a double underscore if static, and no other identifiers are allowed to begin with an underscore. Fields are always assigned the value 'null' unless they are assigned to immediately.

Normally, this works fine but the problem is that if you mis-spell the field name, then it won't be picked up by the compiler which will simply allocate a slot for the mis-spelled field within the class's fields symbol table. You may therefore end up assigning to or referencing a field which has been created by mistake!

The only defence against this gotcha is to try and keep field names short which reduces the chance of mis-spelling them.

3. Wren's compiler is single pass and if it comes across a method call within a class which has not been previously defined it assumes that it will be defined latter. Consequently, if you forget to define this method later or have already defined it but then mis-spelled or misused it, the compiler won't alert you to this and at some point a runtime error will arise.

The same defence as in 2. above can be used to defend against this gotcha though, if the methods are public (and hence need self-explanatory names), it may not be practical to keep them short.

I've tried to construct an example below which illustrates these pitfalls.

class Rectangle {
construct new(width, height) {
// Create two fields.
_width = width
_height = height
area {
// Here we mis-spell _width.
return _widht * _height
isSquare {
// We inadvertently use '=' rather than '=='.
// This sets _width to _height and will always return true
// because any number (even 0) is considered 'truthy' in Wren.
if (_width = _height) return true
return false
diagonal {
// We use 'sqrt' instead of the Math.sqrt method.
// The compiler thinks this is an instance method of Rectangle
// which will be defined later.
return sqrt(_width * _width + _height * _height)
var rect = Rectangle.new(80, 100)
System.print(rect.isSquare) // returns true which it isn't!
System.print(rect.area) // runtime error: Null does not implement *(_)
System.print(rect.diagonal) // runtime error (if previous line commented out)
// Rectangle does not implement 'sqrt(_)'

X86 Assembly[edit]

Don't use LOOP[edit]

This doesn't affect the 16-bit 8086, but LOOP has some quirks where it's slower than:

;loop body goes here
JNZ label

Which is very ironic considering that LOOP was originally designed to be a more efficient version of the above construct. (It was more efficient in the original 8086, but not in today's version.) Thankfully, compilers are "aware" of this and don't use LOOP.

Z80 Assembly[edit]

JP (HL)[edit]

For every other instruction, parentheses indicate a dereference of a pointer.

LD HL,(&C000)  ;load the word at address &C000 into HL
LD A,(HL) ;treating the value in HL as a memory address, load the byte at that address into A.
EX (SP),HL ;exchange HL with the top two bytes of the stack.
JP (HL) ;set the program counter equal to HL. Nothing is loaded from memory pointed to by HL.

Strangely enough, 8080 Assembly uses a more sensible PCHL (set Program Counter equal to HL) to describe this function. So this gotcha is actually exclusive to Z80.


Depending on how the Z80 is wired, ports can be either 8-bit or 16-bit. This creates somewhat confusing syntax with the IN and OUT commands. A system with 16-bit ports will use BC even though the assembler syntax doesn't change. Luckily, this isn't something that's going to change at runtime. The documentation will tell you how to use your ports.

ld a,&46
ld bc,&0734
out (C),a ;write &46 to port &0734 if the ports are 16-bit. Otherwise, it writes to port &34.

Unfortunately, this means that instructions like OTIR and INIR aren't always useful, since the B register is performing double duty as the high byte of the port and the loop counter. Which means that your port destination on systems with 16-bit ports is constantly moving! Not good!


Here's one I didn't learn until recently. Depending on the wiring, RETI (return and enable interrupts) and RETN (return from Non-Maskable Interrupt) may end up functioning the same as a normal RET. This means that sometimes you have to use the following (which just makes anyone else reading your code think that you don't know what RETI does.)

EI  ;RETI doesn't enable interrupts on this Z80.

Fortunately, there's a bit of a "reverse gotcha" that helps us out. When interrupts are enabled with EI, there is no chance that an interrupt will occur during the next instruction. EI doesn't actually enable interrupts until the instruction after it is finished.