Category:Z80 Assembly: Difference between revisions

m
changed tags to display code correctly
(removed unnecesary example)
m (changed tags to display code correctly)
Line 131:
===Inlined bytecode===
Branches take a decent amount of clock cycles, even if they're not taken. A branch not taken is faster than one that is (except JP which takes 10 cycles regardless of the outcome), but either way you're taking a performance hit. It's really frustrating when you have to branch around a single instruction:
<syntaxhighlight lang="Z80">
<lang z80>jr nc,foo
ld a,&44
jr done
Line 138 ⟶ 139:
done:
pop hl
ret</lang>
</syntaxhighlight>
 
But in this (admittedly contrived) example, there's an esoteric way to avoid the <code>jr done</code> while still having the same functionality. It has to do with the <code>LD a,&40</code>. As it turns out, &40 is the opcode for <code>LD b,b</code>, and since loading a register on the Z80 doesn't affect flags, this <code>LD b,b</code> instruction will have no effect on our program. We can actually trick the CPU into executing the operand &40 as an instruction, like so:
 
<syntaxhighlight lang="Z80">
<lang z80>jr nc,foo
ld a,&44
byte &26 ;opcode for LD L,# (next byte is operand.) Functionally identical to "JR done"
Line 149 ⟶ 152:
done:
pop hl
ret</lang>
</syntaxhighlight>
 
Since Z80 is an Intel-like CISC architecture, the same sequence of bytes can be interpreted different ways depending on how the Program Counter reads them. So even though our source code would appear as though there's a random data byte in the middle of instructions, what the CPU actually executes is this, in the event that <code>jr nc,foo</code> is <i>not</i> taken:
 
<syntaxhighlight lang="Z80">
<lang z80>ld a,&44
ld L,&3e ;&3E is the opcode for LD A,__ (next byte is operand)
ld b,b ;do nothing
done:
pop hl
ret</lang>
</syntaxhighlight>
 
Since we're popping HL anyway, it won't hurt to load &3E into L beforehand, as it's just going to get wiped anyway. Effectively, we skipped the <code>LD a,&40</code> without branching. Now you may be wondering why you'd want to go through all this trouble. As it turns out, using <code>byte &26</code> in this situation actually saves 1 byte and 3 clock cycles compared to using <code>jr done</code>! The only thing you lose in this exchange is readability (which is why comments are so essential with tricks like these - I'd recommend commenting the "correct" instruction beside it so it's clear what you're optimizing.) Avoid the temptation to abuse these tricks because it makes you feel clever - it's just another tool in your toolbox, like anything else.