Category:Z80 Assembly: Difference between revisions

m
changed tags to display code correctly, corrected align 8 to 256
(changed tags and cleaned up)
m (changed tags to display code correctly, corrected align 8 to 256)
 
(3 intermediate revisions by the same user not shown)
Line 97:
===Bit Shifting===
The Z80 does have bit shifting, but thanks to RLCA and RRCA it's often faster to rotate instead. Compare the following two code snippets:
<syntaxhighlight lang="Z80">
<lang z80>SLA A
SLA A
SLA A
SLA A
Line 106 ⟶ 107:
RLCA
RLCA
RLCA ;6 bytes, 23 cycles total</lang>
</syntaxhighlight>
 
Not only is the second method shorter, it's also faster. The accumulator-specific bit rotates take 1 byte and 4 clock cycles each. They are different, however, because unlike the two-byte versions, these <i>do not affect the zero flag.</i> This isn't a big deal, however, as more often than not if you're rotating the accumulator you're not expecting to get zero as the output anyway.
 
Correction: The <code>AND</code> <i>does</i> affect the Z flag. So the end result will be the same, right?
Note that the Z flag is preserved through the rotates. And zero rotated is still zero.
 
If you want to know zero or not, you could also use this:
 
<lang z80>RLCA
RLCA
RLCA
RLCA
AND %11110000 ;6 bytes, 23 cycles total</lang>
 
For 16-bit bit shifting, use A instead of the other half of the register pair for faster results (unless you're checking for equality to zero, or you need the accumulator for something else.)
 
<syntaxhighlight lang="Z80">
<lang z80>rept 4 ;inline the following 4 times, back to back:
SRL H
RR L
Line 130 ⟶ 126:
SRL H
RRA
endr</lang>
</syntaxhighlight>
 
===Inlined bytecode===
Branches take a decent amount of clock cycles, even if they're not taken. A branch not taken is faster than one that is (except JP which takes 10 cycles regardless of the outcome), but either way you're taking a performance hit. It's really frustrating when you have to branch around a single instruction:
<syntaxhighlight lang="Z80">
<lang z80>jr nc,foo
ld a,&44
jr done
Line 141 ⟶ 139:
done:
pop hl
ret</lang>
</syntaxhighlight>
 
But in this (admittedly contrived) example, there's an esoteric way to avoid the <code>jr done</code> while still having the same functionality. It has to do with the <code>LD a,&40</code>. As it turns out, &40 is the opcode for <code>LD b,b</code>, and since loading a register on the Z80 doesn't affect flags, this <code>LD b,b</code> instruction will have no effect on our program. We can actually trick the CPU into executing the operand &40 as an instruction, like so:
 
<syntaxhighlight lang="Z80">
<lang z80>jr nc,foo
ld a,&44
byte &26 ;opcode for LD L,# (next byte is operand.) Functionally identical to "JR done"
Line 152:
done:
pop hl
ret</lang>
</syntaxhighlight>
 
Since Z80 is an Intel-like CISC architecture, the same sequence of bytes can be interpreted different ways depending on how the Program Counter reads them. So even though our source code would appear as though there's a random data byte in the middle of instructions, what the CPU actually executes is this, in the event that <code>jr nc,foo</code> is <i>not</i> taken:
 
<syntaxhighlight lang="Z80">
<lang z80>ld a,&44
ld L,&3e ;&3E is the opcode for LD A,__ (next byte is operand)
ld b,b ;do nothing
done:
pop hl
ret</lang>
</syntaxhighlight>
 
Since we're popping HL anyway, it won't hurt to load &3E into L beforehand, as it's just going to get wiped anyway. Effectively, we skipped the <code>LD a,&40</code> without branching. Now you may be wondering why you'd want to go through all this trouble. As it turns out, using <code>byte &26</code> in this situation actually saves 1 byte and 3 clock cycles compared to using <code>jr done</code>! The only thing you lose in this exchange is readability (which is why comments are so essential with tricks like these - I'd recommend commenting the "correct" instruction beside it so it's clear what you're optimizing.) Avoid the temptation to abuse these tricks because it makes you feel clever - it's just another tool in your toolbox, like anything else.
Line 169 ⟶ 172:
Some programmers have taken this to its logical extreme by filling several kilobytes' worth of memory with <code>LDI</code> instructions, then pick how many they want to execute by offsetting a pointer to that section of memory. If you have plenty of bytes to burn and the need for speed, it can be a viable option.
 
<syntaxhighlight lang="Z80">
<lang z80>align 8256 ;ensures that the 0thfirst LDI begins at address &xx00
rept &7FF
LDI ;LDI takes up two bytes each, so by storing &7FF of them we fill up &0FFE bytes, nearly 4k!
endr
RET</lang>
</syntaxhighlight>
 
==References==