Category:Z80 Assembly: Difference between revisions

m
changed tags to display code correctly, corrected align 8 to 256
m (changed tags to display code correctly, corrected align 8 to 256)
 
(6 intermediate revisions by the same user not shown)
Line 46:
* & or $ represents hexadecimal, and % represents binary.
* A register or a 16-bit value in parentheses represents a pointer being dereferenced. The size of the data pointed to will be the same as the size of the destination.
<syntaxhighlight lang="Z80">
<lang z80>LD A,($4000) ;load the BYTE at memory address $4000 into the accumulator.
LD HLA,($60004000) ;load the WORDBYTE at memory address $60004000 into HL.the accumulator.
<lang z80>LD AHL,($40006000) ;load the BYTEWORD at memory address $40006000 into the accumulatorHL.
;Or, more accurately, load the BYTE at $6000 into L and the BYTE at $6001 into H.
LD A,(HL) ;The value stored in HL is treated as a memory address, and the BYTE at that address is loaded into the accumulator.</lang>
</syntaxhighlight>
 
==Efficient Coding Tricks==
Line 56 ⟶ 58:
===Everyone's favorite===
This is where you start optimizing Z80 code:
<syntaxhighlight lang="Z80">
<lang z80>XOR A ; set A to zero</lang>
This saves 1 byte and 3 cycles.
; This saves 1 byte and 3 cycles, but also changes the flags.
</syntaxhighlight>
 
===Fast Checking for Odd or Even===
Suppose you have a byte at some memory location and you want to know if it's odd or even.
 
<syntaxhighlight lang=z80>
If you don't need the actual value, the first choice would be:
<langLD z80>BIT 0A,(HL) ; 21 bytesbyte, 12 7 cycles
JR nz,odd</lang>
 
You could also do the following:
<lang z80>LD A,(HL) ; 1 byte, 7 cycles
BIT 0,A ; 2 bytes, 8 cycles
JR nz,odd</lang>
</syntaxhighlight>
 
If you don't need the actual value, theyou firstcan choice would beuse:
<syntaxhighlight lang=z80>
<langBIT z80>LD A0,(HL) ; 12 bytebytes, 712 cycles
JR nz,odd</lang>
</syntaxhighlight>
 
But there are also a few other ways to do it, which are faster and/or take fewer bytes to execute. <code>BIT 0,A</code> takes 2 bytes to encode. So does <code>AND 1</code> but it's a little faster. It does destroy the accumulator but if you don't care about that, it's better to use <code>AND 1</code>.
 
<syntaxhighlight lang=z80>
But there are also a few other ways to do it, which are faster and/or take fewer bytes to execute. <code>BIT 0,A</code> takes 2 bytes to encode. So does <code>AND 1</code> but it's a little faster. It does destroy the accumulator but if you don't care about that, it's better to use AND 1.
<lang z80>LD A,(HL) ; 1 byte, 7 cycles
AND 1 ; 2 bytes, 7 cycles
JR nz,odd</lang>
</syntaxhighlight>
 
Or is it? There's an even faster way than this that takes only ONE byte to encode:
<syntaxhighlight lang=z80>
<lang z80>LD A,(HL) ; 1 byte, 7 cycles
ThisLD savesA,(HL) ; 1 byte, and 37 cycles.
RRCA ; 1 byte, 4 cycles
JR c,odd</lang>
</syntaxhighlight>
 
If you don't care about destroying the accumulator (i.e. you only need the result of the test and not the number being tested) then <code>RRCA</code> outperforms <code>AND 1</code> in every way. <code>AND 1</code> in this case still has its uses if you need bit 0 to reflect the result of the test. If you don't, use <code>RRCA</code> instead.
 
===Bit Shifting===
The Z80 does have bit shifting, but thanks to RLCA and RRCA it's often faster to rotate instead. Compare the following two code snippets:
<syntaxhighlight lang="Z80">
<lang z80>SLA A
SLA A
SLA A
SLA A
Line 94 ⟶ 107:
RLCA
RLCA
RLCA ;6 bytes, 23 cycles total</lang>
</syntaxhighlight>
 
Not only is the second method shorter, it's also faster. The accumulator-specific bit rotates take 1 byte and 4 clock cycles each. They are different, however, because unlike the two-byte versions, these <i>do not affect the zero flag.</i> This isn't a big deal, however, as more often than not if you're rotating the accumulator you're not expecting to get zero as the output anyway.
 
Correction: The <code>AND</code> <i>does</i> affect the Z flag. So the end result will be the same, right?
Note that the Z flag is preserved through the rotates. And zero rotated is still zero.
 
If you want to know zero or not, you could also use this:
 
<lang z80>RLCA
RLCA
RLCA
RLCA
AND %11110000 ;6 bytes, 23 cycles total</lang>
 
For 16-bit bit shifting, use A instead of the other half of the register pair for faster results (unless you're checking for equality to zero, or you need the accumulator for something else.)
 
<syntaxhighlight lang="Z80">
<lang z80>rept 4 ;inline the following 4 times, back to back:
SRL H
RR L
Line 118 ⟶ 126:
SRL H
RRA
endr</lang>
</syntaxhighlight>
 
===Inlined bytecode===
Branches take a decent amount of clock cycles, even if they're not taken. A branch not taken is faster than one that is (except JP which takes 10 cycles regardless of the outcome), but either way you're taking a performance hit. It's really frustrating when you have to branch around a single instruction:
<syntaxhighlight lang="Z80">
<lang z80>jr nc,foo
ld a,&44
jr done
Line 129 ⟶ 139:
done:
pop hl
ret</lang>
</syntaxhighlight>
 
But in this (admittedly contrived) example, there's an esoteric way to avoid the <code>jr done</code> while still having the same functionality. It has to do with the <code>LD a,&40</code>. As it turns out, &40 is the opcode for <code>LD b,b</code>, and since loading a register on the Z80 doesn't affect flags, this <code>LD b,b</code> instruction will have no effect on our program. We can actually trick the CPU into executing the operand &40 as an instruction, like so:
 
<syntaxhighlight lang="Z80">
<lang z80>jr nc,foo
ld a,&44
byte &26 ;opcode for LD L,# (next byte is operand.) Functionally identical to "JR done"
Line 140 ⟶ 152:
done:
pop hl
ret</lang>
</syntaxhighlight>
 
Since Z80 is an Intel-like CISC architecture, the same sequence of bytes can be interpreted different ways depending on how the Program Counter reads them. So even though our source code would appear as though there's a random data byte in the middle of instructions, what the CPU actually executes is this, in the event that <code>jr nc,foo</code> is <i>not</i> taken:
 
<syntaxhighlight lang="Z80">
<lang z80>ld a,&44
ld L,&3e ;&3E is the opcode for LD A,__ (next byte is operand)
ld b,b ;do nothing
done:
pop hl
ret</lang>
</syntaxhighlight>
 
Since we're popping HL anyway, it won't hurt to load &3E into L beforehand, as it's just going to get wiped anyway. Effectively, we skipped the <code>LD a,&40</code> without branching. Now you may be wondering why you'd want to go through all this trouble. As it turns out, using <code>byte &26</code> in this situation actually saves 1 byte and 3 clock cycles compared to using <code>jr done</code>! The only thing you lose in this exchange is readability (which is why comments are so essential with tricks like these - I'd recommend commenting the "correct" instruction beside it so it's clear what you're optimizing.) Avoid the temptation to abuse these tricks because it makes you feel clever - it's just another tool in your toolbox, like anything else.
Line 157 ⟶ 172:
Some programmers have taken this to its logical extreme by filling several kilobytes' worth of memory with <code>LDI</code> instructions, then pick how many they want to execute by offsetting a pointer to that section of memory. If you have plenty of bytes to burn and the need for speed, it can be a viable option.
 
<syntaxhighlight lang="Z80">
<lang z80>align 8256 ;ensures that the 0thfirst LDI begins at address &xx00
rept &7FF
LDI ;LDI takes up two bytes each, so by storing &7FF of them we fill up &0FFE bytes, nearly 4k!
endr
RET</lang>
</syntaxhighlight>
 
==References==