Category:Z80 Assembly: Difference between revisions

m
Line 50:
 
==Efficient Coding Tricks==
As with most assembly languages, there are techniques to abuse the "rules" of the CPU to squeeze out as much performance as you can. This can become a bit of a "game" to some people, to see how much they can optimize their code. There seems to be a law of nature when it comes to assembly programming, that anything you do to make your code faster takes up more memory, and vice versa. And trying to optimize either for speed or bytes will also make your code harder to read. It's unfortunate but it seems to be true more often than not. Thankfully, comments can make up for readability. Let's explore a few ways to make your code faster and/or more compact.
 
===Fast Checking for Odd or Even===
Suppose you have a byte at some memory location and you want to know if it's odd or even.
Line 83 ⟶ 85:
 
Not only is the second method shorter, it's also faster.
 
===Inlined bytecode===
Branches take a decent amount of clock cycles, even if they're not taken. A branch not taken is faster than one that is, but either way you're taking a performance hit. It's really frustrating when you have to branch around a single instruction:
<lang z80>jr nc,foo
ld a,&44
jr done
foo:
ld a,&40
done:
pop hl
ret</lang>
 
But in this (admittedly contrived) example, there's an esoteric way to avoid the <code>jr done</code> while still having the same functionality. It has to do with the <code>LD a,&40</code>. As it turns out, &40 is the opcode for <code>LD b,b</code>, and since loading a register on the Z80 doesn't affect flags, this <code>LD b,b</code> instruction will have no effect on our program. Unfortunately it's an operand, not an instruction - but we can trick the program counter like so:
<lang z80>jr nc,foo
ld a,&44
byte &26 ;opcode for LD L,# (next byte is operand.) Functionally identical to "JR done"
foo:
ld a,&40
done:
pop hl
ret</lang>
 
Since Z80 is an Intel-like CISC architecture, the same sequence of bytes can be interpreted different ways depending on how the Program Counter reads them. So even though our source code would appear as though there's a random data byte in the middle of instructions, what the CPU actually executes is this, in the event that <code>jr nc,foo</code> is <i>not</i> taken:
 
<lang z80>ld a,&44
ld L,&3e ;&3E is the opcode for LD A,__ (next byte is operand)
ld b,b ;do nothing
done:
pop hl
ret</lang>
 
Since we're popping HL anyway, it won't hurt to load &3E into L beforehand, as it's just going to get wiped anyway. Now you may be wondering why you'd want to go through all this trouble. As it turns out, using <code>byte &26</code> in this situation actually saves 1 byte and 3 clock cycles compared to using <code>jr done</code>! The only thing you lose in this exchange is readability (which is why comments are so essential with tricks like these - I'd recommend commenting the "correct" instruction beside it so it's clear what you're optimizing.) Avoid the temptation to abuse these tricks because it makes you feel clever - it's just another tool in your toolbox, like anything else.
 
===LDIR is slow===
The only advantage of instructions like <code>LDIR</code> is the fact that they only takes up two bytes, regardless of the amount of work they'll be doing. The equivalent number of inlined <code>LDI</code> instructions will outspeed <code>LDIR</code> every time.
Some programmers have taken this to its logical extreme by filling several kilobytes' worth of memory with <code>LDI</code> instructions, then pick how many they want to execute by offsetting a pointer to that section of memory. If you have plenty of bytes to burn and the need for speed, it can be a viable option.
 
<lang z80>
align 8 ;ensures that the 0th LDI begins at address &xx00
rept &7FF
LDI ;LDI takes up two bytes each, so by storing &7FF of them we fill up &0FFE bytes, nearly 4k!
endr
RET</lang>
 
==References==
1,489

edits