Category:Z80 Assembly: Difference between revisions

← Older edit

Category:Z80 Assembly (view source)

Revision as of 17:06, 13 May 2023

153 bytes added , 1 year ago

m

changed tags to display code correctly, corrected align 8 to 256

Livingroomcoder

7

edits

Revision as of 12:47, 13 May 2023 (view source) Livingroomcoder (talk \| contribs) (changed tags and cleaned up) ← Older edit		Latest revision as of 17:06, 13 May 2023 (view source) Livingroomcoder (talk \| contribs) m (changed tags to display code correctly, corrected align 8 to 256)
(3 intermediate revisions by the same user not shown)
Line 97: ===Bit Shifting=== The Z80 does have bit shifting, but thanks to RLCA and RRCA it's often faster to rotate instead. Compare the following two code snippets: <syntaxhighlight lang="Z80"> ~~<lang z80>SLA A~~ SLA A SLA A SLA A Line 106 ⟶ 107: RLCA RLCA RLCA ;6 bytes, 23 cycles total~~</lang>~~ </syntaxhighlight> Not only is the second method shorter, it's also faster. The accumulator-specific bit rotates take 1 byte and 4 clock cycles each. They are different, however, because unlike the two-byte versions, these <i>do not affect the zero flag.</i> This isn't a big deal, however, as more often than not if you're rotating the accumulator you're not expecting to get zero as the output anyway. Correction: The <code>AND</code> <i>does</i> affect the Z flag. So the end result will be the same, right? Note that the Z flag is preserved through the rotates. And zero rotated is still zero. ~~If you want to know zero or not, you could also use this:~~ ~~<lang z80>RLCA~~ ~~RLCA~~ ~~RLCA~~ ~~RLCA~~ ~~AND %11110000 ;6 bytes, 23 cycles total</lang>~~ For 16-bit bit shifting, use A instead of the other half of the register pair for faster results (unless you're checking for equality to zero, or you need the accumulator for something else.) <syntaxhighlight lang="Z80"> ~~<lang z80>~~rept 4 ;inline the following 4 times, back to back: SRL H RR L Line 130 ⟶ 126: SRL H RRA endr~~</lang>~~ </syntaxhighlight> ===Inlined bytecode=== Branches take a decent amount of clock cycles, even if they're not taken. A branch not taken is faster than one that is (except JP which takes 10 cycles regardless of the outcome), but either way you're taking a performance hit. It's really frustrating when you have to branch around a single instruction: <syntaxhighlight lang="Z80"> ~~<lang z80>~~jr nc,foo ld a,&44 jr done Line 141 ⟶ 139: done: pop hl ret~~</lang>~~ </syntaxhighlight> But in this (admittedly contrived) example, there's an esoteric way to avoid the <code>jr done</code> while still having the same functionality. It has to do with the <code>LD a,&40</code>. As it turns out, &40 is the opcode for <code>LD b,b</code>, and since loading a register on the Z80 doesn't affect flags, this <code>LD b,b</code> instruction will have no effect on our program. We can actually trick the CPU into executing the operand &40 as an instruction, like so: <syntaxhighlight lang="Z80"> ~~<lang z80>~~jr nc,foo ld a,&44 byte &26 ;opcode for LD L,# (next byte is operand.) Functionally identical to "JR done" Line 152: done: pop hl ret~~</lang>~~ </syntaxhighlight> Since Z80 is an Intel-like CISC architecture, the same sequence of bytes can be interpreted different ways depending on how the Program Counter reads them. So even though our source code would appear as though there's a random data byte in the middle of instructions, what the CPU actually executes is this, in the event that <code>jr nc,foo</code> is <i>not</i> taken: <syntaxhighlight lang="Z80"> ~~<lang z80>~~ld a,&44 ld L,&3e ;&3E is the opcode for LD A,__ (next byte is operand) ld b,b ;do nothing done: pop hl ret~~</lang>~~ </syntaxhighlight> Since we're popping HL anyway, it won't hurt to load &3E into L beforehand, as it's just going to get wiped anyway. Effectively, we skipped the <code>LD a,&40</code> without branching. Now you may be wondering why you'd want to go through all this trouble. As it turns out, using <code>byte &26</code> in this situation actually saves 1 byte and 3 clock cycles compared to using <code>jr done</code>! The only thing you lose in this exchange is readability (which is why comments are so essential with tricks like these - I'd recommend commenting the "correct" instruction beside it so it's clear what you're optimizing.) Avoid the temptation to abuse these tricks because it makes you feel clever - it's just another tool in your toolbox, like anything else. Line 169 ⟶ 172: Some programmers have taken this to its logical extreme by filling several kilobytes' worth of memory with <code>LDI</code> instructions, then pick how many they want to execute by offsetting a pointer to that section of memory. If you have plenty of bytes to burn and the need for speed, it can be a viable option. <syntaxhighlight lang="Z80"> ~~<lang z80>~~align 8256 ;ensures that the ~~0th~~first LDI begins at address &xx00 rept &7FF LDI ;LDI takes up two bytes each, so by storing &7FF of them we fill up &0FFE bytes, nearly 4k! endr RET~~</lang>~~ </syntaxhighlight> ==References==