Category:Z80 Assembly: Difference between revisions

← Older edit

Category:Z80 Assembly (view source)

Revision as of 17:06, 13 May 2023

385 bytes added , 1 year ago

m

changed tags to display code correctly, corrected align 8 to 256

Livingroomcoder

7

edits

Revision as of 18:57, 20 August 2022 (view source) rosettacode>Livingroomcoder (→‎Registers) ← Older edit		Latest revision as of 17:06, 13 May 2023 (view source) Livingroomcoder (talk \| contribs) m (changed tags to display code correctly, corrected align 8 to 256)
(6 intermediate revisions by the same user not shown)
Line 46: * & or $ represents hexadecimal, and % represents binary. * A register or a 16-bit value in parentheses represents a pointer being dereferenced. The size of the data pointed to will be the same as the size of the destination. <syntaxhighlight lang="Z80"> <lang z80>LD A,($4000) ;load the BYTE at memory address $4000 into the accumulator.▼ LD HLA,($~~6000~~4000) ;load the ~~WORD~~BYTE at memory address $~~6000~~4000 into ~~HL.~~the accumulator. ▲~~<lang z80>~~LD AHL,($~~4000~~6000) ;load the ~~BYTE~~WORD at memory address $~~4000~~6000 into ~~the accumulator~~HL. ;Or, more accurately, load the BYTE at $6000 into L and the BYTE at $6001 into H. LD A,(HL) ;The value stored in HL is treated as a memory address, and the BYTE at that address is loaded into the accumulator.~~</lang>~~ </syntaxhighlight> ==Efficient Coding Tricks== Line 56 ⟶ 58: ===Everyone's favorite=== This is where you start optimizing Z80 code: <syntaxhighlight lang="Z80"> ~~<lang z80>~~XOR A ; set A to zero~~</lang>~~ This saves 1 byte and 3 cycles.▼ ; This saves 1 byte and 3 cycles, but also changes the flags. </syntaxhighlight> ===Fast Checking for Odd or Even=== Suppose you have a byte at some memory location and you want to know if it's odd or even. <syntaxhighlight lang=z80> If you don't need the actual value, the first choice would be:▼ ~~<lang~~LD ~~z80>BIT 0~~A,(HL) ; 21 ~~bytes~~byte, 12 7 cycles JR nz,odd</lang>▼ ~~You could also do the following:~~ <lang z80>LD A,(HL) ; 1 byte, 7 cycles▼ BIT 0,A ; 2 bytes, 8 cycles JR nz,odd~~</lang>~~ </syntaxhighlight> ▲If you don't need the actual value, ~~the~~you ~~first~~can ~~choice would be~~use: <syntaxhighlight lang=z80> ▲~~<lang~~BIT ~~z80>LD A~~0,(HL) ; 12 ~~byte~~bytes, 712 cycles ▲JR nz,odd~~</lang>~~ </syntaxhighlight> But there are also a few other ways to do it, which are faster and/or take fewer bytes to execute. <code>BIT 0,A</code> takes 2 bytes to encode. So does <code>AND 1</code> but it's a little faster. It does destroy the accumulator but if you don't care about that, it's better to use <code>AND 1</code>.▼ <syntaxhighlight lang=z80> ▲But there are also a few other ways to do it, which are faster and/or take fewer bytes to execute. <code>BIT 0,A</code> takes 2 bytes to encode. So does <code>AND 1</code> but it's a little faster. It does destroy the accumulator but if you don't care about that, it's better to use AND 1. ~~<lang z80>~~LD A,(HL) ; 1 byte, 7 cycles AND 1 ; 2 bytes, 7 cycles JR nz,odd~~</lang>~~ </syntaxhighlight> Or is it? There's an even faster way than this that takes only ONE byte to encode: <syntaxhighlight lang=z80> ~~<lang z80>LD A,(HL) ; 1 byte, 7 cycles~~ ▲~~This~~LD ~~saves~~A,(HL) ; 1 byte, ~~and~~ 37 cycles. RRCA ; 1 byte, 4 cycles JR c,odd~~</lang>~~ </syntaxhighlight> If you don't care about destroying the accumulator (i.e. you only need the result of the test and not the number being tested) then <code>RRCA</code> outperforms <code>AND 1</code> in every way. <code>AND 1</code> in this case still has its uses if you need bit 0 to reflect the result of the test. If you don't, use <code>RRCA</code> instead. ===Bit Shifting=== The Z80 does have bit shifting, but thanks to RLCA and RRCA it's often faster to rotate instead. Compare the following two code snippets: <syntaxhighlight lang="Z80"> ~~<lang z80>SLA A~~ SLA A SLA A SLA A Line 94 ⟶ 107: RLCA RLCA RLCA ;6 bytes, 23 cycles total~~</lang>~~ </syntaxhighlight> Not only is the second method shorter, it's also faster. The accumulator-specific bit rotates take 1 byte and 4 clock cycles each. They are different, however, because unlike the two-byte versions, these <i>do not affect the zero flag.</i> This isn't a big deal, however, as more often than not if you're rotating the accumulator you're not expecting to get zero as the output anyway. Correction: The <code>AND</code> <i>does</i> affect the Z flag. So the end result will be the same, right? Note that the Z flag is preserved through the rotates. And zero rotated is still zero. ~~If you want to know zero or not, you could also use this:~~ ~~<lang z80>RLCA~~ ~~RLCA~~ ~~RLCA~~ ~~RLCA~~ ~~AND %11110000 ;6 bytes, 23 cycles total</lang>~~ For 16-bit bit shifting, use A instead of the other half of the register pair for faster results (unless you're checking for equality to zero, or you need the accumulator for something else.) <syntaxhighlight lang="Z80"> ~~<lang z80>~~rept 4 ;inline the following 4 times, back to back: SRL H RR L Line 118 ⟶ 126: SRL H RRA endr~~</lang>~~ </syntaxhighlight> ===Inlined bytecode=== Branches take a decent amount of clock cycles, even if they're not taken. A branch not taken is faster than one that is (except JP which takes 10 cycles regardless of the outcome), but either way you're taking a performance hit. It's really frustrating when you have to branch around a single instruction: <syntaxhighlight lang="Z80"> ~~<lang z80>~~jr nc,foo ld a,&44 jr done Line 129 ⟶ 139: done: pop hl ret~~</lang>~~ </syntaxhighlight> But in this (admittedly contrived) example, there's an esoteric way to avoid the <code>jr done</code> while still having the same functionality. It has to do with the <code>LD a,&40</code>. As it turns out, &40 is the opcode for <code>LD b,b</code>, and since loading a register on the Z80 doesn't affect flags, this <code>LD b,b</code> instruction will have no effect on our program. We can actually trick the CPU into executing the operand &40 as an instruction, like so: <syntaxhighlight lang="Z80"> ~~<lang z80>~~jr nc,foo ld a,&44 byte &26 ;opcode for LD L,# (next byte is operand.) Functionally identical to "JR done" Line 140 ⟶ 152: done: pop hl ret~~</lang>~~ </syntaxhighlight> Since Z80 is an Intel-like CISC architecture, the same sequence of bytes can be interpreted different ways depending on how the Program Counter reads them. So even though our source code would appear as though there's a random data byte in the middle of instructions, what the CPU actually executes is this, in the event that <code>jr nc,foo</code> is <i>not</i> taken: <syntaxhighlight lang="Z80"> ~~<lang z80>~~ld a,&44 ld L,&3e ;&3E is the opcode for LD A,__ (next byte is operand) ld b,b ;do nothing done: pop hl ret~~</lang>~~ </syntaxhighlight> Since we're popping HL anyway, it won't hurt to load &3E into L beforehand, as it's just going to get wiped anyway. Effectively, we skipped the <code>LD a,&40</code> without branching. Now you may be wondering why you'd want to go through all this trouble. As it turns out, using <code>byte &26</code> in this situation actually saves 1 byte and 3 clock cycles compared to using <code>jr done</code>! The only thing you lose in this exchange is readability (which is why comments are so essential with tricks like these - I'd recommend commenting the "correct" instruction beside it so it's clear what you're optimizing.) Avoid the temptation to abuse these tricks because it makes you feel clever - it's just another tool in your toolbox, like anything else. Line 157 ⟶ 172: Some programmers have taken this to its logical extreme by filling several kilobytes' worth of memory with <code>LDI</code> instructions, then pick how many they want to execute by offsetting a pointer to that section of memory. If you have plenty of bytes to burn and the need for speed, it can be a viable option. <syntaxhighlight lang="Z80"> ~~<lang z80>~~align 8256 ;ensures that the ~~0th~~first LDI begins at address &xx00 rept &7FF LDI ;LDI takes up two bytes each, so by storing &7FF of them we fill up &0FFE bytes, nearly 4k! endr RET~~</lang>~~ </syntaxhighlight> ==References==