To quickly test bit 6 or 7 of a byte in memory, you can use the BIT instruction to test them without affecting A, X, or Y. Regardless of what's in A, BIT FOOBAR will put bit 7 of FOOBAR's contents into the N flag, and bit 6 into the V flag, so you can branch on these bits without loading first. A good example is flag variables, where you use a byte as just a "yes" or "no" record. "Branch if the 'ACKnowlege' record is a yes," without affecting A, X, or Y, is as simple as:
BIT ACK_FLAG
BMI _____
(If you have a CMOS 6502 made by Rockwell or WDC and the byte is in ZP, you can even do it in a single BBS7 instruction.)
To clear the flag, use STZ (STore Zero) on the CMOS 6502, so you don't have to LDA #0 first:
STZ ACK_FLAG
To set it, if you know there's a byte in one of the registers that has its high bit set, use that, and you won't need to use LDA# $FF first:
STA ACK_FLAG ; (or store X or Y, whichever has the high bit set)
If you know for sure it's already 0, just decrement it:
DEC ACK_FLAG
Actually, it doesn't have to start out as 0 if you just make sure you don't decrement it so many times that the high bit gets cleared.
BIT is really nice for testing port bits. If you will need to quickly test an I/O line like for the data line when you're bit-banging a synchronous-serial connection, put it on bit 6 or bit 7 of a parallel port, so you can just do:
BIT PORT_A
BMI ____ ; (or BPL or BVC or BVS, as appropriate)
This prevents the need to load the port byte first (which would affect one of the registers, probably the accumulator), or load a bit mask like you would have to do for other bits with the BIT instruction, or even load and AND like you'd have to do if there were no BIT instruction. (Obviously you'll change the port name as appropriate.)
For the CMOS 6502, TSB and TRB let you test and then set or clear bits in memory in a single instruction, after the desired mask is in the accumulator.
If you will need to quickly toggle an I/O bit that has a known value, put it on bit 0 of a parallel port. If the bit normally sits at 0 (low voltage),
INC PORT_A
DEC PORT_A
will produce a fast positive pulse with only two instructions and without affecting A, X, Y, or the other port bits of the port. Similarly, if it normally sits at 1 (high voltage),
DEC PORT_A
INC PORT_A
will produce a fast low pulse. For a bonus, the final INC or DEC will put bit 7 in the N flag, so you can pulse bit 0 and test bit 7 at the same time. (Note that bit directions don't have to all be the same for a port. You can have some pins be inputs while other ones are outputs, at the same time.) If you needed bits 0 and 1 toggling out of phase, you could INC and DEC the port between values 1 and 2 (01 and 10 in binary).
Use the single-byte instruction PHA, PHX, or PHY to save a
register's value temporarily while using the register for something else. When you're ready to bring it back, use the corresponding
single-byte PLA, PLX, or PLY. Remember it's a last-on
first-off stack though, so mind the order if you're putting multiple values there. Make sure branching won't foul you up with
stack programming errors.
Avoid commonly wasted instructions:
1. An automatic compare-to-zero instruction is built into the following 65c02 instructions: LDA, LDX, LDY, INC, INX, INY, DEC, DEX, DEY, INA, DEA, AND, ORA, EOR, ASL, LSR, ROL, ROR, PLA, PLX, PLY, SBC, ADC, TAX, TXA, TAY, TYA, and TSX. This means that, for example, a CMP #0 after an LDA is redundant, a wasted instruction. The only time a 65c02 (CMOS) needs a compare-to-zero instruction after one of these is if you want to compare a register that was not involved in the previous instruction; for example,
DEY
CPX #0
(Note the Y and the X are not the same register.) If you can spare a register to which you can transfer the one you want to test, you can save a byte with the transfer instead of a compare instruction. The example above, if the contents of A don't need to be kept, could be changed to:
DEY
TXA
and then you can branch on the N or Z flag which tell if X was negative or zero. The TXA isn't any faster (both TXA and CPX# take two clocks), but TXA takes only one byte, whereas the CPX #0 takes two bytes.
The NMOS 6502 did have a bug in that the flags weren't always correct after a decimal-mode operation like ADC, so then you might have to follow it with the CMP #0 to get the N and Z flags right. It's best to just use the CMOS processor.
2. Similarly, if you want a compare to $80 strictly for branching on the N flag results, you can omit the compare-to-$80 instruction and branch on the opposite state of the N flag. For example
DEA ; (same thing as DEC A)
CMP #$80
BMI <label>
can be replaced with
DEA ; (same thing as DEC A)
BPL <label>
3. If you have a CMOS 6502 (65c02), take advantage of the extra CMOS instructions and addressing modes. The 65C02 is not just a low-power version of the NMOS 6502. Besides having more instructions and addressing modes, the CMOS version fixed all of the NMOS 6502 bugs. There's a list of them in Table 7-1, "Microprocessor Operational Enhancements" which in the October 19, 2010 W65C02S data sheet from WDC is on page 30 available here.
4. When the end of a routine has JSR immediately followed by RTS, replace the pair with JMP, and put in the comments,
JMP <subroutine_addr> ; (JSR, RTS)
JSR, RTS takes 12 clocks. JMP absolute takes 3, and the single jump above and the use of the RTS at the end of the other subroutine gives you the same execution effect in most circumstances but saves execution time and one byte (ie, an RTS instruction). Something else you can take advantage of is that there's also a JMP(addr) and a JMP(addr,X).
5. The 6502 interrupt sequence automatically pushes the processor status register on the stack, and restores it as part of the return-from-interrupt (RTI) instruction. There is no need to start an interrupt-service routine (ISR) with PHP and end it with PLP. There is also no need to set the interrupt-disable flag at the beginning of an interrupt (using SEI). That is automatic too, part of the interrupt sequence, immediately after pushing the processor status register P onto the stack. And, since the previous status is restored by the RTI, do not re-enable interrupts just before RTI.
6. When practical, set up loops such that the counter ends at 00 or decrements to FF to finish the loop, so you can branch on the Z or N flag condition and don't need to add a compare-immediate instruction.
7. In ISRs, don't waste time saving and restoring registers the ISR itself won't use and disturb. Also, don't
waste time polling interrupt sources that are not enabled. (This is covered much more thoroughly in my
interrupts primer.)
And as with any programming language:
If you still have a dot-matrix impact printer that uses fanfold paper, it will be nice for printing long program listings. Get it out. You'll be glad you haven't gotten rid of it yet. When our daughter-in-law who's majoring in computer science complained about the page breaks with the school's laser printers being a pain for programming, I suggested cutting off the bottom margin of each page and taping the pages together, bottom of one to top of the next, which is what she ended up doing.
For explanation of the V (oVerflow) flag, see the tutorial on it on 6502.org.
You can find explanation of the B (Break) flag and its usage in this forum topic which also links to a couple of other discussions on BRK.
BigEd on the forum observed, "With 6502, I suspect more than one beginner has wondered why they can't do arithmetic or logic operations on X or Y, or struggled to remember which addressing modes use which of the two. And then the intermediate 6502 programmer will be loading and saving X and Y while the expert always seems to have the right values already in place."
You can do temporary storage or pass parameters to subroutines by way of the hardware stack (in page 1) if it helps reduce RAM variable needs. The subroutine does not need to pull all the stack items off to access a byte some number of levels down. To index into the stack, you can do for example:
TSX
LDA 105,X
Repeating TSX won't be necessary for continuted accesses to various stack items throughout the routine. Just change the number before the ",X" above, and you won't have to keep incrementing and decrementing X either. For knowing what that number should be, it will be important to note that the stack pointer is decremented immediately after storing a byte onto the stack, so it points to the next available byte. (Remember that the stack grows down, not up.) LDA 101,X is the same as PLA PHA, same number of instruction bytes (assuming you already did TSX) but faster, getting the top stack byte into the accumulator without removing it from the stack. The tactic becomes all the more valuable when you want to reach further into the stack.
A common criticism of the 6502 is that the stack space is so limiting. A few higher-level languages (notoriously Pascal) do put very
large pieces of data and entire functions and procedures on the stack instead of just their addresses. For most programming though,
the 6502's stack is much roomier than you'll ever need. When you know you're accessing the stacks constantly but don't know what the
maximum depth is you're using, the tendency is to go overboard and keep upping your estimation, "just to be sure." I did this for
years myself, and finally decided to do some tests to find out. I filled the 6502 stack area with a constant value (maybe it was
00-- I don't remember), ran a heavy-ish application with all the interrupts going too, did compiling, assembling, and interpreting
while running other things in the background on interrupts, and after awhile looked to see how much of the stack area had been written
on. It wasn't really much-- less than 20% of each of page 1 (return stack) and page 0 (data stack). This was in Forth,
which makes heavy use of the stacks. The IRQ interrupt handlers were in Forth too, although the software RTC (run off a timer on NMI)
was in assembly language.
Debugging <--Previous | Next--> Workbench Equipment
last updated Mar 9, 2013