Forum posts continue to show misunderstandings and fears about the 65816 processor, ones that make 6502 enthusiasts shy about taking the
step up to the better processor. This article seeks to address those, and to clarify things. It is subject to constant improvement,
like my others.
The '816 opens up new freedoms, without any obligation to use features you're not ready for. It's not a package deal. Instead,
there's a lot of freedom to order "à la carte" or treat it like a buffet table. Choose what you do want, and leave the rest alone.
First, the common misunderstandings:
When you enter '816-native mode, A, X, and Y initially start out as 8-bit; but then you can choose, and change as often as you like, whether the accumulator is 8- or 16-bit, and separately, whether the index registers are 8- or 16-bit, via the m and x status bits, which are manipulated primarily through the REP (REset Processor status bit(s)) and SEP (SEt Processor status bit(s)) instructions. A single instruction sets or clears whichever bit(s) you specify in the bit mask in the operand byte. I always recommend putting these instructions in macros to make them more friendly and less cryptic. I call mine ACCUM_8, ACCUM_16, INDEX_8, and INDEX_16. I've seen others call them LONG_A, SHORT_A, etc.. The only user-accessible registers that are fixed to 16 bits are the stack pointer S (sometimes called SP for added clarity) and the direct-page (like zero page, but movable) pointer, D (sometimes called DP for added clarity).
I've seen a lot of criticism of the "mode bits." The e bit is really the only mode bit (besides the decimal mode d bit which the 6502 also has), and as mentioned above, you can clear it at boot-up and never touch it again unless you need to run both 6502 and '816 software simultaneously (multitasking). The m and x bits are register-size-control bits which come into play if you're in native mode. For many applications, you can set those up one way or another and seldom touch those again either. (m and x aren't really mode bits, as they don't change the various instructions' behavior, only the size of data they operate on.) So no, it's not burdensome. It's more efficient than prefix bytes or having separate op codes for different-size operations. Remember that the op-code table is full; so adding more op codes means going into a second op-code byte, making programs bigger and slower).
These are some of the attractions the '816 offers:
LDA #$64 STA index STZ index+1 loop: <loop_innards> INC index BNE skip INC index+1 skip: LDA index CMP #$88 BNE loop LDA index+1 CMP #$13 BNE loopFor the '816, if you have the index registers set to 16-bit, it's just:
LDX #$100 loop: <loop_innards> INX CPX #$1388 BNE loopand using the index in the loop is much simpler on the '816 too. You don't need the variable. Actually, if you still wanted the variable (like to leave the index registers free), you could put it on the stack and handle it much more efficiently than the '02 could on the stack.
Here's the simple example of Forth's @ (pronounced "fetch," given such a short name because it's one of the things
used most), which takes a 16-bit address placed on the top of the ZP data stack and replaces it with the 16-bit contents of that address.
First for 6502:
LDA (0,X)
PHA
INC 0,X
BNE fet1
INC 1,X
fet1: LDA (0,X)
JMP PUT
; and elsewhere, PUT which is used in so many places is:
PUT: STA 1,X
PLA
STA 0,X
Can you tell at a glance what it's doing? Probably not. But the same thing for the '816 is only two instructions:
LDA (0,X)
STA 0,X ; For the '816, PUT is only one 2-byte instruction anyway, so there's no sense in jumping to it.
This kind of thing makes the '816 easier to program than the '02, not harder, and it's easier to see what you're doing.
LDA #<high_byte> PHA LDA #<low_byte> PHAbut on the '816, it's only:
PEA <value>and does not disturb A or the processor-status register P. (P is often called SR for "status register," although that could be confused with the shift register in the 6522 VIA. Do not confuse P with PC which is the program counter.) PEA makes it much easier to pass parameters to a subroutine on the stack, where they can be handled easily with the stack-relative addressing modes mentioned above. PEA is always 16-bit, regardless of accumulator or index-register sizes.
PER (not to be confused with PEA) is a three-byte instruction that adds the next instruction's 16-bit runtime address to the signed offset given by the operand, and pushes the result onto the stack, without affecting A, X, Y, or status. Doing this on the '02 requires five instructions including a JSR to an 11-instruction subroutine, plus more instructions if you need to preserve A, X, or status.
Related to PER is BRL (Branch Relative Long), which has a 16-bit
operand. Both PER and BRL can reach anywhere in the 64K
bank that the code is running in, even if the target is more than 32K away, since they wrap at the bank boundaries.
BSR (Branch to SubRoutine, with a 16-bit offset) can be synthesized with:
PER $+5
BRL <subroutine_addr>
(the subroutine address of course being turned into an offset by the assembler). These are components in writing code that's
relocatable even after it's loaded and has been running, as long as you're careful about return addresses on the stack.
Although the '816 does not have an MMU, "address 0" may still refer to several different things:
If you're intimidated by the '816, you can initially program it just like a 6502, then add new instructions and addressing modes and use the wider
registers little by little as you're comfortable.
See also:
(The rest of these are a little more advanced.)