65816 myths

Forum posts continue to show misunderstandings and fears about the 65816 processor, ones that make 6502 enthusiasts shy about taking the step up to the better processor. This article seeks to address those, and to clarify things. It is subject to constant improvement, like my others.

The '816 opens up new freedoms, without any obligation to use features you're not ready for. It's not a package deal. Instead, there's a lot of freedom to order "à la carte" or treat it like a buffet table. Choose what you do want, and leave the rest alone.

First, the common misunderstandings:

People talk about "16-bit mode," obviously confusing that with native mode and thinking that native mode can't do 8-bit operations. This might be the #1 software misunderstanding. The processor comes out of reset in 65c02-emulation mode; but then if you don't need to run any '02 software, you can put it in '816 native mode and leave it there, never touching that e bit again. (Note the use of lower case for status-register bit names, and upper case for register names, according to the Lichty and Eyes programming manual.)

When you enter '816-native mode, A, X, and Y initially start out as 8-bit; but then you can choose, and change as often as you like, whether the accumulator is 8- or 16-bit, and separately, whether the index registers are 8- or 16-bit, via the m and x status bits, which are manipulated primarily through the REP (REset Processor status bit(s)) and SEP (SEt Processor status bit(s)) instructions. A single instruction sets or clears whichever bit(s) you specify in the bit mask in the operand byte. I always recommend putting these instructions in macros to make them more friendly and less cryptic. I call mine ACCUM_8, ACCUM_16, INDEX_8, and INDEX_16. I've seen others call them LONG_A, SHORT_A, etc.. The only user-accessible registers that are fixed to 16 bits are the stack pointer S (sometimes called SP for added clarity) and the direct-page (like zero page, but movable) pointer, D (sometimes called DP for added clarity).
I've seen a lot of criticism of the "mode bits." The e bit is really the only mode bit (besides the decimal mode d bit which the 6502 also has), and as mentioned above, you can clear it at boot-up and never touch it again unless you need to run both 6502 and '816 software simultaneously (multitasking). The m and x bits are register-size-control bits which come into play if you're in native mode. For many applications, you can set those up one way or another and seldom touch those again either. (m and x aren't really mode bits, as they don't change the various instructions' behavior, only the size of data they operate on.) So no, it's not burdensome. It's more efficient than prefix bytes or having separate op codes for different-size operations. Remember that the op-code table is full; so adding more op codes means going into a second op-code byte, making programs bigger and slower).
Although bus utilization is slightly more complex that the 6502's, people don't realize it is not necessary to latch, decode, or use the bank address byte if you only need 64K of address space; yet you will still get a ton of benefits. (If you want to use more than one 64K bank and are uneasy about certain hardware aspects, see the helpful forum topics linked at the end of this page.)
They don't realize the D (direct-page) register is 16-bit. Direct page can start at any address in the first 64K of the memory map, and does not need to be page-aligned. (Direct page is like the 6502's zero page, but can be relocated, on the fly, and can even overlap the stack, making all the direct-page addressing modes available in the stack too.)
They see it as just a wider 6502, not realizing that it has lots of new instructions and addressing modes, like the stack-relative ones that make it so much better than the '02 at dealing with stack frames.
They don't realize that the '816 offers a lot of new instructions, and access to the 16MB memory space, even in '02-emulation mode.
They don't realize that the 65816 is much better than the 6502 for:
- relocatable code (meaning it can be moved even after it's loaded, even within the same 64K bank),
- multitasking,
- multithreading,
- multiprocessing,
- DMA,
- cache,
- virtual memory, and
- selecting and prioritizing interrupts from several sources by modifying the vector addresses, using the VP output.
There's no obligation to do any of these; but the possibility is there, available to grow into. There really are no penalties for all this capability.
Compared to the '02, the '816 is a hundred times as good at banking. Banks are 64K each, and the data bank and the program bank don't have to match. If the data or routine you need to access are not in the current bank, you can use bank-agnostic 24-bit addressing and get there in a single instruction. Indexing can cross bank boundaries. However, the stack, direct page, vectors, and interrupt routine entry points will always be in bank 0, ie, in the first 64K, regardless of bank registers. If banks seem like a bad thing, see BDD's post here. He says,"If you understand the following, you and the MPU's banks will get along just fine."

These are some of the attractions the '816 offers:

The '816 is not just a 6502 with potentially wider registers. It has lots of improvements that make it able to do things the '02 simply cannot.
The stack can be thousands of bytes deep, since the stack pointer is 16-bit.
Stack-relative addressing makes it much more graceful at handling stack frames and accessing bytes in the stack in random order, and there's no need to use X like the '02 has to for this.
16-bit quantities don't have to be taken apart and handled 8 bits at a time; so code gets denser and faster. For example, a loop counting from 100 (ie, $64) to 5000 ($13D8) on the '02 requires something like the following, where index is a 2-byte variable:
```
       LDA  #$64
       STA  index
       STZ  index+1

loop:  <loop_innards>

       INC  index
       BNE  skip
       INC  index+1

skip:  LDA  index
       CMP  #$88
       BNE  loop

       LDA  index+1
       CMP  #$13
       BNE  loop
```
For the '816, if you have the index registers set to 16-bit, it's just:
```
       LDX  #$100
loop:  <loop_innards>
       INX
       CPX  #$1388
       BNE  loop
```
and using the index in the loop is much simpler on the '816 too. You don't need the variable. Actually, if you still wanted the variable (like to leave the index registers free), you could put it on the stack and handle it much more efficiently than the '02 could on the stack.

Here's the simple example of Forth's @ (pronounced "fetch," given such a short name because it's one of the things used most), which takes a 16-bit address placed on the top of the ZP data stack and replaces it with the 16-bit contents of that address.
First for 6502:
```
       LDA  (0,X)
       PHA
       INC  0,X
       BNE  fet1
       INC  1,X
fet1:  LDA  (0,X)
       JMP  PUT
; and elsewhere, PUT which is used in so many places is:
PUT:   STA  1,X
       PLA
       STA  0,X
```
Can you tell at a glance what it's doing? Probably not. But the same thing for the '816 is only two instructions:
```
       LDA  (0,X)
       STA  0,X         ; For the '816, PUT is only one 2-byte instruction anyway, so there's no sense in jumping to it.
```
This kind of thing makes the '816 easier to program than the '02, not harder, and it's easier to see what you're doing.
New instructions and addressing modes further improve efficiency. For example, pushing a 16-bit immediate value onto the stack as a parameter to pass to a subroutine with the '02 requires:
```
       LDA  #<high_byte>
       PHA
       LDA  #<low_byte>
       PHA
```
but on the '816, it's only:
```
       PEA  <value>
```
and does not disturb A or the processor-status register P. (P is often called SR for "status register," although that could be confused with the shift register in the 6522 VIA. Do not confuse P with PC which is the program counter.) PEA makes it much easier to pass parameters to a subroutine on the stack, where they can be handled easily with the stack-relative addressing modes mentioned above. PEA is always 16-bit, regardless of accumulator or index-register sizes.
PER (not to be confused with PEA) is a three-byte instruction that adds the next instruction's 16-bit runtime address to the signed offset given by the operand, and pushes the result onto the stack, without affecting A, X, Y, or status. Doing this on the '02 requires five instructions including a JSR to an 11-instruction subroutine, plus more instructions if you need to preserve A, X, or status.
Related to PER is BRL (Branch Relative Long), which has a 16-bit operand. Both PER and BRL can reach anywhere in the 64K bank that the code is running in, even if the target is more than 32K away, since they wrap at the bank boundaries. BSR (Branch to SubRoutine, with a 16-bit offset) can be synthesized with:
```
       PER  $+5
       BRL  <subroutine_addr>
```
(the subroutine address of course being turned into an offset by the assembler). These are components in writing code that's relocatable even after it's loaded and has been running, as long as you're careful about return addresses on the stack.
My '816 Forth runs two to three times as fast as my '02 Forth at a given clock rate; and the clock rate is not at any disadvantage either, as forum member "Windfall" has an '816 running at 24MHz @ 3.3V, three times the speed the data sheet says you can get at that voltage, and forum member "plasmo" got one running at 40MHz @ 5.3V.
ZP is now called "direct page," or "DP," because it can be relocated on the fly, so each task can have its own "ZP." It does not have to start on a page boundary either (although it's slightly more efficient if it does). You can also make it overlap the hardware stack, making it possible to use all the ZP addressing modes in the stack.
More vectors, including that BRK now has its own, so if you useBRK, your IRQ ISR does not need to test the b bit.
The VDA (valid data address) and VPA (valid program address) outputs can be used to identify dead bus cycles. (The simplest designs can ignore them; so don't let them weigh you down. If you use only one of them, it will likely be the VDA, not VPA.) The data sheet says these are for managing separate data and program caches; but another use is to have DMA access the bus in those cycles, so that DMA does not pause or slow the program execution. BDD used them to solve a problem with a 2692 DUART.
It is far better suited to self-relocatable code (meaning you can move it even after it's loaded, even within the same 64K bank), multi-user, multitasking, and multithreading than the '02.
RAM is available in much faster speeds than ROM; so if you want to pre-load RAM from ROM and then switch out the slower ROM to run only in fast RAM, you can do the pre-load in 6502-emulation mode, then go into '816-native mode and have the E output tell the glue logic to disable the ROM and enable the RAM reads in that address range, and kick up the clock speed (or eliminate the wait states that the ROM needed). This more or less assumes you won't be needing 6502-emulation mode after that.

Although the '816 does not have an MMU, "address 0" may still refer to several different things:

the address pointed to by the direct-page register D in bank 0 (The 816's Direct Page is like 6502's ZP, but you can position it anywhere in the first 64K of the memory space, and it doesn't have to start on a page boundary.) When you run several programs at once (in multitasking), each program can have its own direct page, without concerning itself with other programs' direct-page usage. Most programs won't need a whole page for their direct page; so you can reserve as little memory for it as you like.
the address pointed to by the stack-pointer register S in bank 0 (in stack-relative addressing)
the first address in the bank selected by the data-bank register DBR (or, when used as part of a mnemonic, B) (in absolute addressing) Each program can have its own data bank which it views as starting at its own address 0, regardless of other simultaneously running programs' data space.
the first address in the bank selected by the program-bank register PBR (or K when used as part of a mnemonic) (in program addressing) Each program can have its own program bank which it views as starting at its own address 0, regardless of other simultaneously running programs' program space. A program's program bank(s) and data bank(s) can match or be separate.
the first address in the entire memory map (long addressing, both program and data, with no bank boundaries)
and then there are the address-relative ones too, PER & BRL mentioned above. This is of course in addition to the 65c02's BRA and conditional-branch instructions. The first byte of the next instruction is considered 0.

If you're intimidated by the '816, you can initially program it just like a 6502, then add new instructions and addressing modes and use the wider registers little by little as you're comfortable.

See also:

BDD's parallel article on 65C816 myths The first few paragraphs include some '816 background which I have omitted above. (Updated 9/11/21)
Chapter 13 of my 6502 stacks treatise, "65816's instructions and capabilities relevant to stacks, and 65c02 code which partially synthesizes some of them"
BDD's '816 interrupts article (it's also on 6502.org)
Not much love for the poor 65816? (forum topic, where the original poster got his objections answered)
What to do with the 65816's "mystery" pins (forum topic)
65816 pinout compared to 6502 (forum topic. Note that it is a rather simple matter to make a PCB that can accept either the '02 or the '816, with a few jumper options to select which.)
65C816 Opcodes, by Bruce Clark (on 6502.org, but not its forum)
Sketches please: a fewest-chips logic-only 816 system (forum topic)
(The rest of these are a little more advanced.)
The 65816 does NOT drive the bank address while RDY is low (forum post, measurement)
When is a Latch not a Latch? (Capturing the '816 Bank Addr) (forum topic)
Managing the 65816 multiplexed bus (forum topic, with animated .gif's)
RDY vs CLOCK STRETCHING. Includes 2 very simple circuits. (forum topic)
Generating Wait-States with Clock Trickery, with diagrams (forum topic)
On the usefulness of 65816 as a 65C02 alternative (forum topic)
Object-Oriented Dispatch on 65816, by Samuel Falvo
65816 system engineering pros and cons (forum topic)
How to demonstrate advantages of 65816? (forum topic, for demo'ing the added capabilities for the beeb816 project)
...and of course you can always do a search on the 6502.org forum for "65816" or "65c816" or "'816" or "65265" (which is an off-the-shelf '816-based microcontroller).

As usual, I must recommend the excellent programming manual, "Programming the 65816 including the 6502, 65C02, and 65802" by David Eyes and Ron Lichty. This is a .pdf file of a rather large book that is well laid out and is much better than the description there lets on. Note: There were many problems with the earlier .pdf version that were not in the original paper manual; but in late March 2015, WDC scanned and OCR'ed the paper manual and posted the new, repaired .pdf.

Thanks to forum members BigDumbDinosaur, Dr Jefyll, Chromatix, Drogon, and floobydust for help for this article.

last updated Apr 27, 2024 Garth Wilson wilsonminesBdslextremeBcom (replace the B's with @ and .)