the links mine |
6502 primer |
large math look-up tables |
65c02 assembly structure macros |
simple multitask |
6502 interrupts |
zero-overhead Forth interrupts |
RS-232 primer |
assembly relevant today |
NMOS-CMOS 6502 differences |
6502 stacks treatise |
workbench computers |
self-modifying code |
The 65816 microprocessor: Common Misunderstandings, Plus Attractions
Forum posts continue to show misunderstandings and fears about the 65816 processor, ones that make 6502 enthusiasts shy about taking the
step up to the better processor. This article seeks to address those, and to clarify things. It is subject to constant improvement,
like my others.
The '816 opens up new freedoms, without any obligation to use features you're not ready for. It's not a package deal. Instead,
there's a lot of freedom to order "à la carte" or treat it like a buffet table. Choose what you do want, and leave the rest alone.
First, the common misunderstandings:
- People talk about "16-bit mode," obviously confusing that with native mode and thinking that native mode can't do 8-bit operations.
This might be the #1 software misunderstanding. The processor comes out of reset in 65c02-emulation mode; but then if you don't need
to run any '02 software, you can put it in '816 native mode and leave it there, never touching that e bit again.
(Note the use of lower case for status-register bit names, and upper case for register names, according to the
Lichty and Eyes programming manual.)
When you enter '816-native mode, A, X, and Y initially start out as 8-bit; but then
you can choose, and change as often as you like, whether the accumulator is 8- or 16-bit, and whether the index registers are 8- or 16-bit,
via the m and x status bits, which are manipulated primarily through the REP (REset
Processor status bit(s)) and SEP (SEt Processor status bit(s)) instructions. A single instruction sets
or clears whichever bit(s) you specify in the bit mask in the operand byte. I always recommend putting these instructions in macros to
make them more friendly and less cryptic. I call mine ACCUM_8, ACCUM_16, INDEX_8, and INDEX_16.
I've seen others call them LONG_A, SHORT_A, etc.. The only user-accessible registers that are fixed to 16 bits are the
stack pointer S (sometimes called SP for added clarity) and the direct-page (like zero page, but movable)
pointer, D (sometimes called DP for added clarity).
I've seen a lot of criticism of the "mode bits." The e bit is really the only mode bit (besides the decimal mode
d bit which the 6502 also has), and as mentioned above, you can clear it at boot-up and never touch it again unless you
need to run both 6502 and '816 software simultaneously (multitasking). The m and x bits are
register-size-control bits which come into play if you're in native mode. For many applications, you can set those up one way or
another and seldom touch those again either. So no, it's not burdensome. It's more efficient than prefix bytes or having
separate op codes for different-size operations. Remember that the op-code table is full; so adding more op codes means going into a
second op-code byte, making programs bigger and slower).
- Although bus utilization is slightly more complex that the 6502's, people don't realize it is not necessary to latch, decode, or use
the bank address byte if you only need 64K of address space; yet you will still get a ton of benefits. (If you want to use more than
one 64K bank and are uneasy about certain hardware aspects, see the helpful forum topics linked at the end of this page.)
- They don't realize the D (direct-page) register is 16-bit. Direct page can start at any address in
the first 64K of the memory map, and does not need to be page-aligned. (Direct page is like the 6502's zero page, but can be
relocated, on the fly, and can even overlap the stack, making all the direct-page addressing modes available in the stack too.)
- They see it as just a wider 6502, not realizing that it has lots of new instructions and addressing modes, like the stack-relative ones
that make it so much better than the '02 at dealing with stack frames.
- They don't realize that the '816 offers a lot of new instructions, and access to the 16MB memory space, even in '02-emulation
- They don't realize that the 65816 is much better than the 6502 for:
- relocatable code (meaning it can be moved even after it's loaded, even within the same 64K bank),
- virtual memory, and
- selecting and prioritizing interrupts from several sources by modifying the vector addresses, using the
There's no obligation to do any of these; but the possibility is there, available to grow into. There really are no
penalties for all this capability.
- Compared to the '02, the '816 is a hundred times as good at banking. Banks are 64K each, and the data bank and the program bank
don't have to match. If the data or routine you need to access are not in the current bank, you can use bank-agnostic 24-bit
addressing and get there in a single instruction. Indexing can cross bank boundaries. However, the stack, direct page,
vectors, and interrupt routine entry points will always be in bank 0, ie, in the first 64K, regardless of bank registers. If
banks seem like a bad thing, see BDD's post here.
He says,"If you understand the following, you and the MPU's banks will get along just fine."
These are some of the attractions the '816 offers:
- The '816 is not just a 6502 with potentially wider registers. It has lots of improvements that make it able to do things the '02
- The stack can be thousands of bytes deep, since the stack pointer is 16-bit.
- Stack-relative addressing makes it much more graceful at handling stack frames and accessing bytes in the stack in random order, and
there's no need to use X like the '02 has to for this.
- 16-bit quantities don't have to be taken apart and handled 8 bits at a time; so code gets denser and faster. For example,
a loop counting from 100 (ie, $64) to 5000 ($13D8) on the '02 requires something like the following, where index is a 2-byte variable:
skip: LDA index
For the '816, if you have the index registers set to 16-bit, it's just:
and using the index in the loop is much simpler on the '816 too. You don't need the variable. Actually, if you still
wanted the variable (like to leave the index registers free), you could put it on the stack and handle it much more efficiently than the
'02 could on the stack.
Here's the simple example of Forth's @ (pronounced "fetch," given such a short name because it's one of the things
used most), which takes a 16-bit address placed on the top of the ZP data stack and replaces it with the 16-bit contents of that address.
First for 6502:
fet1: LDA (0,X)
; and elsewhere, PUT which is used in so many places is:
PUT: STA 1,X
Can you tell at a glance what it's doing? Probably not. But the same thing for the '816 is only two instructions:
STA 0,X ; For the '816, PUT is only one 2-byte instruction anyway, so there's no sense in jumping to it.
This kind of thing makes the '816 easier to program than the '02, not harder, and it's easier to see what you're doing.
- New instructions and addressing modes further improve efficiency. For example, pushing a 16-bit immediate value onto the stack as a
parameter to pass to a subroutine with the '02 requires:
but on the '816, it's only:
and does not disturb A or the processor-status register P. (P is often
called SR for "status register," although that could be confused with the shift register in the 6522 VIA. Do not
confuse P with PC which is the program counter.) PEA
makes it much easier to pass parameters to a subroutine on the stack, where they can be handled easily with the stack-relative addressing
modes mentioned above. PEA is always 16-bit, regardless of accumulator or index-register sizes.
PER (not to be confused with PEA) is a three-byte
instruction that adds the next instruction's 16-bit runtime address to the signed offset given by the operand, and pushes the
result onto the stack, without affecting A, X, Y, or status. Doing this on the
'02 requires five instructions including a JSR to an 11-instruction subroutine, plus more instructions
if you need to preserve A, X, or status.
Related to PER is BRL (Branch Relative Long), which has a 16-bit
operand. Both PER and BRL can reach anywhere in the 64K
bank that the code is running in, even if the target is more than 32K away, since they wrap at the bank boundaries.
BSR (Branch to SubRoutine, with a 16-bit offset) can be synthesized with:
(the subroutine address of course being turned into an offset by the assembler). These are components in writing code that's
relocatable even after it's loaded and has been running, as long as you're careful about return addresses on the stack.
- My '816 Forth runs two to three times as fast as my '02 Forth at a given clock rate; and the clock rate is not at any disadvantage
either, as forum member "Windfall" has an '816 running at
24MHz @ 3.3V, three times the speed the data sheet says you can get at that voltage, and forum member "plasmo" got one running at
40MHz @ 5.3V.
- ZP is now called "direct page," or "DP," because it can be relocated on the fly, so each task can have its own "ZP." It does not have
to start on a page boundary either (although it's slightly more efficient if it does). You can also make it overlap the hardware stack,
making it possible to use all the ZP addressing modes in the stack.
- More vectors, including that BRK now has its own, so if you use
BRK, your IRQ ISR does not need to test the b bit.
- The VDA (valid data address) and VPA (valid program address) outputs can be used to identify dead bus cycles. (The simplest designs can
ignore them; so don't let them weigh you down. If you use only one of them, it will likely be the VDA, not VPA.) The data sheet
says these are for managing separate data and program caches; but another use is to have DMA access the bus in those cycles, so that DMA does
not pause or slow the program execution. BDD used them to solve a problem with a 2692 DUART.
- It is far better suited to self-relocatable code (meaning you can move it even after it's loaded, even within the same 64K bank), multi-user,
multitasking, and multithreading than the '02.
- RAM is available in much faster speeds than ROM; so if you want to pre-load RAM from ROM and then switch out the slower ROM to run only in
fast RAM, you can do the pre-load in 6502-emulation mode, then go into '816-native mode and have the E output tell the glue logic to disable
the ROM and enable the RAM reads in that address range, and kick up the clock speed (or eliminate the wait states that the ROM needed).
This more or less assumes you won't be needing 6502-emulation mode after that.
Although the '816 does not have an MMU, "address 0" may still refer to several different things:
- the address pointed to by the direct-page register D in bank 0 (The 816's Direct Page is like 6502's ZP, but you can
position it anywhere in the first 64K of the memory space, and it doesn't have to start on a page boundary.) When you run several programs
at once (in multitasking), each program can have its own direct page, without concerning itself with other programs' direct-page usage.
Most programs won't need a whole page for their direct page; so you can reserve as little memory for it as you like.
- the address pointed to by the stack-pointer register S in bank 0 (in stack-relative addressing)
- the first address in the bank selected by the data-bank register DBR (or, when used as part of a mnemonic,
B) (in absolute addressing) Each program can have its own data bank which it views as starting at its own address
0, regardless of other simultaneously running programs' data space.
- the first address in the bank selected by the program-bank register PBR (or K when used as part of a
mnemonic) (in program addressing) Each program can have its own program bank which it views as starting at its own address 0,
regardless of other simultaneously running programs' program space. A program's program bank(s) and data bank(s) can match or be separate.
- the first address in the entire memory map (long addressing, both program and data, with no bank boundaries)
- and then there are the address-relative ones too, PER & BRL mentioned above. This is of course in addition to the
65c02's BRA and conditional-branch instructions. The first byte of the next instruction is considered 0.
If you're intimidated by the '816, you can initially program it just like a 6502, then add new instructions and addressing modes and use the wider
registers little by little as you're comfortable.
As usual, I must recommend the excellent programming manual,
"Programming the 65816 including the 6502, 65C02, and 65802"
by David Eyes and Ron Lichty. This is a .pdf file of a rather large book that is well laid out and is much better than the
description there lets on. Note: There were many problems with the earlier .pdf version that were not in the original paper
manual; but in late March 2015, WDC scanned and OCR'ed the paper manual and posted the new, repaired .pdf.
Thanks to forum members BigDumbDinosaur, Dr Jefyll, Chromatix, Drogon, and floobydust for help for this article.
last updated Mar 2, 2022 Garth Wilson wilsonminesBdslextremeBcom (replace
the B's with @ and .)