Note: Principles here can be used on any assembler with macro capability, but this is primarily aimed at
the 65c02 microprocessor. I'm using the C32 assembler (Cross-32) from Universal Cross Assemblers, now
distributed by Data Sync Engineering and
MPE Forth. (Edit, late 2023: C32 seems to have
become unavailable, at least from the sources I know; so I'm trying to find out where one can get it, or if I can
distribute it myself.) The one assembler is good for lots of processors. It is down to $99 now, but
there's a list of free 6502/c02/816 assemblers here and
here, and I have a section
on assemblers on my links page, here.
BDD has macros to assemble 65816 code on Kowalski's 65(c)02 macro assembler,
here (although since he wrote that, Daryl
Rictor, "8BIT" on the 6502.org forum, has extended
the Kowalski assembler to work for the '816 also). (A related article on this site is
"Assembly Language: Still Relevant Today.")
Expanded 5/11/13. This material is subject to constant improvement. I have tested these 65c02 macros but have not used them extensively yet like I have the corresponding ones for PIC16 microcontrollers on Microchip's MPASM assembler. I've done a few large (for PIC16) projects using them. What a pleasure to get away from the spaghetti that's so typical of assembly! The code to do it on PIC16 is here. On 5/5/14 I added alternate versions of CASE_OF and END_OF to the PIC code. They save an instruction every time they can be used (although PIC16's new CASE_OF is still not as efficient as the 6502's old one.) See the notes at their source code for the frequent conditions under which they can be used in place of the original version. I have not added these to the 6502 code yet. The next ambition is the ability to have multiple WHILEs between BEGIN and REPEAT.
1/12/18: I just found out Dave Keenan did the same kind of thing for the IAR assembler and the MSP430 processor. We were unaware of each other's work until now.
10/6/23: Also, Anton Grigorev
adapted these for the DASM assembler.
On this page:
"Beauty is more important in computing than anywhere else in technology because software is so complicated. Beauty is the ultimate defense against complexity." —David Gelernter
"Good programmers know what's beautiful and bad ones don't." —David Gelernter
As you write an assembly-language program, you may see repeating patterns. If it's exactly the same all the time, you can make it a subroutine. That incurs a 12-clock performance penalty for the subroutine call (JSR) and return (RTS), but program memory is saved because the code for the subroutine is not repeated over and over.
There will be other times however where the repeating pattern is the same but internal details are not, so you can't just use a JSR. The differences from one occurrence to another might be an operand, a string or other data, an address, a condition, etc.. It would be helpful to be able to tell the assembler, "Do this sequence here; except when you get down to this part, substitute-in such-and-such," or, "under such-and-such condition, assemble this alternate code." That's where it's time for a macro. "White Flame" on the 6502.org forum wrote, "Macros are assembly-time function calls, whose return value is source code."
The repeating, possibly messy-looking sequences that clutter your code can be replaced with a macro call that takes a single line each time, optionally with parameters. Since you write the macro (or at least you can edit it if you want to, even if someone else wrote it), you have complete control of every bit of machine code that gets laid down. After the internal details have been ironed out, you shouldn't have to keep being bothered with them. If you can hide them with macros, you can see the big picture more clearly, have better control of the project, have fewer bugs, and become more productive without losing any performance or taking more memory.
A macro may replace a piece of assembly-language code as short as a line or two, to give more clarity to what is being done
there. An example of a single-line macro is where you want to replace the 65816's
cryptic REP and SEP instructions:
INDEX_16: MACRO ; Make index registers X & Y to be 16-bit.
REP #00010000B
; NOP ; NOP was necessary for early versions
ENDM ; of '802/'816 >4MHz.
;-------------------
INDEX_8: MACRO ; Make index registers X & Y to be 8-bit.
SEP #00010000B
; NOP ; NOP was necessary for early versions
ENDM ; of '802/'816 >4MHz.
;-------------------
INDEX_16 above is far more clear than REP #00010000B which it replaces, yet
it lays down exactly the same machine code, C2 10, which takes 3 clocks' execution time at runtime.
Here's another short but useful one (and you can modify it to get related ones), for branching more than half a page away.
BEQlong: MACRO LBL
BNE bel1
JMP LBL
bel1:
ENDM
;-------------------
The JMP makes it no longer relative but absolute, but most applications can use it. The 65816 has
a BRL (Branch Relative Long, or BRanch Long) instruction, but doing the same kind of thing on the 6502
for relocatable code requires more steps. (You can store the offset, and then, since JSR puts the
current address on the stack, the subroutine can add the offset to it before the RTS.)
With the macro above, if you want to do a BEQ to someplace more than half a page away, you can do
for example:
BEQlong FOOBAR
and it will assemble:
BNE 3 ; ie, to the instruction after the 3-byte JMP instruction
JMP FOOBAR
<continue>
A macro doesn't necessarily have to lay down any code at all. I use a couple for paragraphs of comments. Although you can
comment any line by putting a ; in front of it, this becomes a pain if you find you need to insert or delete a few words and adjust
all the lines after it. Why not do:
COMMENT
Every flight or checklist is a file. There is no directory. The computer
can find a file by starting at the first file in memory (which starts at
FIRST_FL_ADR), calculating the address of the 2nd file by using the file-
length bytes of the first file, etc.. There's an ENDRAM byte after the
END_OF_FL byte of the last file. The address of the ENDRAM byte is also
stored in ENDRAM_ADR. The number of files is stored in NR_FLs. All the
RAM used by the system is at lower addresses than the file chain.
END_COMMENT
Besides making it easier to re-format the paragraph after changes (<alt>R in the MultiEdit text editor even preserves the
left margin), it looks nicer. Just be sure no line starts with something that the assembler interprets as
the ENDIF.
Here's how to do it. IF 0 is a condition that will never be met, so the assembler skips
to the ENDIF (or actually ENDI in the C32 assembler I used the portion below on).
COMMENT: MACRO ; COMMENT and END_COMMENT here relieve us from the load
IF 0 ; of semicolons where we have many consecutive lines of
ENDM ; comments. Since the IF is looking for an ELSE or ENDI
;---------------- ; (either cap.s or lower case), be sure none of the lines
; commented-out start with one of these words that could
END_COMMENT: MACRO ; fool it. If there is, that line will still need a ; .
ENDI ; Also, if a line starts with a macro name which is
ENDM ; followed by illegal parameters for that macro (as to
;---------------- ; discuss it), you will still need the ; .
Of course if your assembler already has a COMMENT directive (like the 2500AD one does),
you won't need this one. Not all do however, and the above still shows what can be done.
Here's an only slightly more complex example, where we want the computer to display an immediate string and wait for the
response:
DISPLAY_IMM "Press CONTINUE when ready"
WAIT_FOR_KEY CONT_KEY
where the macro DISPLAY_IMM is defined as:
DISPLAY_IMM: MACRO STR
JSR DISP_QUOTE
BYTE dim2#-dim1# ; Lay down the string length byte,
dim1#: BYTE STR ; followed by the string. (Counted string, not nul-terminated.)
dim2#: ; (Must not put ENDM on same line with the label.)
ENDM
;------------------
(This is from an application I did with the 2500AD assembler which used the # at the end of a label to mean it's local to the
macro.) It assembles:
JSR DISP_QUOTE
BYTE 25
BYTE "Press CONTINUE when ready"
The DISP_QUOTE subroutine looks at the return address to get the string length, then continues that far
to get the string it's supposed to display, then also uses the length byte to adjust the return address on the 6502's stack so
the RTS takes it to the first instruction after the string instead of trying to execute data.
The WAIT_FOR_KEY macro used above is defined as:
WAIT_FOR_KEY: MACRO KEY
wfk1#: JSR SCAN_KEYPAD
CMP #KEY
BNE wfk1#
ENDM
;------------------
and will assemble:
JSR SCAN_KEYPAD
CMP #CONT_KEY
BNE $F9 ; Branch back to the JSR.
Macros can make a routine take fewer pages and make it easier to wrap your head around
it. A piece from a routine goes as follows:
; Initialize software flags
LDA #0
STA ICNT
STA IHEAD
STA ITAIL
STA OCNT
STA OHEAD
STA OTAIL
STA OIE
One thing I liked about the 2500AD assembler is that it allowed a variable number of parameters in the macro call, which was
nice for setting or clearing a list of flag variables that won't always have the same number of them listed. Hence the above
could be replaced with:
CLR_FLAG ICNT, IHEAD, ITAIL, OCNT, OHEAD, OTAIL, OIE ; Init the software flags.
and lay down exactly the same code. (Of course with the 65c02 you could make it eliminate the LDA #0 and
use STZ instead of STA.) There are seven flags to clear, so you would need at
least that many parts in the macro. In the 2500AD assembler, IFMA 5 for example meant "If there's a
5th macro parameter, do this part." In this case IFMA 8 would be the first one that would not lay down
any code, since there's no 8th flag listed to clear. If your assembler does not have that capability, you might have to just pad
the unused parameter positions with 0's in order to accomplish the above.
You can use macros to simplify countless things in your code. Take something as mundane as copying the value of one two-byte
variable to another, normally done this way when not using macros:
LDA ACCb
STA KEY_TIME
LDA ACCb + 1
STA KEY_TIME + 1
Why not shorten it to:
COPY2 ACCb, TO, KEY_TIME
where COPY2 is defined as:
COPY2: MACRO variable1, preposition, variable2
LDA variable1
STA variable2
LDA variable1 + 1
STA variable2 + 1
ENDM
;------------------
Here preposition is not actually used by the macro. It's just there to make the line more English-like, and
I define TO as TO: EQU 0 since the assembler does want it defined. Note
again that using the macro does not change the resulting code run by the 6502. It's the same. The extra work is handled at assembly
time, not at run time.
You know exactly what code gets produced. If there were conditionals and you wanted to see what code resulted, you could
look at the assembler's .LST (list) output file which shows the actual addresses along the left edge, followed
by the actual bytes of op codes, operands, and data, all to the left of the corresponding lines of source code. If you assign
constants and assembler variables, the list file will show the exact numeric values that resulted from those too.
At the rare extreme, a macro can replace even pages of code. Before I did the program-structure macros, I think my longest macro was 54 lines, with much of that being conditionals, and the macro actually assembled only two to seven instructions depending on the conditions which in this case were set in the parameters in the line calling the macro.
A macro, when defined, is only kept in the assembler. It is like a subroutine for the assembler itself to run. No machine code is produced at the time the macro is defined; so unused macros take no memory at all in the final program. When you do call the macro later in your code, the assembler itself will execute that "assembler subroutine" and generate machine code at that time, per the parameters given in the macro call, and put it where the program pointer indicates. If you look at the resulting .LST (list) file, you can see the code in the macro expanded out at each point the macro is invoked. It can have conditionals and so on, just like non-macro code.
Since a macro expansion can take varying amounts of space in the output machine code, the macro must be defined before it is called. Forward references to macros are not allowed. Forward references in JSR's are ok because the entire JSR instruction is always the same length; but the a macro may take differing amounts of code space from one call to the next because of things like conditional assembly in the macro and conditions being different each time, varying lengths of text strings, etc.. The assembler would have no idea how much address space to reserve to expand a macro that is not defined yet.
Myth #1: "Macros breed inefficiency."
An example given in a forum post regarded having a macro MOV that would put a value in an address:
MOV #0, $D020
MOV #0, $D021
(although none of the assemblers I've used would parse the "#" in front of a macro parameter), where the example would
lay down an LDA #0 twice when once is enough. One way to handle it is that if your assembler allows
a variable number of parameters, you could write the macro to handle for example (and I'll use PUT to handle the problem
of immediates rather than COPYing one memory location to another),
PUT 0, in, $D020, and, D021
or, since the two addresses are contiguous, my preference, have a separate macro "PUT2" for two-byte operations,
PUT2 0, in, $D020
where PUT2 views the first parameter as a 16-bit quantity, and if the high byte and low byte are the same, it does not
reload it. Otherwise, the low byte of your specified 16-bit number goes in address $D020 and the high byte in $D021.
Now suppose the accumulator is not available but Y is. You could either use a separate macro for that case; or if your
assembler allows varying numbers of macro parameters, specify what you want in the invocation something like this:
PUT2 0, in, $D020, using_Y
where "using_Y" is an EQUate, and the macro tests that parameter, and if it's absent, it would use the
accumulator, or if it's this one it would use Y, and if it's "using_X," it would use X. The EQUates for
X and Y might be 1 and 2, or anything else you want them to be, as long as you write the macro to interpret them correctly.
In the case of the 0 above, the 65c02 (ie, CMOS) can use STZ and then you don't need an LDA, LDX, or
LDY at all. The way to write that for the C32 assembler would be:
PUT2: MACRO num, preposition, addr
IF num != 0
IF {num & FF} != {num >> 8}
LDA #num & $FF
STA addr
LDA #{num >> 8}
STA addr + 1
ELSE
LDA #num & $FF
STA addr
STA addr + 1
ENDI
ELSE
STZ addr
STZ addr + 1
ENDI
ENDM
;-------------
which covers all the conditions to get the desired results with the fewest possible machine-language instructions,
whether two, three, or four instructions. "preposition" is just a dummy equate that does not actually get used by
the macro. It's only there to make the line more English-like, as in "PUT2 $28BE, in, FOOBAR".
You can do loads of conditional assembly in macros; and then what they'll do automatically may be an optimization
you'd forget to do if you were to write it all out by hand, or they may propagate a change you make elsewhere even if you forget
to come back and improve the code here.
Myth #2: "Macros will introduce more bugs."
Properly implemented, macros make it easier to see what you're doing. One of the effects is that you'll spot bugs sooner,
meaning the macros will eliminate some debugging. Source code becomes more clear and concise. Once the macro
definitions themselves are debugged, they'll work right every time. You do however need to avoid doing things like making
the macro invisibly or unexpectedly store data in a variable, overwriting data that's still needed by another part of the
program. Yes, that could be difficult to debug; but it won't happen with good macro technique.
Myth #3: "Macros make the code less readable."
Again, when properly implented, macros will make your code more readable, not less. Use names that don't leave
anyone guessing. You can also use dummy parameters that aren't even used by the macro but make the line more like an
English sentence, as shown in the examples above.
Myth #4: "Macros defeat the purpose of assembly language."
Doing things the difficult and cryptic way is not the purpose of assembly language, as some seem to think it is,
and they turn up their noses at it. We do assembly language for maximum performance and control of the processor.
If we can use macros to further add the benefits of productivity, maintainability, fewer bugs, and keeping control of a large
project, without forfeiting the benefits of assembly, that's a good thing! Macros may help you stay in
"assembly-language land" when it might otherwise be impractical.
High-level languages (HLLs) were invented to improve productivity, reduce source-code length, and improve
portability. In assembly language, I commonly work with two entirely different processor families, and I use many of
the same macros in my code for both. This improves portability, even though it's still assembly language. If I
have a routine for something working on one, and want to write it for the other, I don't have to completely re-write it.
Myth (or objection) #5: "You're making up your own language!"
Good implementation of macros will make the code more like English (or whatever your native tongue is). The result should
be rather intuitive to anyone else who speaks your language and has any understanding of the application at all, even if they're
not very familiar with 65xx assembly language.
If you've programmed much in non-structured programming, you have experienced situations with lots of branches that just drive you nuts. I have, many times, printed out the routine on fanfold paper and laid the strip out on the floor and drawn arrows showing all the spaghetti, ie, the tangle of branches. It even gets hard to come up with short labels that are semi-descriptive, especially if a part is branched to by different conditions.
Here's a short-ish more general-purpose piece of code from the 6502.org source-code repository, from Bruce Clark. Without laying a macro foundation for structured
programming, it was indeed appropriate for him to do it unstructured and use labels.
ORG 0
FROM: DFS 2 ; "DFS" in C32 is like "BLKB" in the 2500AD assembler.
TO: DFS 2 ; It stands for "DeFine Storage", and in this case
SIZE: DFS 2 ; allots two bytes for each ZP variable here.
SIZEL: EQU SIZE ; SIZEL and SIZEH are the low and high bytes of
SIZEH: EQU SIZE+1 ; variable SIZE above.
ORG $8000
; +-----------------------+
; | ORIGINAL VERSION |
; +-----------------------+
MOVEDOWN: LDY #0
LDX SIZEH
BEQ MD2
MD1: LDA (FROM),Y ; Move a page at a time.
STA (TO),Y
INY
BNE MD1
INC FROM+1
INC TO+1
DEX
BNE MD1
MD2: LDX SIZEL
BEQ MD4
MD3: LDA (FROM),Y ; Move the remaining bytes
STA (TO),Y
INY
DEX
BNE MD3
MD4: RTS
;----------------
; +-------------------------+
; | STRUCTURED VERSION |
; +-------------------------+
MOVEDOWN:
LDY #0
LDX SIZEH ; Get the high byte of the size of block to move.
IF_NOT_ZERO ; Do this 1st part if there's at least one full page to move.
BEGIN ; Do this loop once for each full page to move.
BEGIN ; Do this loop once for each byte in the page.
LDA (FROM),Y
STA (TO),Y
INY
UNTIL_ZERO ; UNTIL_ZERO assembles the BNE up to the BEGIN four lines up.
INC FROM+1 ; Increment the high byte of the source
INC TO+1 ; and destination addresses, and
DEX ; decrement the number of full pages left to do.
UNTIL_ZERO ; UNTIL_ZERO assembles the BNE up to the corresponding BEGIN.
END_IF ; END_IF puts the branch distance in the BEQ assembled by the
; IF_NOT_ZERO above, whose operand's addr was on the macro stack.
LDX SIZEL ; After all full pages have been moved, see if there's _part_
IF_NOT_ZERO ; of one left to do. If there is, do the following.
BEGIN ; Do this loop once for each byte left.
LDA (FROM),Y
STA (TO),Y ; After transferring each byte,
INY ; increment the index,
DEX ; and decrement the number of bytes left to do.
UNTIL_ZERO ; UNTIL_ZERO assembles the BNE up to the BEGIN 5 lines up.
END_IF ; END_IF puts the branch distance in the BEQ assembled
; by the IF_NOT_ZERO above, so a branch taken goes to the RTS below.
RTS
;----------------
Or, saving a few lines of source code:
MOVEDOWN:
LDY #0
LDX SIZEH ; Get the high byte of the size of block to move.
IF_NOT_ZERO ; Do this 1st part if there's at least one full page to move.
FOR_X X_REG, DOWN_TO, 0 ; Do this loop once for each full page to move. Start w/ current X contents.
FOR_Y Y_REG, UP_TO, 0 ; Do this loop once for each byte in the page. Start w/ current Y contents.
LDA (FROM),Y
STA (TO),Y
NEXT_Y ; NEXT_Y assembles the BNE up to the LDA (FROM),Y two lines up.
INC FROM+1 ; Increment the high byte of the source and
INC TO+1 ; destination addresses. In next line, decr the number of full pages left to do.
NEXT_X ; NEXT_X does the DEX, and assembles a BNE up to the first line after FOR_X above.
END_IF ; END_IF puts the branch distance in the BEQ assembled by the
; IF_NOT_ZERO above, whose operand's addr was on the macro stack.
LDX SIZEL ; After all full pages have been moved, see if there's _part_
IF_NOT_ZERO ; of one left to do. If there is, do the following.
FOR_X X_REG, DOWN_TO, 0 ; Do this loop once for each byte left.
LDA (FROM),Y
STA (TO),Y ; After transferring each byte,
INY ; increment the index. In next line, decr the number of bytes left to do.
NEXT_X ; NEXT_Y does the DEX, then assembles the BNE up to the first line after FOR_X above.
END_IF ; END_IF puts the branch distance in the BEQ assembled
; by the IF_NOT_ZERO above, so a branch taken goes to the RTS below.
RTS
;----------------
The three versions result in exactly the same machine code, but the program structures make it more intuitive what's happening.
Here's another one, my hex-to-decimal routine from http://6502.org/source/integers/hex2dec.htm:
HTD_IN: DFS 1 ; Input and output variables. DFS is DeFine Storage.
HTD_OUT: DFS 2 ; Output is low-byte-first.
TABLE: DWL 1, 2, 4, 8, 16H, 32H, 64H, 128H ; DWL is Define Word, Low byte first.
; +-----------------------+
; | ORIGINAL VERSION |
; +-----------------------+
HTD: SED ; Output gets added up in decimal.
STZ HTD_OUT ; Initialize output word as 0.
STZ HTD_OUT+1 ; (NMOS 6502 will need LDA#0, STA ...)
LDX #0EH ; $E is 14 for 2x7 bits. (0-7 is 8 positions.)
loop: ASL HTD_IN ; Look at next high bit. If it's 0,
BCC htd1 ; don't add anything to the output for this bit.
LDA HTD_OUT ; Otherwise get the running output sum
CLC
ADC TABLE,X ; and add the appropriate value for this bit
STA HTD_OUT ; from the table, and store the new sum.
LDA HTD_OUT+1 ; After low byte, do high byte.
ADC TABLE+1,X
STA HTD_OUT+1
htd1: DEX ; Go down to next bit value to loop again.
DEX
BPL loop ; If still not done, go back for another loop.
CLD
RTS
;----------------
; +-------------------------+
; | STRUCTURED VERSION |
; +-------------------------+
HTD: SED ; Output gets added up in decimal.
STZ HTD_OUT ; Initialize output word as 0.
STZ HTD_OUT+1 ; (NMOS 6502 will need LDA#0, STA ...)
LDX #0EH ; $E is 14 for 2x7 bits. (0-7 is 8 positions.)
BEGIN
ASL HTD_IN ; Look at next high bit. If it's 0,
IF_C_SET ; don't add anything to the output for this bit.
LDA HTD_OUT ; Otherwise get the running output sum
CLC
ADC TABLE,X ; and add the appropriate value for this bit
STA HTD_OUT ; from the table, and store the new sum.
LDA HTD_OUT+1 ; After low byte, do high byte.
ADC TABLE+1,X
STA HTD_OUT+1
END_IF
DEX ; Go down to next bit value to loop again.
DEX
UNTIL_NEG ; If still not done, go back for another loop.
CLD
RTS
;----------------
Or, a few lines shorter with FOR_X and NEXT_X:
HTD: SED ; Output gets added up in decimal.
STZ HTD_OUT ; Initialize output word as 0.
STZ HTD_OUT+1 ; (NMOS 6502 will need LDA#0, STA ...)
FOR_X 0EH, DOWN_TO, NEG_NRs ; $E is 14 for 2x7 bits. (0-7 is 8 positions.)
ASL HTD_IN ; Look at next high bit. If it's 0,
IF_C_SET ; don't add anything to the output for this bit.
LDA HTD_OUT ; Otherwise get the running output sum
CLC
ADC TABLE,X ; and add the appropriate value for this bit
STA HTD_OUT ; from the table, and store the new sum.
LDA HTD_OUT+1 ; After low byte, do high byte.
ADC TABLE+1,X
STA HTD_OUT+1
END_IF
DEX ; Go down to next bit value to loop again. Need two DEX's, so add one here.
NEXT_X ; If still not done, go back for another loop.
; In this case, NEXT_X will assemble a DEX, BPL up to the line with the ASL.
CLD
RTS
;----------------
Again, the three versions assemble exactly the same machine code. (One reader commented that since they assemble the
same machine code, it means the structure was already there before, just not visible.) Note that the local labels are gone.
I will be modeling some structures here partly after common Forth structures. I supply the macros for them, in the form needed by the C32 assembler, in the files STRUCMAC.ASM, STAKPUSH.ASM, STKPUSH2.ASM, STKPUSH3.ASM, STACKPOP.ASM, STAKPOP2.ASM, STAKPOP3.ASM, and STAKSWAP.ASM, or, all zipped together, STRUCMAC.ZIP, (named such because I did them in DOS). You can rename the structures after the equivalents in other languages if you wish. Keep in mind too that if any names clash with the names of assembler directives in your assembler, you will have to change the macro names.
The As65 assembler (written by BitWise on the
6502.org forum and
taken over by Bill Chatfield after Andrew's untimely death) has
structure capabilities without the user adding macros. His even automatically chooses branch versus jump instructions to get
the code compact in most cases but still able to make the jump when the distances exceed 127 bytes. Most assemblers don't have
the built-in structure capability, so I will continue here. [Edit, 1/1/13: Anton Treuenfels added the structures here to his
HXA 6502 assembler.]
CMP #14 IF_EQ ; clear enough that it really needs no comments <actions> <actions> <actions> END_IFNo label is needed. The IF_EQ lays down a BNE instruction to branch around the code if the Z flag in the status register is not set. It leaves the operand byte blank (or invalid) since it does not know yet how far the branch will be, but records the address of the operand so the END_IF macro can fill it in. END_IF records the address the next instruction will be at, sets the pointer ( * in some assemblers, $ in some) to what IF_EQ recorded, fills in the operand, then sets the pointer back to where assembly will be resumed. The internal details are shown in STRUCMAC.ASM.
Ok, so we said the address of the operand byte to be filled in will be "recorded." Where? It will be on a stack held in the assembler, which I'm calling the "macro stack." It will get explained here just a little, but I go into it further in chapter 17 of the 6502 stacks treatise, showing in more detail how stacks can be used to form nestable program structures during assembly or compilation.
Note: The operand of a forward branch will initially appear incorrect in the list file (usually as $FE), but will be corrected further down when the corresponding macro goes back to fill it in. It may even be wrong initially in the hex file, but if so, the hex file will come back to that address and overwrite it with the right value.
The next step would be to add an ELSE in
the IF...END_IF. The name may need to be changed slightly to keep it from
colliding with names of assembler directives; and in fact the C32 assembler does use ELSE in
conditional assembly, so I add the underscore for this macro, ELSE_, which should be
easy to remember since there's an underscore after IF and END above.
CMP #14
IF_EQ
<actions>
<actions>
<actions>
ELSE_
<actions>
<actions>
<actions>
END_IF
This time the IF_EQ lays down a BNE instruction to branch down to the first instruction
after the ELSE_ if the Z flag in the status register is not set. It leaves the operand byte blank since it
does not know yet how far the branch will be, but records the address of the operand so the ELSE_ macro can fill it
in when the assembler gets down to it.
Similarly, ELSE_ lays down a BRA instruction to unconditionally branch down to the first instruction after the END_IF. It leaves the operand byte blank since it does not know yet how far the branch will be, but records the address of the operand so the END_IF macro can fill it in when the assembler gets down there. END_IF records the address the next instruction will be at, sets the pointer to what IF_EQ recorded, fills in the operand, then sets the pointer back to where assembly will be resumed.
Whether you use ELSE_ or not, END_IF only fills in the operand of a previous branch
instruction. It does not lay down any additional code.
BEGIN <actions> <actions> <actions> AGAINThis sets up an endless loop, with the last instruction being a branch back to the beginning of the loop. BEGIN only records the address of the top of the loop so that AGAIN can figure out the correct operand to use in a BRA (Branch Relative Always) or JMP instruction to make the loop repeat again. The way out of this kind of structure is often an RTS taken under a certain condition somewhere inside the loop. (In at least one language, AGAIN is called FOREVER, which I'm not fond of because the life of a computer is an insignificant speck in the span of forever.) Notice again that no labels are needed, and the loop stands out clearly.
BTW, I do recommend that each level of indenting be at least three spaces. Using only one especially
makes it look like you meant to align things vertically and just got sloppy. It's harder to see the structure.
BEGIN <actions> <actions> <actions> WHILE_<condition> <actions> <actions> <actions> REPEATIt begins the loop with some pre-processing, and continues WHILE the given condition is still met (WHILE_EQ, WHILE_NEG, etc.), otherwise branches to the first instruction after the REPEAT, ie, after the end of the structure. If the WHILE condition is still being met, the instructions in the last half of the structure are executed, and the REPEAT assembles a BRA or JMP to send the program counter back up to the top of the loop. Obviously the BEGIN has to record the address there so the REPEAT macro knows what operand to put in the BRA or JMP instruction. Also, the WHILE macro needs to record the address of the branch instruction it assembles so that the REPEAT macro can fill it in the operand.
The WHILE part could be made to take on as many conditions as you like. The condition could be
a macro parameter to use with conditional assembly to lay down the right branch instruction (BNE,
BMI, etc.). For most situations, I've taken the route of forming separate macros
for WHILE_NEG, WHILE_C_SET, WHILE_EQ, etc.; but
I do have WHILE_BIT , so you can have for example:
WHILE_BIT VIA3PA, 4, IS_LOW
for the condition in this example to be that VIA3's Port A's bit 4 is low.
A planned future addition is to add the ability to have more than
one WHILE between BEGIN and REPEAT.
BEGIN <actions> <actions> <actions> UNTIL_<condition>It is similar to BEGIN...AGAIN, but it lets execution drop out of the loop when the condition is met. UNTIL_EQ for example assembles BNE ___ to go back to the top of the loop. UNTIL_MINUS assembles BPL ___ to the top of the loop, and so on.
CASE ACCUM ; Test the accumulator against the following cases. CASE_OF $0A ; In the case of it containing the linefeed character, <actions> ; execute these instructions, <actions> END_OF ; then jump to the first instruction after END_CASE. CASE_OF $0D ; If it has the carriage-return character, <actions> ; execute these instructions, <actions> END_OF ; then jump to the first instruction after END_CASE. CASE_OF $08 ; If it has the backspace character, <actions> ; execute these instructions, <actions> END_OF ; then jump to the first instruction after END_CASE. <actions> ; If the character is anything else, do these default <actions> ; actions to feed it to the display as display data. END_CASECASE_OF $0A above assembles CMP #$0A, BNE ___, with the BNE operand invalid until the corresponding END_OF fills it in, making the BNE to branch down to the next part which is to see if the accumulator has the carriage-return character, $0D. All the END_OFs also assemble a JMP down to the code just after the END_CASE, and leave a record of where they are so the END_CASE macro can fill in their operands, without requiring a second pass, and without requiring labels.
Internally, the CASE structure here is basically the same as a series of IFs and ELSEs. This is not always true of higher-level languages. In my 65816 Forth, the set of CASE words is both faster and more memory-efficient than IFs and ELSEs. Regardless, when appropriate, the CASE statement is still more clear to look at in the source code than a deeply nested series of IFs and ELSEs.
Note that the code at the end of the structure in the example above gets run if none of the watched-for cases exist. It has been suggested that a do-nothing DEFAULT line of code be put above them to say so; but I would say that the indentation (or lack of it) should tell. If you do use a DEFAULT line, I would recommend also ending the default section with END_DEFAULT, and indenting the code between the two.
Note also that code can be put between any END_OF and the following CASE_OF. You may for example want to take some kind of action if it's neither case A nor case B, regardless of whether it will later be found to be case C, D, or E (or none of the above). In other words, defaults don't have to go at the end, and you can have multiple default sections.
I originally made the END_OFs assemble BRA instructions down to the END_CASE, but the branch distance was sometimes too far if the CASE structure was a long one, so I had to change them to JMPs. Another possible slight inefficiency is that the last END_OF also had the jump to the end, when it might effectively become a 3-byte, 3-clock NOP if there are no instructions like are shown above in the last two lines before the END_CASE. IOW, it would just jump to the next instruction anyway. So I added an END_OF_ (note the trailing _) alternate version which eliminates the no-longer-needed GOTO <END_CASE> for times that it is immediately preceded by an unconditional or is immediately followed by END_CASE anyway.
If the cases were consecutive numbers, and especially if there were a lot of them, it would be much faster and more memory-efficient to use a jump table instead. A jump table is just a list of addresses. It has no op codes in it. A short routine would make sure the input is valid, then if you have at least a 65c02, double the input with ASL, transfer it to X (with TAX), and use JMP(table,X). The NMOS 6502 does not have that addressing mode, so you might have to use self-modifying code to do it. It does have JMP(addr), but not JMP(addr,X) like the 65c02 has.
9/22/14: The maximum number of cases for a CASE statement was increased from 10 to 16 in the 6502
code. (I had done it earlier in the PIC code.)
10/1/15: there's more explanation of how the insides of the CASE statement are formed in the assembler
in section 17 of the stacks treatise, "Forming nestable
program structures," about 90% of the way down the page. You can mouse over each line and get an explanation of what
the assembler does on that line.
Initial index values for either X or Y can be: * pre-existing accumulator contents (specifying "ACCUM") This makes FOR_X or FOR_Y assemble a TAX or TAY. * pre-existing X-register contents (specifying "X_REG") This makes FOR_X lay down no code at all (only mark the address of the top of the loop for NEXT_X to branch to); but it makes FOR_Y assemble PHX, PLY. * pre-existing Y-register contents (specifying "Y_REG") This makes FOR_Y lay down no code at all (only mark the address of the top of the loop for NEXT_Y to branch to); but it makes FOR_X assemble PHY, PLX. * a specified constant between 0 and $FF inclusive This makes FOR_X assemble an LDX# and makes FOR_Y assemble an LDY#. You can: * count down one at a time (by specifying "DOWN_TO") This makes NEXT_X or NEXT_Y assemble DEX or DEY before the conditional branch to the top of the loop. * count up one at a time (by specifying "UP_TO") This makes NEXT_X or NEXT_Y assemble INX or INY before the conditional branch to the top of the loop. If you want two at a time, you would have to precede the NEXT_X or NEXT_Y with an extra INX/INY/DEX/DEY. For other step sizes, you can of course alter X or Y inside the loop. The limit (ie, target count) can be: * a specified constant between 0 and $FF inclusive. This makes NEXT_X or NEXT_Y assemble CPX# or CPY# between the INX/DEX/INY/DEY and the conditional branch instruction. If the limit is 0, the CPX #0 or CPY #0 will be skipped since it is already automatically implied in the INX or DEX. * the contents of a non-ZP variable above $102 (since $101 and $102 are the numerical representation for NEG_NRs and POS_NRs). This makes NEXT_X or NEXT_Y assemble a CPX or CPY abs between the INX/DEX/INY/DEY and the conditional branch instruction. * or you can specify that it loop until the index becomes negative or positive (watching bit 7) by specifying: UP_TO, NEG_NRs. This makes NEX_X or NEXT_Y assemble INX/INY, BPL <top_of_loop>. DOWN_TO, NEG_NRs. This makes NEX_X or NEXT_Y assemble DEX/DEY, BPL <top_of_loop>. UP_TO, POS_NRs. This makes NEX_X or NEXT_Y assemble INX/INY, BMI <top_of_loop>. DOWN_TO, POS_NRs. This makes NEX_X or NEXT_Y assemble DEX/DEY, BMI <top_of_loop>.The limitations of FOR_X...NEXT_X and FOR_Y...NEXT_Y are:
FOR_X...NEXT_X and FOR_Y...NEXT_Y can be nested, unlike FOR...NEXT for 2-byte variables further down which allow looping 65536 times with a single loop structure. (Nesting will be discussed in a minute.)
LEAVE_LOOP could be implemented, but the complexity is probably not justified considering the rare need. I'm leaving it out for now, and if there's a need, you could use a BEGIN...WHILE...REPEAT instead, or handle it in more-conventional ways, like a branch instruction to a label after the loop. Otherwise, what you could do is use an additional stack level, and have FOR_X or FOR_Y initialize it as 0. Then if there's a LEAVE_LOOP, it would store the address of its branch instruction in that stack cell, and NEXT_X or NEXT_Y would test it to see if that cell is non-0 and fill it in with a branch to the end if so. You would have to be careful not to put the LEAVE_LOOP inside another structure that might be using the macro structure stack. Also, allowing more than one LEAVE_LOOP would complicate things further. And as always, "compiler" security is up the to programmer.
The number of clock cycles taken for a loop which loads its own index (call it "N") into X or Y and decrements it to 0 is:
2 for loading X or Y immediate (Omit this if you're starting with what was already there.) + N * loop_contents your code in the loop, plus the 2 clocks for DEX or DEY, meaning an empty loop still has N * 2. + (N-1) * 3 for BNE top_of_loop. The 3 turns to 4 if the loop straddles a page boundary. (Usually it won't.) + 2 for final BNE that does not branch.So for:
FOR_X 8, DOWN_TO, 0 NEXT_Xyou have 2 + 16 + 21 + 2 = 41 clocks. (The PIC16 takes 100 to do the same thing.)
FOR var1, 1, TO, 5000 ; (Loop 5,000 times. C32 requires commas between parameters.) <actions> <actions> <actions> NEXT var1For the 65816 which has 16-bit index registers, doing something 5,000 times as shown above becomes as efficient as the 6502 handles numbers under 256.
One way around it is to use repeats of the same structure macros with names which differ only slightly, like IF_EQ, IF_EQ., IF_EQ.., etc. (note the different numbers of dots after the ends), each one using its own assembler variables. Then you just have to make sure you use the right one. There's a better way.
Ideally the addresses would go on a stack, so you could nest structures all you want; but unfortunately assemblers don't usually let you have a variable array and provide a way to do indexing into the array so you can synthesize a stack. There's a way around it. It takes an awful lot of lines in the macros, but fortunately these extra lines do not actually lay down any machine code. The voluminous macro code is only a problem if you don't keep the macros in a separate INCLude file, or if you want to print the list code (although your assembler might let you turn the listing off and on), or if disc or memory space is limited (which is unlikely in today's PCs!)
Here's the idea, illustrated to five stack levels. (In reality you'll probably want more, to make sure you don't run
out.) Here's how you would add a cell to the stack:
STK_LVL_5: SETL STK_LVL_4 ; SETL stands for "SET Label value" in the C32
STK_LVL_4: SETL STK_LVL_3 ; assembler, and you can do it as many times as
STK_LVL_3: SETL STK_LVL_2 ; you want for any given assembler variable,
STK_LVL_2: SETL STK_LVL_1 ; unlike EQU which only allows defining one time.
<now assign the desired value to STK_LVL_1 as the top-of-stack>
and to pop a level off the stack, do:
STK_LVL_1: SETL STK_LVL_2 ; STK_LVL_1 is always the top of the stack, regardless of depth.
STK_LVL_2: SETL STK_LVL_3
STK_LVL_3: SETL STK_LVL_4
STK_LVL_4: SETL STK_LVL_5
This, carried to 20 levels (increased from 16 when I added the
FOR_X...NEXT_X and FOR_Y...NEXT_Y since
these take three levels per structure), is what is in my INCLude
files STAKPUSH.ASM and STACKPOP.ASM.
STAKPOP2.ASM pops two levels off at once, and STAKPOP3.ASM pops three levels off at once.
STKPUSH2.ASM and STKPUSH3.ASM push two and three cells on the stack, respectively.
Repetition of the lengthy process in the list file can be avoided by turning the listing off and
on for this portion, which in the C32 assembler is done by bracketing the portion
with LIST "OFF" and LIST "ON".
If your assembler allows nested macros, put the stack push and pop in macros that get called by the structure macros, just to keep the structure macro source code shorter. If it does not allow nested macros, you might still be able to have the push and pop code in a separate .ASM file that you can bring it in at the appropriate places with the INCL (include) directive. This is what I ended up doing here.
Unfortunately I ran into another little problem with C32, which is that if you have an INCL line in a macro, the assembler doesn't do the INCLuding until after the macro is done. All I had to do to get around it was that when I wanted to deepen the macro stack and add a cell, I just did TO_PUSH: SETL ___ and then put STK_LVL_1: SETL TO_PUSH at the end of STAKPUSH.ASM. (This did not affect the machine code output.)
BTW, my file names are limited to 8 letters since I still do a few applications in DOS, with a 132-column, 60-line monitor, point-and-click interface, and I've had up to 34 files open at once, with all kinds of windowing and tiling. This web page was also done with that, since my DOS-based text editor is far better than any I've seen for a GUI.
The top stack level is always STK_LVL_1, regardless of how many levels under it are being used, and you address it as such with no concern for how deeply nested you have your structures.
Now you can do for example:
CMP #14
IF_EQ
<actions>
<actions>
IF_NEG
<actions>
<actions>
IF_EQ
<actions>
<actions>
END_IF
END_IF
<actions>
<actions>
END_IF
and the structure macros will all keep to themselves and not step on each other's variables.
The CASE structure, if made nestable, increases the assembler macro stack operations' complexity far more than the other structures do, especially since there will be different numbers of cases to handle. Fortunately, there's almost never any need to nest CASE statements! For these reasons, I decided to make the CASE structure non-nestable here. It can be nested with non-CASE statements, but one CASE statement cannot be nested inside another CASE statement. The non-nestability also goes for the FOR...NEXT macros provided here (without the _X or _Y) as indicated above.
I have a 6502-oriented treatise on stacks (plural—not just the page-1 hardware stack) here which starts out with the definition and very basics, but then gets into deeper applications, including how to use stacks in the forming of program structures in assembly and compiled languages, doing the nesting as well as going further into compiler security which is discussed briefly below.
Each structure-starting macro would put its compiler security number on the stack, then the matching words check that number on the stack to make sure it's the right one. (Again, this all happens in the assembler's processing, and has no effect on the machine code output except to help catch human errors.) In my Forth kernel, BEGIN gets a 1, IF gets a 2, DO gets a 3, CASE gets a 6, OF gets a 7, etc.. The nature of the stack makes it all nestable, so nested structures are fine, but they have to be completed before finishing up in structure levels that are further out. I have elected not to implement it here. Since assembly language requires one instruction per line, it should be easy enough to keep things straight and matched by using indentation and vertical alignment appropriately.
I am mostly using branch instructions which are limited to hops of -128 to +127 bytes. If you want longer hops,
you will have to modify the code to use the JMP instruction, sometimes making it longer because of conditional
branches around the JMPs. I find I don't normally exceed the branch instructions' branch distance
limitation though. The structure that's likely to be longest might be a long CASE structure, and the
first END_OF has to branch clear down past a lot of other cases to the END_CASE,
so I did use JMPs for that.
There is no reason you can't make other structures or modify the accompanying ones to suit your purposes. This is by no means an exhaustive list. The following macros are defined in STRUCMAC.ASM (and BTW, I have a similar set for PIC16, here):
IF_EQ (using Z flag) BEGIN IF_ZERO (using Z flag) IF_NEQ (using Z flag) WHILE_EQ (using Z flag) IF_NOT_ZERO (using Z flag) WHILE_NEQ (using Z flag) IF_PLUS (using N flag) WHILE_ZERO (using Z flag) IF_MINUS (using N flag) WHILE_NOT_ZERO (using Z flag) IF_NEG (using N flag) WHILE_PLUS (using N flag) IF_C_SET (using C flag) WHILE_MINUS (using N flag) IF_C_CLR (using C flag) WHILE_NEG (using N flag) IF_GE (using C flag) WHILE_C_CLR (using C flag) IF_LT (using C flag) WHILE_C_SET (using C flag) IF_V_SET (using V flag) WHILE_GE (using C flag) IF_V_CLR (using V flag) WHILE_LT (using C flag) IF_FLAG_VAR (added May 2013) WHILE_V_CLR (using V flag) IF_BIT (added May 2013) WHILE_V_SET (using V flag) IF_MEM_BYTE_NEG (added May 2013) WHILE_BIT (added May 2013) IF_MEM_BYTE_POS (added May 2013) REPEAT ELSE_ AGAIN END_IF UNTIL_EQ (using Z flag) CASE (using A, X, or Y) UNTIL_ZERO (using Z flag) CASE_OF (using A, X, or Y) UNTIL_NEQ (using Z flag) END_OF UNTIL_NOT_ZERO (using Z flag) END_CASE UNTIL_PLUS (using N flag) UNTIL_MINUS (using N flag) UNTIL_NEG (using N flag) FOR (16-bit. Overwrites A) UNTIL_C_CLR (using C flag) NEXT (16-bit. Overwrites A) UNTIL_C_SET (using C flag) UNTIL_GE (using C flag) FOR_X (added May 2013) UNTIL_LT (using C flag) NEXT_X (added May 2013) UNTIL_V_CLR (using V flag) FOR_Y (added May 2013) UNTIL_V_SET (using V flag) NEXT_Y (added May 2013) UNTIL_BIT (added May 2013)In May 2013 I added a group of accessory macros I find useful:
RTS_IF_EQ RTS if Z flag is set RTS_IF_NEQ RTS if Z flag is clear RTS_IF_PLUS RTS if N flag is clear RTS_IF_MINUS RTS if N flag is set RTS_IF_FLAG_VAR RTS on a flag variable's condition. The variable name and target condition are given in the parameter list. RTS_IF_BIT RTS if the specified bit of the specified byte in memory meets the target condition. These are given in the parameter list. RTS_IF_MEM_LOC RTS if the value in the specified memory location is positive | negative | zero | non-zero, again per the parameter list.
Labels C32 requires labels to start with a letter in the range of A-Z, or "_", ".", or "?". Individual characters within a label can come from the same set or 0-9. Unfortunately it's not case-sensitive. If the first character of the label is not in column 1 of the line, the label must be followed immediately by a colon. Putting a colon in after every label is a good practice anyway, as it makes searches for actual labels (as opposed to references to the label) much quicker. A label can be any length, and it can stand alone on a line. $ is the current program counter value. It can be used in labels or expressions for them or for operands, etc.. Some assemblers use * instead. EQU EQUate. Sets an assembler constant. It cannot be changed after its first definition. SETL SET Label value. It's basically a variable for the assembler itself, in that its value can be changed countless times after it is defined. It always requires a label and an expression to assign the value to the label. DFB DeFine Byte(s). Lay the following bytes down in the code. If there are two or more, they are separated by commas, except of course when they are consecutive bytes in a quoted string. Some assemblers use .DB , .BYTE , or simply BYTE. DFS DeFine Storage. Skip the specified number of bytes, with the beginning of the block taking on the label. Used for variables in the target system. The 2500AD assembler uses BLKB for "define a BLocK of Bytes." ORG ORiGin. Force the program counter to the desired value. We have to do this a lot in the program-structure macros, to go back to earlier places and fill in addresses and forward branch distances after they become known, then go forward again ("back to the future" :) ) to pick up assembly at the end of the previously laid code. All the back-and-forth makes for varying line lengths in C32's Intel Hex output, but it's not a problem. IF For conditional assembly. If the expression following it comes to something other than 0, do the following lines. If it's 0, the assembler ignores them, down to the ELSE or ENDI, or anything that evaluates to those (like the END_COMMENT macro). To keep the IF assembler directive separate from the structure macro names, I've put in conditions on the latter, referring to flags to branch on, like IF_EQ, IF_C_SET, etc.. ELSE Optional. If the "IF" portion was false, pick up the assembly process here. To keep the assembler directive separate from the structure macro name, I've called the latter ELSE_ (note the underscore at the end, and don't forget to put it in). ENDI Regardless of the outcome of the "IF" above, assembly will definitely be back on after this point. The macro name of the end of the IF_xx structure is END_IF . INCL INCLude. Bring another source-code file in at this point. You can have various files that are used as modules for many different projects, and it is not necessary to show all that code again in every one. Added flexibility might be exercised with the "IF" directive, ie, that you assemble the other file at this point if certain conditions are true. INCL can be nested; ie, an INCLuded file can have INCLude directives to bring in other files, and so on. A file can be brought in with INCL at many different points. MACRO Begins a macro definition. MACRO must always follow a label. MACRO can be followed by parameter expressions, as many as you want, as long as they all fit on the one line. If there are two or more parameters, they must be separated by commas. ENDM signals the end of the macro definition. operators in expressions (There are others, but they're either obvious or not used here): == equal to != not equal to $ value in program counter. Some assemblers use the * . & bitwise AND X >> Y Shift X right by the number of bits specified by Y.You can have as many parameters in the macro definition as you want, but then C32 requires that the macro call have the same number of parameters. One thing I liked about the 2500AD assembler was that the number of parameters didn't have to match, and you could say in effect, "If there's a fourth parameter, do this with it; and if there's a fifth one, do that with it..." Macro parameters can be expressions. Macro parameters, both in the macro definition and in macro calls, must be separated by commas if there are two or more. C32 does not allow the nesting of macros. Some assemblers do. 10/6/23: Anton Grigorev adapted these for the DASM assembler.
BRA: Apologies—if you're using an NMOS 6502 which does not have the
Branch-Relative-Always instruction, you will have to modify the accompanying code. Unless you're using something like the
Commodore 64 whose 6510 processor never was available in CMOS, I would encourage switching to the 65c02 which
has a lot of advantages over the NMOS 6502.
Structures seldom need branch distances that cannot be achieved with branch instructions, so the JMP is seldom used here. In the rare case that you think you're getting close to the maximum reach of branches, you might need to check in the list file. The forward branches in the structure macros will not give error messages if you try to branch more that 127 bytes away.
I suspect changes will have to be made to implement the idea if a linker is used. If you have ideas or knowledge about that, email me.
I'm sure there are still more and greater techniques that could be carried out with any good macroassembler, techniques that
we still haven't thought of, even at this late date.
STRUCMAC.ASM program-structure macros |
STAKPUSH.ASM include file |
STKPUSH2.ASM include file |
STKPUSH3.ASM include file |
STACKPOP.ASM include file |
STAKPOP2.ASM include file |
STAKPOP3.ASM include file |
STAKSWAP.ASM include file |
STRUCMAC.ZIP, all .ASM files zipped
Related: PIC_stru_MAC.ASM file for PIC16
last updated Mar 24, 2024 (New macros added 5/11/13, and, to the PIC code, 5/5/14.
Max nr of cases increased in 6502 code 9/22/14.) contact: Garth Wilson,
wilsonminesBdslextremeBcom (replacing the B's with @ and .)