home   |   the links mine   |   6502 primer   |   large math look-up tables   |   simple multitask   |   program-structure macros   |   6502 interrupts   |   0-overhead Forth interrupts   |   RS-232 primer   |   assembly relevant today   |   NMOS-CMOS 6502 differences   |   6502 stacks treatise   |   workbench computers   |   self-modifying code   |   65816 misunderstandings


STRUCMAC.ASM program-structure macros   |   STAKPUSH.ASM include file   |   STKPUSH2.ASM include file   |   STKPUSH3.ASM include file   |   STACKPOP.ASM include file   |   STAKPOP2.ASM include file   |   STAKPOP3.ASM include file   |   STAKSWAP.ASM include file   |   STRUCMAC.ZIP, all .ASM files zipped
Related: PIC_stru_MAC.ASM file for PIC16



Macros:  more clarity, better control, fewer bugs, with no memory or performance penalty.

Program structures with macros:  ditto!


Note: Principles here can be used on any assembler with macro capability, but this is primarily aimed at the 65c02 microprocessor.  I'm using the C32 assembler (Cross-32) from Universal Cross Assemblers, now distributed by Data Sync Engineering and MPE Forth.  (Edit, late 2023:  C32 seems to have become unavailable, at least from the sources I know; so I'm trying to find out where one can get it, or if I can distribute it myself.)  The one assembler is good for lots of processors.  It is down to $99 now, but there's a list of free 6502/c02/816 assemblers here and here, and I have a section on assemblers on my links page, here.  BDD has macros to assemble 65816 code on Kowalski's 65(c)02 macro assembler, here (although since he wrote that, Daryl Rictor, "8BIT" on the 6502.org forum, has extended the Kowalski assembler to work for the '816 also).  (A related article on this site is "Assembly Language:  Still Relevant Today.")

Expanded 5/11/13.  This material is subject to constant improvement.  I have tested these 65c02 macros but have not used them extensively yet like I have the corresponding ones for PIC16 microcontrollers on Microchip's MPASM assembler.  I've done a few large (for PIC16) projects using them.  What a pleasure to get away from the spaghetti that's so typical of assembly!  The code to do it on PIC16 is hereOn 5/5/14 I added alternate versions of CASE_OF and END_OF to the PIC code.  They save an instruction every time they can be used (although PIC16's new CASE_OF is still not as efficient as the 6502's old one.)  See the notes at their source code for the frequent conditions under which they can be used in place of the original version.  I have not added these to the 6502 code yet.  The next ambition is the ability to have multiple WHILEs between BEGIN and REPEAT.

1/12/18:  I just found out Dave Keenan did the same kind of thing for the IAR assembler and the MSP430 processor.  We were unaware of each other's work until now.

10/6/23:  Also, Anton Grigorev adapted these for the DASM assembler.




On this page:


"Beauty is more important in computing than anywhere else in technology because software is so complicated.  Beauty is the ultimate defense against complexity."  —David Gelernter
"Good programmers know what's beautiful and bad ones don't."  —David Gelernter


First, what is an assembly-language macro?

As you write an assembly-language program, you may see repeating patterns.  If it's exactly the same all the time, you can make it a subroutine.  That incurs a 12-clock performance penalty for the subroutine call (JSR) and return (RTS), but program memory is saved because the code for the subroutine is not repeated over and over.

There will be other times however where the repeating pattern is the same but internal details are not, so you can't just use a JSR.  The differences from one occurrence to another might be an operand, a string or other data, an address, a condition, etc..  It would be helpful to be able to tell the assembler, "Do this sequence here; except when you get down to this part, substitute-in such-and-such," or, "under such-and-such condition, assemble this alternate code."  That's where it's time for a macro.  "White Flame" on the 6502.org forum wrote, "Macros are assembly-time function calls, whose return value is source code."

The repeating, possibly messy-looking sequences that clutter your code can be replaced with a macro call that takes a single line each time, optionally with parameters.  Since you write the macro (or at least you can edit it if you want to, even if someone else wrote it), you have complete control of every bit of machine code that gets laid down.  After the internal details have been ironed out, you shouldn't have to keep being bothered with them.  If you can hide them with macros, you can see the big picture more clearly, have better control of the project, have fewer bugs, and become more productive without losing any performance or taking more memory.

A macro may replace a piece of assembly-language code as short as a line or two, to give more clarity to what is being done there.  An example of a single-line macro is where you want to replace the 65816's cryptic REP and SEP instructions:



INDEX_16: MACRO               ; Make index registers X & Y to be 16-bit.
          REP    #00010000B
 ;        NOP                 ; NOP was necessary for early versions
          ENDM                ; of '802/'816 >4MHz.
 ;-------------------

INDEX_8:  MACRO               ; Make index registers X & Y to be 8-bit.
          SEP    #00010000B
 ;        NOP                 ; NOP was necessary for early versions
          ENDM                ; of '802/'816 >4MHz.
 ;-------------------


INDEX_16 above is far more clear than REP #00010000B which it replaces, yet it lays down exactly the same machine code, C2 10, which takes 3 clocks' execution time at runtime.

Here's another short but useful one (and you can modify it to get related ones), for branching more than half a page away.



BEQlong: MACRO  LBL
         BNE    bel1
         JMP    LBL
 bel1:
         ENDM
 ;-------------------


The JMP makes it no longer relative but absolute, but most applications can use it.  The 65816 has a BRL (Branch Relative Long, or BRanch Long) instruction, but doing the same kind of thing on the 6502 for relocatable code requires more steps.  (You can store the offset, and then, since JSR puts the current address on the stack, the subroutine can add the offset to it before the RTS.)

With the macro above, if you want to do a BEQ to someplace more than half a page away, you can do for example:



         BEQlong  FOOBAR


and it will assemble:


         BNE   3        ; ie, to the instruction after the 3-byte JMP instruction
         JMP   FOOBAR
         <continue>


A macro doesn't necessarily have to lay down any code at all.  I use a couple for paragraphs of comments.  Although you can comment any line by putting a ; in front of it, this becomes a pain if you find you need to insert or delete a few words and adjust all the lines after it.  Why not do:


  COMMENT
      Every flight or checklist is a file.  There is no directory.  The computer
      can find a file by starting at the first file in memory (which starts at
      FIRST_FL_ADR), calculating the address of the 2nd file by using the file-
      length bytes of the first file, etc..  There's an ENDRAM byte after the
      END_OF_FL byte of the last file.  The address of the ENDRAM byte is also
      stored in ENDRAM_ADR.  The number of files is stored in NR_FLs.  All the
      RAM used by the system is at lower addresses than the file chain.
  END_COMMENT


Besides making it easier to re-format the paragraph after changes (<alt>R in the MultiEdit text editor even preserves the left margin), it looks nicer.  Just be sure no line starts with something that the assembler interprets as the ENDIF.

Here's how to do it.  IF 0 is a condition that will never be met, so the assembler skips to the ENDIF (or actually ENDI in the C32 assembler I used the portion below on).



COMMENT: MACRO       ; COMMENT and END_COMMENT here relieve us from the load
         IF 0        ; of semicolons where we have many consecutive lines of
         ENDM        ; comments.  Since the IF is looking for an ELSE or ENDI
 ;----------------   ; (either cap.s or lower case), be sure none of the lines
                     ; commented-out start with one of these words that could
END_COMMENT: MACRO   ; fool it.  If there is, that line will still need a ; .
         ENDI        ; Also, if a line starts with a macro name which is
         ENDM        ; followed by illegal parameters for that macro (as to
 ;----------------   ; discuss it), you will still need the ; .


Of course if your assembler already has a COMMENT directive (like the 2500AD one does), you won't need this one.  Not all do however, and the above still shows what can be done.


Here's an only slightly more complex example, where we want the computer to display an immediate string and wait for the response:



        DISPLAY_IMM   "Press CONTINUE when ready"
        WAIT_FOR_KEY  CONT_KEY


where the macro DISPLAY_IMM is defined as:


DISPLAY_IMM: MACRO STR
        JSR   DISP_QUOTE
        BYTE  dim2#-dim1#   ; Lay down the string length byte,
 dim1#: BYTE  STR           ; followed by the string.  (Counted string, not nul-terminated.)
 dim2#:                     ; (Must not put ENDM on same line with the label.)
        ENDM
 ;------------------


(This is from an application I did with the 2500AD assembler which used the # at the end of a label to mean it's local to the macro.)  It assembles:


        JSR   DISP_QUOTE
        BYTE  25
        BYTE  "Press CONTINUE when ready"


The DISP_QUOTE subroutine looks at the return address to get the string length, then continues that far to get the string it's supposed to display, then also uses the length byte to adjust the return address on the 6502's stack so the RTS takes it to the first instruction after the string instead of trying to execute data.

The WAIT_FOR_KEY macro used above is defined as:



WAIT_FOR_KEY:  MACRO  KEY
 wfk1#:  JSR   SCAN_KEYPAD
         CMP   #KEY
         BNE   wfk1#
         ENDM
 ;------------------


and will assemble:


        JSR   SCAN_KEYPAD
        CMP   #CONT_KEY
        BNE   $F9           ; Branch back to the JSR.


Macros can make a routine take fewer pages and make it easier to wrap your head around it.  A piece from a routine goes as follows:


        ; Initialize software flags
        LDA   #0
        STA   ICNT
        STA   IHEAD
        STA   ITAIL
        STA   OCNT
        STA   OHEAD
        STA   OTAIL
        STA   OIE


One thing I liked about the 2500AD assembler is that it allowed a variable number of parameters in the macro call, which was nice for setting or clearing a list of flag variables that won't always have the same number of them listed.  Hence the above could be replaced with:


        CLR_FLAG    ICNT, IHEAD, ITAIL, OCNT, OHEAD, OTAIL, OIE       ; Init the software flags.


and lay down exactly the same code.  (Of course with the 65c02 you could make it eliminate the LDA #0 and use STZ instead of STA.)  There are seven flags to clear, so you would need at least that many parts in the macro.  In the 2500AD assembler, IFMA 5 for example meant "If there's a 5th macro parameter, do this part."  In this case IFMA 8 would be the first one that would not lay down any code, since there's no 8th flag listed to clear.  If your assembler does not have that capability, you might have to just pad the unused parameter positions with 0's in order to accomplish the above.

You can use macros to simplify countless things in your code.  Take something as mundane as copying the value of one two-byte variable to another, normally done this way when not using macros:



        LDA  ACCb
        STA  KEY_TIME
        LDA  ACCb + 1
        STA  KEY_TIME + 1


Why not shorten it to:


        COPY2  ACCb, TO, KEY_TIME


where COPY2 is defined as:


COPY2:  MACRO  variable1, preposition, variable2
        LDA  variable1
        STA  variable2
        LDA  variable1 + 1
        STA  variable2 + 1
        ENDM
 ;------------------


Here preposition is not actually used by the macro.  It's just there to make the line more English-like, and I define TO as TO: EQU 0 since the assembler does want it defined.  Note again that using the macro does not change the resulting code run by the 6502.  It's the same.  The extra work is handled at assembly time, not at run time.



You know exactly what code gets produced.  If there were conditionals and you wanted to see what code resulted, you could look at the assembler's .LST (list) output file which shows the actual addresses along the left edge, followed by the actual bytes of op codes, operands, and data, all to the left of the corresponding lines of source code.  If you assign constants and assembler variables, the list file will show the exact numeric values that resulted from those too.

At the rare extreme, a macro can replace even pages of code.  Before I did the program-structure macros, I think my longest macro was 54 lines, with much of that being conditionals, and the macro actually assembled only two to seven instructions depending on the conditions which in this case were set in the parameters in the line calling the macro.

A macro, when defined, is only kept in the assembler.  It is like a subroutine for the assembler itself to run.  No machine code is produced at the time the macro is defined; so unused macros take no memory at all in the final program.  When you do call the macro later in your code, the assembler itself will execute that "assembler subroutine" and generate machine code at that time, per the parameters given in the macro call, and put it where the program pointer indicates.  If you look at the resulting .LST (list) file, you can see the code in the macro expanded out at each point the macro is invoked.  It can have conditionals and so on, just like non-macro code.

Since a macro expansion can take varying amounts of space in the output machine code, the macro must be defined before it is called.  Forward references to macros are not allowed.  Forward references in JSR's are ok because the entire JSR instruction is always the same length; but the a macro may take differing amounts of code space from one call to the next because of things like conditional assembly in the macro and conditions being different each time, varying lengths of text strings, etc..  The assembler would have no idea how much address space to reserve to expand a macro that is not defined yet.



Answering objections

You may hear, or have, various objections to using macros.  All the objections I've ever heard are valid only when the macros are poorly written, poorly named, or improperly used.  These are alluded to above; but I'll explain further here.

Myth #1:  "Macros breed inefficiency."
An example given in a forum post regarded having a macro MOV that would put a value in an address:



        MOV  #0, $D020
        MOV  #0, $D021


(although none of the assemblers I've used would parse the "#" in front of a macro parameter), where the example would lay down an LDA #0 twice when once is enough.  One way to handle it is that if your assembler allows a variable number of parameters, you could write the macro to handle for example (and I'll use PUT to handle the problem of immediates rather than COPYing one memory location to another),


        PUT  0, in, $D020, and, D021


or, since the two addresses are contiguous, my preference, have a separate macro "PUT2" for two-byte operations,


        PUT2  0, in, $D020


where PUT2 views the first parameter as a 16-bit quantity, and if the high byte and low byte are the same, it does not reload it.  Otherwise, the low byte of your specified 16-bit number goes in address $D020 and the high byte in $D021.

Now suppose the accumulator is not available but Y is.  You could either use a separate macro for that case; or if your assembler allows varying numbers of macro parameters, specify what you want in the invocation something like this:



        PUT2  0, in, $D020, using_Y


where "using_Y" is an EQUate, and the macro tests that parameter, and if it's absent, it would use the accumulator, or if it's this one it would use Y, and if it's "using_X," it would use X.  The EQUates for X and Y might be 1 and 2, or anything else you want them to be, as long as you write the macro to interpret them correctly.  In the case of the 0 above, the 65c02 (ie, CMOS) can use STZ and then you don't need an LDA, LDX, or LDY at all.  The way to write that for the C32 assembler would be:


PUT2:   MACRO  num, preposition, addr
        IF  num != 0
            IF  {num & FF} != {num >> 8}
                LDA  #num & $FF
                STA  addr
                LDA  #{num >> 8}
                STA  addr + 1
            ELSE
                LDA  #num & $FF
                STA  addr
                STA  addr + 1
            ENDI
        ELSE
            STZ  addr
            STZ  addr + 1            
        ENDI
        ENDM
 ;-------------


which covers all the conditions to get the desired results with the fewest possible machine-language instructions, whether two, three, or four instructions.  "preposition" is just a dummy equate that does not actually get used by the macro.  It's only there to make the line more English-like, as in "PUT2  $28BE, in, FOOBAR".

You can do loads of conditional assembly in macros; and then what they'll do automatically may be an optimization you'd forget to do if you were to write it all out by hand, or they may propagate a change you make elsewhere even if you forget to come back and improve the code here.

Myth #2:  "Macros will introduce more bugs."
Properly implemented, macros make it easier to see what you're doing.  One of the effects is that you'll spot bugs sooner, meaning the macros will eliminate some debugging.  Source code becomes more clear and concise.  Once the macro definitions themselves are debugged, they'll work right every time.  You do however need to avoid doing things like making the macro invisibly or unexpectedly store data in a variable, overwriting data that's still needed by another part of the program.  Yes, that could be difficult to debug; but it won't happen with good macro technique.

Myth #3:  "Macros make the code less readable."
Again, when properly implented, macros will make your code more readable, not less.  Use names that don't leave anyone guessing.  You can also use dummy parameters that aren't even used by the macro but make the line more like an English sentence, as shown in the examples above.

Myth #4:  "Macros defeat the purpose of assembly language."
Doing things the difficult and cryptic way is not the purpose of assembly language, as some seem to think it is, and they turn up their noses at it.  We do assembly language for maximum performance and control of the processor.  If we can use macros to further add the benefits of productivity, maintainability, fewer bugs, and keeping control of a large project, without forfeiting the benefits of assembly, that's a good thing!  Macros may help you stay in "assembly-language land" when it might otherwise be impractical.

High-level languages (HLLs) were invented to improve productivity, reduce source-code length, and improve portability.  In assembly language, I commonly work with two entirely different processor families, and I use many of the same macros in my code for both.  This improves portability, even though it's still assembly language.  If I have a routine for something working on one, and want to write it for the other, I don't have to completely re-write it.

Myth (or objection) #5:  "You're making up your own language!"
Good implementation of macros will make the code more like English (or whatever your native tongue is).  The result should be rather intuitive to anyone else who speaks your language and has any understanding of the application at all, even if they're not very familiar with 65xx assembly language.



Now on to the program structures, starting with examples.

If you've programmed much in non-structured programming, you have experienced situations with lots of branches that just drive you nuts.  I have, many times, printed out the routine on fanfold paper and laid the strip out on the floor and drawn arrows showing all the spaghetti, ie, the tangle of branches.  It even gets hard to come up with short labels that are semi-descriptive, especially if a part is branched to by different conditions.

picture

Here's a short-ish more general-purpose piece of code from the 6502.org source-code repository, from Bruce Clark.  Without laying a macro foundation for structured programming, it was indeed appropriate for him to do it unstructured and use labels.



 ORG 0

FROM:    DFS   2           ; "DFS" in C32 is like "BLKB" in the 2500AD assembler.
TO:      DFS   2           ; It stands for "DeFine Storage", and in this case
SIZE:    DFS   2           ; allots two bytes for each ZP variable here.
SIZEL:   EQU   SIZE        ; SIZEL and SIZEH are the low and high bytes of
SIZEH:   EQU   SIZE+1      ; variable SIZE above.


        ORG  $8000

;         +-----------------------+
;         |   ORIGINAL VERSION    |
;         +-----------------------+


MOVEDOWN: LDY  #0
          LDX  SIZEH
          BEQ  MD2
 MD1:     LDA  (FROM),Y    ; Move a page at a time.
          STA  (TO),Y
          INY
          BNE  MD1
          INC  FROM+1
          INC  TO+1
          DEX
          BNE  MD1
 MD2:     LDX  SIZEL
          BEQ  MD4
 MD3:     LDA  (FROM),Y    ; Move the remaining bytes
          STA  (TO),Y
          INY
          DEX
          BNE  MD3
 MD4:     RTS
 ;----------------


;         +-------------------------+
;         |   STRUCTURED VERSION    |
;         +-------------------------+


MOVEDOWN:
   LDY  #0

   LDX  SIZEH              ; Get the high byte of the size of block to move.
   IF_NOT_ZERO             ; Do this 1st part if there's at least one full page to move.
      BEGIN                ; Do this loop once for each full page to move.
         BEGIN             ; Do this loop once for each byte in the page.
            LDA  (FROM),Y
            STA  (TO),Y
            INY
         UNTIL_ZERO        ; UNTIL_ZERO assembles the BNE up to the BEGIN four lines up.
         INC  FROM+1       ; Increment the high byte of the source
         INC  TO+1         ; and destination addresses, and
         DEX               ; decrement the number of full pages left to do.
      UNTIL_ZERO           ; UNTIL_ZERO assembles the BNE up to the corresponding BEGIN.
   END_IF                  ; END_IF puts the branch distance in the BEQ assembled by the
                           ; IF_NOT_ZERO above, whose operand's addr was on the macro stack.

   LDX  SIZEL              ; After all full pages have been moved, see if there's _part_
   IF_NOT_ZERO             ; of one left to do.  If there is, do the following.
      BEGIN                ; Do this loop once for each byte left.
         LDA  (FROM),Y
         STA  (TO),Y       ; After transferring each byte,
         INY               ; increment the index,
         DEX               ; and decrement the number of bytes left to do.
      UNTIL_ZERO           ; UNTIL_ZERO assembles the BNE up to the BEGIN 5 lines up.
   END_IF                  ; END_IF puts the branch distance in the BEQ assembled
                           ; by the IF_NOT_ZERO above, so a branch taken goes to the RTS below.
   RTS
 ;----------------


Or, saving a few lines of source code:



MOVEDOWN:
   LDY  #0

   LDX  SIZEH                   ; Get the high byte of the size of block to move.
   IF_NOT_ZERO                  ; Do this 1st part if there's at least one full page to move.
      FOR_X   X_REG, DOWN_TO, 0 ; Do this loop once for each full page to move.  Start w/ current X contents.
         FOR_Y  Y_REG, UP_TO, 0 ; Do this loop once for each byte in the page.   Start w/ current Y contents.
            LDA  (FROM),Y
            STA  (TO),Y
         NEXT_Y                 ; NEXT_Y assembles the BNE up to the LDA (FROM),Y two lines up.
         INC  FROM+1            ; Increment the high byte of the source and
         INC  TO+1              ; destination addresses.  In next line, decr the number of full pages left to do.
      NEXT_X                    ; NEXT_X does the DEX, and assembles a BNE up to the first line after FOR_X above.
   END_IF                       ; END_IF puts the branch distance in the BEQ assembled by the
                                ; IF_NOT_ZERO above, whose operand's addr was on the macro stack.

   LDX  SIZEL                   ; After all full pages have been moved, see if there's _part_
   IF_NOT_ZERO                  ; of one left to do.  If there is, do the following.
      FOR_X   X_REG, DOWN_TO, 0 ; Do this loop once for each byte left.
         LDA  (FROM),Y
         STA  (TO),Y            ; After transferring each byte,
         INY                    ; increment the index.  In next line, decr the number of bytes left to do.
      NEXT_X                    ; NEXT_Y does the DEX, then assembles the BNE up to the first line after FOR_X above.
   END_IF                       ; END_IF puts the branch distance in the BEQ assembled
                                ; by the IF_NOT_ZERO above, so a branch taken goes to the RTS below.
   RTS
 ;----------------


The three versions result in exactly the same machine code, but the program structures make it more intuitive what's happening.


Here's another one, my hex-to-decimal routine from http://6502.org/source/integers/hex2dec.htm:



HTD_IN:   DFS  1   ; Input and output variables.  DFS is DeFine Storage.
HTD_OUT:  DFS  2   ; Output is low-byte-first.

TABLE:    DWL  1, 2, 4, 8, 16H, 32H, 64H, 128H  ; DWL is Define Word, Low byte first.


;         +-----------------------+
;         |   ORIGINAL VERSION    |
;         +-----------------------+


HTD:    SED              ; Output gets added up in decimal.
        STZ  HTD_OUT     ; Initialize output word as 0.
        STZ  HTD_OUT+1   ; (NMOS 6502 will need LDA#0, STA ...)

        LDX  #0EH        ; $E is 14 for 2x7 bits.  (0-7 is 8 positions.)
 loop:  ASL  HTD_IN      ; Look at next high bit.  If it's 0,
        BCC  htd1        ; don't add anything to the output for this bit.
        LDA  HTD_OUT     ; Otherwise get the running output sum
        CLC
        ADC  TABLE,X     ; and add the appropriate value for this bit
        STA  HTD_OUT     ; from the table, and store the new sum.
        LDA  HTD_OUT+1   ; After low byte, do high byte.
        ADC  TABLE+1,X
        STA  HTD_OUT+1

 htd1:  DEX              ; Go down to next bit value to loop again.
        DEX
        BPL  loop        ; If still not done, go back for another loop.

        CLD
        RTS
 ;----------------


;         +-------------------------+
;         |   STRUCTURED VERSION    |
;         +-------------------------+


HTD: SED                   ; Output gets added up in decimal.
     STZ  HTD_OUT          ; Initialize output word as 0.
     STZ  HTD_OUT+1        ; (NMOS 6502 will need LDA#0, STA ...)

     LDX  #0EH             ; $E is 14 for 2x7 bits.  (0-7 is 8 positions.)
     BEGIN
        ASL  HTD_IN        ; Look at next high bit.  If it's 0,
        IF_C_SET           ; don't add anything to the output for this bit.
           LDA  HTD_OUT    ; Otherwise get the running output sum
           CLC
           ADC  TABLE,X    ; and add the appropriate value for this bit
           STA  HTD_OUT    ; from the table, and store the new sum.
           LDA  HTD_OUT+1  ; After low byte, do high byte.
           ADC  TABLE+1,X
           STA  HTD_OUT+1
        END_IF
        DEX                ; Go down to next bit value to loop again.
        DEX
     UNTIL_NEG             ; If still not done, go back for another loop.

     CLD
     RTS
 ;----------------


Or, a few lines shorter with FOR_X and NEXT_X:


HTD: SED                          ; Output gets added up in decimal.
     STZ  HTD_OUT                 ; Initialize output word as 0.
     STZ  HTD_OUT+1               ; (NMOS 6502 will need LDA#0, STA ...)

     FOR_X  0EH, DOWN_TO, NEG_NRs ; $E is 14 for 2x7 bits.  (0-7 is 8 positions.)
        ASL  HTD_IN               ; Look at next high bit.  If it's 0,
        IF_C_SET                  ; don't add anything to the output for this bit.
           LDA  HTD_OUT           ; Otherwise get the running output sum
           CLC
           ADC  TABLE,X           ; and add the appropriate value for this bit
           STA  HTD_OUT           ; from the table, and store the new sum.
           LDA  HTD_OUT+1         ; After low byte, do high byte.
           ADC  TABLE+1,X
           STA  HTD_OUT+1
        END_IF
        DEX                       ; Go down to next bit value to loop again.  Need two DEX's, so add one here.
     NEXT_X                       ; If still not done, go back for another loop.
                                  ; In this case, NEXT_X will assemble a DEX, BPL up to the line with the ASL.
     CLD
     RTS
 ;----------------


Again, the three versions assemble exactly the same machine code.  (One reader commented that since they assemble the same machine code, it means the structure was already there before, just not visible.)  Note that the local labels are gone.

I will be modeling some structures here partly after common Forth structures.  I supply the macros for them, in the form needed by the C32 assembler, in the files STRUCMAC.ASM, STAKPUSH.ASM, STKPUSH2.ASM, STKPUSH3.ASM, STACKPOP.ASM, STAKPOP2.ASM, STAKPOP3.ASM, and STAKSWAP.ASM, or, all zipped together, STRUCMAC.ZIP,  (named such because I did them in DOS).  You can rename the structures after the equivalents in other languages if you wish.  Keep in mind too that if any names clash with the names of assembler directives in your assembler, you will have to change the macro names.

The As65 assembler (written by BitWise on the 6502.org forum and taken over by Bill Chatfield after Andrew's untimely death) has structure capabilities without the user adding macros.  His even automatically chooses branch versus jump instructions to get the code compact in most cases but still able to make the jump when the distances exceed 127 bytes.  Most assemblers don't have the built-in structure capability, so I will continue here.  [Edit, 1/1/13: Anton Treuenfels added the structures here to his HXA 6502 assembler.]

IF...[ELSE]...END_IF

The IF... structure is probably the most basic of the program structures.


        CMP  #14
        IF_EQ            ; clear enough that it really needs no comments
           <actions>
           <actions>
           <actions>
        END_IF


No label is needed.  The IF_EQ lays down a BNE instruction to branch around the code if the Z flag in the status register is not set.  It leaves the operand byte blank (or invalid) since it does not know yet how far the branch will be, but records the address of the operand so the END_IF macro can fill it in.  END_IF records the address the next instruction will be at, sets the pointer ( * in some assemblers, $ in some) to what IF_EQ recorded, fills in the operand, then sets the pointer back to where assembly will be resumed.  The internal details are shown in STRUCMAC.ASM.

Ok, so we said the address of the operand byte to be filled in will be "recorded."  Where?  It will be on a stack held in the assembler, which I'm calling the "macro stack."  It will get explained here just a little, but I go into it further in chapter 17 of the 6502 stacks treatise, showing in more detail how stacks can be used to form nestable program structures during assembly or compilation.

Note: The operand of a forward branch will initially appear incorrect in the list file (usually as $FE), but will be corrected further down when the corresponding macro goes back to fill it in.  It may even be wrong initially in the hex file, but if so, the hex file will come back to that address and overwrite it with the right value.

The next step would be to add an ELSE in the IF...END_IF.  The name may need to be changed slightly to keep it from colliding with names of assembler directives; and in fact the C32 assembler does use ELSE in conditional assembly, so I add the underscore for this macro, ELSE_, which should be easy to remember since there's an underscore after IF and END above.



        CMP  #14
        IF_EQ
           <actions>
           <actions>
           <actions>
        ELSE_
           <actions>
           <actions>
           <actions>
        END_IF


This time the IF_EQ lays down a BNE instruction to branch down to the first instruction after the ELSE_ if the Z flag in the status register is not set.  It leaves the operand byte blank since it does not know yet how far the branch will be, but records the address of the operand so the ELSE_ macro can fill it in when the assembler gets down to it.

Similarly, ELSE_ lays down a BRA instruction to unconditionally branch down to the first instruction after the END_IF.  It leaves the operand byte blank since it does not know yet how far the branch will be, but records the address of the operand so the END_IF macro can fill it in when the assembler gets down there.  END_IF records the address the next instruction will be at, sets the pointer to what IF_EQ recorded, fills in the operand, then sets the pointer back to where assembly will be resumed.

Whether you use ELSE_ or not, END_IF only fills in the operand of a previous branch instruction.  It does not lay down any additional code.

BEGIN...AGAIN

Another set of structures starts with BEGIN.  Here's the simplest:


        BEGIN
           <actions>
           <actions>
           <actions>
        AGAIN


This sets up an endless loop, with the last instruction being a branch back to the beginning of the loop.  BEGIN only records the address of the top of the loop so that AGAIN can figure out the correct operand to use in a BRA (Branch Relative Always) or JMP instruction to make the loop repeat again.  The way out of this kind of structure is often an RTS taken under a certain condition somewhere inside the loop.  (In at least one language, AGAIN is called FOREVER, which I'm not fond of because the life of a computer is an insignificant speck in the span of forever.)  Notice again that no labels are needed, and the loop stands out clearly.

BTW, I do recommend that each level of indenting be at least three spaces.  Using only one especially makes it look like you meant to align things vertically and just got sloppy.  It's harder to see the structure.

BEGIN...WHILE...REPEAT

Another structure is:


        BEGIN
           <actions>
           <actions>
           <actions>
        WHILE_<condition>
           <actions>
           <actions>
           <actions>
        REPEAT


It begins the loop with some pre-processing, and continues WHILE the given condition is still met (WHILE_EQ, WHILE_NEG, etc.), otherwise branches to the first instruction after the REPEAT, ie, after the end of the structure.  If the WHILE condition is still being met, the instructions in the last half of the structure are executed, and the REPEAT assembles a BRA or JMP to send the program counter back up to the top of the loop.  Obviously the BEGIN has to record the address there so the REPEAT macro knows what operand to put in the BRA or JMP instruction.  Also, the WHILE macro needs to record the address of the branch instruction it assembles so that the REPEAT macro can fill it in the operand.

The WHILE part could be made to take on as many conditions as you like.  The condition could be a macro parameter to use with conditional assembly to lay down the right branch instruction (BNE, BMI, etc.).  For most situations, I've taken the route of forming separate macros for WHILE_NEG, WHILE_C_SET, WHILE_EQ, etc.; but I do have WHILE_BIT , so you can have for example:


         WHILE_BIT  VIA3PA, 4, IS_LOW


for the condition in this example to be that VIA3's Port A's bit 4 is low.

A planned future addition is to add the ability to have more than one WHILE between BEGIN and REPEAT.

BEGIN...UNTIL

Another structure is:


        BEGIN
           <actions>
           <actions>
           <actions>
        UNTIL_<condition>


It is similar to BEGIN...AGAIN, but it lets execution drop out of the loop when the condition is met.  UNTIL_EQ for example assembles BNE ___ to go back to the top of the loop.  UNTIL_MINUS assembles BPL ___ to the top of the loop, and so on.

CASE...CASE_OF...END_OF...[...]...END_CASE

The CASE statement (BASIC calls it SELECT CASE, C calls it switch, and Forth, which most of these structures are patterned after, calls it CASE) is great for taking different courses of action based on an input number, particularly where the possibilities for that number are non-consecutive.  Take the example of treating different special output characters in a display:


        CASE  ACCUM       ; Test the accumulator against the following cases.
           CASE_OF  $0A   ; In the case of it containing the linefeed character,
              <actions>   ; execute these instructions,
              <actions>
           END_OF         ; then jump to the first instruction after END_CASE.


           CASE_OF  $0D   ; If it has the carriage-return character,
              <actions>   ; execute these instructions,
              <actions>
           END_OF         ; then jump to the first instruction after END_CASE.


           CASE_OF  $08   ; If it has the backspace character,
              <actions>   ; execute these instructions,
              <actions>
           END_OF         ; then jump to the first instruction after END_CASE.


           <actions>      ; If the character is anything else, do these default
           <actions>      ; actions to feed it to the display as display data.
        END_CASE


CASE_OF $0A above assembles CMP #$0A, BNE ___, with the BNE operand invalid until the corresponding END_OF fills it in, making the BNE to branch down to the next part which is to see if the accumulator has the carriage-return character, $0D.  All the END_OFs also assemble a JMP down to the code just after the END_CASE, and leave a record of where they are so the END_CASE macro can fill in their operands, without requiring a second pass, and without requiring labels.

Internally, the CASE structure here is basically the same as a series of IFs and ELSEs.  This is not always true of higher-level languages.  In my 65816 Forth, the set of CASE words is both faster and more memory-efficient than IFs and ELSEs.  Regardless, when appropriate, the CASE statement is still more clear to look at in the source code than a deeply nested series of IFs and ELSEs.

Note that the code at the end of the structure in the example above gets run if none of the watched-for cases exist.  It has been suggested that a do-nothing DEFAULT line of code be put above them to say so; but I would say that the indentation (or lack of it) should tell.  If you do use a DEFAULT line, I would recommend also ending the default section with END_DEFAULT, and indenting the code between the two.

Note also that code can be put between any END_OF and the following CASE_OF.  You may for example want to take some kind of action if it's neither case A nor case B, regardless of whether it will later be found to be case C, D, or E (or none of the above).  In other words, defaults don't have to go at the end, and you can have multiple default sections.

I originally made the END_OFs assemble BRA instructions down to the END_CASE, but the branch distance was sometimes too far if the CASE structure was a long one, so I had to change them to JMPs.  Another possible slight inefficiency is that the last END_OF also had the jump to the end, when it might effectively become a 3-byte, 3-clock NOP if there are no instructions like are shown above in the last two lines before the END_CASE.  IOW, it would just jump to the next instruction anyway.  So I added an END_OF_ (note the trailing _) alternate version which eliminates the no-longer-needed GOTO <END_CASE> for times that it is immediately preceded by an unconditional or is immediately followed by END_CASE anyway.

If the cases were consecutive numbers, and especially if there were a lot of them, it would be much faster and more memory-efficient to use a jump table instead.  A jump table is just a list of addresses.  It has no op codes in it.  A short routine would make sure the input is valid, then if you have at least a 65c02, double the input with ASL, transfer it to X (with TAX), and use JMP(table,X).  The NMOS 6502 does not have that addressing mode, so you might have to use self-modifying code to do it.  It does have JMP(addr), but not JMP(addr,X) like the 65c02 has.

9/22/14:  The maximum number of cases for a CASE statement was increased from 10 to 16 in the 6502 code.  (I had done it earlier in the PIC code.)
10/1/15:  there's more explanation of how the insides of the CASE statement are formed in the assembler in section 17 of the stacks treatise, "Forming nestable program structures," about 90% of the way down the page.  You can mouse over each line and get an explanation of what the assembler does on that line.


FOR_X...NEXT_X and FOR_Y...NEXT_Y

I offer two classes of FOR...NEXT loop here.  Added May 2013 is FOR_X...NEXT_X and FOR_Y...NEXT_Y, and there's the FOR...NEXT I had earlier for 2-byte variables.  FOR_X...NEXT_X and FOR_Y...NEXT_Y cover most of the scenarios you could want for looping with X or Y as the counter, just as efficiently as you would do without the macros.

Initial index values for either X or Y can be:
     * pre-existing accumulator contents (specifying "ACCUM")
             This makes FOR_X or FOR_Y assemble a TAX or TAY.
     * pre-existing X-register contents  (specifying "X_REG")
             This makes FOR_X lay down no code at all (only mark the address of the top of
             the loop for NEXT_X to branch to); but it makes FOR_Y assemble PHX, PLY.
     * pre-existing Y-register contents  (specifying "Y_REG")
             This makes FOR_Y lay down no code at all (only mark the address of the top of
             the loop for NEXT_Y to branch to); but it makes FOR_X assemble PHY, PLX.
     * a specified constant between 0 and $FF inclusive
             This makes FOR_X assemble an LDX# and makes FOR_Y assemble an LDY#.

You can:
     * count down one at a time (by specifying "DOWN_TO")
             This makes NEXT_X or NEXT_Y assemble DEX or DEY before the conditional branch to the top of the loop.
     * count  up  one at a time (by specifying   "UP_TO")
             This makes NEXT_X or NEXT_Y assemble INX or INY before the conditional branch to the top of the loop.


If you want two at a time, you would have to precede the NEXT_X or NEXT_Y with an extra
INX/INY/DEX/DEY.  For other step sizes, you can of course alter X or Y inside the loop.

The limit (ie, target count) can be:
     * a specified constant between 0 and $FF inclusive.
             This makes NEXT_X or NEXT_Y assemble CPX# or CPY# between the INX/DEX/INY/DEY and the conditional branch instruction.
             If the limit is 0, the CPX #0 or CPY #0 will be skipped since it is already automatically implied in the INX or DEX. 
     * the contents of a non-ZP variable above $102 (since $101 and $102 are the numerical representation for NEG_NRs and POS_NRs).
             This makes NEXT_X or NEXT_Y assemble a CPX or CPY abs between the INX/DEX/INY/DEY and the conditional branch instruction.
     * or you can specify that it loop until the index becomes negative or positive (watching bit 7) by specifying:
             UP_TO,   NEG_NRs.  This makes NEX_X or NEXT_Y assemble INX/INY, BPL <top_of_loop>.
             DOWN_TO, NEG_NRs.  This makes NEX_X or NEXT_Y assemble DEX/DEY, BPL <top_of_loop>.
             UP_TO,   POS_NRs.  This makes NEX_X or NEXT_Y assemble INX/INY, BMI <top_of_loop>.
             DOWN_TO, POS_NRs.  This makes NEX_X or NEXT_Y assemble DEX/DEY, BMI <top_of_loop>.

The limitations of FOR_X...NEXT_X and FOR_Y...NEXT_Y are:
  1. The counter must be 8-bit index reg X or Y, not a variable.  They can count up or down though, from/to any 8-bit value.
  2. The initial index can be a constant, or it can be what's already in A, X, or Y.  The macro itself won't fetch it from a variable.  If it's a constant, it can of course be calculated by the assembler.
  3. The limit can be any 8-bit constant, or it can be in a non-ZP variable above address $102.  The loop can alter the variable.  If you use a constant, it can of course be computed by the assembler.
  4. NEXT_X and NEXT_Y do the comparison of the index to the limit after the increment or decrement, and drop through if there's a match; so the loop will not be run with the final "to" value.  IOW,  "FOR  COUNT, 8, DOWN_TO, 0"  will run 8 times, not 9:   8, 7, 6, 5, 4, 3, 2, and 1, but not 0.  If you want 9, do  "9, DOWN_TO, 0" or "8, DOWN_TO, NEG_NRs".
  5. The loop must be short enough that a relative branch at the end will reach the top.  It is rare that loops are too long for that.
The NEXT_X and NEXT_Y macros are 44 lines long and yet might assemble only two instructions, like DEX, BNE.

FOR_X...NEXT_X and FOR_Y...NEXT_Y can be nested, unlike FOR...NEXT for 2-byte variables further down which allow looping 65536 times with a single loop structure.  (Nesting will be discussed in a minute.)

LEAVE_LOOP could be implemented, but the complexity is probably not justified considering the rare need.  I'm leaving it out for now, and if there's a need, you could use a BEGIN...WHILE...REPEAT instead, or handle it in more-conventional ways, like a branch instruction to a label after the loop.  Otherwise, what you could do is use an additional stack level, and have FOR_X or FOR_Y initialize it as 0.  Then if there's a LEAVE_LOOP, it would store the address of its branch instruction in that stack cell, and NEXT_X or NEXT_Y would test it to see if that cell is non-0 and fill it in with a branch to the end if so.  You would have to be careful not to put the LEAVE_LOOP inside another structure that might be using the macro structure stack.  Also, allowing more than one LEAVE_LOOP would complicate things further.  And as always, "compiler" security is up the to programmer.

The number of clock cycles taken for a loop which loads its own index (call it "N") into X or Y and decrements it to 0 is:


      2                    for loading X or Y immediate  (Omit this if you're starting with what was already there.)
    + N * loop_contents    your code in the loop, plus the 2 clocks for DEX or DEY, meaning an empty loop still has N * 2.
    + (N-1) * 3            for BNE top_of_loop.  The 3 turns to 4 if the loop straddles a page boundary.  (Usually it won't.)
    + 2                    for final BNE that does not branch.

So for:

    FOR_X  8, DOWN_TO, 0
    NEXT_X


you have 2 + 16 + 21 + 2 = 41 clocks.  (The PIC16 takes 100 to do the same thing.)


FOR...NEXT (16-bit loop control on 6502)

The above covers most of the loop situations; but what if you want a counter of more than 8 bits?  Without going to the 65816 (which I would encourage you to look into anyway), it gets more complex on the 6502.  The FOR...NEXT macros provided here (without the _X or _Y) hide that complexity.  They use constants for the beginning index (ie, counter) value and the limit, and count by ones.  If you want your program to change the index somewhere in the loop, there's nothing preventing that of course, but keep in mind that NEXT will increment it before comparing to the limit + 1 for a match.  It uses a two-byte variable for the index, one that the user specifies when invoking the FOR macro.  As supplied here, this 16-bit looping structure, like the CASE structure above, is not nestable with others of its type; but the need for nesting one 16-bit FOR...NEXT loop inside another would be rare.  Do be aware of its limitations.  Here's the form of usage:


        FOR  var1, 1, TO, 5000   ; (Loop 5,000 times.  C32 requires commas between parameters.)
           <actions>
           <actions>
           <actions>
        NEXT  var1


For the 65816 which has 16-bit index registers, doing something 5,000 times as shown above becomes as efficient as the 6502 handles numbers under 256.


Nesting structures

Now suppose you want structures nested.  (You will.)  If you have assembler variables to store the addresses where the structure ending macros should fill in branch operands, and then you nest another structure inside the first structure, you can see that it would step on those variables, ie, overwrite them before you're done with them.

One way around it is to use repeats of the same structure macros with names which differ only slightly, like IF_EQ, IF_EQ., IF_EQ.., etc. (note the different numbers of dots after the ends), each one using its own assembler variables.  Then you just have to make sure you use the right one.  There's a better way.

Ideally the addresses would go on a stack, so you could nest structures all you want; but unfortunately assemblers don't usually let you have a variable array and provide a way to do indexing into the array so you can synthesize a stack.  There's a way around it.  It takes an awful lot of lines in the macros, but fortunately these extra lines do not actually lay down any machine code.  The voluminous macro code is only a problem if you don't keep the macros in a separate INCLude file, or if you want to print the list code (although your assembler might let you turn the listing off and on), or if disc or memory space is limited (which is unlikely in today's PCs!)

Here's the idea, illustrated to five stack levels.  (In reality you'll probably want more, to make sure you don't run out.)  Here's how you would add a cell to the stack:



STK_LVL_5:  SETL  STK_LVL_4   ; SETL stands for "SET Label value" in the C32
STK_LVL_4:  SETL  STK_LVL_3   ; assembler, and you can do it as many times as
STK_LVL_3:  SETL  STK_LVL_2   ; you want for any given assembler variable,
STK_LVL_2:  SETL  STK_LVL_1   ; unlike EQU which only allows defining one time.
<now assign the desired value to STK_LVL_1 as the top-of-stack>


and to pop a level off the stack, do:


STK_LVL_1:  SETL  STK_LVL_2   ; STK_LVL_1 is always the top of the stack, regardless of depth.
STK_LVL_2:  SETL  STK_LVL_3
STK_LVL_3:  SETL  STK_LVL_4
STK_LVL_4:  SETL  STK_LVL_5


This, carried to 20 levels (increased from 16 when I added the FOR_X...NEXT_X and FOR_Y...NEXT_Y since these take three levels per structure), is what is in my INCLude files STAKPUSH.ASM and STACKPOP.ASM.  STAKPOP2.ASM pops two levels off at once, and STAKPOP3.ASM pops three levels off at once.  STKPUSH2.ASM and STKPUSH3.ASM push two and three cells on the stack, respectively.  Repetition of the lengthy process in the list file can be avoided by turning the listing off and on for this portion, which in the C32 assembler is done by bracketing the portion with LIST "OFF" and LIST "ON".

If your assembler allows nested macros, put the stack push and pop in macros that get called by the structure macros, just to keep the structure macro source code shorter.  If it does not allow nested macros, you might still be able to have the push and pop code in a separate .ASM file that you can bring it in at the appropriate places with the INCL (include) directive.  This is what I ended up doing here. 

Unfortunately I ran into another little problem with C32, which is that if you have an INCL line in a macro, the assembler doesn't do the INCLuding until after the macro is done.  All I had to do to get around it was that when I wanted to deepen the macro stack and add a cell, I just did TO_PUSH: SETL ___ and then put STK_LVL_1: SETL TO_PUSH at the end of STAKPUSH.ASM.  (This did not affect the machine code output.)

BTW, my file names are limited to 8 letters since I still do a few applications in DOS, with a 132-column, 60-line monitor, point-and-click interface, and I've had up to 34 files open at once, with all kinds of windowing and tiling.  This web page was also done with that, since my DOS-based text editor is far better than any I've seen for a GUI.

The top stack level is always STK_LVL_1, regardless of how many levels under it are being used, and you address it as such with no concern for how deeply nested you have your structures.

Now you can do for example:



        CMP  #14
        IF_EQ
            <actions>
            <actions>
            IF_NEG
                <actions>
                <actions>
                IF_EQ
                    <actions>
                    <actions>
                END_IF
            END_IF
            <actions>
            <actions>
        END_IF


and the structure macros will all keep to themselves and not step on each other's variables.

The CASE structure, if made nestable, increases the assembler macro stack operations' complexity far more than the other structures do, especially since there will be different numbers of cases to handle.  Fortunately, there's almost never any need to nest CASE statements!  For these reasons, I decided to make the CASE structure non-nestable here.  It can be nested with non-CASE statements, but one CASE statement cannot be nested inside another CASE statement.  The non-nestability also goes for the FOR...NEXT macros provided here (without the _X or _Y) as indicated above.

I have a 6502-oriented treatise on stacks (plural—not just the page-1 hardware stack) here which starts out with the definition and very basics, but then gets into deeper applications, including how to use stacks in the forming of program structures in assembly and compiled languages, doing the nesting as well as going further into compiler security which is discussed briefly below.


"Compiler" security (although we're talking about assemblers here, not compilers)

(I borrowed the term from a compiled language.)  As you might have anticipated, there could be a problem if you don't match up the structure parts correctly.  For example having an ELSE_ without an IF_xx, or two ELSE_'s in a row, a BEGIN followed by END_IF, two CASE_OF's without an END_OF between them, etc..  There's a way to add what higher-level languages might call "compiler security," which would generate error messages if you mess up; but it again uses the stack, making the macros even more super-long especially with the inefficient workarounds we have to do for stack operations in the assembler.

Each structure-starting macro would put its compiler security number on the stack, then the matching words check that number on the stack to make sure it's the right one.  (Again, this all happens in the assembler's processing, and has no effect on the machine code output except to help catch human errors.)  In my Forth kernel, BEGIN gets a 1, IF gets a 2, DO gets a 3, CASE gets a 6, OF gets a 7, etc..  The nature of the stack makes it all nestable, so nested structures are fine, but they have to be completed before finishing up in structure levels that are further out.  I have elected not to implement it here.  Since assembly language requires one instruction per line, it should be easy enough to keep things straight and matched by using indentation and vertical alignment appropriately.




I am mostly using branch instructions which are limited to hops of -128 to +127 bytes.  If you want longer hops, you will have to modify the code to use the JMP instruction, sometimes making it longer because of conditional branches around the JMPs.  I find I don't normally exceed the branch instructions' branch distance limitation though.  The structure that's likely to be longest might be a long CASE structure, and the first END_OF has to branch clear down past a lot of other cases to the END_CASE, so I did use JMPs for that.


List of macros provided in the .ASM files

There is no reason you can't make other structures or modify the accompanying ones to suit your purposes.  This is by no means an exhaustive list.  The following macros are defined in STRUCMAC.ASM (and BTW, I have a similar set for PIC16, here):



  IF_EQ       (using Z flag)           BEGIN
  IF_ZERO     (using Z flag)           
  IF_NEQ      (using Z flag)           WHILE_EQ       (using Z flag)
  IF_NOT_ZERO (using Z flag)           WHILE_NEQ      (using Z flag)
  IF_PLUS     (using N flag)           WHILE_ZERO     (using Z flag)
  IF_MINUS    (using N flag)           WHILE_NOT_ZERO (using Z flag)
  IF_NEG      (using N flag)           WHILE_PLUS     (using N flag)
  IF_C_SET    (using C flag)           WHILE_MINUS    (using N flag)
  IF_C_CLR    (using C flag)           WHILE_NEG      (using N flag)
  IF_GE       (using C flag)           WHILE_C_CLR    (using C flag)
  IF_LT       (using C flag)           WHILE_C_SET    (using C flag)
  IF_V_SET    (using V flag)           WHILE_GE       (using C flag)
  IF_V_CLR    (using V flag)           WHILE_LT       (using C flag)
  IF_FLAG_VAR     (added May 2013)     WHILE_V_CLR    (using V flag)
  IF_BIT          (added May 2013)     WHILE_V_SET    (using V flag)
  IF_MEM_BYTE_NEG (added May 2013)     WHILE_BIT      (added May 2013)
  IF_MEM_BYTE_POS (added May 2013)     
                                       REPEAT
  ELSE_                                AGAIN
  END_IF                               
                                       
                                       UNTIL_EQ       (using Z flag)
  CASE      (using A, X, or Y)         UNTIL_ZERO     (using Z flag)
  CASE_OF   (using A, X, or Y)         UNTIL_NEQ      (using Z flag)
  END_OF                               UNTIL_NOT_ZERO (using Z flag)
  END_CASE                             UNTIL_PLUS     (using N flag)
                                       UNTIL_MINUS    (using N flag)
                                       UNTIL_NEG      (using N flag)
  FOR       (16-bit.  Overwrites A)    UNTIL_C_CLR    (using C flag)
  NEXT      (16-bit.  Overwrites A)    UNTIL_C_SET    (using C flag)
                                       UNTIL_GE       (using C flag)
  FOR_X     (added May 2013)           UNTIL_LT       (using C flag)
  NEXT_X    (added May 2013)           UNTIL_V_CLR    (using V flag)
  FOR_Y     (added May 2013)           UNTIL_V_SET    (using V flag)
  NEXT_Y    (added May 2013)           UNTIL_BIT      (added May 2013)


In May 2013 I added a group of accessory macros I find useful:

  RTS_IF_EQ          RTS if Z flag is set
  RTS_IF_NEQ         RTS if Z flag is clear
  RTS_IF_PLUS        RTS if N flag is clear
  RTS_IF_MINUS       RTS if N flag is set
  RTS_IF_FLAG_VAR    RTS on a flag variable's condition.  The variable name and target condition are given in the parameter list.
  RTS_IF_BIT         RTS if the specified bit of the specified byte in memory meets the target condition.  These are given in the parameter list.
  RTS_IF_MEM_LOC     RTS if the value in the specified memory location is positive | negative | zero | non-zero, again per the parameter list.


C32 assembler rules, for clarification so you can adapt the .ASM files to other assemblers

I supply these macros, in the form needed by the C32 assembler, in the files STRUCMAC.ASM, STAKPUSH.ASM, STKPUSH2.ASM, STKPUSH3.ASM, STACKPOP.ASM, STAKPOP2.ASM, STAKPOP3.ASM, and STAKSWAP.ASM, or, all zipped together, STRUCMAC.ZIP,  (named such because I did them in DOS).  To clarify the meanings of things in these accompanying .ASM files, here are the relevant rules of the C32 assembler I'm using.  Your assembler will probably be similar enough that a quick search-and-replace operation will be sufficient.  There's a list of free 6502/c02/816 assemblers here and here.

Labels  C32 requires labels to start with a letter in the range of A-Z, or "_", ".", or "?".
        Individual characters within a label can come from the same set or 0-9.  Unfortunately
        it's not case-sensitive.  If the first character of the label is not in column 1 of the
        line, the label must be followed immediately by a colon.  Putting a colon in after every
        label is a good practice anyway, as it makes searches for actual labels (as opposed to
        references to the label) much quicker.  A label can be any length, and it can stand
        alone on a line.

$       is the current program counter value.  It can be used in labels or expressions for them
        or for operands, etc..  Some assemblers use * instead.

EQU     EQUate.  Sets an assembler constant.  It cannot be changed after its first definition.

SETL    SET Label value.  It's basically a variable for the assembler itself, in that its value
        can be changed countless times after it is defined.  It always requires a label and an
        expression to assign the value to the label.

DFB     DeFine Byte(s).  Lay the following bytes down in the code.  If there are two or more,
        they are separated by commas, except of course when they are consecutive bytes in a
        quoted string.  Some assemblers use .DB , .BYTE , or simply BYTE.

DFS     DeFine Storage.  Skip the specified number of bytes, with the beginning of the block
        taking on the label.  Used for variables in the target system.  The 2500AD assembler
        uses BLKB for "define a BLocK of Bytes."

ORG     ORiGin.  Force the program counter to the desired value.  We have to do this a lot in
        the program-structure macros, to go back to earlier places and fill in addresses and
        forward branch distances after they become known, then go forward again ("back to the
        future" :) ) to pick up assembly at the end of the previously laid code.  All the
        back-and-forth makes for varying line lengths in C32's Intel Hex output, but it's not
        a problem.

IF      For conditional assembly.  If the expression following it comes to something other than
        0, do the following lines.  If it's 0, the assembler ignores them, down to the ELSE or
        ENDI, or anything that evaluates to those (like the END_COMMENT macro).  To keep the IF
        assembler directive separate from the structure macro names, I've put in conditions on
        the latter, referring to flags to branch on, like IF_EQ, IF_C_SET, etc..

ELSE    Optional.  If the "IF" portion was false, pick up the assembly process here.  To keep
        the assembler directive separate from the structure macro name, I've called the latter
        ELSE_ (note the underscore at the end, and don't forget to put it in).

ENDI    Regardless of the outcome of the "IF" above, assembly will definitely be back on after
        this point.  The macro name of the end of the IF_xx structure is END_IF .

INCL    INCLude.  Bring another source-code file in at this point.  You can have various files
        that are used as modules for many different projects, and it is not necessary to show
        all that code again in every one.  Added flexibility might be exercised with the "IF"
        directive, ie, that you assemble the other file at this point if certain conditions are
        true.  INCL can be nested; ie, an INCLuded file can have INCLude directives to bring in
        other files, and so on.  A file can be brought in with INCL at many different points.

MACRO   Begins a macro definition.  MACRO must always follow a label.  MACRO can be followed by
        parameter expressions, as many as you want, as long as they all fit on the one line.
        If there are two or more parameters, they must be separated by commas.

ENDM    signals the end of the macro definition.

operators in expressions (There are others, but they're either obvious or not used here):
==      equal to
!=      not equal to
$       value in program counter.  Some assemblers use the * .
&       bitwise AND
X >> Y  Shift X right by the number of bits specified by Y.

You can have as many parameters in the macro definition as you want, but then C32 requires that the macro call have the same number of parameters.  One thing I liked about the 2500AD assembler was that the number of parameters didn't have to match, and you could say in effect, "If there's a fourth parameter, do this with it; and if there's a fifth one, do that with it..."  Macro parameters can be expressions.  Macro parameters, both in the macro definition and in macro calls, must be separated by commas if there are two or more.  C32 does not allow the nesting of macros.  Some assemblers do.  10/6/23:  Anton Grigorev adapted these for the DASM assembler.




BRA:  Apologies—if you're using an NMOS 6502 which does not have the Branch-Relative-Always instruction, you will have to modify the accompanying code.  Unless you're using something like the Commodore 64 whose 6510 processor never was available in CMOS, I would encourage switching to the 65c02 which has a lot of advantages over the NMOS 6502.

Structures seldom need branch distances that cannot be achieved with branch instructions, so the JMP is seldom used here.  In the rare case that you think you're getting close to the maximum reach of branches, you might need to check in the list file.  The forward branches in the structure macros will not give error messages if you try to branch more that 127 bytes away.


Some macro topics on the 6502.org forum

assembler macros!
desirable assembler features
Passing a variable as an argument to a macro
ANN: HXA0.190 teamtempest's (Anton Treuenfels') topic on program structures in his HXA 65c02/816 assembler.  It was an honor that he copied my structures here and implemented them in the assembler he provides.
A sensible macro engine  In this one, enso proposes and explores the idea of a macro pre-processor that could then be used with various assemblers including ones with no macro capability.
And mentioned on the forum by HansO, is "Macross 6502, an assembler for people who hate assembly language," a very C-like macro assembler for the 6502 (NMOS only).  (Note the two s's in "Macross.")


Future improvements

As time allows, I plan to add more structures, more options (starting with multiple WHILEs between BEGIN and REPEAT), more diagrams, and make other minor improvements.  Note that the 6502 stacks treatise covers in greater detail how the assembler itself uses a stack in the formation and nesting of program structures.

I suspect changes will have to be made to implement the idea if a linker is used.  If you have ideas or knowledge about that, email me.

I'm sure there are still more and greater techniques that could be carried out with any good macroassembler, techniques that we still haven't thought of, even at this late date.


STRUCMAC.ASM program-structure macros   |   STAKPUSH.ASM include file   |   STKPUSH2.ASM include file   |   STKPUSH3.ASM include file   |   STACKPOP.ASM include file   |   STAKPOP2.ASM include file   |   STAKPOP3.ASM include file   |   STAKSWAP.ASM include file   |   STRUCMAC.ZIP, all .ASM files zipped
Related: PIC_stru_MAC.ASM file for PIC16


last updated Mar 24, 2024           (New macros added 5/11/13, and, to the PIC code, 5/5/14.  Max nr of cases increased in 6502 code 9/22/14.)           contact: Garth Wilson, wilsonminesBdslextremeBcom (replacing the B's with @ and .)