Principles here can be used on any assembler with macro capability, but is primarily aimed at the 65c02. I'm using the C32 assembler (Cross-32) from Universal Cross Assemblers, now distributed by Data Sync Engineering. The one assembler is good for lots of processors. It is down to $99 now, but there's a list of free 6502/c02/816 assemblers here and here. BDD has macros to assemble 65816 code on Kowalski's 65(c)02 macro assembler, here.
Revised 5/11/13. This material is subject to constant improvement. I have tested these 65c02 macros but have not used them extensively yet like I have the corresponding ones for PIC16 microcontrollers on Microchip's MPASM assembler. I've done an entire large-ish (for PIC16) project using them. What a pleasure to get away from the spaghetti that's so typical of assembly! The code to do it on PIC16 is here.
First, what is a macro?
As you write an assembly-language program, you may see repeating patterns. If it's exactly the same all the time, you can make it a subroutine. That incurs a 12-clock performance penalty for the subroutine call and return, but program memory is saved because the code for the subroutine is not repeated over and over.
There will be other times however where the repeating pattern is the same but internal details are not, so you can't just use a JSR. The differences from one occurrence to another might be an operand, a string or other data, an address, a condition, etc.. It would be helpful to be able to tell the assembler, "Do this sequence here, except when you get down to this part, substitute-in such-and-such." That's where it's time for a macro.
The repeating, possibly messy-looking sequences that clutter your code can be replaced with a macro call that takes a single line each time, optionally with parameters. Since you write the macro (or at least you can edit it if you want to, even if someone else wrote it), you have complete control of every bit of machine code that gets laid down. After the internal details have been ironed out, you shouldn't have to keep being bothered with them. If you can hide them with macros, you can see the big picture more clearly, have better control of the project, have fewer bugs, and become more productive without losing any performance or taking more memory.
A macro may replace a piece of assembly-language code as short as a line or two, to give more clarity to what is being done there. An example of a single-line macro is where you want to replace the 65816's cryptic REP and SEP instructions:
INDEX_16: MACRO ; Put X & Y into 16-bit mode. REP #00010000B ; NOP ; NOP was necessary for early versions ENDM ; of '802/'816 >4MHz. ;------------------- INDEX_8: MACRO ; Put X & Y into 8-bit mode. SEP #00010000B ; NOP ; NOP was necessary for early versions ENDM ; of '802/'816 >4MHz. ;-------------------INDEX_16 above is far more clear than REP #00010000B which it replaces, yet it lays down exactly the same machine code, C2 10, which takes 3 clocks' execution time at runtime.
Here's another short but useful one (and you can modify it to get related ones), for branching more than half a page away.
BEQlong: MACRO LBL BNE bel1 JMP LBL bel1: ENDM ;-------------------The JMP makes it no longer relative but absolute, but most applications can use it. The 65816 has a BRL (Branch Relative Long, or BRanch Long) instruction, but doing the same kind of thing on the 6502 for relocatable code requires more steps. (You can store the offset, then, since JSR puts the current address on the stack, the subroutine can add the offset to it before the RTS.)
With the macro above, if you want to do a BEQ to someplace more than half a page away, you can do for example:
BEQlong FOOBARand it will assemble:
BNE 3 ; ie, to the instruction after the 3-byte JMP instruction JMP FOOBAR <continue>A macro doesn't necessarily have to lay down any code at all. I use a couple for paragraphs of comments. Although you can comment any line by putting a ; in front of it, this becomes a pain if you find you need to insert or delete a few words and adjust all the lines after it. Why not do:
COMMENT Every flight or checklist is a file. There is no directory. The computer can find a file by starting at the first file in memory (which starts at FIRST_FL_ADR), calculating the address of the 2nd file by using the file- length bytes of the first file, etc.. There's an ENDRAM byte after the END_OF_FL byte of the last file. The address of the ENDRAM byte is also stored in ENDRAM_ADR. The number of files is stored in NR_FLs. All the RAM used by the system is at lower addresses than the file chain. END_COMMENTBesides making it easier to re-format the paragraph after changes (<alt>R in the MultiEdit text editor even preserves the left margin), it looks nicer. Just be sure no line starts with something that the assembler interprets as the ENDIF.
Here's how to do it. IF 0 is a condition that will never be met, so the assembler skips to the ENDIF (or actually ENDI in the C32 assembler I used the portion below on).
COMMENT: MACRO ; COMMENT and END_COMMENT here relieve us from the load IF 0 ; of semicolons where we have many consecutive lines of ENDM ; comments. Since the IF is looking for an ELSE or ENDI ;---------------- ; (either cap.s or lower case), be sure none of the lines END_COMMENT: MACRO ; commented-out start with one of these words that could ENDI ; fool it. If there is, that line will still need a ; . ENDM ; Also, if a line starts with a macro name which is ;---------------- ; followed by illegal parameters for that macro (as to ; discuss it), you will still need the ; .Of course if your assembler already has a COMMENT directive (like the 2500AD one does), you won't need this one. Not all do however, and the above still shows what can be done.
Here's an only slightly more complex example, where we want the computer to display an immediate string and wait for the response:
DISPLAY_IMM "Press CONTINUE when ready" WAIT_FOR_KEY CONT_KEYwhere the macro DISPLAY_IMM is defined as:
DISPLAY_IMM: MACRO STR JSR DISP_QUOTE BYTE dim2#-dim1# ; Lay down the string length byte, dim1#: BYTE STR ; followed by the string. (Counted string, not nul-terminated.) dim2#: ; (Must not put ENDM on same line with the label.) ENDM ;-------------------(This is from an application I did with the 2500AD assembler which used the # at the end of a label to mean it's local to the macro.) It assembles:
JSR DISP_QUOTE BYTE 25 BYTE "Press CONTINUE when ready"The DISP_QUOTE subroutine looks at the return address to get the string length, then continues that far to get the string it's supposed to display, then also uses the length byte to adjust the return address on the 6502's stack so the RTS takes it to the first instruction after the string instead of trying to execute data.
The WAIT_FOR_KEY macro used above is defined as:
WAIT_FOR_KEY: MACRO KEY wfk1#: JSR SCAN_KEYPAD CMP #KEY BNE wfk1# ;-------------------and will assemble:
JSR SCAN_KEYPAD CMP #CONT_KEY BNE $F9 ; Branch back to the JSR.Macros can make a routine take fewer pages and make it easier to wrap your head around it. A piece from a routine goes as follows:
; Initialize software flags LDA #0 STA ICNT STA IHEAD STA ITAIL STA OCNT STA OHEAD STA OTAIL STA OIEOne thing I liked about the 2500AD assembler is that it allowed a variable number of parameters in the macro call, which was nice for setting or clearing a list of flag variables that won't always have the same number of them listed. Hence the above could be replaced with:
CLR_FLAG ICNT, IHEAD, ITAIL, OCNT, OHEAD, OTAIL, OIE ; Init the software flags.and lay down exactly the same code. (Of course with the 65c02 you could make it eliminate the LDA #0 and use STZ instead of STA.) There are seven flags to clear, so you would need at least that many parts in the macro. In the 2500AD assembler, IFMA 5 for example meant "If there's a 5th macro parameter, do this part." In this case IFMA 8 would be the first one that would not lay down any code, since there's no 8th flag listed to clear. If your assembler does not have that capability, you might have to just pad the unused parameter positions with 0's in order to accomplish the above.
You know exactly what goes laid down. If there were conditionals and you wanted to see what code resulted, you could look at the assembler's .LST (list) output file which shows the actual addresses along the left edge, followed by the actual bytes of op codes, operands, and data, all to the left of the corresponding lines of source code. If you assign constants and assembler variables, the list file will show the exact numeric values that resulted from those too.
At the other rare extreme, a macro can replace even pages of code. Before I did the program-structure macros, I think my longest macro was 54 lines, with much of that being conditionals, and the macro actually assembled only two to seven instructions depending on the conditions which in this case were set in the parameters in the line calling the macro.
A macro, when defined, is only kept in the assembler. It is like a subroutine for the assembler itself to run. No machine code is produced at the time the macro is defined. When you call the macro later in your code, the assembler will execute that "assembler subroutine" and generate machine code at that time, per the parameters given in the macro call, and put it where the program pointer indicates. If you look at the resulting .LST (list) file, you can see the code in the macro expanded out at each point the macro is called. It can have conditionals and so on, just like non-macro code.
Since a macro expansion can take a variable amount of space in the output machine code, the macro must be defined before it is called. Forward references to macros are not allowed. Forward references in JSR's are ok because the entire JSR instruction is always the same length; but the a macro may take differing amounts of code space from one call to the next because of things like conditional assembly in the macro and conditions being different each time, varying lengths of text strings, etc.. The assembler would have no idea how much address space to reserve to expand a macro that is not defined yet.
Now on to the program structures.
If you've programmed much in non-structured programming, you have experienced situations with lots of branches that just drive you nuts. I have, many times, printed out the routine on fanfold paper and laid the strip out on the floor and drawn arrows showing all the spaghetti, ie, the tangle of branches. It even gets hard to come up with short labels that are semi-descriptive, especially if a part is branched to by different conditions.
Here's a short-ish more general-purpose piece of code from the 6502.org source-code repository, from Bruce Clark. Without laying a macro foundation for structured programming, it was indeed appropriate for him to do it unstructured and use labels.
ORG 0 FROM: DFS 2 ; "DFS" in C32 is like "BLKB" in the 2500AD assembler. TO: DFS 2 ; It stands for "DeFine Storage", and in this case SIZE: DFS 2 ; allots two bytes for each ZP variable here. SIZEL: EQU SIZE ; SIZEL and SIZEH are the low and high bytes of SIZEH: EQU SIZE+1 ; variable SIZE above. ORG $8000 ; +-----------------------+ ; | ORIGINAL VERSION | ; +-----------------------+ MOVEDOWN: LDY #0 LDX SIZEH BEQ MD2 MD1: LDA (FROM),Y ; Move a page at a time. STA (TO),Y INY BNE MD1 INC FROM+1 INC TO+1 DEX BNE MD1 MD2: LDX SIZEL BEQ MD4 MD3: LDA (FROM),Y ; Move the remaining bytes STA (TO),Y INY DEX BNE MD3 MD4: RTS ;---------------- ; +-------------------------+ ; | STRUCTURED VERSION | ; +-------------------------+ MOVEDOWN: LDY #0 LDX SIZEH ; Get the high byte of the size of block to move. IF_NOT_ZERO ; Do this 1st part if there's at least one full page to move. BEGIN ; Do this loop once for each full page to move. BEGIN ; Do this loop once for each byte in the page. LDA (FROM),Y STA (TO),Y INY UNTIL_ZERO ; UNTIL_ZERO assembles the BNE up to the BEGIN four lines up. INC FROM+1 ; Increment the high byte of the source INC TO+1 ; and destination addresses, and DEX ; decrement the number of full pages left to do. UNTIL_ZERO ; UNTIL_ZERO assembles the BNE up to the corresponding BEGIN. END_IF ; END_IF puts the branch distance in the BEQ assembled by the ; IF_NOT_ZERO above, whose operand's addr was on the macro stack. LDX SIZEL ; After all full pages have been moved, see if there's _part_ IF_NOT_ZERO ; of one left to do. If there is, do the following. BEGIN ; Do this loop once for each byte left. LDA (FROM),Y STA (TO),Y ; After transferring each byte, INY ; increment the index, DEX ; and decrement the number of bytes left to do. UNTIL_ZERO ; UNTIL_ZERO assembles the BNE up to the BEGIN 5 lines up. END_IF ; END_IF puts the branch distance in the BEQ assembled ; by the IF_NOT_ZERO above, so a branch taken goes to the RTS below. RTS ;----------------
Or, saving a few lines of source code:
MOVEDOWN: LDY #0 LDX SIZEH ; Get the high byte of the size of block to move. IF_NOT_ZERO ; Do this 1st part if there's at least one full page to move. FOR_X X_REG, DOWN_TO, 0 ; Do this loop once for each full page to move. Start w/ current X contents. FOR_Y Y_REG, UP_TO, 0 ; Do this loop once for each byte in the page. Start w/ current Y contents. LDA (FROM),Y STA (TO),Y NEXT_Y ; NEXT_Y assembles the BNE up to the LDA (FROM),Y two lines up. INC FROM+1 ; Increment the high byte of the source and INC TO+1 ; destination addresses. In next line, decr the number of full pages left to do. NEXT_X ; NEXT_X does the DEX, and assembles a BNE up to the first line after FOR_X above. END_IF ; END_IF puts the branch distance in the BEQ assembled by the ; IF_NOT_ZERO above, whose operand's addr was on the macro stack. LDX SIZEL ; After all full pages have been moved, see if there's _part_ IF_NOT_ZERO ; of one left to do. If there is, do the following. FOR_X X_REG, DOWN_TO, 0 ; Do this loop once for each byte left. LDA (FROM),Y STA (TO),Y ; After transferring each byte, INY ; increment the index. In next line, decr the number of bytes left to do. NEXT_X ; NEXT_Y does the DEX, then assembles the BNE up to the first line after FOR_X above. END_IF ; END_IF puts the branch distance in the BEQ assembled ; by the IF_NOT_ZERO above, so a branch taken goes to the RTS below. RTS ;----------------The three versions result in exactly the same machine code, but the program structures make it more intuitive what's happening.
Here's another one, my hex-to-decimal routine from http://6502.org/source/integers/hex2dec.htm :
HTD_IN: DFS 1 ; Input and output variables. DFS is DeFine Storage. HTD_OUT: DFS 2 ; Output is low-byte-first. TABLE: DWL 1, 2, 4, 8, 16H, 32H, 64H, 128H ; DWL is Define Word, Low byte first. ; +-----------------------+ ; | ORIGINAL VERSION | ; +-----------------------+ HTD: SED ; Output gets added up in decimal. STZ HTD_OUT ; Inititalize output word as 0. STZ HTD_OUT+1 ; (NMOS 6502 will need LDA#0, STA ...) LDX #0EH ; $E is 14 for 2x7 bits. (0-7 is 8 positions.) loop: ASL HTD_IN ; Look at next high bit. If it's 0, BCC htd1 ; don't add anything to the output for this bit. LDA HTD_OUT ; Otherwise get the running output sum CLC ADC TABLE,X ; and add the appropriate value for this bit STA HTD_OUT ; from the table, and store the new sum. LDA HTD_OUT+1 ; After low byte, do high byte. ADC TABLE+1,X STA HTD_OUT+1 htd1: DEX ; Go down to next bit value to loop again. DEX BPL loop ; If still not done, go back for another loop. CLD RTS ;---------------- ; +-------------------------+ ; | STRUCTURED VERSION | ; +-------------------------+ HTD: SED ; Output gets added up in decimal. STZ HTD_OUT ; Inititalize output word as 0. STZ HTD_OUT+1 ; (NMOS 6502 will need LDA#0, STA ...) LDX #0EH ; $E is 14 for 2x7 bits. (0-7 is 8 positions.) BEGIN ASL HTD_IN ; Look at next high bit. If it's 0, IF_C_SET ; don't add anything to the output for this bit. LDA HTD_OUT ; Otherwise get the running output sum CLC ADC TABLE,X ; and add the appropriate value for this bit STA HTD_OUT ; from the table, and store the new sum. LDA HTD_OUT+1 ; After low byte, do high byte. ADC TABLE+1,X STA HTD_OUT+1 END_IF DEX ; Go down to next bit value to loop again. DEX UNTIL_NEG ; If still not done, go back for another loop. CLD RTS ;----------------
Or, a few lines shorter with FOR_X and NEXT_X:
HTD: SED ; Output gets added up in decimal. STZ HTD_OUT ; Inititalize output word as 0. STZ HTD_OUT+1 ; (NMOS 6502 will need LDA#0, STA ...) FOR_X 0EH, DOWN_TO, NEG_NRs ; $E is 14 for 2x7 bits. (0-7 is 8 positions.) ASL HTD_IN ; Look at next high bit. If it's 0, IF_C_SET ; don't add anything to the output for this bit. LDA HTD_OUT ; Otherwise get the running output sum CLC ADC TABLE,X ; and add the appropriate value for this bit STA HTD_OUT ; from the table, and store the new sum. LDA HTD_OUT+1 ; After low byte, do high byte. ADC TABLE+1,X STA HTD_OUT+1 END_IF DEX ; Go down to next bit value to loop again. Need two DEX's, so add one here. NEXT_X ; If still not done, go back for another loop. ; In this case, NEXT_X will assemble a DEX, BPL up to the line with the ASL. CLD RTS ;----------------Again, the three versions assemble exactly the same machine code. (One reader commented that since they assemble the same machine code, it means the structure was already there before, just not visible.)
I will be modeling some structures here partly after common Forth structures. I supply the macros for them, in the form needed by the C32 assembler, in the files STRUCMAC.ASM, STAKPUSH.ASM, STKPUSH2.ASM, STKPUSH3.ASM, STACKPOP.ASM, STAKPOP2.ASM, STAKPOP3.ASM, and STAKSWAP.ASM, or, all zipped together, STRUCMAC.ZIP, (named such because I did them in DOS). You can rename the structures after the equivalents in other languages if you wish. Keep in mind too that if any names clash with the names of assembler directives in your assembler, you will have to change the macro names.
The As65 assembler (written by BitWise on the 6502.org forum) has structure capabilities without the user adding macros. His even automatically chooses branch versus jump instructions to get the code compact in most cases but still able to make the jump when the distances exceed 127 bytes. Most assemblers don't have the built-in structure capability, so I will continue here. [Edit, 1/1/13: Anton Treuenfels added the structures here to his HXA 6502 assembler.]
First, take an example of the most basic one, the IF.
CMP #14 IF_EQ ; clear enough that it really needs no comments <actions> <actions> <actions> END_IFNo label is needed. The IF_EQ lays down a BNE instruction to branch around the code if the Z flag in the status register is not set. It leaves the operand byte blank since it does not know yet how far the branch will be, but records the address of the operand so the END_IF macro can fill it in. END_IF records the address the next instruction will be at, sets the pointer ( * in some assemblers, $ in some) to what IF_EQ recorded, fills in the operand, then sets the pointer back to where assembly will be resumed. The internal details are shown in STRUCMAC.ASM.
Ok, so we said the address of the operand byte to be filled in will be "recorded." Where? It will be on a stack held in the assembler, which I'm calling the "macro stack." It will get explained here just a little, but I plan to go into it more in the upcoming primer on stacks, showing in more detail how stacks can be used to form nestable program structures during assembly or compilation.
Note: The operand of a forward branch will initially appear incorrect in the list file (usually as $FE), but will be corrected further down when the corresponding macro goes back to fill it in. It may even be wrong initially in the hex file, but if so, the hex file will come back to that address and overwrite it with the right value.
The next step would be to add an ELSE in the IF...END_IF. The name may need to be changed slightly to keep it from colliding with names of assembler directives; and in fact the C32 assembler does use ELSE in conditional assembly, so I add the underscore for this macro, ELSE_, which should be easy to remember since there's an underscore after IF and END above.
Another set of structures begins with BEGIN. Here's the simplest:
BEGIN <actions> <actions> <actions> AGAINBEGIN only records the address of the top of the loop so that AGAIN can figure out the correct operand to use in a BRA (Branch Relative Always) or JMP instruction to make the loop repeat again. The way out of this kind of structure is often an RTS taken under a certain condition somewhere inside the loop. (In at least one language, AGAIN is called FOREVER, which I'm not fond of because the life of a computer is an insignificant speck in the span of forever.) Notice again that no labels are needed, and the loop stands out clearly.
BTW, I do recommend that each level of indenting be at least three spaces. Using only one especially makes it look like you meant to align things vertically and just got sloppy. It's harder to see the structure.
Another one is:
BEGIN <actions> <actions> <actions> WHILE_<condition> <actions> <actions> <actions> REPEATIt begins the loop with some pre-processing, and continues WHILE the given condition is still met (WHILE_EQ, WHILE_NEG, etc.), otherwise branches to the first instruction after the REPEAT, ie, after the end of the structure. If the WHILE condition is still being met, the instructions in the last half of the structure are executed, and the REPEAT assembles a BRA or JMP to send the program counter back up to the top of the loop. Obviously the BEGIN has to record the address there so the REPEAT macro knows what operand to put in the BRA or JMP instruction. Also, the WHILE macro needs to record the address of the branch instruction it assembles so that the REPEAT macro can fill it in the operand.
The WHILE part could be made to take on as many conditions as you like. The condition could be a macro parameter to use with conditional assembly to lay down the right branch instruction (BNE, BMI, etc.). For most situations, I've taken the route of forming separate macros for WHILE_NEG, WHILE_C_SET, WHILE_EQ, etc.; but I do have WHILE_BIT , so you can have for example:
WHILE_BIT VIA3PA, 4, IS_LOWfor the condition in this example to be that VIA3's Port A's bit 4 is low.
Another structure is:
BEGIN <actions> <actions> <actions> UNTIL_<condition>It is similar to BEGIN...AGAIN, but it lets execution drop out of the loop when the condition is met. UNTIL_EQ for example assembles BNE___ to go back to the top of the loop. UNTIL_MINUS assembles BPL___ to the top of the loop, and so on.
The CASE statement (BASIC calls it SELECT CASE) is great for taking different courses of action based on an input number, particularly where the possibilities for that number are non-consecutive. Take the example of treating different special output characters in a display:
CASE ACCUM ; Test the accumulator against the following cases. CASE_OF $0A ; If it has the linefeed character, <actions> ; execute these instructions, <actions> END_OF ; then branch to the first instruction after END_CASE. CASE_OF $0D ; If it has the carriage-return character, <actions> ; execute these instructions, <actions> END_OF ; then branch to the first instruction after END_CASE. CASE_OF $08 ; If it has the backspace character, <actions> ; execute these instructions, <actions> END_OF ; then branch to the first instruction after END_CASE. <actions> ; If the character is anything else, do these actions <actions> ; to feed it to the display as display data. END_CASECASE_OF $0A above assembles CMP<#$0A, BNE___, with the BNE operand invalid until the corresponding END_OF fills it in, making the BNE to branch down to the next part which is to see if the accumulator has the carriage-return character, $0D. All the END_OFs also assemble a JMP down to the code just after the END_CASE, and leave a record of where they are so the END_CASE macro can fill them in, without requiring a second pass, and without requiring labels.
Internally, the CASE structure here is basically the same as a series of IFs and ELSEs. This is not always true of higher-level languages. In my 65816 Forth, the set of CASE words is both faster and more memory-efficient than IFs and ELSEs. Regardless, when appropriate, the CASE statement is still more clear to look at in the source code than a series of IFs and ELSEs.
This is one of the few program structures which, done with macros, may be slightly less efficient than going without macros, as the machine-language code laid down won't be quite as good as you could do by hand. (Still, I think the clarity and programming productivity gained with the CASE macros makes them well worth the miniscule penalty.) I originally made the END_OFs assemble BRA instructions down to the END_CASE, but the branch distance was sometimes too far if the CASE structure was a long one, so I had to change them to JMPs. Another possible slight inefficiency is that the last END_OF also has the jump to the end, when it might effectively become a 3-byte, 3-clock NOP if there are no instructions like are shown above in the last two lines before the END_CASE. IOW, it would just jump to the next instruction anyway.
If the cases were consecutive numbers, and especially if there were a lot of them, it would be much faster and more memory-efficient to use a jump table instead. A jump table is just a list of addresses. It has no op codes in it. A short routine would make sure the input is valid, then if you have at least a 65c02, double the input with ASL, transfer it to X (with TAX), and use JMP(table,X). The NMOS 6502 does not have that addressing mode, so you might have to use self-modifying code to do it. It does have JMP(addr), but not JMP(addr,X) like the 65c02 has.
I offer two classes of FOR...NEXT loop here. Added May 2013 is FOR_X...NEXT_X and FOR_Y...NEXT_Y , and there's the FOR...NEXT I had earlier for 2-byte variables. FOR_X...NEXT_X and FOR_Y...NEXT_Y cover most of the senarios you could want for looping with X or Y as the counter, just as efficiently as you would do without the macros.
Initial index values for either X or Y can be: * pre-existing accumulator contents (specifying "ACCUM") This makes FOR_X or FOR_Y assemble a TAX or TAY. * pre-existing X-register contents (specifying "X_REG") This makes FOR_X lay down no code at all (only mark the address of the top of the loop for NEXT_X to branch to); but it makes FOR_Y assemble PHX, PLY. * pre-existing Y-register contents (specifying "Y_REG") This makes FOR_Y lay down no code at all (only mark the address of the top of the loop for NEXT_Y to branch to); but it makes FOR_X assemble PHY, PLX. * a specified constant between 0 and $FF inclusive This makes FOR_X assemble an LDX# and makes FOR_Y assemble an LDY#. You can: * count down one at a time (by specifying "DOWN_TO") This makes NEXT_X or NEXT_Y assemble DEX or DEY before the conditional branch to the top of the loop. * count up one at a time (by specifying "UP_TO") This makes NEXT_X or NEXT_Y assemble INX or INY before the conditional branch to the top of the loop. If you want two at a time, you would have to precede the NEXT_X or NEXT_Y with an extra INX/INY/DEX/DEY. For other step sizes, you can of course alter X or Y inside the loop. The limit (ie, target count) can be: * a specified constant between 0 and $FF inclusive. This makes NEXT_X or NEXT_Y assemble CPX# or CPY# between the INX/DEX/INY/DEY and the conditional branch instruction. If the limit is 0, the CPX #0 or CPY #0 will be skipped since it is already automatically implied in the INX or DEX. * the contents of a non-ZP variable above $102 (since $101 and $102 are the numerical representation for NEG_NRs and POS_NRs). This makes NEXT_X or NEXT_Y assemble a CPX or CPY abs between the INX/DEX/INY/DEY and the conditional branch instruction. * or you can specify that it loop until the index becomes negative or positive (watching bit 7) by specifying: UP_TO, NEG_NRs. This makes NEX_X or NEXT_Y assemble INX/INY, BPL <top_of_loop>. DOWN_TO, NEG_NRs. This makes NEX_X or NEXT_Y assemble DEX/DEY, BPL <top_of_loop>. UP_TO, POS_NRs. This makes NEX_X or NEXT_Y assemble INX/INY, BMI <top_of_loop>. DOWN_TO, POS_NRs. This makes NEX_X or NEXT_Y assemble DEX/DEY, BMI <top_of_loop>.
The limitations of FOR_X...NEXT_X and FOR_Y...NEXT_Y are:
FOR_X...NEXT_X and FOR_Y...NEXT_Y can be nested, unlike FOR...NEXT for 2-byte variables further down which allow looping 65536 times with a single loop structure. (Nesting will be discussed in a minute.)
LEAVE_LOOP could be implemented, but the complexity is probably not justified considering the rare need. I'm leaving it out for now, and if there's a need, it can be handled in more-conventional ways, like a branch instruction to a label after the loop. Otherwise, what you could do is use an additional stack level, and have FOR_X or FOR_Y initialize it as 0. Then if there's a LEAVE_LOOP, it would store the address of its branch instruction in that stack cell, and NEXT_X or NEXT_Y would test it to see if that cell is non-0 and fill it in with a branch to the end if so. You would have to be careful not to put the LEAVE_LOOP inside another structure that might be using the macro structure stack. Also, allowing more than one LEAVE_LOOP would complicate things furter. And as always, "compiler" security is up the to programmer.
The number of clock cycles taken for a loop which loads its own index (call it "N") into X or Y and decrements it to 0 is:
2 for loading X or Y immediate (Omit this if you're starting with what was already there.) + N * loop_contents your code in the loop, plus the 2 clocks for DEX or DEY, meaning an empty loop still has N * 2. + (N-1) * 3 for BNE top_of_loop. The 3 turns to 4 if the loop straddles a page boundary. (Usually it won't.) + 2 for final BNE that does not branch.So for:
FOR_X 8, DOWN_TO, 0 NEXT_Xyou have 2 + 16 + 21 + 2 = 41 clocks. (The PIC16 takes 100 to do the same thing.)
The above covers most of the loop situations; but what if you want a counter of more than 8 bits? Without going to the 65816 (which I would encourage you to look into anyway), it gets more complex on the 6502. The FOR...NEXT macros provided here (without the _X or _Y) hide that complexity. They use constants for the beginning index (ie, counter) value and the limit, and count by ones. If you want your program to change the index somewhere in the loop, there's nothing preventing that of course, but keep in mind that NEXT will increment it before comparing to the limit + 1 for a match. It uses a two-byte variable for the index, one that the user specifies when invoking the FOR macro. As supplied here, this 16-bit looping structure, like the CASE structure above, is not nestable with others of its type; but the need for nesting one 16-bit FOR...NEXT loop instide another would be rare. Do be aware of its limitations. Here's the form of usage:
FOR var1, 1, TO, 5000 ; (Loop 5,000 times. C32 requires commas between parameters.) <actions> <actions> <actions> NEXT var1For the 65816 which has 16-bit index registers, doing something 5,000 times as shown above becomes as efficient as the 6502 can handle numbers under 256.
Now suppose you want structures nested.
You will. If you have assembler variables to store the addresses where the structure ending macros should fill in branch operands, and then you nest another structure inside the first structure, you can see that it would step on those variables, ie, overwrite them before you're done with them.
One way around it is to use repeats of the same structure macros with names differ only slightly, like IF_EQ, IF_EQ., IF_EQ.., etc. (note the different numbers of dots after the ends), each one using its own assembler variables. Then you just have to make sure you use the right one. There's a better way.
Ideally the addresses would go on a stack, so you could nest structures all you want; but unfortunately assemblers don't usually let you have a variable array and provide a way to do indexing into the array so you can synthesize a stack. There's a way around it. It takes an awful lot of lines in the macros, but fortunately these extra lines do not actually lay down any machine code. The voluminous macro code is only a problem if you don't keep the macros in a separate INCLude file, or if you want to print the list code (although your assembler might let you turn the listing off and on), or if disc or memory space is limited (which is unlikely in today's PCs!)
Here's the idea, illustrated to five stack levels. (In reality you'll probably want more, to make sure you don't run out.) Here's how you would add a cell to the stack:
STK_LVL_5: SETL STK_LVL_4 ; SETL stands for "SET Label value" in the C32 STK_LVL_4: SETL STK_LVL_3 ; assembler, and you can do it as many times as STK_LVL_3: SETL STK_LVL_2 ; you want for any given assembler variable, STK_LVL_2: SETL STK_LVL_1 ; unlike EQU which only allows defining one time. <now assign the desired value to STK_LVL_1 as the top-of-stack>and to pop a level off the stack, do:
STK_LVL_1: SETL STK_LVL_2 ; STK_LVL_1 is always the top of the stack, regardless of depth. STK_LVL_2: SETL STK_LVL_3 STK_LVL_3: SETL STK_LVL_4 STK_LVL_4: SETL STK_LVL_5This, carried to 20 levels (increased from 16 when I added the FOR_X...NEXT_X and FOR_Y...NEXT_Y since these take three levels per structure), is what is in my INCLude files STAKPUSH.ASM and STACKPOP.ASM. STAKPOP2.ASM pops two levels off at once, and STAKPOP3.ASM pops three levels off at once. STKPUSH2.ASM and STKPUSH3.ASM push two and three cells on the stack, respectively. Repetition of the lengthy process in the list file can be avoided by turning the listing off and on for this portion, which in the C32 assembler is done by bracketing the portion with LIST "OFF" and LIST "ON".
If your assembler allows nested macros, put the stack push and pop in macros that get called by the structure macros, just to keep the structure macro source code shorter. If it does not allow nested macros, you might still be able to have the push and pop code in a separate .ASM file that you can bring it in at the appropriate places with the INCL (include) directive. This is what I ended up doing here.
Unfortunately I ran into another little problem with C32, which is that if you have an INCL line in a macro, the assembler doesn't do the INCLuding until after the macro is done. All I had to do to get around it was that when I wanted to deepen the macro stack and add a cell, I just did TO_PUSH: SETL ___ and then put STK_LVL_1: SETL TO_PUSH at the end of STAKPUSH.ASM. (This did not affect on the machine code output.)
BTW, I limited my file names limited to 8 letters since I still do a few applications in DOS, with a 132-column, 60-line monitor, point-and-click interface, and I've had up to 34 files open at once, with all kinds of windowing and tiling. This web page was also done with that, since my DOS-based text editor is far better than any I've seen for a GUI.
The top stack level is always STK_LVL_1, regardless of how many levels under it are being used, and you address it as such with no concern for how deeply nested you have your structures.
Now you can do for example:
CMP #14 IF_EQ <actions> <actions> IF_NEG <actions> <actions> IF_EQ <actions> <actions> END_IF END_IF <actions> <actions> END_IFand the structure macros will all keep to themselves and not step on each other's variables.
The CASE structure, if made nestable, increases the assembler macro stack operations complexity far more than the other structures do, especially since there will be different numbers of cases to handle. Fortunately, there's almost never any need to nest CASE statements! For these reasons, I decided to make the CASE structure non-nestable here. It can be nested with non-CASE statements, but one CASE statement cannot be nested inside another CASE statement. The non-nestability also goes for the FOR...NEXT macros provided here (without the _X or _Y) as indicated above.
I am writing (as limited time allows) a primer on stacks (plural-- not just the page-1 hardware stack) to be the next major feature on my website. Actually, "primer" may not be the best name for it, because although it will start out with the definition and very basics, it will get into deeper applications, including how to use stacks in the forming of program structures in assembly and compiled languages, doing the nesting as well as going further into compiler security which is discussed briefly below.
"Compiler" security (although we're talking about assemblers, not compilers)
As you might have anticipated, there could be a problem if you don't match up the structure parts correctly. For example having and ELSE_ without an IF_xx, or two ELSE_'s in a row, a BEGIN followed by END_IF, two CASE_OF's without an END_OF between them, etc.. There's a way to add what higher-level languages might call "compiler security," which would generate error messages if you mess up; but it again uses the stack, making the macros even more super-long especially with the inefficient workarounds we have to do for stack operations in the assembler.
Each structure-starting macro would put its compiler security number on the stack, then the matching words check that number on the stack to make sure it's the right one. (Again, this all happens in the assembler's processing, and has no effect on the machine code output except to help catch human errors.) In my Forth kernel, BEGIN gets a 1, IF gets a 2, DO gets a 3, CASE gets a 6, OF gets a 7, etc.. The nature of the stack makes it all nestable, so nested structures are fine, but they have to be completed before finishing up in structure levels that are further out. I have elected not to implement it here. Since assembly language requires one instruction per line, it should be easy enough to keep things straight and matched by using indentation and vertical alignment appropriately.
I am mostly using branch instructions which are limited to hops of -128 to +127 bytes. If you want longer hops, you will have to modify the code to use the JMP instruction, sometimes making it longer because of conditional branches around the JMPs. I find I don't normally exceed the branch instructions' branch distance limitation though. The structure that's likely to be longest might be a long CASE structure, and the first END_OF has to branch clear down past a lot of other cases to the END_CASE, so for that, I did use JMPs.
There is no reason you can't make other structures or modify the accompanying ones to suit your purposes. This is by no means an exhaustive list. The following macros are defined in STRUCMAC.ASM (and BTW, I have a similar set for PIC16, here):
IF_EQ (using Z flag) BEGIN IF_ZERO (using Z flag) IF_NEQ (using Z flag) WHILE_EQ (using Z flag) IF_NOT_ZERO (using Z flag) WHILE_NEQ (using Z flag) IF_PLUS (using N flag) WHILE_ZERO (using Z flag) IF_MINUS (using N flag) WHILE_NOT_ZERO (using Z flag) IF_NEG (using N flag) WHILE_PLUS (using N flag) IF_C_SET (using C flag) WHILE_MINUS (using N flag) IF_C_CLR (using C flag) WHILE_NEG (using N flag) IF_GE (using C flag) WHILE_C_CLR (using C flag) IF_LT (using C flag) WHILE_C_SET (using C flag) IF_V_SET (using V flag) WHILE_GE (using C flag) IF_V_CLR (using V flag) WHILE_LT (using C flag) IF_FLAG_VAR (added May 2013) WHILE_V_CLR (using V flag) IF_BIT (added May 2013) WHILE_V_SET (using V flag) IF_MEM_BYTE_NEG (added May 2013) WHILE_BIT (added May 2013) IF_MEM_BYTE_POS (added May 2013) REPEAT ELSE_ AGAIN END_IF UNTIL_EQ (using Z flag) CASE (using A, X, or Y) UNTIL_ZERO (using Z flag) CASE_OF (using A, X, or Y) UNTIL_NEQ (using Z flag) END_OF UNTIL_NOT_ZERO (using Z flag) END_CASE UNTIL_PLUS (using N flag) UNTIL_MINUS (using N flag) UNTIL_NEG (using N flag) FOR (16-bit. Overwrites A) UNTIL_C_CLR (using C flag) NEXT (16-bit. Overwrites A) UNTIL_C_SET (using C flag) UNTIL_GE (using C flag) FOR_X (added May 2013) UNTIL_LT (using C flag) NEXT_X (added May 2013) UNTIL_V_CLR (using V flag) FOR_Y (added May 2013) UNTIL_V_SET (using V flag) NEXT_Y (added May 2013) UNTIL_BIT (added May 2013)In May 2013 I added a group of accessory macros I find useful:
RTS_IF_EQ RTS if Z flag is set RTS_IF_NEQ RTS if Z flag is clear RTS_IF_PLUS RTS if N flag is clear RTS_IF_MINUS RTS if N flag is set RTS_IF_FLAG_VAR RTS on a flag variable's condition. The variable name and target condition are given in the parameter list. RTS_IF_BIT RTS if the specified bit of the specified byte in memory meets the target condition. These are given in the parameter list. RTS_IF_MEM_LOC RTS if the value in the specified memory location is positive | negative | zero | non-zero, again per the parameter list.I supply these macros, in the form needed by the C32 assembler, in the files STRUCMAC.ASM, STAKPUSH.ASM, STKPUSH2.ASM, STKPUSH3.ASM, STACKPOP.ASM, STAKPOP2.ASM, STAKPOP3.ASM, and STAKSWAP.ASM, or, all zipped together, STRUCMAC.ZIP, (named such because I did them in DOS). To clarify the meanings of things in these accompanying .ASM files, here are the relevant rules of the C32 assembler I'm using. Your assembler will probably be similar enough that a quick search-and-replace operation will be sufficient. There's a list of free 6502/c02/816 assemblers here and here.
Labels C32 requires labels to start with a letter in the range of A-Z, or "_", ".", or "?". Individual characters within a label can come from the same set or 0-9. Unfortunately it's not case-sensitive. If the first character of the label is not in column 1 of the line, the label must be followed immediately by a colon. Putting a colon in after every label is a good practice anyway, as it makes searches for actual labels (as opposed to references to the label) much quicker. A label can be any length, and it can stand alone on a line. $ is the current program counter value. It can be used in labels or expressions for them or for operands, etc.. Some assemblers use * instead. EQU EQUate. Sets an assembler constant. It cannot be changed after its first definition. SETL SET Label value. It's basically a variable for the assembler itself, in that its value can be changed countless times after it is defined. It always requires a label and an expression to assign the value to the label. DFB DeFine Byte(s). Lay the following bytes down in the code. If there are two or more, they are separated by commas, except of course when they are consecutive bytes in a quoted string. Some assemblers use .BYTE , or simply BYTE. DFS DeFine Storage. Skip the specified number of bytes, with the beginning of the block taking on the label. Used for variables in the target system. The 2500AD assembler uses BLKB for "define a BLocK of Bytes." ORG ORiGin. Force the program counter to the desired value. We have to do this a lot in the program-structure macros, to go back to earlier places and fill in addresses and forward branch distances after they become known, then go forward again ("back to the future" :) ) to pick up assembly at the end of the previously laid code. All the back-and-forth makes for varying line lengths in C32's Intel Hex output, but it's not a problem. IF For conditional assembly. If the expression following it comes to something other than 0, do the following lines. If it's 0, the assembler ignores them, down to the ELSE or ENDI, or anything that evaluates to those (like the END_COMMENT macro). To keep the IF assembler directive separate from the structure macro names, I've put in conditions on the latter, referring to flags to branch on, like IF_EQ, IF_C_SET, etc.. ELSE Optional. If the "IF" portion was false, pick up the assembly process here. To keep the assembler directive separate from the structure macro name, I've called the latter ELSE_ (note the underscore at the end, and don't forget to put it in). ENDI Regardless of the outcome of the "IF" above, assembly will definitely be back on after this point. The macro name of the end of the IF_xx structure is END_IF . INCL INCLude. Bring another source-code file in at this point. You can have various files that are used as modules for many different projects, and it is not necessary to show all that code again in every one. Added flexibility might be exercised with the "IF" directive, ie, that you assemble the other file at this point if certain conditions are true. INCL can be nested; ie, an INCLuded file can have INCLude directives to bring in other files, and so on. A file can be brought in with INCL at many different points. MACRO Begins a macro definition. MACRO must always follow a label. MACRO can be followed by parameter expressions, as many as you want, as long as they all fit on the one line. If there are two or more parameters, they must be separated by commas. ENDM signals the end of the macro definition. operators in expressions (There are others, but they're either obvious or not used here): == equal to != not equal to $ value in program counter. Some assemblers use the * . & bitwise AND X >> Y Shift X right by the number of bits specified by Y.You can have as many parameters in the macro definition as you want, but then C32 requires that the macro call have the same number of parameters. One thing I liked about the 2500AD assembler was that the number of parameters didn't have to match, and you could say in effect, "If there's a fourth parameter, do this with it; and if there's a fifth one, do that with it..." Macro parameters can be expressions. Macro parameters, both in the macro definition and in macro calls, must be separated by commas if there are two or more. C32 does not allow the nesting of macros. Some assemblers do.
BRA: Apologies-- if you're using an NMOS 6502 which does not have the Branch-Relative-Always instruction, you will have to modify the accompanying code. Unless you're using something like the Commodore 64 whose 6510 processor never was available in CMOS, I would encourage switching to the 65c02 which has a lot of advantages over the NMOS 6502.
Structures seldom need branch distances that cannot be achieved with branch instructions, so the JMP is seldom used here. In the rare case that you think you're getting close to the maximum reach of branches, you might need to check in the list file. The forward branches in the structure macros will not give error messages if you try to branch more that 127 bytes away.
Some macro topics on the 6502.org forum:
STRUCMAC.ASM program-structure macros | STAKPUSH.ASM include file | STKPUSH2.ASM include file | STKPUSH3.ASM include file | STACKPOP.ASM include file | STAKPOP2.ASM include file | STAKPOP3.ASM include file | STAKSWAP.ASM include file | STRUCMAC.ZIP, all .ASM files zipped
Related: PIC_stru_MAC.ASM file for PIC16
last updated May 13, 2013 (New macros added May 11, 2013) contact: Garth Wilson, email@example.com