home   |   stacks treatise index   |   1. Intro: stack basics   |   2. subroutine return addresses & nesting   |   3. interrupts   |   4. virtual stacks   |   5. stack addressing   |   6. passing parameters   |   7. inlined data   |   8. RPN operations   |   9. RPN efficiency   |   10. 65c02 added instructions   |   11. synth instructions w/ RTS/RTI/JSR   |   12. where-am-I routines   |   13. synthesizing 65816 stack instructions   |   14. local variables, environments   |   15. recursion   |   16. enough stack space?   |   17. forming program structures   |   18. stack potpourri   |   19. further reading   |   A: StackOps.ASM   |   B: 816StackOps.ASM   |   Appendix C


Inlining subroutine data

One method of specifying data for a subroutine is to put it in line in the program, immediately after the subroutine call.  The subroutine uses the return address on the stack to figure out where to find the data, then adjusts the return address so the RTS instruction puts the program counter back to the right place to resume execution in the main program, so it doesn't drop into the data area immediately following the JSR and try to execute data bytes as if they were instructions, and crash.  If the data length is not always the same, the subroutine looks for a delimiter or a length byte in the data to know where the end of the data is, and how much to add to the return address before executing the RTS.

This method of passing parameters to a subroutine would get used most in situations where the data is a constant or set of constants (including string constants), especially too much to just pass by way of the processor registers.  (Actually, if it's in ROM, it can only be constants, not variables.)  The data doesn't even need a label with this method, unless the data is is used elsewhere also.

If you've already been using the BRK instruction with its signature byte, you already know that the signature byte is a type of inlined data, and you probably know how it gets read.  Push A and X in the ISR (and possibly Y, if the ISR will use Y), then determine whether the interrupt was caused by the BRK instruction, and if so, branch to the BRK handler:

ISR:    PHA          ; (Later you may need to push Y also, with PHY, if the
        PHX          ; ISR uses Y anywhere. NMOS will have to go through A.)

        LDA  103,X   ; Get the stacked status, and
        AND  #$10    ; see if the B bit is set (meaning a BRK instruction
        BNE  break   ; caused the interrupt.  If B bit is set, branch.

        <service hardware interrupt>


break:  LDA  104,X   ; The return addr is at 104,X and 105,X, and the
        SEC          ; signature byte is at the addr they point to, minus 1.
        SBC  #1      ; Use SBC here, not DEA, because we want the C flag.
        STA  temp    ; Do low byte first (temp is in ZP),

        LDA  105,X   ; then high byte.
        BCS  brk1    ; If decrementing the low byte did a borrow,
        DEA          ; then decrement the high byte too.
 brk1:  STA  temp+1

        LDA  (temp)  ; Read the signature byte
         .           ; which will determine the next actions.

A note to whet your appetite for the 65816, which is the 16-bit step up from the 65c02:  The '816, operating in native mode, has a separate BRK vector, so there's no need to test for a BRK instruction having caused the interrupt.  It also offers the option to do a 16-bit decrement (and other arithmetic and logic operations) all in a single instruction, and stack-relative addressing modes, further simplifying operations, plus a lot of other attractions.  It makes a big reduction in the number of instructions needed for the job.

There's more about BRK in the interrupts primer.

http://6502.org/source/io/primm.htm on 6502.org shows three ways to do a print-immediate subroutine with an example usage of:

        JSR   PRIMM
        BYTE  "This will be printed!", $00

You could go further and make a macro that puts it all in one line, defined as:

        JSR    PRIMM
        BYTE   STR, 0

and then use this way in an assembly-language program:

        PRINT  "This will be printed!"

I will add another version now, to avoid the need for variables and keep everything on the stacks.  (Again the structure is covered in the portion of this website on program-structure macros in 6502 assembly.  There's no penalty in program size or runtime speed; they just make the source code more clear and maintainable.)

PRIMM:  DEX               ; Put another cell on the data stack for a temporary
        DEX               ; variable that can be used for indirect addressing.
        PHX               ; Save the data stack pointer value on the return stack.
           TSX            ; These five lines here are similar to a non-existent TSY.
           TAY            ; We'll index into the hardware stack using Y, not X.
        PLX               ; X will be used to index into the ZP data stack.

        LDA  $102,Y       ; Transfer the low byte of the return address
        STA  0,X          ; to the data stack in ZP,
        LDA  $103,Y       ; then do the same with the high byte.
        STA  1,X          ; (It's $102,X and $103,X instead of $101,X and $102,X
                          ; because the TSX above is before the PLX.)
        BEGIN             ; (BEGIN just tells the assembler where the loop top is.)
           INC  0,X       ; Increment the low byte of the return address, to the
           IF_ZERO        ; first byte of the data.  If that made it roll over to 0,
              INC  1,X    ; increment the high byte also.
           LDA  (0,X)     ; Get the next data byte.
        WHILE_NOT_ZERO    ; Cont loop while LDA reads non-0 bytes, else branch past REPEAT.
           JSR  CHAROUT   ; (CHAROUT must leave X intact.)
        REPEAT            ; Jump to top of loop (to the BEGIN).

        LDA  0,X          ; When we've found the 00 byte, tranfer the address back
        STA  $102,Y       ; to the return address space on the hardware stack.  RTS
        LDA  1,X          ; automatically increments it to the first instruction
        STA  $103,Y       ; following the data.

        INX               ; (See notes below about the INX's and DEX's.)

That's 43 bytes, 5 bytes less than Lee Davison's version, 10 bytes more than Ross Archer's version, but does not require any permanent variable space at all, ZP or otherwise.  You might choose to keep a pair of ZP bytes as a virtual processor register to use as an indirect pointer as was done at the linked page on 6502.org, and any subroutines or ISRs that might use it will have to save this virtual register on the stack and later restore it.  It's your choice.  This is just another way you can do it.  Things you should consider when choosing a method would include (but not be limited to) ZP space available for variables, plus code length and bugs.  Slightly longer code length is usually justifiable to reduce the number of variables (especially ZP ones) and bugs.  There's a cute cartoon on page 227 of the .pdf of the book "Thinking Forth" (available here) about accidents caused by too many variables.  I'm sure copyright rules forbid me to post it here.

If neither interrupt service nor CHAROUT will overwrite the two unclaimed ZP bytes over the top of the data stack, you could eliminate the two DEX and two INX instructions (using $FE,X and $FF,X and ($FE,X) for the ZP instructions since they wrap to stay in ZP), and save four bytes and 8 clocks.  Note: The 65816 has a stack-relative indexed indirect (sr,S),Y addressing mode that would make this more efficient.  We'll get to that in section 13.  (It also has a TXY so we don't have to do TXA, TAY, and it can handle 16-bit quantities in one gulp.)  Many of the 816's extra instructions can even be used in 6502-emulation mode.

Another way to save some bytes would be to use Ross Archer's version but replace his DPL and DPH ZP variables with N and N+1 (or other bytes in N), N being the virtual-registers ZP scratchpad space of usually about 8 bytes mentioned about 75% of the way down the page in the last section.  You just have to make sure that CHAROUT doesn't use the same bytes in N, or you're back to pushing it before calling CHAROUT and pulling it later.  Again the normal rules for N are that it is really only for use internal to a subroutine, and the subroutine must be completely finished with N when it exits, and it cannot call another subroutine that might use N while N's contents are still needed.

The above examples always used a null byte to mean the end of the data.  In strings, I have more frequently used counted strings rather than null-terminated strings.  In a counted string, the first byte tells the length, so you know up front rather than having to look for a 00 byte.  The following example is a modification of my first subroutine using this method, in 1987.  LCDDISP (called in the middle of DISPIMMEDIATE) was a subroutine used also in many places where the data was not inlined, for strings used repeatedly.  It requires ADH to be in A and ADL in Y, and that the first byte of the data field be the length.

                      ; The 1st byte after the JSR instruction tells
DISPIMMEDIATE:        ; the string's length.  There is no delimiter.
       TSX            ; The addr needs to be inc'ed from the last
       LDA  $101,X    ; byte of the JSR to the first byte of data.
       TAY            ; Put low byte in Y for LCDDISP,
       LDA  $102,X    ; and high byte in A.
       INY            ; INY _after_ the LDA so we can branch on Z.
       BNE  d1        ; If inc'ing the low byte made it roll over,
       INC  A         ; increment high byte too.

 d1:   JSR  LCDDISP   ; LCDDISP ends with the string length in A.

       ADC  $101,X    ; Add content of 1st (length) byte to return addr.
       BCC  d2        ; If that made the low byte roll over to 00,
       INC  $102,X    ; then increment the high byte too.
 d2:   INC  A         ; Now add 1 for the length byte itself.
       STA  $101,X
       BNE  d4        ; If that made it (the low byte) roll over to 00,
       INC  $102,X    ; increment the high byte of the return addr too.

 d4:   RTS

This would be suitable for non-text, variable-length data also, where you might want null bytes in the sequence before reaching the end.

Another possibility is to write a subroutine that expects the calling program to have the two address bytes of the string (rather than the string itself) immediately following the JSR instruction, if the string is used elsewhere too.  The address itself will of course always be two bytes in length.

Whichever way you do it, strings displayed this way can be a template that gets modified after being put in the display.  In a recent project, I had the string to put in a 2x16 LCD,

      Treat#___, Rx#_,
       _:__  AGL=10

The underscores and the 10 got replaced with other numerals immediately after the string was initially displayed.  I used a carriage-return character embedded in the string in the source code to start the 2nd line.  "Treat" here is short for "treatment," and "Rx" for "prescription".

There will be times that the line will have an area for the user to edit, whether or not you supply default characters, and you control which cursor positions the user can edit, for example:

      Enter SS number:

and the program only lets them type where the underscores are, and the cursor skips over the hyphens.  The program can also ignore characters that are not numerals (but accept <Enter>, <Esc>, cursor keys, etc.).

Caveat:  Even if you want the data to be right there with the subroutine call (to make it easier to bundle in a macro for example), it will in many situations be more efficient to pass the data's address by way of the registers, then have an unconditional branch around them.  This way, the subroutine does not have to take so much execution time finding the data and incrementing the return address.  You might have for example:

           LDA  #>data_adr   ; Get high byte of data_adr.
           LDY  #<data_adr   ; Get  low byte of data_adr.
           JSR  do_stuff
           BRA  data_end
data_adr:  <data>
data_end:  <continue execution>

and put it in a macro so it all goes on one intuitive line as shown further up.  (Other parameters can still be passed simultaneously via the stacks, as already discussed.)  You might also be able to put the return address on the stack first so as to avoid requiring the subroutine to increment it past the data; and then use JMP rather than JSR to call the subroutine.

These methods will take more program memory where the subroutine is called, but they make the subroutine itself shorter and faster, and you can still avoid unnecessary variables and eliminate a source of bugs.  Which way turns out best for the job will depend on things like how many times you'll call the subroutine, memory limitations, and speed requirements.  With that said, we'll continue here, just to show the possibilities, to give you tools for the occasions where they might be appropriate.

If the data is always guaranteed to be the same length, then there's no need for a null byte or a count byte.  Here's an example of just pushing a 16-bit number on the ZP data stack:

          JSR  PUSH_LIT
          DWL  $1A60     ; C32 assembler sees DWL as "Define Word, Low byte first."

and you might put it in a macro LITERAL and use it like this:

          LITERAL  $1A60

and PUSH_LIT might be defined like this  (Note the double indirection!  :) )

        DEX           ; Make another cell available
        DEX           ; on the ZP data stack.
        PHX           ; Keep a record of the data-
           TSX        ; stack pointer while we get
           TXA        ; the return-stack pointer
           TAY        ; into Y.

        LDA  $102,Y   ; Transfer low byte of data
        STA  0,X      ; address to data stack,
        ADC  #2       ; and add 2 to it to get the
        STA  $102,Y   ; updated return address.

        LDA  $103,Y   ; Now transfer the high addr
        STA  1,X      ; byte to the data stack, and
        ADC  #0       ; inc it for the return addr
        STA  $103,Y   ; if low byte rolled over.

        JMP  FETCH    ; (60% of the way down the page in section 4)
                      ; (Or just put FETCH next, and forgo the JMP.)

It's not efficient in 6502, but if you use it enough times in a program, it will pay for itself in program memory, and the speed may not be much of an issue if it's part of a process that involves 16- and 32-bit multiplications and divisions which will take hundreds of clock cycles each.  If you do need the speed, you can revert back to LDA#, DEX, STA 0,X twice (once for high byte, once for low byte) which would perform better (at 16 clocks total) but with a 5-byte penalty for each occurrence.

On the 65816, the whole thing, including the FETCH is a lot shorter; but there you might as well take the two-byte hit and just do LDA#, DEX, DEX, STA 0,X with A in 16-bit mode.  It will take seven bytes and 12 clocks.  On a related note:  To just put a literal on the 65816's hardware stack, all it takes is:

        PEA  $1A60

More on that in section 13, on synthesizing '816 special stack instructions on the 6502.

Similarly, if you had a floating-point stack of four-byte cells, you could use a subroutine with inlined data to put floating-point literals on the stack.  The number 5280 (the number of feet in a mile), according to this IEEE floating-point conversion page (thanks, Rob Finch!), is 45A50000 in a single-precision IEEE float in hex, so when you need to put it on the stack, you would have:

        DFB  $00, $00, $A5, $45    ; fp for 5280, low byte first

To put it in a macro that would do the conversion and assemble the instruction and the data and hide the details, you might have something like:

        FLOAT  5280, "E0"

Depending on the assembler and its macro capabilities, it might be a pretty lengthy macro to do the conversion.  Otherwise, do the conversion beforehand and just put in the comments what it is.

6. passing parameters <--Previous   |   Next--> 8. RPN operations

last updated Apr 2, 2021