The basis for the separate data stack and its use were laid out in section 4, along with the term "Reverse Polish Notation," or "RPN," from the 1920's parentheses-free notation invention of Polish mathematician Jan Łukasiewicz and the fact that RPN puts the numbers on the stack first and then tells what to do with them, instead of vice-versa. RPN was used starting in the early 1960's to make computers more efficient. Philip J. Koopman, Jr.'s book "Stack Computers: The New Wave" came out in 1989, and is available to read online.
Is it better than algebraic notation? It may initially seem less intuitive, and it requires a short time to get the hang of it; but many users then come to prefer it for most things in programming. I'm one of them. My first programmable calculator, a TI-58c in 1981, used algebraic, and next was the HP-41 which used RPN. The algebraic required more keystrokes, but for a short time I could have gone either way. As I gained more experience with RPN, particularly in situations where I was using the calculator/hand-held computer to control lab equipment and take data, I came to prefer RPN. In equations, algebraic shows more directly what you get, while RPN shows how you get there. When I want both, I do the programming in RPN and then put the algebraic notation in the comments.
The argument in school calculators is "Algebraic lets you enter the equation just as you see it on paper!" Well, for one thing, a lot of programming work falls outside of actual equations. For another, even in the field of calculators, in real life we don't usually have an equation in front of us. I think through it, "Let's see—I need A, and then square that; now take B, multiply it by C and add that result to the earlier one. Now I need D raised to the power of E, and divide the earlier result by that. Done. Oops, no, I still need to take the log of that..." That's the way much of real engineering is. It's only in school that you get canned problems and have the equation in the book and you're supposed to just drop the numbers in the chute and turn the crank—then we wonder why we get graduates who passed all their classes and yet show a disconnect between that knowledge and any understanding of what the problem is in their circuit on the workbench! I've hired a lot of electronics technicians and a few engineers, and I gave all the applicants a circuit-analysis test with a dozen simple problems. Not one of them ever got it all correct. It was kind of frustrating.
It's not just about math of course. Take the analogous example from baseball, "Throw the ball to the second-baseman." In
English, this sentence is expressed in algebraic order (although that's not true of all spoken languages.) It starts with what to do,
which is to throw. Throw what? You're told to throw, but you haven't yet been told what to throw. Even if you
did know, you haven't been told what direction yet. In RPN, "ball" would come first, then the direction, and finally
after the needed parameters are in place, the instruction is given. So you have:
ball direction throw
which makes a lot more sense in most programming situations, and in fact it's also much easier for compilers to handle. "Ball" is put
on the data stack, followed by the direction, and finally there's a command to handle them. You might have the following to type (ie,
display or print) a string for example:
string_address string_length type
The inputs on the data stack are usually consumed by the command (so make a DUPlicate first if you'll need it again later). Many commands
will return one or more results on the data stack when they're finished.
In this section, we will further develop some functions and show how to use them. Source code for hundreds
of functions, dealing with two-byte cells on the data stack, is in Appendix A. (Some will appear to be quite inefficient in the number of instructions
required. It's not because of RPN, but because the 6502 is not so efficient at handling 16-bit quantities which we will mostly be
dealing with. The 65816 is far better at that.) A few of those functions will be repeated here with extra explanation;
otherwise, we want to cover more application-oriented examples here with more explanation of the whys, to help become more fluent in
RPN. I try to keep them short to avoid getting bogged down.
If you have an application where you only need single-byte cells on the data stack, many of the stack functions in Appendix A might become short enough to appropriately be made into short macros instead of subroutines. This would be a rather limited situation since you can't have full addresses in a single byte, but I have definitely had such occasions in the past, needing to handle a lot of boolean inputs and functions with intermediate results that had to be left pending for further handling later.
The subroutines below will serve as examples, with explanation, of various RPN operations. One thing that keeps coming up is the need to put a 16-bit literal on the data stack, so we make a macro, called LITERAL, which was introduced in the last section and just lays down the JSR PUSH_LIT instruction followed by the 16-bit number to put on the stack. (Several commonly used literals might be used often enough to justify making them their own subroutines so it's not necessary to put data after the JSR. These might include, but not necessarily be limited to, -1, 0, 1, 2, 3, 4, $20 (ASCII space), and $8000, plus many others that are specific to the application.) A list of basic stack operations was introduced briefly about 70% of the way down the page in section 4. Note also the stack-effect comments at the first line of each subroutine below, in the parentheses, with the dash, introduced about 80% of the way down the page in section 4. Again, the part to the left of the "--" is what the subroutine needs upon entry, and the part to the right of the "--" is what it leaves on the stack after the subroutine is executed. In both cases, the TOS (top-of-stack cell) is toward the right end of the set.
Let's look at an example to convert degrees Fahrenheit to degrees centigrade, scaled by ten so we have
0.1° resolution. °C=5/9(°F-32). We'll introduce a subroutine
STAR_SLASH. MINUS was mentioned earlier, and is self-explanatory:
DegFtoC: ; ( °F*10 -- °C*10 )
LITERAL 320 ; Put 32.0 (in decimal) on the
JSR MINUS ; data stack, and subtract.
LITERAL 5 ; Below, we'll multiply by 5
LITERAL 9 ; and divide by 9. Remember
JMP STAR_SLASH ; (JSR, RTS) */ has a 32-
;------------------ ; bit intermediate result.
STAR_SLASH (ie, */) has a stack effect of ( n1 n2 n3 -- n1*n2/n3 ),
with a double-precision (ie, 32-bit) intermediate result (to avoid overflow) after the multiplication and before the division. If
you wanted to save a few bytes, you could put the 5 and 9 on the stack all at once as a double (ie, 32-bit) number instead,
with D_LITERAL. Since a double has the high byte on top, 9 looks like the high byte of a double, so
you would enter $90005, or 589,829 in decimal. (Be careful when doing that with one or both numbers being negative though, to keep
the signs right.) Then DegFtoC will look like this:
DegFtoC: LITERAL 320 ; Put 32.0 (in decimal) on the data
JSR MINUS ; stack, and subtract.
D_LITERAL $90005 ; Like LITERAL 5 followed by LITERAL 9.
JMP STAR_SLASH ; Multiply by 5 and divide by 9.
;------------------
: U*/_RND ( U1 U2 U3 -- U1*U2/U3 ) \ Fractional part of answer is rounded in, not truncated. >R UM* R@ -1 SHIFT 0 D+ R> UM/MOD NIP ;whose assembly-language equivalent is, more or less:
U_STAR_SLASH_RND: ; ( U1 U2 U3 -- U1*U2/U3 ) Fractional part of answer is rounded in, not truncated. JSR to_R ; Transfer the number to divide by (ie, U3) from data stack to return stack. JSR UM_STAR ; Multiply U1 by U2, getting a double-precision (32-bit) answer. JSR Rfetch ; Get a copy of U3 from the return stack and put it on the data stack. JSR NEG1 ; Put -1 on the data stack because we want to logical shift right (not left, JSR SHIFT ; hence -1, not 1) 1 bit, ie, to get half of the 16-bit divisor's value. JSR ZERO ; Make it double-precision (32-bit) with a high cell of 0 put on the stack. JSR DPLUS ; Add those two double-precision numbers. JSR Rfrom ; Now get U3 off the return stack and put it back on the data stack. JSR UM_SLASH_MOD ; Do the standard UM/MOD, getting 16-bit quotient and 16-bit remainder, JMP NIP ; and get rid of the remainder which ended up just under the quotient. ;------------------We'll get to the to_R, UM_STAR, Rfetch, and Rfrom further down. There's also a 2/ or _2SLASH (to divide by two) which looks simpler than -1 SHIFT, but it's for signed numbers; so for example 35,129 (ie, $8939, or 1000100100111001B), which in this case we want to come out to 17,564 ($449C, or 0100010010011100B) instead gets interpreted as -30,407, and 2/ turns it into -15,204 ($C49C or 1100010010011100B) so the negative sign is preserved.
The way to make it round the fractional part of the answer instead of truncating it is to add half the divisor's value to the numerator right before dividing. If you calculate with guard digits and then reduce the number of digits for display, you will normally want to do this trick again when converting for display. <end side note>
Again, the same idea can be carried further in scaled-integer applications for work that is often thought to be floating-point
territory. I had situations in the automated test
equipment (ATE) where for example a 12-bit A/D converter produced a 4096-count output for a 0-5V input (after signal conditioning and
ranging), meaning each count is 1.221mV. You can adjust the output this way:
LITERAL 1221 ; (in decimal, not hex)
LITERAL 1000
JSR STAR_SLASH ; Mult by 1221 and div by 1000 (w/ 32-bit intermediate result).
or shorten the literals with a double again, converting to hex to be able to separate the two 16-bit numbers visually into $4C5 and $3E8:
D_LITERAL $3E804C5 ; Like LITERAL 1221 followed by LITERAL 1000.
JSR STAR_SLASH ; (Or use JMP if you make this a subroutine.)
(or you could just leave it in decimal, as 65,537,221, or 1000*65,536 + 1221.) Or, why not make another macro that does the
same thing but makes it more clear:
RATIONAL 1221, 1000 ; Numerator and denominator, to
JSR STAR_SLASH ; multiply by 1221/1000.
or even:
MULT_BY_RATIONAL 1221, 1000
Rationals can give good, unexpected accuracy with few digits. For example, you can represent pi as 355/113, which is accurate to
about 8 digits. You can represent √2 as 19601/13860, which is accurate to about nine digits, even though both numbers still fit
comfortably in 16-bit cells, even as signed numbers. I have a list of commonly used rational numbers, along with their accuracies
and how I calculated them, here.
Note: You can avoid the division by using more bits in the multiplication and multiplying by a number that will give the 16-bit answer in the high cell of a double-precision (ie, 32-bit) number instead of the low cell. Then you either dump the low cell, or round it into the high cell. So to multiply by the 1.221 above, you could multiply by 1.221*65,536 (which is 80,019, or $1:3893), and use JSR NIP to get rid of the low cell of the result.
The entire ATE project was done without floating-point, including logarithms. But to do the above in floating-point, you
might have something like:
FLOAT 1.221, "E0"
JSR FP_MULT
Now suppose you want a random number generator. Something that helps make it random is to use the
free-running timer/counter in a 6522 VIA or similar I/O IC, since you don't know what its value will be when you read it at more-or-less random
times. VIA1T1CL below stands for VIA1 (ie, the first 6522 VIA chip), Timer 1, Counter Low byte. If
you read a 16-bit value onto the data stack, you'll get the high byte too, ie, VIA1T1CH. Read the byte
pair and do some scrambling by multiplying by $7ABD (which will cause overflow and we don't care about losing those high bits, since we're
after random numbers anyway) and adding $1B0F (which may cause addition overflow, and again, we don't care). The result will be
MOD(T1CL*$7ABD+$1B0F, $10000).
This function by itself is not very good for randomizing if you were to input 1, 2, 3, 4, etc.; but keep in mind that even the multiply
takes differing numbers of cycles based on the inputs, so T1's count becomes more random. It definitely won't be multiples of some
number, even if you do a loop to generate an array of random numbers. You also don't know what T1's starting value will be.
; ( -- u ) Produces a random number, range 0-$FFFF.
RANDOM: JSR _VIA1T1CL ; Get address of the counter byte pair on the stack.
JSR FETCH ; Get the contents of the counter. It's ok if T1CH
LITERAL $7ABD ; increments after you read T1CL. Put literal on
JSR STAR ; the ZP data stack and multiply. Overflow is ok.
LITERAL $1B0F ; Now put another literal on the data stack, and
JMP PLUS ; (JSR, RTS) add. Again, overflow is ok.
;------------------
You could of course optionally go further with the scrambling and with saving one or both bytes or portions of past outputs to involve in
the formation of future ones.
Now when you want random numbers in a certain range, you start with the unsigned maximum number on the stack, and
call CHOOSE (below). A subroutine mentioned above is UM_STAR ("you-emm-star,"
for "unsigned, mixed-precision multiply," "mixed precision" meaning that it takes in two 16-bit numbers and gives 32-bit result). Another
is NIP which removes 2OS, ie, the second-on-stack cell, meaning the one under TOS, the top-of-stack
cell. NIP does the same thing as JSR SWAP followed by
INX INX (ie, what the DROP macro lays down), but is more efficient.
; ( u1 -- u2 ) Produce random nr, range 0 to (u1-1).
CHOOSE: JSR RANDOM ; Get the whole-range random number from above.
JSR UM_STAR ; Multiply by your input number to scale the range,
JMP NIP ; (JSR, RTS) giving 32-bit output. Dump the low 16.
;------------------
The reason for NIP instead of DROP here is that CHOOSE's
input number is scaled by $10000, meaning $FFFF multiplies RANDOM's output by just a hair less than 1; so for
example if you wanted a range of 0-$400 (minus 1, so 0-$3FF inclusive), and you enter $400, the output will have a maximum of $FFFF*$400, or
$3FFFC00. DROP would take off the high byte and leave you with $FC00 which is not what you
want. NIP OTOH takes off the low cell and leaves you with $3FF.
Let's take an iterative process: Square root. For the square root subroutine below, maximum
input is $3FFFFFFE (half the maximum positive number for 32 bits, minus 1), or 1,073,741,822 in decimal. If you enter a higher number,
it returns the root of this number, which is 32767d. The method used is r=(x+r²)/2r, repeated until you converge on an
answer. The first guess is $7FFF. Lower first guesses can get by on fewer iterations on low input numbers, but can cause
overflow errors with high input numbers. First I used a BEGIN...UNTIL
structure to watch for when you get the same answer twice in
a row and exit; but if the input number is a perfect square minus 1, like 99, the odd- and even-numbered iterations never agreed, and
looping continued. The short-code solution was to always loop enough to finish any root (√0 took 15 iterations). The
number of iterations has to be odd, because on the square-minus-1 problem, an even number of iterations gave the approximate root minus
1, which means √0 gave -1. This method is not lightning fast, but I went for short code.
A subroutine not introduced until now is DU_MIN which takes two double-precision (ie, 32-bit) unsigned
numbers on the stack and deletes the higher one, keeping only the lower one. Another is PICK, which looks
back in the stack to the depth specified, and puts a duplicate of that cell at the top. 0 PICK is the
same as DUP, and 1 PICK is the same as OVER.
Another is _2STAR (2*) which just uses shifting to quickly multiply the cell by
two. The last one is UM_SLASH (UM/) which takes in an unsigned double number
and an unsigned single, and outputs an unsigned single quotient, discarding the remainder. The stack effect is
( ud u -- u_quot ). MINUS_ROT does exactly the opposite
of ROT, so its stack effect is
( a b c -- c a b ). The "^" below starts a comment on what's on
the data stack after that line has executed.
; ( ud -- u )
SQRT: D_LITERAL $3FFFFFFE ; Between $3FFFFFFE and the 32-bit input
JSR DU_MIN ; number, take the lower of the two.
LITERAL $7FFF ; Put initial estimate of $7FFF on stack.
; Use Y to loop 15x. It drops thru when
FOR_Y $0F, DOWN_TO, 0 ; DEY in NEXT_Y makes Y become 0. ^ D n1
JSR DUP ; To square the estimate, just put two
JSR DUP ; duplicates of it on the stack, and
JSR UM_STAR ; multiply them, getting a 32-bit result.
LITERAL 4 ; 4 PICK twice reaches back into the stack
JSR PICK ; and gets a full copy of the two cells of
LITERAL 4 ; the double number input to put at the
JSR PICK ; top of the stack.
JSR D_PLUS ; Add it. Now you have: ^ D n1 (x+r²)
JSR ROT ; ROT changes that to: ^ D (x+r²) n1
JSR _2STAR ; Now double the n1 (the new estimate)
JSR UM_SLASH ; to divide the (x+r²) by. ^ D n2
NEXT_Y ; DEY, BNE. Loop until Y becomes 0.
JSR MINUS_ROT ; ^ n2 D
_2DROP ; This is a macro in StackOps.ASM. ^ n2
RTS
;------------------
The stack-effect comment on the JSR ROT line above makes it look like we just did a
SWAP; but (x+r²) is a double, ie, 32 bits, which is why n1 became the TOS after the ROT.
I'm no algorithms expert, so there might be better ways to get a square root. I got this method from a magazine, probably Embedded Systems Programming. Lee Davison (RIP) has one here which is faster, but longer, and only handles 16-bit inputs (giving an 8-bit output), and requires variables (unlike the subroutine above).
The same article I got the above method from also had a very simple, short one that requires no multiplication or division, but is extreeeeeemely slow. There will be situations where that's ok though. If there were any reason to have a square-root function in a sprinkler controller or thermostat, I'm sure the slowness would be no problem. I tell about it, with an assembly-language program suitable for 15-bit inputs, here. It uses variables, but could be modified to use the data stack instead.
String manipulation can be handled in RPN too. The innards may get a bit tricky to follow, but we still
eliminate the need for variables; and once the components (SWAP, CMOVE, etc.) are
in place, the string subroutines will be shortened. Below are two subroutine samples, STR_STO (string store)
and STR_ADD (to add one string to another), for counted strings, meaning that instead of using a null-byte
delimiter, the first byte tells the length of the string. With a minor change, they could be made to handle strings with two
bytes for telling the length, so strings could be thousands of bytes long, with very little added complexity.
If you have only the address of the string, COUNT takes that as an input and returns two cells: the address
of the first data byte, and the length of the string; and it can be defined this way:
COUNT: LDA (0,X) ; ( addr -- addr+1 len ) Get the length.
INC 0,X ; Increment the low addr byte, to get to the start of the data.
BNE c4 ; If that made it roll over to 00,
INC 1,X ; then increment the high byte too. ^ addr+1 (Length is still in A.)
c4: JMP C_PUSH ; (JSR, RTS) Push the length byte as a new TOS cell w/ high byte=0.
;------------------
Subroutines used below that have not been introduced until now are C_STO (8-bit character
store), INCR (increment the data the TOS points to by 1), DECR (decrement
the data the TOS points to by 1), and _2PICK which is just LITERAL 2
and PICK merged since it gets enough use to justify that. (Again, PICK
makes a copy of the data-stack cell that's at the specified depth, put puts that copy at the top of the stack, replacing the
depth specifier.) And again, the "^" below starts a comment on what's on the data stack after
that line (or the last line of actual code) has executed.
STR_STO: ; S! ( str_addr length dest_addr -- ) Store counted string.
JSR _2DUP ; DUPlicate the length & the destination address for C_STO, and
JSR C_STO ; store the length byte (not word) at the desired destination.
JSR INCR ; Go dest_adr+1, as text will start at the byte after the length.
JSR SWAP ; ^ str_source_adr dest_adr len
JMP CMOVE ; (JSR, RTS) ^ <empty>
;------------------
; S+ Copy the 2nd counted string to the end of the 1st.
STR_ADD: ; ( addr1 length1 addr2 length2 -- addr1 length1+length2 )
JSR SWAP ; Get addr2 on TOS. ^ adr1 len1 len2 adr2
JSR _2OVER ; Make a copy of adr1 & len1 ^ adr1 len1 len2 adr2 adr1 len1
JSR PLUS ; and add these two to get addr where to move the 2nd string to.
; ^ adr1 len1 len2 adr2 adr1+len1
JSR _2PICK ; Copy length2. ^ adr1 len1 len2 adr2 adr1+len1 len2
JSR CMOVE ; Concatenate strings. CMOVE's from,to,count inputs are the
; last 3 param.s shown in the previous line. ^ adr1 len1 len2
JSR PLUS ; Add the lengths of the two strings to get total length.
JSR DUP ; ^ adr1 tot_len tot_len
JSR _2PICK ; ^ adr1 tot_len tot_len adr1
JSR DECR ; ^ adr1 tot_len tot_len adr1-1
JMP C_STO ; (JSR, RTS) Store new length of string 1. ^ adr1 len1+len2
;------------------
Note that the strings themselves do not go on either stack. Only the address and often the count does. This is normal for longer
pieces of data. Some other languages (notoriously Pascal) put very large pieces of data on the stack, even whole procedures; but for
those large sets, it is usually better to only put their addresses on the stack.
Appendices A and B * * * * * have various subroutines for handling strings, including forming formatted number output (called "pictured numeric output") in strings. The number is turned into text in a string variable called the "pad," and then you can add it into other strings you are forming. A nice thing about the way we'll do it is that it's just as easy to do it in one base as another; in fact, the same routines are used, unmodified, and a variable called BASE (which, not surprisingly, holds the base of the numbers to read from text, or form into a text string) makes the difference. You can even change the base between digits that are converted, which is practical for things like outputting a time string, like "12:41:23", where the seconds' and minutes' units are in base ten, but the tens are in base 6; so if you're converting a number of seconds, 45,683 or $B273 in this case, the representation in the variable is in hex but you convert a digit at a time (right to left) in base ten, then six, then ten again, the six again, the ten two more times, adding the "hold" characters (":" in this case) at the appropriate times.
How to do the conversion: We will do this all in hex, regardless of the desired output base. We will not use the 6502's decimal mode. Take your input number and divide by BASE. The integer part of the quotient goes back where the input number was, and the remainder, converted to ASCII, becomes the right-most digit. Now do it again, even changing BASE first (like going from 10 to 6 above) if desired. The remainder becomes the next digit, moving left. In our "12:41:23" example above, we would add the ":" after the seconds' digit (using subroutine HOLD), before continuing on to do the minutes' digit, and again between the minutes' pair and the hours' pair. If a sign in involved, you first push the sign on the return stack with toR, then take the absolute value of the number and convert it, then pull the sign back with Rfrom and add it to the left end of the pictured numeric output when you're done.
If you wanted to convert the degrees output near the top of this section, you would convert the first digit, then insert the radix ("." in the U.S., and usually "," in Europe) using HOLD, then continue until the number you're dividing becomes zero. If you get more than three digits to the left of the decimal point and want to separate groups of three, you can insert the separator (we use the comma in the U.S.), again using HOLD. HOLD takes an argument from the top of the data stack, the ASCII value of whatever you want in insert. It could be / , . : ' or anything else.
I've thought about using a string stack, but my own need for it has been too small to justify it. It seems best to just have two or three string variables to form strings in. Strings can of course be any length; so if you make a string stack, being able to find the next string on the stack requires you to either make every cell to be the maximum length, or examine the string-length byte (in the case of counted strings) of each string or the end-of-string byte (in the case of null-terminated strings). You would be more likely to do stack operations in place with strings than with numbers, since you can't pull whole strings into processor registers to concatenate, compare, search for the occurrence of one in the other, etc..
For more on forming numeric output strings, see chapter 7 of "Starting Forth," especially from about the middle of the page on down. (We're not really doing Forth here, but the trail has already been blazed in good stack operations, and we can do them in assembly language.) There's more on interpreting numbers in text input in chapter 10 of the same online book.
If you've been experimenting with ideas, you might have found that there's sometimes a need to get something at the TOS
out of the way temporarily to make stack manipulation easier; but you don't want to use a variable. So what do you do? You can
use the other stack—the return (ie, hardware) stack. toR removes a cell from the top of the data stack
and transfers it to the return stack. Rfrom goes the other direction, removing a cell
from the return stack and transferring it to the data stack. They can be defined in the ways shown below.
However! To shorten the routines, here we use a scratchpad space, called "N", of usually 8 ZP bytes,
mentioned 75% of the way down section 6. So why not just use N to save the top data-stack cell(s)? It's because we set rules
that N is to only be used within a subroutine, and any usage of N must be finished and N must be freed up before another subroutine
is called, and before the return from the current subroutine. (Obviously then if an interrupt hits and the ISR uses N, it must preserve
it; but nothing else should be having to.) If these rules are kept, it works out quite well.
; clocks:
; The JSR calling it takes 6.
toR: PLA ; First pull the return addr off 4
STA N ; the return stack and put it in 3
PLA ; N, the temporary area mentioned 4
STA N+1 ; 3/4 of the way down section 6. 3
LDA 1,X ; Now push the top data-stack 4
PHA ; cell onto the return stack. 3
LDA 0,X ; 4
PHA ; 3
INX ; Get rid of the now-unused cell 2
INX ; that was at the top of the 2
; data stack.
LDA N+1 ; 3
PHA ; 3
LDA N ; Push the return addr back 3
PHA ; onto the return stack. 3
RTS ; 6
;------------------ total: 56
; The JSR calling it takes 6.
R_from: PLY ; Pull return addr low off stack 4
PLA ; followed by return addr high. 4
INY ; Increment return addr low and 2
STY N ; store it for indirect JMP below. 3
BNE rf1 ; If INY made it roll over to 00, 3|2
INA ; increment return addr high, too. 0|2
rf1: STA N+1 ; 3
DEX ; Prepare a new cell space on 2
DEX ; the data stack. 2
PLA ; Take desired data off the return 4
STA 0,X ; stack and put in on the data 4
PLA ; stack, low byte then high byte. 4
STA 1,X ; 4
JMP (N) ; 6
;------------------ total: usually 51
As I was discussing this with Jeff Laughton
during the writing, he brought up some unorthodox ways of shortening it and speeding it up, as he has a knack for doing both in hardware and
sofware. However, it's still not very efficient on 6502. It's much more efficient on the 65816.
If you don't need them often, or you want better performance, you could make them
macros, at 8 bytes each (which is 5 more than a JSR takes), and 18 and 20 clocks:
toR: MACRO ; ( n -- ) clocks:
LDA 1,X ; Copy high byte to return 4
PHA ; (hardware) stack, then 3
LDA 0,X ; do the low byte. 4
PHA ; 3
INX ; Finally, free-up that 2
INX ; cell on the data stack. 2
ENDM ; total: 18
;------------------
Rfrom: MACRO ; ( -- n )
DEX ; First, prepare a new cell 2
DEX ; space on the data stack. 2
PLA ; Get low byte off return stack 4
STA 0,X ; and put it on the data stack, 4
PLA ; then to the same with the 4
STA 1,X ; high byte. 4
ENDM ; total: 20
;------------------
Let's look at using the data stack to calculate an address of a field of a record in an array, and then copy
a string variable to that field. Suppose:
File_The_String: ; ( Record# -- ) (Record number previously calc'ed.) LITERAL 23 ; Multiply the record number by JSR STAR ; the number of bytes in each record. JSR _ArrayBase ; Add that to the JSR PLUS ; array's base address. LITERAL 5 ; Add the field offset to get the JSR PLUS ; actual address of the desired field. JSR _STR1 ; Put the address of STR1 on the stack, JSR SWAP ; then change the parameter order for CMOVE. LITERAL 16 ; ^ From_addr To_addr length (the order CMOVE needs) JMP CMOVE ; (JSR, RTS) Done. ^ <empty> ;------------------I just put in 16 as the number of bytes for CMOVE to move, as it's easier than finding the string count. If the string is shorter, extra bytes past the end will get moved, but it won't do any damage. There's no requirement that it be a string—it's just that that was the assumption here.
Modifications can be made, for example to have the offset into a record be an input parameter on the data stack:
File_The_String: ; ( Record# offset -- )
JSR SWAP ; Put the record number at the top to handle first.
LITERAL 23 ; Multiply the record number by
JSR STAR ; the number of bytes in each record.
JSR _ArrayBase ; Add that to the
JSR PLUS ; array's base address.
; Add the field offset to get the
JSR PLUS ; actual address of the desired field.
JSR _STR1 ; Put the address of STR1 on the stack,
JSR SWAP ; then change the parameter order for CMOVE.
LITERAL 16 ; ^ From_addr To_addr length (the order CMOVE needs)
JMP CMOVE ; (JSR, RTS) Done. ^ <empty>
;------------------
or to merge the addition of the array base and the addition of the field offset into the record:
File_The_String: ; ( Record# -- ) (Record number previously calc'ed.)
LITERAL 23 ; Multiply the record number by
JSR STAR ; the number of bytes in each record.
LITERAL ArrayBase+5 ; Add that to the array's base
JSR PLUS ; address and the field offset.
JSR _STR1 ; Put the address of STR1 on the stack,
JSR SWAP ; then change the parameter order for CMOVE.
LITERAL 16 ; ^ From_addr To_addr length (the order CMOVE needs)
JMP CMOVE ; (JSR, RTS) Done. ^ <empty>
;------------------
Note that I'm differentiating the following this way:
Or how about finding the address of a record in a two-dimensional array, and then reading it and putting its
contents on the data stack. Suppose:
; ( row# col# -- n ) JSR SWAP ; Get row# on top, and LITERAL 90 ; multiply it by the number JSR STAR ; of columns in a row, then JSR PLUS ; add the desired col# number ; to get the record number. JSR _2STAR ; Now multiply by 2, since there ; are two bytes per record. LITERAL ArrayBase ; Add address of base of array JSR PLUS ; to get address of desired record. JSR FETCH ; Replace that addr with its contents.
Loops with 16-bit index, limit, and increment size can be done using the data stack too. The 6502 is not
very efficient at the 16-bit stuff, but sometimes the job needs to be done and the added programmer productivity of having a ready-made set of
tools for it is well worth it. The 65816 is of course much more efficient at handling 16-bit quantities, and you can have
for example:
LDY #1240
loop: <do_stuff>
<do_stuff>
INY
CPY #5804
BNE loop
It takes far more instructions to do it on the 6502 though. I have FOR...NEXT macros to do that in
the section of this website on program-structure macros, at
http://wilsonminesco.com/StructureMacros/
and the form that the above example would take on is:
FOR_Y 1240, UP_TO, 5804 ; (C32 requires commas between parameters.)
<do_stuff>
<do_stuff>
NEXT_Y
and the macros hide the ugly internal details (although you're still in complete control of them). But now suppose you want to use
the data stack with 16-bit cells for loop parameters. There are tools in Appendix A for
that, called DO and LOOP and a few related ones. The general form (again
referring to the example above) is:
LITERAL 5804 ; Put the loop limit on the data stack.
LITERAL 1240 ; Put the initial loop index on the data stack.
DO
<do_stuff>
<do_stuff>
LOOP
where DO and LOOP are macros that assemble do
and loop and the branch addresses, not needing labels. (How to have the assembler do
this will be covered in section 17, and we will discuss how to set up a stack in the assembler itself to be able to nest
structures.) Loop limit and index don't have to be literals of course. They will often be calculated on the data
stack before arriving at DO instead. Note the order, which is like a tag on a gift, which might say,
"To Emily, From Bob", with the loop limit going on the data stack before the initial loop index. The index will get
incremented or decremented until it crosses the line between the limit and limit minus 1. We'll see a little more on this
near the bottom of section 11. At run time, do takes the parameters off the data stack and puts them
on the return stack, optionally following the loop end address. Since the looping controls are using the stacks and not registers
or variables, loops can be nested several levels deep without trouble. Again it's the nature of stacks.
Appendix C is a slightly off-topic piece about RPN calculators, a production test
application of controlling instrumentation on the workbench, plus assembly language, and Forth, taken from an email I sent to a smart nephew
who was thinking about what direction he wanted to go in programming and robotics as he started college.
7. inlined data <--Previous |
Next--> 9. RPN efficiency
last updated Feb 4, 2024