assembly today

Why is assembly language still relevant today?
What is assembly language, what is machine language, and what is an assembler?
Sidebar: Similarity to calculator keystroke programming
Links

Assembly Language: Still Relevant Today (No, it won't ever be dead.)

First, no, I don't do everything in assembly. I use other languages too. Assembly language does have its place though. Second, a few of my points may not apply to highly complex processors; for example, self-modifying code may not work on a processor with an instruction cache.

Jeff Laughton (Dr Jefyll on the 6502.org forum) says, "I recall hanging out with a programmer pal o' mine and a younger fella who was in college. The young fella was complaining, 'We have to take assembly language,' and Len corrected him immediately, saying, 'You get to take assembly language!' :o) " Part of the reason is to learn machine organization, the subject of volume 1 of the book "Write Great Code" by assembly-language expert Randall Hyde, writer of the essay linked at the bottom of this web page.

asveikau, who had been a developer at Microsoft said, "From a certain vantage point, in these languages, if you aren't seeing assembly from time to time, you're not seeing reality."

Unfortunately even Dave Jones went really hard on assembly-language programmers in a microcontroller episode of his EEVBlog. I have a lot of respect for him, but I don't always agree with him. Part of the point of this article is to show that the reasons for this animosity against assembly language fail to consider some key points.

As food is said to be the way to a man's heart, assembly language is the way to the computer's heart. Assembly language, or more accurately, the machine language the assembler turns out, is the only language the microprocessor understands. High-level languages (HLLs) must be either compiled or interpreted; the processor cannot handle them directly.

Assembly language yields maximum control and execution speed. Things can be done in assembly that might not be doable in HLLs, and it is definitely the way to get the maximum performance out of a processor (with possible exceptions for very complex processors). That's not to say an application has to be 100% in assembly language to get the speed benefit. You might find that you can effectively get 95% of max performance by writing only a tiny portion of the code in assembly, provided it's the speed-critical portion.

On a 6502, assembly language-language capability benefits include, but are not limited to:

getting maximum interrupt performance, especially with single-cycle latency with the WAI instruction on WDC's 65c02, as discussed in my 6502 interrupts article
self-modifying code, sometimes appropriate to solve certain problems that have no other solution, or for improving efficiency, as in double-indirect addressing
using BRK with its signature byte in a multitasking OS, or for a mini-interpreter
using the BIT or CMP op codes without an operand, as discussed in this forum topic, and as shown in the NMOS op codes reference, for conditionally skipping certain subsequent instructions without branching, in order to shorten the machine code. On the CMOS 65c02, illegal op codes make it possible to do this without modifying flags.
On the NMOS 6502, the "illegal" op codes (which perform strange operations, undocumented by manufacturers) were popular, and were used in the GEOS GUI for the Commodore 64, to meet the speed and memory requirements to make GEOS a viable software product on that machine.
On the CMOS 65c02, illegal op codes initially act as NOPs, but can be useful for effectively extending the processor's instruction set; for example, Jeff Laughton's fast (1-cycle) 65c02 I/O methods, his KimKlone 65c02 w/ pointer-arithmetic-friendly extended address space and 9-cycle ITC Forth NEXT instruction, and the memory-mapping ops of the Hudson 65c02-based processor used in the TurboGrafx-16 Entertainment SuperSystem.

Writing an OS or getting hard realtime control of I/O hardware will require at least some assembly-language programming. Writing a compiler will require knowing the assembly language of the processor you're compiling for.

An assembly-language program will usually take less memory than an equivalent HLL program. This point is made just over halfway down this page, in the example of compiled C code. Besides the visible length, note that the C compiler's output also calls subroutines that are not shown, whereas the equivalent loop written directly in assembly language is complete. Forth, a language I use a lot, produces very compact code, but you still need the kernel which will be an absolute minimum of a couple of KB, up to ten times that much if it's quite full-featured. If the application is large (for a 6502), the overall Forth code will be more compact than the assembly-language equivalent; but for small applications, the size of the required Forth kernel will tip the memory-economy advantage to assembly language. Very inexpensive microcontrollers often have only a couple KB of program memory, and in those situations, assembly language is the only way to get a substantial program to fit. They also often are required to run on very low battery power, and assembly language gets you the best ratio of computing power to battery power.

High-level languages (HLLs) were invented to:

improve portability (ie, to avoid having to re-write the program every time you go to a different processor)
improve programmer productivity (since instructions in an HLL, especially long equations, can be made more intuitive to humans)
reduce the length of source code

However, I have found that in embedded systems where the program is constantly twiddling I/O bits and making simple decisions, an HLL may yield little or no benefit to offset the losses it incurs in performance— especially if, doing it in assembly, you make good use of macros.

When I write software for the microcontrollers that control the features on our aircraft intercoms and related products, there's almost no math beyond addition and subtraction, but instead, a constant stream of watching the status of various sensors and timings, and controlling the various circuits accordingly, while I have a timer set up to seize control even tens of thousands of times per second for particular jobs, on a very consistent time interval. They are multitasking, but without a multitasking operating system. The microcontrollers we've used do not have the resources or the computing power to do the job in an HLL.

I got a load of free Commodore 64's and accessories for our sons many years ago when the elementary school my wife teaches at was getting rid of them. A common complaint from the teachers was they they were so slow. I looked at the software they were using though, and it was all written in interpreted BASIC, very unprofessionally. Although the days for a C64 to be an appropriate classroom computer are gone, I still wonder how much more it could have delivered if better development tools had been available to the average Joe in the early years.

BTW, it's "assembly language," not "assembler." Despite the slang, "assembler" is not a language, but is instead the tool, the piece of software that converts assembly language into machine language. Its basic action is described 80% of the way down this page. It's like that at work, we have assemblers, ie, people who assemble our electronics products. They follow assembly instructions written out for them.

Fortunately today we have had some great macro assemblers, text editors, and other helps that run on PCs. Such was not the case in the late 1970's and early 80's, even though most computer users back then expected to do at least some of their own programming. A skilled assembly-language programmer can use the macro capability of a good macro assembler to:

gain program structures, abstraction, and other qualities normally associated with HLLs,
do more, with shorter source code,
develop cleaner, more-readable code,
develop code faster,
keep better control of the project,
get fewer bugs, and
make code more maintainable,
all without forfeiting any of the benefits of assembly language. In most cases, there's absolutely zero penalty in run speed or memory taken, because it's still assembly language— you just have more tools for how to control it.

The main argument I seem to hear against assembly language is that there's no structure in it. However, you can form structures through macros. The spaghetti (ie, the maze of GOTOs or equivalent) is gone, and the need for local labels is mostly eliminated, whether for looping, conditional jumps, etc.. You can have for example (and yes, this is assembly language):



        FOR_X  $7F, DOWN_TO, 0          ; Assembles LDX #$7F, and tells NEXT_X to assemble DEX, BNE.
            <do_stuff>                  ; (assembly-language instructions here)
            <do_stuff>
            IF_BIT  VIA_PB, 6, IS_SET   ; Assembles BIT VIA_PB, BVC.
                <do_this>               ; (more assembly-language instructions here)
                <do_this>
            ELSE_                       ; Assembles BRA, and fills in BVC's operand above.
                <do_that>               ; (assembly-language instructions, yada yada)
                <do_that>
            END_IF                      ; Assembles nothing, but fills in BRA's operand above.
        NEXT_X                          ; Assembles DEX, BNE to top of loop.

and you can nest the structures, many levels deep, with others of the same or different type. (Note that theTOpart of the FOR_X line is slightly modified from BASIC in order to more closely match what we do in assembly, which is to increment or decrement the counter at the end of the loop and compare to the limit and drop through when you reach it. IOW, the above loop will be run $7F times, for $7F...1 but not 0.) You can define all the program and data structures you could possibly want.

Here's a short example of an actual useful routine which converts a single-byte hex input to a two-byte (four-digit) binary-coded decimal output in the range of 0-255:



HTD_IN:   DFS  1     ; Input and output variables.  DFS is DeFine Storage, number of bytes.
HTD_OUT:  DFS  2     ; BCD output is two bytes, low-byte-first.


TBL_LO:   DFB  0, 1, 3, 7, 15H, 31H, 63H, 27H    ; DFB is DeFine Byte.  Using C flag set below adds 1
TBL_HI:   DFB  0, 0, 0, 0,  0,   0,   0,   1     ; to the low byte. (Many assemblers use DB, not DFB.)
                                                 ; Table has powers of 2, minus 1.


HTD:  SED                         ; Output gets added up in decimal.
      STZ  HTD_OUT                ; Initialize output word as 0.
      STZ  HTD_OUT+1              ; (NMOS 6502 will need LDA#0, STA ...)

      FOR_X  7, DOWN_TO, NEG_NRs  ; 7...0 is for 8 bits.  NEXT_X drops through when DEX results in -1.
          ASL  HTD_IN             ; Look at next high bit.  If it's a 1, it will set carry (C) flag,
          IF_C_SET                ; so add to the output for this bit.
              LDA  HTD_OUT        ; Get the running output sum and
              ADC  TBL_LO, X      ; add the appropriate value for this bit (using Carry set to add 1)
              STA  HTD_OUT        ; from the low-byte table, and store the new sum.

              LDA  HTD_OUT+1      ; After low byte, do high byte.
              ADC  TBL_HI, X
              STA  HTD_OUT+1
          END_IF
      NEXT_X                      ; Go down to next bit value to loop again.  If not done, loop again.
                                  ; In this case, NEXT_X assembles DEX, BPL up to the line after FOR_X.
      CLD
      RTS
 ;----------------

The macro version is every bit as efficient as you would do without the macros, just more clear. There are longer examples near the bottom of the page in my multitasking article. The program structures become more valuable as the routines get longer.

A macro is like a subroutine that the assembler itself executes in order to assemble portions of code exactly the way you defined in the macro, taking in the various parameters you specify when you invoke the macro in your source code. Macros offer a ton of flexibility—far more than initially meets the eye—and leave you in full control of the internal details without making you write them out or look at them every time you need them. They allow you to effectively raise the level of the language a lot, without losing any of the benefits of assembly. The article linked above tells about macros in general before getting into using macros to make program structures.

I did my first big project using the structure macros in 2014, and it was a breath of fresh air compared to the old way. It was a multitasking system, switching tasks 10,000-15,000 times per second plus interrupts hitting at 39,000 per second, and it maxed out the resources of the 40-pin microcontroller. I tell about the simple method for doing the multitasking without a multitasking OS, for systems that lack the resources for a such an OS, or where hard realtime requirements may rule one out anyway, here. The sample code listings near the end show the structure macros in use. Both articles are 6502-oriented.

Another objection is that assembly language lacks portability. While that's true, it's also true that if you maximize a processor's performance by using assembly, you're less likely to have to go to a more-powerful processor for the next project. The same processor (or microcontroller family) can be used for a wider array of applications. If you do need to move to another processor and you define the same macros for that one too, any needed translation or re-writing goes faster.

Another objection is that assembly language requires more lines of source code to get a job done. Again, macros to the rescue. Here's a simple example of copying two bytes in an array called simply ARRAY to a phase accumulator's 16-bit incrementer variable called INCR:


        COPY2  ARRAY+5, TO, INCR

which lays down the same machine code as:


        LDA  ARRAY+5
        STA  INCR
        LDA  ARRAY+6
        STA  INCR +1

if that's what you defined the macro to do. The macro does the same thing with a quarter as many lines of source code.

And another objection says modern compilers generate better code than your hand-crafted assembly language. That's probably true for modern 32- and 64-bit multi-core high-end processors that have super-complex instructions, deep pipelines that must be kept full, multi-level cache, out-of order execution, and other things that simply don't apply to the 65's. The argument is not true for the simpler processors though.

The x86-oriented book "Art of Assembly Language," downloadable for free in this .pdf, answers many more objections to assembly language, in the Foreword on pages 25-27 of the .pdf.

So why use assembly language today?

Suppose you need events on output pins to be .00000055 seconds (.55µs, or 550ns), ±10%, apart. This might be irrelevant to most web, desktop, and other human-I/O-oriented programming, but not to controlling circuitry on the workbench, or in industrial environments, or in other embedded systems whose raison d'être is something other than human I/O. If the processor is fast enough and a compiler for it produces efficient code, you may have plenty of speed to do it; but you will not have control of the exact amount of time taken with an HLL. For that, you need assembly language. (Again, that's not to say the entire application needs to be written in assembly. Often it takes only a little assembly to meet the requirements, and the rest can be in a higher-level language.)

Knowledge of assembly language helps better understand how the computer works. The above-referenced book tells the benefit that "Your knowledge of assembly language will help you write better programs, even when using HLLs;" and a page later, "Good assembly language programmers make better HLL programmers because they understand the limitations of the compiler and they know what it's doing with their code. Those who don't know assembly language will accept the poor performance their compiler produces and simply shrug it off."

In this video, posted June 5, 2022, Daniel Thornburgh shows why C compilers have a hard time generating efficient code for the 6502. He's the main codegen author behind the LLVM-MOS project to make a 6502 back end for LLVM.

Wikipedia has a long list of situations where assembly language might be appropriately chosen, here, followed by a list of typical applications.

Some of the highest-performing microprocessors however have assembly languages that are totally impractical for a casual user to learn well. Mainly only compiler writers can justify the learning curve; then they write compilers so their customers (who are programmers) can use a common HLL like C and not bother with the assembly. It seems to partly defeat the purpose, since assembly is the way to get maximum performance, yet to get maximum performance they have designed a processor you almost can't program in assembly! Still, good compilers can bring out some pretty good performance. (That's not to say all compilers do. For some processors, I've read of performance ratios of 5:1 or more from the best compiler to the worst, all for the same processor!) I've heard it said that a good compiler can produce better code than a good assembly-language programmer; but I dare say that while that may sometimes be true with the more complex processors, it won't be true with something like the 6502. And if you ever want to write a compiler for a processor, you will of course need to know its assembly language.

Since processors only understand their own assembly language (or, more accurately, the machine language that the assembler produces directly from the assembly-language source code), higher-level languages (e.g., BASIC, C, Pascal, etc.) have to be either interpreted or compiled. If interpreted, the HLL instructions are decoded at runtime by a machine-language program. If compiled, the instructions are figured out at compile time, producing either a machine-language program to run, or a list of addresses of machine-language routines to run. Obviously then the compiled one will run faster than the interpreted one. A compiler is a piece of software similar to an assembler, but produces processor-usable code out of an HLL instead of out of assembly language.

Here's an example of compiled C code, an empty count-to-100 loop, the line


for (i = 0; i < 100; i++);

with i defined as a char (1 byte). The equivalent BASIC line would be


FOR I = 0 to 99 : NEXT I

The popular cc65 C compiler, which admittedly is not the best-optimizing, produces the following 6502 code from it. (The rr's are for address bytes that will get filled in by the linker.) The comments were added by hand, not by the compiler.



                                jsr     decsp1    ; make 1 byte space on the stack

000003r 1  A2 00                ldx     #$00
000005r 1  A9 00                lda     #$00
000007r 1  A0 00                ldy     #$00
000009r 1  91 rr                sta     (sp),y    ; initialize i to 0
00000Br 1  A0 00        L0003:  ldy     #$00
00000Dr 1  A2 00                ldx     #$00
00000Fr 1  B1 rr                lda     (sp),y
000011r 1  C9 64                cmp     #$64      ; cmp i to 100    
000013r 1  20 rr rr             jsr     boolult   ; do a less than comparison
000016r 1  F0 03 4C rr          jne     L0005     ; if less than
00001Ar 1  rr
00001Br 1  4C rr rr             jmp     L0004     ; if equal to 100
00001Er 1  A0 00        L0005:  ldy     #$00
000020r 1  A2 00                ldx     #$00
000022r 1  B1 rr                lda     (sp),y    ; get i again
000024r 1  48                   pha
000025r 1  18                   clc
000026r 1  69 01                adc     #$01      ; increment i     
000028r 1  A0 00                ldy     #$00
00002Ar 1  91 rr                sta     (sp),y    ; store i
00002Cr 1  68                   pla                                        
00002Dr 1  4C rr rr             jmp     L0003
000030r 1  20 rr rr     L0004:  jsr     incsp1    ; restore stack

all for what can be done in assembly language with only:


        LDX  #0
L0003:  INX
        CPX  #100
        BNE  L0003

which, besides being less than one-seventh as many bytes, has no subroutine calls to eat up even more time like the compiled C version did.

Since the index value is not used inside the loop, we could further shorten the code by starting at 100 and counting down, to zero, to get the same number of loop iterations. We'll use DEX, which like many other instructions, has an automatic, implied, compare-to-zero instruction built in, so we can omit the CPX #0:


        LDX  #100
L0003:  DEX
        BNE  L0003

Clearly the assembly-language version will execute much, much faster than the compiled C-language version. Two different people now have pointed out to me that if you know the insides of cc65, you could write the source code line slightly differently to get a better result; but needing to know that stuff partly defeats the purpose of using C anyway, IMO.

With the optimer, cc65 yields the following code. (Again, the rr's are for address bytes that will get filled in by the linker.)


000000r 1  20 rr rr             jsr     decsp1     ; make 1 byte space on the stack
000003r 1  A9 00                lda     #$00
000005r 1  A8                   tay
000006r 1  92 rr                sta     (sp)       ; initialize i to 0
000008r 1  AA                   tax
000009r 1  B1 rr        L000C:  lda     (sp),y     ; get i
00000Br 1  C9 64                cmp     #$64       ; cmp i to 100
00000Dr 1  B0 05                bcs     L000D
00000Fr 1  1A                   ina                ; increase i
000010r 1  91 rr                sta     (sp),y     ; store i
000012r 1  80 F5                bra     L000C
000014r 1  A9 00        L000D:  lda     #$00
000016r 1  4C rr rr             jmp     incsp1     ; restore stack

There's still the subroutine call at the beginning, and the JMP (replacing a JSR-RTS pair) at the end.

Edit, 2/14/21: I just came across this page about benchmarking the various C compilers for the 6502. CC65 produced much slower, more bloated code than the other C options, although it was more solid.

The old PCB CAD I use, Easy-PC Pro, is written in C. Before buying, I tried the demo versions of both the Pro and the non-Pro versions. The non-Pro version was written in assembly language, and was very fast compared to the Pro version which cost four times as much and was written in C. I needed the extra capabilities of the Pro version though, so I got that. Even on a 16MHz '286, the non-Pro version gave nearly instant screen re-draws, and there was no waiting for anything that could be done with the demo version.

I started with hand assembly in the 6502 college class in 1982, where we used AIM-65 computers. The computer had a rudimentary assembler onboard, but the teacher had us do the assembly by hand so we would gain a better understanding of what happens under the hood right from the start. It was very valuable. It is common for the beginner to want all the fancy tools too soon—not just the assembler (which can hardly be considered a luxury), but HLL compilers, simulators, debuggers, etc.—and these tools insulate him too much from really learning what goes on at the heart of the machine, and lead him to think it's ok to be less thorough, giving him a disadvantage that takes much longer to overcome. A result down the road may be more bugs, some of which he won't be aware of until after the bugs have been a secret cause of a lot of inconvenience, equipment damage, or worse. It's kind of like giving young kids a calculator to multiply and divide with before they really have an understanding of what these procedures even are. The appropriate thing is to get them well acquainted with, and practiced at, doing it with pencil and paper, before moving on. A very smart engineer I worked with marveled at how I can quickly do logarithms (for decibels) in my head. He was just slightly too young to have used a slide rule, having started with calculators. Same kind of thing.

I mainly use the Forth language on the workbench for controlling processes and taking data, because of its unmatched development speed and interactiveness; but it is easy to mix some assembly in when I need maximum performance and maximum control of microsecond timing. I've even changed an assembly-language interrupt-service routine (ISR) between interrupts that were coming at over 40,000 per second on my 5MHz 65c02 workbench computer, without pausing the interrupts. (To do it, you get the new ISR ready, then have a little piece of code that watches for the ISR having run, and right after it has run, when you know there's enough time before the interrupt hits again, you change the vector to point to the new ISR.) For my 6502 work, the assembler I use most is part of my ROM-resident Forth kernel. It's tiny and is not intended to assemble whole applications; but being part of the Forth system, it naturally allows macros, can assemble on the fly while the same computer is doing something else at the same time, etc..

The empty count-to-100 loop above could look like this in Forth:


   100 0 DO LOOP

Of the several methods of doing Forth, ITC (indirect-threaded code) and DTC (direct-threaded code) Forth would compile the following:



XX XX   ; Two-byte address of a routine, the internal called lit (for "literal"),
64 00   ; followed by the 16-bit literal itself (in this case, 100) to put on the data stack.
XX XX   ; Address of the much-used constant 0 routine.  It puts a 16-bit 0 on the data stack.
XX XX   ; Address of the internal called do , which puts the two numbers on the return stack.
XX XX   ; Address of the internal called loop ,
XX XX   ; followed by the address to branch up to if the index has not met the limit yet.

It's 12 bytes, and there is a short threading routine called NEXT that runs the short routines at the addresses in the list. STC (subroutine-threaded code) is a hotter-performing method to do Forth, and you could optionally tell it the balance you want between performance and code density. Theoretically, you could end up with the same as one of the last assembly-language examples above except that it would be bracketed with PHX and PLX to preserve X which is used as the ZP data-stack pointer (or using Y instead, to avoid losing the value in X). Regardless of threading method, you can easily insert sections of code in assembly language if you need to. BTW, there are stack processors whose assembly language basically is Forth, and they typically average more than one Forth instruction per clock cycle.

What is assembly language? What is machine language? What is an assembler?

For readers new to the field who might not know what this "assembly language" is: It is the only language any microprocessor understands and executes, as it is what is "wired" into its logic gates in its instruction decoder and deals directly with the processor's registers, ALU (arithmetic logic unit), etc.. That means it will be different from one processor family to another; but basic principles apply, so some of what you learn of one processor's assembly language will carry over when you take up another processor.

Actually, I should clarify— the processor executes its own machine language, which just looks like a bunch of hexadecimal (base-16) numbers which don't have much meaning for humans looking at it, like this:


A2 FF A0 FF 88 D0 FD CA D0 F8 9C 01 90 AD 00 90 A9 1E 8D 03 90 A9 0B 8D 02 90 A9 02 8D 03 A0 9C 01 A0

while assembly language is the human-readable counterpart which has basically a 1:1 relationship to machine language. The machine-language code above came from assembly-language source code that goes like this (after a lot of explanations that don't actually go into the target computer's memory are added, and after constants and other things are defined):



SETUP:  LDX      #$FF    ; First delay to make sure ACIA & VIA are out of reset.
 su2:     LDY    #$FF    ; (The indentation marks two loops, one inside the other.)
 su1:       DEY          ; At the end of this delay, X & Y = 0, so they are already
          BNE    su1     ; initialized as receive and send buffer pointers, respectively.
          DEX            ; Actually, today I would use my more-readable FOR...NEXT macros 
        BNE      su2     ; to get exactly the same machine code for these counting loops.
                         ; "ACIA" below is Asynchronous Serial Interface Adapter; a UART.
        STZ  ACIA_STAT   ; Do software reset of the ACIA by storing 0 to status register.
        LDA  ACIA_DATA   ; Read the data register to clear parity and framing errors.
        LDA  #00011110B  ; Set the ACIA for 1 stop bit, 8 data bits, 9600 baud,
        STA  ACIA_CTRL   ; by storing the above number into the ACIA's control register.
        LDA  #00001011B  ; No parity, no echo, transmitter on, RTS true, no transmitter
        STA  ACIA_COMM   ; interrupt, no receiver interrupt, enable receiver, DTR true.
                         ; (ACIA_COMM is the ACIA's command register.)
        LDA  #00000010B  ; Make PA1 an output.  We'll use it to tell the PC that it's
        STA  DDRA        ; cleared to send.  (DDRA is data-direction register A.)
        STZ  PA          ; Set it true by storing zero in port A.
           < . . . >

It's part of something that I have working on the workbench. The "Program-Writing: Where Do I Start?" page of the 6502 Primer on this website tells how to start writing your assembly-language program in a text editor, for the assembler to assemble, because you'll need to know things like how to specify starting addresses, vectors, variables, constants, etc..

Comparing the machine language with the assembly language, you can see the A2 FF A0 FF at the beginning that corresponds to the LDX #$FF (load X-register with hexadecimal number FF) andLDY #$FF(load Y-register with hexadecimal number FF). Even if you're not familiar with the 65c02 instruction set or with the hardware that the program is setting up, you can see that it's really two forms of the same thing, the assembly language being much more human-readable than the machine language is. Each line here begins with a three-letter mnemonic (the leading m is silent; so say, "neh-mon-ik") (STA for STore Accumulator register, DEX for DEcrement X register, STZ for STore Zero to the specified memory location, etc.) which are easily memorized when you're learning to program for the particular microprocessor. After most mnemonics, there's an operand, telling what data or address or branch distance (etc.) the instruction is supposed to operate on. Operands usually have names that are meaningful to humans, and the assembler looks up or figures out the number that needs to go there. After the semicolons is just comments, which do not get put into the resulting machine-language program.

Further showing that the above are two versions of the same thing, you can even count exactly how many clock cycles (and therefore the exact amount of time to execute, assuming you know the clock speed in MHz) that the various parts of the program will take to run, from looking even at the human-readable version. In the case above, the number of clock cycles doesn't really matter, but there are times that it definitely will. The instructions above each take two to four clock cycles (or "T states" in the parlance of other processors). The exact number is given in WDC's excellent programming manual, "Programming the 65816 including the 6502, 65C02, and 65802" by David Eyes and Ron Liechty. With experience, you get to know them without having to look them up anymore.

The assembler is a tool, a piece of software that takes the assembly-language program you write and produces the machine-language version that the processor can use. It also produces outputs for humans to use such as the.lst(list) text file which puts the resulting machine code next to the assembly code so you can see them side by side if necessary (like for debugging), and adds line numbers and addresses. Here's the .lst output from the section of code above, showing the source-code line number, address, machine code laid down, and the assembly-language source code:



120 F000 A2 FF     SETUP:  LDX      #$FF    ; First delay to make sure ACIA & VIA are out of reset.
121 F002 A0 FF      su2:     LDY    #$FF    ; (The indentation marks two loops, one inside the other.)
122 F004 88         su1:       DEY          ; At the end of this delay, X & Y = 0, so they are already
123 F005 D0 FD               BNE    su1     ; initialized as receive and send buffer pointers, respectively.
124 F007 CA                  DEX            ; Actually, today I would use my more-readable FOR...NEXT macros 
125 F008 D0 F8             BNE      su2     ; to get exactly the same machine code for these counting loops.
126                                         ; "ACIA" below is Asynchronous Serial Interface Adapter; a UART.
127 F00A 9C 01 90          STZ  ACIA_STAT   ; Do software reset of the ACIA by storing 0 to status register.
128 F00D AD 00 90          LDA  ACIA_DATA   ; Read the data register to clear parity and framing errors.
129 F010 A9 1E             LDA  #00011110B  ; Set the ACIA for 1 stop bit, 8 data bits, 9600 baud,
130 F012 8D 03 90          STA  ACIA_CTRL   ; by storing the above number into the ACIA's control register.
131 F015 A9 0B             LDA  #00001011B  ; No parity, no echo, transmitter on, RTS true, no transmitter
132 F017 8D 02 90          STA  ACIA_COMM   ; interrupt, no receiver interrupt, enable receiver, DTR true.
133                                         ; (ACIA_COMM is the ACIA's command register.)
134 F01A A9 02             LDA  #00000010B  ; Make PA1 an output.  We'll use it to tell the PC that it's
135 F01C 8D 03 A0          STA  DDRA        ; cleared to send.  (DDRA is data-direction register A.)
136 F01F 9C 01 A0          STZ  PA          ; Set it true by storing zero in port A.
     < . . . >

Macros are typically expanded out so you can see exactly what they produced. Error messages will be inlined. The assembler may also produce a separate .err (error) file. The main file output, with just the resulting machine code, is often an Intel Hex or Motorola S-record file, which are text files using an ASCII representation with some error-detection built in. Here's an example of a very short Intel Hex (.hex) file:


:020000020000FC
:20F00000A2FFA0FF88D0FDCAD0F89C0190AD0090A91E8D0390A90B8D0290A9028D03A09C2E
:20F0200001A09C0BA0A9148D0BA0A9078D08A09C09A0A9028D0CA0A9FF8D0DA064018D020F
:20F04000A09C00A060AD00909D0002E860AD0DA02901D004A501D00EA9FF85018D0DA0B953
:20F0600000028D0AA0C8608400E4003009AD01A029FD8D01A060AD01A009028D01A0608421
:20F0800000E40060A2FF9A2000F0AD01902908F0032045F02067F0207FF0F0EE204DF08069
:02F0A000E94045
:06FFFA00A1F084F0A1F06B
:00000001FF

There are other types, but these two (Intel Hex and Motorola S-record) are probably the most common. The error-correction and housekeeping bytes are removed before the machine code goes into the memory where finally it will be run. SRecord 1.65 for Linux and Windows lets you transform/convert EPROM file types, concatenate, split, etc. if you need to.

In a microprocessor class in 1982, we had to learn a processor's internal registers and its way of doing things and its instruction set, and write programs and assemble them by hand (on paper) and then type the machine code into the computer to try it. There was an assembler onboard, but the teacher wanted us to really understand the lowest levels of the machine, including the assembly process, to layer knowledge upon knowledge, instead of starting directly at higher language levels that would have insulated us from the nitty-gritty and kept us from learning what really goes on down there.

Sidebar:

Similarity to calculator keystroke programming

Early programmable calculators used "keystroke programming." Each program line held only one operation. For example, as you key in a program and need the cosine function, you press the COS key and a line is added with that single instruction. At execute time, that COS instruction acts on the number in the X register (which was like the 6502's accumulator). Each program line then was like a key stroke (although there were exceptions), and running the program was like the calculator was just pressing the buttons for you, just much faster than you could do it by hand, except that it could have things like conditional branches.

I suppose it's called "keystroke programming" because before programmable calculators had way too many functions to give them all their own key (like my HP-41cx with potentially thousands of functions), you'd just press the key (for example SIN, COS, LOG, etc.) giving the desired function when you're in program mode and it would be put in the program as a single step, without having to spell things out like you have to in other programming environments. Syntax requirements were minimal.

My very first programmable calculator (a TI-58c, shown at right) did not have an alphanumeric display; so in program mode, the display would show the program step number followed by only a number representing the function by the row and column of its key on the keyboard. In that sense, it was kind of like machine language with each line showing an address followed by an op code or operand byte. (I don't know anything about its processor and its true assembly and machine languages, but you can see the analogy I'm making.) It did not take long to learn the keymapped function numbers. In the photo at the right, it's in program mode and showing that step number 418 has a 23 in it, which is the lnx (natural log of X) function by its key being in the 2nd row, 3rd key in the row. The printer was alphanumeric and would have printed out 418 23 LNX for that line in a program listing. The calculator offered indirect addressing too, again leading into assembly-language programming.

Registers and flags always had numbers, never names; and text, which could only be printed on the printer but not displayed on this TI calculator, was strictly in numbers in the calculator's display when in program mode. For example, an "R" was a 35 (obviously not ASCII). When I took that first processor class in 1982 where we used the AIM-65's, we also entered text as numbers, but they were hex ASCII numbers. The TI calculator had a crude system of labels, but programs ran faster if jumps could be made to program step numbers, because labels were always searched for at run time, which took time.

It would be less of a stretch to call my HP-41 calculator a computer, as it does have boolean functions, powerful (albeit slow) I/O, string-handling capability, alarm interrupts, files, text editor, direct RAM hex editing (although you better really know what you're doing to use that one!), etc., and allows the user to program it in assembly language. At left, it is showing a program step with the bit-twiddling function to rotate the 32-bit number in register Y by the number of bits given in register X. Its alphanumeric LCD shows the actual function names, with any operands on the same line, like 247 DSE 10 (decrement and skip if equal, on data register 10) or 174 RCLALM (recall the parameters of the alarm whose number is in X), or text113^TFREQ=? . Registers, flags, and local variables have only numbers or single letters, but global labels and files use names.

When checking or debugging calculator programs, I used to print them out on long, continuous strips of paper, and draw arrows to clarify where all the jumps and conditional branches went. Sometimes it was quite a mess, resembling spaghetti (hence the term, "spaghetti code"). I did the same for years in assembly language, before figuring out a way to make the program structures with assembly macros.

In the same grain, see "Using a Programmable Calculator to Introduce Fundamental Concepts of Assembly Language," by H. D. Schwetman in the computer-science department of Purdue University, CSD TR 171, December 1975, 28 pgs.

Calculator keystroke programming was easy to pick up, and it seems to bring a benefit later for grasping assembly language. Today, the grasp of assembly seems to be getting lost, and the common mentality is that cheap processing power always justifies wasting it. The article here aims to show that there's more to it than that.

Links:

Randall Hyde, author of "Write Great Code (No Starch)" has an essay online, "Why Learning Assembly Language Is Still a Good Idea" (also available here, and archived here) which has a lot of good comments about how knowledge of assembly helps you write more-efficient high-level-language code. He is an instructor at the University of California who laments the two decades of unwise "assembly-is-dead" teaching that has been in the schools.

An essay that keeps challenging me is Low-Fat Computing (A politically incorrect essay by Jeff Fox). He and Chuck Moore (inventor of Forth), taking an entirely different programming philosophy, plus Forth hardware and software, have improved the compactness and speed of code by factors of anywhere from 100 to 1000. It's not particularly about assembly language, nevertheless seems to be appropriate to mention here.

In this 9½-minute video posted July 25, 2022, "Programmers Aren't Productive Anymore," apparently an excerpt from a seminar, Jonathan Blow says that all these modern high-level languages (HLLs) are making programmers less productive now, and that the HLLs are failing to deliver the promised benefits. He says that as we go up the ladder of HLLs, "somewhere through this chain, it becomes wrong." He does say we don't want to go all the way back to assembly language, because of the lack of abstraction; but I address that criticism above and elsewhere on this site, regarding macros.

Next--> My requests for if you write an assembler

last updated Mar 6, 2024 Garth Wilson email wilsonminesBdslextremeBcom (replace the B's with @ and .)