program-writing: where to start

Note: This addresses only assembly language, not higher-level languages like BASIC, Forth, Pascal, C, Lisp, Oberon, COBOL, etc.. Assembly language is the lowest-level, down-to-the-metal language for maximum control and performance. Assembly is all the processor runs, so higher-level languages must get compiled into assembly before running, or interpreted by an assembly-language program, or some combination. For someone who really wants to know what's under the hood, assembly language is a good place to start. (Related on this site is the article "Assembly Language: Still Relevant Today.")

(O.T.) For the fun of it, there's a humorous brief history of programming languages here.

A quote from a forum post many years ago: "and no one really ever tells you in what program do you put the code."

You will typically use a text editor to write the source code in a text file. Virtually any text editor will do, but some will be far better than others. I really like MultiEdit. A word processor generally won't do, because of all the hidden control bytes that instruct it for fonts and sizes and colors and graphics and so on, unless it has a non-document mode that gives you straight ASCII text with nothing hidden. That's what the assembler software needs.

With some source code written in a text file, typically with a ".asm" extension, you run the assembler to turn that into the form the microprocessor can use. For the really basic definition, an "assembler" is a program that takes the more human-readable assembly-language instructions with mnemonics (like "LDA PORT_A") in a text file and turns them into machine-language instructions like AD 01 60 for the microprocessor to execute. (In this example, PORT_A is at address $6001.) The assembler is the tool, not the language. The language is "assembly language." Assembly language and machine language are, in a sense, the same thing, but assembly language is more readable for humans and machine language is the form the microprocessor understands. There's a 1:1 relationship between the two, unlike the much more distant relationship between higher-level languages and machine language.

6502.org has a list of free assemblers here. The Kowalski assembler, IDE, and simulator seem to be popular on the forum. The final version released by Michael Kowalski is here (.zip download, with executable for DOS/Windows, but can be used with Mac & Linux under WINE). More info here. It's all on github too. However, Daryl Rictor ("8BIT" on the 6502.org forum) subsequently modified it and produced V1.2.15. This was the final version that recognizes only the 8-bit processors. Version 1.4.0.9 (again by Daryl, and available on his page) recognizes the 65816 for assembly, and now its simulation extends to the '816 also (starting 7/18/24), no longer being limited to the '02. He will no longer be developing the 1.2 series though, and says that although he will continue to do bug fixes on the 1.4 series, he's done adding features. (BTW, if it's software only, it's a simulator, not an emulator. More on that near the bottom of the page.)

Jeff Parsons has an in-browser 6502 simulator. See this forum topic.

The assembler might run on a computer that's different from the one that the final program runs on. In that case, it would be called a "cross-assembler," or sometimes a "meta-assembler." For this discussion, we will lump them all together. The assembler's output is commonly another text file in an Intel hex or Motorola S19 format, keeping all the machine-language instructions and related data in neat rows of hexadecimal numbers along with a few characters at the beginning and end of each line for error-checking and telling what address the line starts at. The error-checking is used by the EPROM programmer or other device or program that the information is sent to. SRecord 1.65 for Linux and Windows lets you transform/convert EPROM file types, concatenate, split, etc. if you need to.

In 1985, I started out assembling by hand, on paper, when I had no working computer yet to even run an assembler on—which in a way is actually good training for understanding the lowest-level nuts & bolts better—and I can tell you that you do quickly learn the op codes; but that's the least of the problems. You can imagine one big problem is that if you realize you're missing an instruction at line 200 of a 500-line file, inserting it will change the addresses of the code after it, meaning now you have to go through and fix the addresses in jump instructions, maybe references to tables that got scooted to higher or lower addresses, strings, variables, etc.. It's an extremely time-consuming mess, and prone to human error. There are ways to reduce the re-work (which involve leaving a lot of chunks of unused memory space), but you won't get far before you decide it's time for an assembler so that that all gets handled automatically, in seconds.

What To Put In Your .ASM File

Assemblers give you the option of typing in comments. The common way is that anything following a semicolon on a line gets ignored by the assembler. Comments do not take any room in the program that gets formed for your target computer to run. Begin your file by typing comments (using the semicolon to begin each comment line), telling what the project is and anything else you feel is important including for example a table of what I/O bits are dedicated to what functions. If you have more than one version because goals changed, document which one the file is, right near the top. Neglecting this causes tons of confusion and wasted time later.

Then, if the assembler offers the option of assembling for more than one processor, you'll need to tell it what processor you want. The assembler's manual will tell you what you need in order to do that. The one I use now for 6502 when I do it on the PC is Cross-32 (C32 for short), formerly from Universal Cross-Assemblers but now sold by Data Sync Engineering, and the line would be:


        CPU  "6502.TBL"

(Hmmm... I had forgotten that this assembler makes no distinction between 6502 and 65C02 which has more instructions and addressing modes. I guess it just assumes that you know not to use the extra ones if you're using an NMOS 6502 which doesn't have them.) This assembler also works for 1802, Z80, 68HC11, 8086, etc., dozens of non-65-family processors, so you have to tell it which one you have. Anyway, this, like any assembler directive, is needed by the assembler, not the computer you made and are writing code for.

When I used the 2500AD assembler in the late 1980's, it was:


        .CHIP 65C02

indicating I was using the CMOS 6502 (65C02). That assembler was only for 6502 and 65C02. This line, like any other line, can have comments after the assembler directive, but not before it.

You may need a line in your source code to tell the assembler what kind of output file you want, whether binary, Intel hex, Motorola S19, or whatever. If you don't tell it, it will probably default to one, maybe Intel Hex. On mine, it takes the form for example:


        HOF "INT8"    ; for 8-bit Intel Hex.   "HOF" stands for "hexadecimal output format."

Next, you'll start with the general equates, including fixed addresses. A major thing to put in here is the addresses of various registers in your I/O ICs, typically with the assembler directive "EQU", short for "equate". For example:



;                           +----------------------+
;                           |   General Equates    |
;                           +----------------------+

ACIA:       EQU  $9000    ; The base address of the 6551 Asynchronous Communications Interface Adapter is $9000.
ACIA_DATA:  EQU  ACIA+0   ; Its data I/O register is at $9000.
ACIA_STAT:  EQU  ACIA+1   ; Its  status  register is at $9001.
ACIA_COMM:  EQU  ACIA+2   ; Its command  register is at $9002. 
ACIA_CTRL:  EQU  ACIA+3   ; Its control  register is at $9003.

VIA:        EQU  $A000    ; The base address of the 6522 Versatile Interface Adapter is $A000.
PB:         EQU  VIA      ; Its port B is at that address.
PA:         EQU  VIA+1    ; Its port A is at address $A001.
DDRB:       EQU  VIA+2    ; Its data-direction register for port B is at $A002.
DDRA:       EQU  VIA+3    ; Its data-direction register for port A is at $A003.
T2CL:       EQU  VIA+8    ; Its timer-2 counter's low  byte is at $A008.
T2CH:       EQU  VIA+9    ; Its timer-2 counter's high byte is at $A009.
SR:         EQU  VIA+10   ; The shift register is at $A00A.
ACR:        EQU  VIA+11   ; The auxiliary  control register is at $A00B.
PCR:        EQU  VIA+12   ; The peripheral control register is at $A00C.
IFR:        EQU  VIA+13   ; The interrupt  flag  register is at $A00D.
IER:        EQU  VIA+14   ; The interrupt enable register is at $A00E.

I made it extra wordy for this tutorial, just to explain what may not be obvious yet. Note that even the sign at the top is preceded by semicolons, so the assembler will ignore it and not give you error messages saying it can't understand it. (Wow, I sure prefer the old DOS/ANSI (Code Page 437) characters. They let you draw smooth lines, boxes, diagrams, and charts.)

Note that I did not assign anything for addresses $A004 through $A007, which was because I didn't use those registers in the project I copied this from.

Note also that the addresses are in hex, with the dollar sign to tell the assembler that, but the offsets are in decimal. You can leave it in hexadecimal all the time if you like, with an assembler directive like


        RADIX HEX

(although it might be different on yours—you'll have to check the manual), but then if you want to use a decimal number, you might have to express it something like D'231' since D231 and 231D are valid hex numbers. It's probably best to leave it in decimal and specify hex with the "$" or "H". I absolutely hate the C form, "0xff" for example, since "x" means a "don't care" digit in non-C-like languages, and numerals 0-9 are always "capital" and I do not mix them with lower-case a-f! Ok, call me stubborn, but I have the privilege to be that way about certain things. :)

The directives may be a little different from one assembler to another, but the assembly-language code will be pretty much the same as long as it's for the same processor. How to make macros won't be much different—more on that later though. I'd like to get you into macros pretty early, but not right at the beginning. They are not necessary yet, but they will enable you to dramatically improve your code and your productivity later on.

The EQUates can be nearly anything where you want to use an easy-to-remember name instead of the number the processor will require. Not only is it for making your code more intelligible to humans, but if you have to change something, you will only have to change the number assigned to that name in the one place, instead of every place it gets used in your code. The assembler takes care of that, substituting the number in anywhere it finds the name used.

You can put equates anywhere in your code. They don't have to all be at the beginning. For example, when the program reads the keypad, it will return numbers representing the various keys, and you will probably want to substitute names for them, like ENTER_KEY, UP_KEY, DOWN_KEY, etc.. You would probably want to put those EQUates right above the routine that reads the keypad.

After the general equates near the top of your source-code file, you'll declare your zero-page variables, telling the assembler how many bytes in each. An example in the C32 assembler I use is:


FOOBAR:  DFS   2

"FOOBAR" is a nonsense name given to variables, constants, routines, etc. in programming discussions where the function itself is not the issue. "DFS" in the C32 assembler means "define storage," and the "2" tells it to reserve two bytes for that variable which in this case is called FOOBAR. The 2500AD assembler calls it "BLKB":


FOOBAR:  BLKB  2

meaning reserve a block of bytes, specifically two bytes in this case. Often it will be only one byte, and sometimes you might leave enough for a string or a buffer of some sort or even a large array. Any number is fair as long as you have enough memory for it.

I generally declare most of my non-ZP variables right after the ZP variables. If your program will run in ROM (instead of RAM), the variables will have to be separate from it anyway, and the RAM (ie, read/write memory which can handle variables) will generally be in low memory, closer to address $0000, and the ROM will generally be in high memory, closer to address $FFFF.

It is generally good to declare the variables before they are used, to avoid phase errors in the assembler from zero-page versus non-zero-page addresses as operands for instructions. I won't explain a "phase error" here, only say that it is totally recoverable but can take the assembler longer if you have a bunch of them. (The assembler manual will explain it.)

I like to put my general macros next, after the general equates. You won't be using macros to start, but when you do use them, you'll need to define them before they're used. No exceptions! To see what assembler macros can do to improve your programming with no penalties in either performance or code size, see my article on them which starts with the basics and then goes on to show how you can do program structures too, like IF...ELSE...ENDIF, FOR...NEXT, BEGIN...WHILE...REPEAT, CASE statements, etc.. I hope it'll get you excited about the possibilities. I put the structure macros in a separate INCLude file. The INCL assembler directive (as usual, check your assembler manual for exact syntax) is used in a line in your source code where you want to bring in (include) another source-code file, something like:


        INCL  "STRUCTURE_MACROS.ASM"

Keep in mind again that although macros don't have to necessarily go at the beginning, they always have to be defined before they are invoked, which also means a structure macro INCLude file will have to be "INCLuded" before the macros are invoked, unlike the situation with variables, constants, routine addresses, etc. which don't have that stringent requirement.

One of the few assembler directives you will need to know up front is ORG for "origin". It tells the assembler to start laying down code (or variables, tables, whatever) at the address specified by the parameter following the "ORG", like this:


        ORG  $200

You can use this directive as many times as you want, but in your first project it probably won't appropriately occur more than a few times in your entire assembly code text file. Until the assembler encounters the next ORG directive, it will just keep filling memory in order of increasing addresses, always filling the next available byte. It will not pull any surprises on you and put something where you didn't expect it.

Since the 6502 generally has RAM in low addresses (starting at $0000 and going up) and ROM ending at $FFFF, you will, if you're assembling a program to put into ROM so it can run immediately upon power-up, put that at the higher addresses, starting, let's say, at $8000, if that is where your ROM starts for example. Let's say you start your reset routine there.


        ORG  $8000

RESET:  LDX  #$FF
        TXS                 ; Initialize stack pointer.  If this were an NMOS 6502, you'd need to add CLD.
                            ; Start by initializing the 6551.
        STZ  ACIA_STAT      ; Do a software reset on the 6551.
        LDA  ACIA_DATA      ; Read the data register to clear parity and framing errors.
        LDA  #00011110B     ; Tell it you want 1 stop bit, 8 data bits, and 9600 baud.  (The "B" at the end
        STA  ACIA_CTRL      ;                   of the number means it's binary, not decimal, hex, or octal.
        LDA  #00001011B     ; Also, no parity, no echo, transmitter on, RTS true, no transmitter
        STA  ACIA_COMM      ; interrupt (yet), no receiver interrupt, enable receiver, DTR true.

        <bla bla bla>

When you have things set up and ready to begin the main program, the RESET routine will lead into, or jump to, the main program, instead of ending with RTS like most other routines do.

Comment your programs profusely. Explain what you're doing. Even after a routine appears to be working, adding more explanation in the comments sometimes makes you catch bugs that had not shown up yet but later would have. I comment as if trying to explain it to someone else who hasn't been following my train of though on it. If I come back to change it a year or more later, I'll need the comments anyway. Comments do not take any space in the machine-language output that the assembler produces. In fact, if you look at the resulting code, you can't see any evidence of comments.

Now— How does the computer know to start execution at address $8000 for the RESET routine above? It's because you tell it in the vectors. Assuming your code fits in the available ROM (you can run code from RAM too, but so far we're talking about what runs in ROM immediately upon start-up), there will always be some unused space in the ROM, before the vectors at the end. So you have to use the ORG directive again:


        ORG  $FFFA  ; Lay down the vectors.  They must start at address $FFFA and go in this order:
        DWL  NMI    ; "NMI" is the name of my NMI interrupt-service routine.      (DWL in C32 assembler
        DWL  RESET  ; "RESET" is the name of my reset (and set-up) routine.        directives means "define
        DWL  IRQ    ; "IRQ" is the name of my IRQ interrupt-service routine.       word, low byte first.)

        END         ; Your assembler might not need the END directive if this is the only file.

Since the RESET label is defined in the program (and, not shown, theIRQand NMI labels too), the assembler knows their addresses and lays them down in the order you put here, at the six bytes starting at address $FFFA. Remember the reason for the six bytes is that each byte is 8 bits, but an address takes 16 bits, or two bytes. Low byte is first.

So you've probably figured out what a label is by now. It's basically a name you give to an address or a constant or a macro or... The name makes it much easier for humans to deal with than a bunch of numbers that might keep changing and don't have any description with them.

Although many assemblers don't require labels to be followed by a colon, I would encourage you to use the colon in the label definition anyway, because otherwise a search for the label may turn up a lot of references to it before you find the label itself. Using the label can result in time saved in that situation.

There's nothing keeping you from having more than one label to the same place in memory if you like. A reason might be for example that a set of bytes starts out as one thing at the beginning of a process, and, in the course of the process, gets transformed into something else, justifying having another name (although you might want to put a reminder in the comments that they take the same space). Another reason might be that you're short on variable space in RAM and there are particular variables that are never needed at the same time, so they can share the same space because the part of the program that uses one of the variables is always completely done with it before the part that uses the other variable runs again, and vice-versa. Another reason might be that the same place in the code is the top of two different loops, one nested within the other, and you want to label them to make that fact clear.

Write your code in modules. Write basic building blocks (ie, subroutines, and later, macros), referring to them in subsequent bigger building blocks. Make your code modular. Getting and keeping control of it as the project size grows is something that needs to start at the beginning. If you're sloppy at the beginning "just to get things going" with the idea that the computer doesn't care about neatness, or that you'll neaten it up later if necessary, it will just turn into a monster that bites you later.

So far we've been talking about code only going into ROM. That will be the only possibility with many microcontrollers, but not with the 6502 (or most any other non-microcontroller computer for that matter). For example, you might get a monitor program going in ROM, and then it can be used to load subsequently assembled code from the PC over a serial port into its RAM to run there, partly in order to try programs without having to program a ROM for every iteration.

If you get to the point of having an assembler on the computer you made, you won't need the one that runs on the PC as much anymore. I still use the PC for its hi-res monitor and full keyboard and disc drives; but it just sends source code text to my workbench computer which takes it and compiles or assembles or interprets, as appropriate, on the fly. The resulting executable code from its own compilation and assembly get put in RAM of course, not ROM, since it can't program its own ROM.

Until you have the basics working in ROM, there's usually the write-assemble-program-test cycle. You

write code,
assemble it,
program an EPROM or EEPROM,
and test the program on your home-made computer.

You observe problems, and go back to the top of the cycle to make fixes in the code, re-assemble, etc.. This may seem awfully tedious; but after you have enough running in ROM, you can in many situations move the development to RAM and get much better interactiveness.

Simulators exist, as the name says, to simulate the behavior of the program in the PC so you don't have to keep going through the ROM-programming cycle. They also let you single-cycle through instructions and look at the status and the contents of each register, which can be useful when you are not totally familiar with the instruction set yet. They are software only. Simulators are not very good at simulating non-human I/O, and for the applications I do which are 99% non-human I/O, the simulator is too slow to be of any value.

A real emulator is one computer that takes the place of another; so in contrast to a simulator, the emulator includes the associated hardware to connect to, taking the place of whatever it is emulating, acting like the real thing. In the case of most calculators, there might be no real difference between a simulator and an emulator, because they have no I/O other than the keyboard and display. That might also be the case with even with a pure game machine. A slightly better example of the difference would be a Commodore 64 computer, where a real emulator would not only run C64 software, but you could also plug real C64 cartridges (not just their software images) and Commodore accessories into it too, to test them, because it would have the same connectors, with the same pinouts, voltages, and signal timings and protocols as the real thing. The concept of emulating my workbench computer emphasizes the point all the more, because it has almost no human I/O, but nearly a hundred bits of I/O, plus analog-to-digital and digital-to-analog converters, RS-232 line drivers & receivers, and more, all for controlling experiments and processes on the workbench, and taking data.

An in-circuit emulator (ICE) has a pod that plugs into the processor socket in your board to interface to the actual I/O on the board itself, showing you on the screen what it going on inside the processor. It can help debug hardware, not just software.

A ROM emulator plugs into your ROM socket and has a cable going to the host computer and it lets you change the program without programming an actual ROM every time you want to try a program modification or addition. Your board will think it is ROM, but your data is actually in RAM.

Microprocessor emulators are pieces of equipment that can be very expensive (particularly ICEs), whereas there are gobs of simulators available for free download.

Unfortunately those who are only focused on things like video games and office applications keep confusing and abusing the term "emulator." Repeated abuse doesn't make it correct, in spite of how hostile and insistent they may become.

If you want to see exactly what code was produced by the assembler, look at the .lst (list) file. This is another text file that duplicates your source code but along the left side it will add the addresses and what bytes were laid down there, and, if you have conditional assembly or macros, those will be expanded out, and you can see exactly what it did with them.

Debugging in the cycle above will be addressed in the next section, "Debugging". It's not difficult. You just divide and conquer.

Project steps <--Previous | Next--> Debugging

last updated Feb 12, 2025