Each line of the file is a record, and starts with a : (colon). Lines not starting with a colon should be ignored by the loader. This makes it sound like there could be embedded comments in a hex file; but since I've never seen any or had any reason to add them, when I wrote my PIC programmer software, I made it such that a line not starting with a colon would stop the loading and give an error message. Intel Hex records have no spaces (ASCII $20) in them.
On each line, the first pair of digits following the colon tell how many data bytes are in the line; ie, how many pairs of hex digits that are actual data. The other bytes (address, checksum, record type, and length) are not included in this count—only data bytes. This is normally at least 01 for a line with actual data, and usually 10 or 20 (hex, so 16 or 32 in decimal). There's nothing that says all data lines must be the same length. Most assemblers will make all the data lines the same length, but they don't have to. When I wrote an assembler years ago, I made it finish machine-language instructions before ending the line. In other words, if the line got to a length of $10 data bytes but the operand of the current instruction was not there yet, it would go ahead and put the operand on the same line before inserting the data byte count, calculating the two-hex-digit checksum, and finishing the line.
The next four hex digits are an address where the first data byte on the line will get put in the EPROM. This address is not expressed low-byte-first. Address $1234 will just be put on the line as 1234, not 3412. If the line contains no actual data for the EPROM, these four digits are 0000. There is an exception: there could be an offset which would be added to this address resulting in a different final address; but the offset is typically not used in small systems with an address bus of only 16 bits. More on this in a minute.
Next are two digits telling what kind of record the line is.
00 = data to program into the ROM.
01 = end of file
02 = address offset (typically unused in small systems with 16-bit address bus)
03 = start segment address record (used with x86 processors)
Next are the actual data bytes (or four-digit address offset in the case of a type-02 line.) Actual data can be anywhere from 1 byte (two hex digits in the data field) up to I suppose $FF bytes (510 hex digits in the data field), although most common is $10 or $20 (32 or 64 hex digits in the data field, respectively).
Last on each line is the checksum byte. If you add up all the bytes after the ":", including the checksum byte, the last two hex digits of the resulting sum should be 00.
Now some examples:
End-of-file line (the last record of the file):
:00000001FF
Length is 00 (no data bytes).
address 0000 (since address doesn't matter here)
record type 01 (end of file)
The checksum is $FF, since 00+00+00+01+$FF = $100, whose last two digits are 00.
You can calculate it from $100-(MOD($100,(00+00+00+01)))
Short data line:
:020000001E28B8
(The underlining is only here to indicate the data bytes. Actual Intel Hex files, like any other simple text file, don't have any underlining in them.)
Length is 02 (only two data bytes, $1E and $28).
The $1E goes at address 0000, and the $28 goes at address 0001.
The record type is 00 (data).
The checksum is $B8, since 02+00+00+00+$1E+$28+$B8 = $100, whose last two digits are 00.
You can calculate it from $100-(MOD($100,(02+00+00+00+$1E+$28)))
Longer data line:
:100260000F08FA3C0318DA280800931B0800FF3037
(The underlining is only here to indicate the data bytes. It won't be in actual Intel Hex files.)
This has $10 data bytes, starting with $0F and ending with $30.
The first data byte, $0F, gets put at address $0260.
Record type is again 00 (data).
As you can see, the address field limits the number of possible addresses to 64K. The type-02 record extends this out to 1M addresses. The offset given in this kind of record is only four hex digits, but it is actually a five-digit address where the last digit is forced to be a 0. The offset is added to the starting address of subsequent records' address fields.
You would use this record type to give bank numbers for material for the 65816 processor. The lines that give the bank number say :02000002xxxxcs, where xxxx is the offset (like to go into bank 1, the offset is $1:0000, so you put 1000, dropping the last 0) and cs is the checksum. An offset record line to put subsequent data into bank 7 (addresses $7:0000-$7:FFFF) would look like:
:0200000270008C
(Again the underlining is only here to show the data bytes. It won't be in actual Intel Hex files.)
This still only gets you to just over 1M addresses. The 65816 can address 16M addresses, but it's highly unlikely you'd want that much EPROM. The largest byte-wide EPROMs I know of are 2Mx8. Intel 32 does address more memory but will not be discussed here.
Most commercially-available programmers allow various combinations of splits and so on, so that you can use four 128Kx8 EPROMs to get 128Kx32 for example, or possibly for older equipment, use four 8Kx8 EPROMs to get 32Kx8, where each of the EPROMs' address 0 is at a different address as far as the target computer is concerned.
SRecord 1.65 for Linux and Windows lets you transform/convert EPROM file types, concatenate, split, etc..
last updated Oct 10, 2023 contact: Garth Wilson, wilsonminesBdslextremeBcom
(replacing the B's with @ and .)