ASM Diaries 1: Type Identifier Equals Data Length

A few years ago I reverse engineered Amstrad CPC BASIC. I'm currently writing a compiler for my Pascal-like language (called Quiche). This is an occasional series of articles where I note observations about assembly programming and, in particular, Z80 assembly language.

This is a trick I learnt while reverse engineering the Amstrad BASIC interpreter. The interpreter features an ‘execution stack’. When it encounters a FOR..NEXT loop, WHILE..WEND loop or a GOSUB it stores some data on the execution stack. This data is for things such as the line number, return address and loop variable. The size of the data which needs needs to be stored depends on the control statement and, for for loops, whether the control variable is a real or integer.

Below is the detail of the execution stack and the items stored onto it.

====EXECUTION STACK
----
&AE70	&AE8C	&1FF	execution stack. GOSUB, FOR and WHILE stack. Entries are added above any existing ones in use (mixed as encountered) at address given by &B06F and must be used up in the opposite order. Completed entries are not deleted, just overwritten by the next new entry:

Offset  Length  Usage
GOSUB entry:
0       1	    GOSUB type: &00=normal, &01=event (EVERY, AFTER), &02=ON BREAK
1       2	    address of end of program line byte or ':' after GOSUB statement (the point to RETURN to)
3       2	    address of line number of line containing GOSUB
5       1	    byte of &06, ie the length of the GOSUB entry

FOR entry:
0       2	    address of current value of control variable (in Variables area)
2       5/2	    value of limit (ie the TO value) - five bytes if control variable is a real, two if it's an integer
7/4     5/2	    value of STEP - five bytes if control variable is a real, two if it's an integer
12/6    1	    sign byte (&01 for positive; &ff for negative)
13/8    2	    address of the end of program line byte, or ':' after the FOR statement (ie the address for the NEXT loop to restart at)
15/10   2	    address of line number of line containing FOR
17/12   2	    if NEXT specifies variables then the address of the byte after the variable name in the next statement, otherwise the byte after the next statement
19/14   2	    if NEXT specifies variables then the address of the first byte of the variable name in the NEXT statement, otherwise the byte after the NEXT statement
20/15   1	    length byte (&16 for Real FORs; &10 for Integer FORs)

WHILE entry: (66 max capacity):
0       2	    address of line number LB of line containing WHILE
2       2	    address of the end of program line byte or ':' after WEND statement (ie the address to continue at when the condition is false)
4       2	    address of condition after the WHILE command
6       1	    length byte of &07 - end of WHILE entry proper but:
+5	value of condition (0 or -1 as a floating point value) only while the WHILE entry is the last on the stack

==&B06E   &B08A   1   last byte of execution stack

When BASIC enters a loop of calls a gosub if appends data on the execution stack (which grows upwards in memory). At the end of the loop, or a RETURN statement it needs to step backwards down the stack. This could be implemented using fixed size data blocks. The start of the previous block could easily be found by subtracting the block size from the list-head system variable. But, as noted above, the data size varies significantly between control structires.

The programmers chose instead to append an identifier to the end of every item on the list to specify that item’s type. The interpreter can then map from the entry type to data size and, therefore, determine where the entry starts. However a simple implementation of this at strategy would require some kind of look up table to map from block type to data size. So instead they chose to make the type identifiers equal to the byte length of the entry. Now when stepping back through the list the interpreter reads the byte at top-of-list and subtracts it from the top-of-list address to get the entry address. And it can use the same identifier to process the list entry.

Look at the above listing and you’ll see it in practice.

I’ve adopted a similar strategy in the Quiche-Z80 compiler. The compiler needs to store a list of identifiers and data items such as types, variables and functions. As with the BASIC example above the amount of data to be stored varies depending on the item (type, variable, function etc)1. The compiler needs to be able to step through this list when searching for a given identifier. The identifier list takes the form of an identifier in ASCII7 string format (ie with bit 7 of the last character set) followed by byte to identify the kind of identifier. By making this identifier equal to the length of data I remove the need for an extra byte to specify the length of the data.

Here is the current set of identifier types in Quiche-Z80. Those with a zero value are yet to be defined or coded.

itType      equ 9   ;TYPE declaration - See Types.asm
itEnumItem  equ 1   ;For enumerations - See Types.asm
itField     equ 0   ;Of a record      - See Types.asm
itConst     equ 7   ;CONST declaration
itVar       equ 0   ;VAR declaration
itFunction  equ 0   ;FUNCTION or PROCEDURE declaration
itParam     equ 0   ;Of a function definition

This kind of strategy does have some downsides too. You need to make sure every kind of thing has a differently sized data block, which might mean adding a padding byte to some of them. And it could slow down code which checks the kind of the data block. This latter point deserves an article or two of it’s own so I’ll save more discussion on this for now.

Footnotes

  1. I want to be able to say the data varies with the ‘type’ of item but, of course, the word type has a specific meaning in programming. This is one of those challenges you only appreciate when you start writing language tools. I tend to use the work ‘kind’ for such purposes but even that has a special meaning in some programming circles :sigh: