Why Your Amstrad CPC Might Crash in 2104!

Recently I was reading the source code of the firmware routine which updates the system TIME in the Amstrad CPC. This code contains a bug which will eventually crash your machine! Read on for the full gory details!

Updating the Timer

The TIME counter is a is 32-bit wide integer (four bytes) which is updated 300 times per second via an interrupt (generated by the gate array). This is not a real-time clock (in hours, minutes and seconds) and is not battery backed or persistent in any way when the machine is switched off. Instead it is zero-ed at boot time and counts the number of ‘ticks’ since switch on1.

The code initially piqued my interest because it shows a couple of quirks in the Z80 instruction set. Here is the code in question:

update_TIME:
        ld      hl,TIME_         ;Address of low byte of counter
_update_time_1:
        inc     (hl)             ;Increment counter byte
        inc     hl               ;Next byte of counter
        jr      z,_update_time_1 ;Loop if overflow

Firstly, 16-bit INC (and DEC) instructions don’t affect any of the CPU flags2 so the JR Z is operating on a flag set by the 8-bit INC (HL).

Secondly, operating on multi-byte integers would normally entail using the carry flag to determine whether the next most significant byte should be operated on. Ie. the carry flag indicates whether the addition ‘carries over’ into the next byte. But the 8-byte INC (and DEC) instructions don’t affect the carry flag! Instead the programmers have used the (Z)ero flag. When the least significant byte ‘wraps around’ from $ff to $00 the Z flag will be set, the JR will loop and the next byte will be incremented.

Overflow!

And this is what causes the 2104 bug: there is nothing in the code to stop the counter from continuing through the following bytes in the system variables area and wreaking destruction!

How long will this take? The counter is four bytes and updates 300 times per second:

  • Four bytes = 2^32 ticks
  • = 4 billion ticks
  • = 14,316,557 seconds
  • = 238,609 minutes
  • = 3976 hours
  • = 165.7 days

What Gets Trashed?

So I looked to see what system data was going to be trashed in six months. Thankfully those clever Amstrad engineers already thought about this. The TIME data starts at address &B8B4 (on a ‘6128) so the data to be ‘trashed’ is at &B8B8. The Amstrad firmware zeroes that data whenever the TIME value value is set (as it is at system startup):

KL_TIME_SET:
        di                   ;Disable interrupts
        xor     a            ;A=0
        ld      (RAM_b8b8),a ;'Guard byte' address
        ld      ($b8b6),de   ;Write high word of new value
        ld      (TIME_),hl   ;Write low word of new value
        ei                   ;Enable interrupts
        ret

So memory address &B8B8 is a ‘guard byte’ which doesn’t contain anything of value3.

The Next Byte…

Except, of course, that even this byte will overflow at some point and the data at &B8B9 will be the one to get trashed. And what is at &B8B9? This appears to be the data block for the system’s frame flyback event handler:

;API routine to add a frame flyback event handler
KL_ADD_FRAME_FLY:
        ld      de,RAM_b8b9
        jp      add_event_to_an_event_list

Without digging deeply into the firmware event routines I’m not entirely sure what will happen when this gets corrupted. I’m expecting that the chain of event handler data blocks will be corrupted, the system will read what it expects to be event handler addresses from ‘random’ memory addresses and the machine will crash when those event handlers are called.

And when will this happen? That extra byte effectively extends the 32-bit counter to 40 bits. Multiply the original 165.7 days by 256 and you get 42,419 days, or 116 years.

So, if you switched your computer on in 1985 when you first received it and have kept it switched on ever since you can expect your machine to crash 116 years later in 2104!

Is It Really a Bug?

Is this really a bug? Clearly no-one expects any such machine to be left on continuously for over 100 years. And I doubt anyone even expected these machines to be in regular use almost forty years later, let alone at nearly three times that span. So I’d think of this more as a case saving a couple of bytes of code and a few microseconds of execution time by making a very reasonably assumption about real world usage. But that’s not nearly as fun as thinking of the ticking time-bomb waiting to destroy your machine, is it?

Footnotes

  1. Ticks will be missed when interrupts are disabled for any extended period, as they are when reading from or writing to tape.
  2. The Z80 uses the address counter (as opposed to the ALU) for these instructions, the same counter used to increment or decrement the program counter, the stack pointer, and the (R)efresh register. The address counter can increment (or decrement) a 16-bit address in a single clock cycle, whereas the ALU would require four clock cycles to do the same (plus the cycles required to move registers to the counter/ALU and return the result(s)).
  3. And the fact it is cleared every time TIME is set ruins any thoughts I had of setting TIME to an unreasonably high value and waiting for the chaos to ensue.