This is the second of my development logs as I work to create a replacement for the Amstrad CPC gate array based around a Raspberry Pi RP2350 microcontroller.
In the previous article I divided the gate arrays input and output signals into a number of blocks depending on their function. In this article I implement the FSIGS signals. These are signals which have the same timings on every gate array cycle (Ie. they have fixed timings).
My design uses an array of integers in memory, with each entry corresponding to one step in the gate array cycle and with one bit of each entry corresponding to a specific gate array signal. I will then use a DMA (direct memory access) to copy each array entry in turn to a PIO. The PIO will in turn copy those data values to the relevant GPIO pins. The PIO will update the pin states at the correct time interval. The DMA will be paced by the PIO such that it will only send it data when the PIO is ready to receive it, thereby avoiding an ‘overflow’ of data.
The PIO Program
There are six signals in the FSIGS ‘block’. I have assigned these to six contiguous GPIO pins of the RP2350 starting at GPIO 0. This enables the PIO program to be trivially simple.
.program fsigs
pull block ;Get data
out pins,6 [3] ;Copy to output pins (there are six of them)
;Wait for 3 cycles, so each loop takes 5 cycles
The pull
instructions reads an entry from the TX FIFO (transmit FIFO – data send into the PIO to be ‘transmitted’ out via pins) into the OSR (Output Shift Register). The block
specifier tells the PIO to wait until data is available in the FIFO. Using block
in this way ensures the PIO will pause at startup until we have the DMA configured and running to send it data. It also allows us to easily pause the host computer in a known state simply by stopping the DMA which feeds it. This enables interesting possibilities to reconfigure the RP2350 ‘on the fly’ to different signal timings.
The out
instruction moves data from the OSR register to one of a number of destinations. In this case it’s moving six bits data to six contiguous GPIO pins
(see below for how the pins are configured). To output data at the correct intervals to match the CPC’s gate array the PIO needs to output data every fifth clock cycle (I’ll discuss system clock frequency below). Each instruction in a PIO executes in a single cycle, the [3]
after the instruction tells the PIO to pause for three clock cycles after executing the instruction, thus bringing the total execution time for the code to five cycles.
The PIO program loops automatically back to the first instruction once the end of the program has been reached (this can be changed if required) so this program will run eternally.
PIO Initialisation Function
The PIO also needs a small initialisation function to set the configuration.
void fsigs_program_init(PIO pio, uint sm, uint offset, uint first_pin) {
for (int i=0;i<6;i++) {
pio_gpio_init(pio, first_pin+i);
}
pio_sm_set_consecutive_pindirs(pio, sm, first_pin, 6, true);
pio_sm_config c = fsigs_program_get_default_config(offset);
sm_config_set_out_pins(&c, first_pin, 6);
pio_sm_init(pio, sm, offset, &c);
}
The loop ‘claims’ the six pins which will be used by the PIO and initialises them. The next line sets the pin directions to output.
The code then initialises a default PIO configuration, changes that configuration as required and initialises the PIO SM (state machine).
The only configuration option I need to change here is to specify the pins to be used by the out
instruction, by setting the first GPIO index (first_pin
) and the number of (contiguous) pins to use (6
).
PIO Creation
I then move to the core code to create and enable the PIO. I first define a couple of constants to specify the PIO and the state machine within that PIO which will run the program (each PIO has four state machines, so can run up to four programs concurrently), and the index of the first GPIO to use.
#define SIGS_PIO pio0
#define FSIGS_SM 0
#define FSIGS_FIRST_PIN
And then some code to load the PIO program and set it running.
uint offset = pio_add_program(SIGS_PIO, &fsigs_program);
printf("FSIGS program loaded at %d\n", offset);
fsigs_program_init(SIGS_PIO, FSIGS_SM, offset, FSIGS_FIRST_PIN);
pio_sm_set_enabled(SIGS_PIO, FSIGS_SM, true);
FSIGS Data
That completes the PIO program itself. Now for the table of data to feed to it.
A full Amstrad gate array cycle takes 16 cycles of the 16MHz oscillator to execute, with signals transitioning on both the rising and falling edges of that oscillator. I therefore need to use a 32 step cycle to emulate that.
I created the data in the array below after analysing an original 40007 gate array and aligning signal transitions to the nearest clock edge. The bits in each array entry correspond to, from MSB to LSB: READY, /RAS, /CASAD, /CPU, /CCLK, and PHI.
The first two numbers in each comment specify the index of PHI clock at that point, and the index of steps within each PHI clock cycle. This was very helpful when adding the signal data.
The remaining comments that detail any signal transitions in that data point. (The data shown here incorporates a couple of tweaks which I deemed necessary when checking the oscilloscope trace at the end of this article).
const int8_t fsigs40007[FSIGS_COUNT] = {
0b010010, //4:7 CPU low, RAS high
0b011010, //4:8 CASAD high
0b011001, //1:1 CCLK low
0b101001, //1:2 READY high, RAS low
0b101001, //1:3
0b100001, //1:4 CASAD low
0b100000, //1:5
0b100000, //1:6
0b100000, //1:7
0b100000, //1:8
0b100001, //2:1
0b100001, //2:2
0b010101, //2:3 READY low, RAS high, CPU high
0b011101, //2:4 CASAD high
0b011100, //2:5
0b011100, //2:6
0b001100, //2:7 RAS low
0b000100, //2:8 CASAD low
0b000101, //3:1
0b000101, //3:2
0b000101, //3:3
0b000101, //3:4
0b000100, //3:5
0b000100, //3:6
0b000110, //3:7 CCLK high
0b000110, //3:8
0b000111, //4:1
0b000111, //4:2
0b000111, //4:3
0b000111, //4:4
0b000110, //4:5
0b000110, //4:6
};
There are actually two different original Amstrad gate arrays, each running the same design but with different pinouts. To allow my hardware to use different pinouts itself (to make PCB routing easier) I defined a second array into which the above data is copied at startup. (This would also be helpful if using alternative timings, which would be possible, for example, if using different hardware or even to optimise original hardware for higher speeds).
int8_t fsigs[FSIGS_COUNT] __aligned(FSIGS_COUNT);
This declaration also specifies a memory alignment which will be required by the DMA (see below).
At this point I wrote some temporary code to feed each array element to the PIO at a frequency of 1Hz. This allowed me to use a volt-meter to test the PIO was correctly driving each pin.
The DMA
I then turned to creating the DMA which will copy values from the fsigs
array to the TX FIFO of the PIO. The DMA will be set to eternally loop over the data in fsigs, copying one byte per step to the PIO.
The code for this begins by finding an unused DMA channel and then configuring a few settings.
// Get a free channel, panic() if there are none
fsigs_dma_chan = dma_claim_unused_channel(true);
dma_channel_config c = dma_channel_get_default_config(fsigs_dma_chan);
//8-bit transfers
channel_config_set_transfer_data_size(&c, DMA_SIZE_8);
//Inc read address (data array)
channel_config_set_read_increment(&c, true);
//Don't inc write address (PIO FIFO)
channel_config_set_write_increment(&c, false);
//Loop read address every 32th byte. (2^5)
channel_config_set_ring(&c, false, 5);
//Pace to PIO DREQ
channel_config_set_dreq(&c, FSIGS_DREQ);
dma_channel_configure(
fsigs_dma_chan, // Channel to be configured
&c, // The configuration we just created
&SIGS_PIO->txf[FSIGS_SM], // Write address (PIO FIFO)
&fsigs[0], // The initial read address - fsigs array
-1, //Run endlessly
true // Start immediately.
);
The items being configured in the above code are:
- A DMA can copy 1, 2 or 4 bytes per step. There are only six signals to drive which obviously fits within a single byte, therefore I have defined the arrays above to use one byte per entry (int8_t type) and the DMA will need to be configured to copy one byte per step.
- The DMA is reading from an array in memory, so it needs to increment the read address after each step.
- The DMA is writing to the TX FIFO of the PIO. As with all hardware on the RP2350 the I/O address is memory mapped. Thus I need the DMA to write to a fixed address and need to configure it to not increment (or indeed decrement) that address.
- We need the DMA to constantly loop over the fsigs array when reading data. A DMA can easily be configured to loop if the number of entries is a power of two. Thankfully our array contains 32 entries. The
channel_config_set_ring
function configures the read address (second parameter, false) to only increment the bottom five bits (third parameter). Note that this also requires the data to be aligned in memory – this being the reason I declared the fsigs array with the__aligned(32)
attribute. - As mentioned above we need the DMA to be paced by the PIO. In other words the DMA must only send data to the PIO when the PIO can accept data. To enable this the PIO has a DREQ (data request) flag which it sets whenever it’s TX FIFO can accept data. The DMA is configured with the appropriate DREQ flag for the TX FIFO or SM (state machine) zero of PIO zero. Here
FSIGS_DREQ
is a constant assigned elsewhere in the code to the valueDREQ_PIO0_TX0
.
The last function call in this code configures,
- The address to write to (the PIO’s TX FIFO – note that we need to use the address of the FIFO).
- The initial read address (the first element of the fsigs array – again we need to be careful to give the address rather than the value).
- The number of bytes to copy. The TRANSFER_COUNT value on the RP2350 is slightly different to that on the RP2040. The RP2350 uses the four highest bits to specify a count mode. A count mode of $f signifies ENDLESS – the transfer count never decrements and, therefore, the transfer continues indefinitely. Using a value of -1 is a simple way of setting those count mode bits.
- And triggers the DMA to start running (true).
System Clock Frequency
The final step (although the first one to be executed in the final code) is to configure the system clock frequency of the RP2350.
As noted above the original Amstrad gate array runs from a 16MHz crystal oscillator, and therefore has signals transitioning on both clock edges of the oscillator, ie. at 32MHz. To keep the code simple I want the RP2350 to be running at a frequency which is an integer multiple of that 32Mhz.
The RP2350 is rated for a maximum clock system speed of 150MHz. The two closest multiples of 32MHz to that are 120MHz and 160MHz. Given that the previous generation RP2040 microchip copes very well with being overclocked I have chosen to run the system at the higher frequency of 160MHz. This is slightly over 7% above the rated maximum speed so I’m not expecting any issues. If there are then it shouldn’t be to difficult to rework the design to use a 120MHz clock frequency.
Setting the system clock frequency entails configuring the PLL which generates it. The reference design for an RP2350 uses a 12MHz oscillator. This frequency needs to by multiplied and then divided to generate the required frequency. I leave you to read my source comments regarding the values chosen.
//Set the system clock to 160MHz
//(Crystal is 12MHz. Multiply that by 80 to give 960MHz VCO frequency,
//divide by 6 to get 160MHz sys_clk)
//This frequency means we can easily divide down to the 16MHz crystal
//frequency of the CPC, giving us 10 ticks for each tick of the CPC.
//(And 5 ticks for each state (high/low) of the CPC crystal)
//Note the slight over clocking here. The next highest even multiple of
//the CPC crystal would be 120MHz (12*64 = 768MHz / 6 = 120MHz).
set_sys_clock_pll(12*80 * 1000000, 6, 1);
The Results
Below is an oscilloscope trace of the signals generated by the RP2350. The PHI signal is running at 4MHz and the others match the signals I previously measured on an original 40007 gate array. However, since recording the image I have slid the READY and /RAS signal forward a step to better align with those measured signals. The fsigs array code above shows the updated data.