Writing a Compiler 2: Conditionals

In part one I showed how my compiler is parsing expressions and converting this into an intermediate language format suitable for optimisation and code generation. The code in part one was purely linear with no branches or loops. In this second article I begin to deal with code which has multiple execution paths. I’ll start with conditionals (i.e. IF statements) and the additions necessary in the intermediate language to be able to process them.

An IF statement takes the form:

if <test> then
  <true-code>
else 
  <false-code>

The ELSE clause is, of course, optional.

Basic Blocks

There are a few things we need to add to the intermediate language to be able to encode this. The first is Basic Blocks (which I’ll refer to as Blocks).

A Basic Block is a section of code with a single execution path. If we look at the IF statement above we have the following Basic Blocks:

The code up to and including the THEN keyword (which will probably include statements preceding the IF statement to the preceding branching operation).
The <true-code> section.
The <false-code> section.
The statement(s) after the ELSE block, where these paths merge together. Again this will probably include the following statements until the next branching operation.

Also, note that the <true-code> and <false-code> sections may consist of multiple Basic Blocks if they contain further branching operations.

Each Basic Block needs a unique identifier. I have implemented this by adding a BlockID field to the TILItem record. This is an integer value which is simply incremented for each successive Block:

type TILItem = record
    BlockID: Integer;       //Contains +ve value on first line of a block, otherwise -1
    Op: TILOperation;       //The operation

    Param1Loc: TILLocation; //Location of parameter 1
    Param1Data: Integer;    //Data for parameter 1.
    Param1Sub: Integer;     //Subscript for parameter 1 (i.e. the variable write index)

    Param2Loc: TILLocation; //Location for parameter 2
    Param2Data: Integer;    //Data for parameter 2
    Param2Sub: Integer;     //Subscript for parameter 2 (i.e. the variable write index)

    case DestType: TDestType of
      dtData: (
        DestLoc: TILLocation;
        DestData: Integer;      //Destination (a temp result index). -1 if operation has no result
        DestSub: Integer;
      );
      dtBranch:
      (
        TrueBlock: Integer;     //Block number to branch to if condition is true
        FalseBlock: Integer;    //Block number to branch to if condition is false
      );
  end;

You’ll note the CASE statement towards the end of the record declaration. This is a useful Pascal feature, called variant records, which allows a single record to map data in different ways. In this case there is a field called DestType. The next section of the record will store different data depending on the value of DestType. The two variants here either store data for an assignment (i.e. to a variable) or for a branch (see the next section).

Branches

We clearly need a way to define branches (jumps) between these Blocks. For example the <test> needs to be able to branch to the <true-code> or <false-code> Blocks, and the <true-code> section needs to be able to branch to the Block where the paths merge.

I’m also setting a requirement for every Block to end with an explicit branch to another Block. In other words we can’t simply allow a Block to ‘fall-through’ to the next Block. This is a requirement for the optimiser, which needs the ability to re-order code, and perform other pieces of magic.

So, we need a primitive in the intermediate language for branches.

It’s pretty easy to see how we can implement this in a compiler:

Parse the conditional test
Add a branch operation for the conditional test, branching to either <true-code> or <false-code> (If there is no ELSE clause then the branches will be to either <true-code> or the merge Block).
Parse the <true-code>.
Add a branch to the merge Block.
Parse the <false-code> (if there is an ELSE clause).
Add a branch to the merge Block (if there is an ELSE clause).
Do any processing needed for the merge Block.

You might notice that at several stages of this process we need information which is not available yet. For example when we add the branch for the conditional test we need to know the IDs of the <true-code> and <false-code> Blocks.

The <true-code> Block will immediate follow the Block containing the test, so it’s Block ID will be one more than the current Block ID. But we won’t know the Block ID of the <false-code> section until the the <true-code> section has been processed. For example if it contains a nested IF statement it will contain multiple Blocks. Therefore we need to wait until these Block IDs are known and them return to ‘fix-up’ the branch primitives.

Thankfully there’s a solution to this in compiler design. If you’ve ever heard the term ‘recursive descent compiler’ this is what it refers to. We make a recursive call to the parser for the <true-code> and <false-code> sections. These calls might then make their own recursive calls and so on. Eventually the parser will return and we can make those those fix-ups to the branch code.

Here’s the full code for parsing an IF statement. Ignore any variable relating to Indexes – those will be explained in future articles when I describe the ‘branch fixup’ and ‘phi nodes’.

function DoIF: TAssembleError;
var Branch: PILItem;
  ThenBranch: PILItem;
  ElseBranch: PILItem;
  //BlockIDs
  BranchID: Integer; //Block ID of the conditional branch
  ThenLastID: Integer; //Last block of the THEN branch
  ElseLastID: Integer; //Last block of the ELSE branch 
  //ILItem indexes
  BranchIndex: Integer; //Item containing the conditional branch (i.e. end of previous code)
  ThenLastIndex: Integer; //Last item in the THEN path
  ElseLastIndex: Integer; //Last item in the ELSE path. -1 if no ELSE path 
begin
  Result := ParseExpression(Branch);
  if Result <> errNone then
    EXIT;

  //Convert item to a branch
  Branch.DestType := dtBranch;

  //Test for THEN
  Parser.SkipWhiteSpace;
  if not TestForIdent('then') then
    EXIT(errTHENExpected);

  BranchID := GetCurrBlockID;
  BranchIndex := ILGetCount-1;
  //THEN Block
  NewBlock := True;
  Branch.TrueBlock := GetCurrBlockID + 1;
  Result := ParseBlock(bsSingle);
  if Result <> errNone then
    EXIT;
  //Branch to merge block
  ThenBranch := ILAlloc(dtBranch);
  ThenBranch.FalseBlock := -1;
  //Unconditional branch
  ThenLastID := GetCurrBlockID;
  ThenLastIndex := ILGetCount-1;

  //Test for ELSE
  Parser.SkipWhiteSpaceAll;
  if TestForIdent('else') then
  begin
    //If so, ELSE block
    NewBlock := True;
    Branch.FalseBlock := GetCurrBlockID + 1; 

    //ParseBlock    Result := ParseBlock(bsSingle);
    if Result <> errNone then      EXIT;
    //Branch to merge block    ElseBranch := ILAlloc(dtBranch);    ElseBranch.FalseBlock := -1; //Unconditional branch
    ElseLastID := GetCurrBlockID;    ElseLastIndex := ILGetCount-1;
    //Branch to merge block    NewBlock := True;    ElseBranch.TrueBlock := GetCurrBlockID + 1;
    Parser.SkipWhiteSpaceAll;  end
  else
  //No ELSE - if condition failed it jumps straight to the merge block 
  begin 
    NewBlock := True;
    ElseLastID := -1;
    ElseLastIndex := -1; 

    Branch.FalseBlock := GetCurrBlockID + 1;  end;

  //Fixup branches at end of THEN and ELSE blocks NewBlock := True;            
  ThenBranch.TrueBlock := GetCurrBlockID + 1;

Examples

Lets have a look at how this works in practice. The first example is a basic IF … THEN … ELSE statement. The source code is:

if a=1 then
  a:=a+1
else
  a:=a-1

and the IL output is:

1:
Branch {2,3} Equal %a_0, 1 
2: 
%a_1 = Add %a_0, 1 
Branch {4} 
3: 
%a_2 = Subtract %a_1, 1 
Branch {4} 
4:

The numbers followed by colons are the Block identifiers.

We begin with Block 1 which is the conditional test and branch. The Branch tells us that the code jumps to either Block 2 or 3 depending on the result of the Equal test.
Block 2 is the <true-code> and jumps to Block 4 when done.
Block 3 is the <false-code> and, again, jumps to Block 4 when done.
Block 4 is where the two paths merge together.

Example 2

Here’s an example without the ELSE clause:

1: 
Branch {2,3} Equal %a_0, 1 
2: 
%a_1 = Add %a_0, 1 
Branch {3} 
3:

Now the paths merge together in Block 3 with the branches update appropriately.

Example 3

And a final example with a nested IF statement:

if a=1 then
  a:=a+1 
else 
begin
  if b=1 then
    a:=a+b
  b:=0 
end

and the IL code:

1: 
Branch {2,3} Equal %a_0, 1 
2: 
%a_1 = Add %a_0, 1 
Branch {6} 
3: 
Branch {4,5} Equal %b_0, 1 
4: 
%a_2 = Add %a_1, %b_0 
Branch {5} 
5: 
%b_1 = 0 
Branch {6} 
6:

This time the <else-code> consists of Blocks 3 to 5 with Block 4 being the <true-code> of the nested conditional. Block 5 is where the nested conditional merges. The outer conditional merges together in Block 6.

A Last Minute Bug

However, there is a bug in the last two tests output above. In both cases the ELSE clause should be read a value from the %a variable, but the read the version assigned in the ELSE clause. In other words their using %a_1 when they should be using %a_0 – the version assigned (or, in this case assumed) before the conditional. This is a tricky issue to resolve, and I’ll describe my solution in the next post.

You can view the code so far in the repository.

Bread80.com

8 Bit Playground