Writing a Compiler 3: The Branch Fixup

I left the last article by talking about a bug. Look at this source code:

if a=1 then
  a:=a+1
else
  a:=a-1

and the IL which is being generated from it:

1:
Branch {2,3} Equal %a_0, 1
2:
%a_1 = Add %a_0, 1
Branch {4}
3:
%a_2 = Subtract %a_1, 1
Branch {4}
4:

Assignments to a variable are versioned. I.e. every variable reference has a ‘subscript’, which is incremented for every write. So in this example we have %a_0, %a_1, and %a_2, all of which refer to assignments the variable %a.

  • %a_0 is the subscript version before we reach the conditional.
  • %a_1 is the subscript version for the assignment in the THEN clause.
  • %a_2 is the subscript version for the assignment in the ELSE clause.

The value of %a_1 (THEN clause) is based on the value of %a before the conditional – %a_0.
The value of %a_2 (ELSE clause) should also be based on %a_0. But my code is wrongly trying to base the value on %a_1. This is not only the wrong value, but since the THEN clause won’t have been executed (the ELSE clause being executed instead) this version of %a won’t even exist.

This is happening because each variable (during parsing) stores the current assignment subscript version. When a read of a variable is parsed the most recent subscript version is read from the variable data and stored in the IL code. When an assignment to a variable is parsed the subscript version stored in the variable is incremented and stored in the IL code.

Solutions

There are a number of possible solutions to this problem. One would be to store a copy the current versions of each variable at the start of the conditional (or store copies of old data as variables are assigned). The downside with this is that the parser is recursive and the code needs to be re-entrant. We would end up having to store multiple data sets at every branch. This would require complex data structures and consume more storage than I want to use. It would also be necessary store both the write subscript version and the read subscript version for every variable.

Instead I prefer a solution which does a ‘fixup’ of the IL code after the conditional has been processed and the multiple branches have merged back together.

To do this I need to find references to variable reads which access the version predating the start of the branch, then update their subscript version to the correct one.

My algorithm is as follows:

I step through each line in the left (THEN) branch.
  For every variable assignment,
    if it's the first assignment to that variable
      the version prior to the conditional will be one less than this assignment version
I then step though each line of the right (ELSE) branch.
  For every variable assignment,
    if it's the first assignment to that variable
      make a note of the subscript version - we need to update reads of the version prior to this

  For every variable read
    if we have a version stored after stepping through the left branch (from the first part of the algorithm)
      if we have a version stored from an assignment in this branch
        and the read version is the version prior to this
      or if we have yet to encounter a write to this variable in this branch
        update the subscript read version to that stored when stepping through the left branch

Code

The final code for this breaks down into three fairly simple functions. The first scans down the first code path:

function BranchFixupLeft(Index, StopIndex: Integer): Integer;
var ILItem: PILItem;
  Variable: PVariable;
begin
  Result := 0;
  while Index <= StopIndex do
  begin
    ILItem := ILIndexToData(Index);
    if ILItem.DestType = dtData then
      if ILItem.DestLoc = locVar then
      begin
        Variable := VarIndexToData(ILItem.DestData);
        if Variable.AdjustSubTo = -1 then
        begin
          Variable.AdjustSubTo := ILItem.DestSub - 1;
          inc(Result);
        end;
      end;

    inc(Index);
  end;
end;

The second function scans down the second code path:

procedure BranchFixupRight(Index, StopIndex: Integer);
var ILItem: PILItem;
  Variable: PVariable;
begin
  while Index <= StopIndex do
  begin
    ILItem := ILIndexToData(Index);
    if ILItem.DestType = dtData then
      if ilItem.DestLoc = locVar then
      begin
        Variable := VarIndexToData(ILItem.DestData);
        if Variable.AdjustSubTo >= 0 then
          if Variable.AdjustSubFrom = -1 then
            Variable.AdjustSubFrom := ILItem.DestSub;
      end;

    if ILItem.Param1Loc = locVar then
    begin
      Variable := VarIndexToData(ILItem.Param1Data);
      if Variable.AdjustSubTo >= 0 then
        if (Variable.AdjustSubFrom = -1) or (ILItem.Param1Sub < Variable.AdjustSubFrom) then
          ILItem.Param1Sub := Variable.AdjustSubTo;
    end;

    if ILItem.Param2Loc = locVar then
    begin
      Variable := VarIndexToData(ILItem.Param2Data);
      if Variable.AdjustSubTo >= 0 then
        if (Variable.AdjustSubFrom = -1) or (ILItem.Param2Sub < Variable.AdjustSubFrom) then
          ILItem.Param2Sub := Variable.AdjustSubTo;
    end;

    inc(Index);
  end;
end;

And the final function wraps up these two to do our bidding. Notice that the function for the left branch returns a count of the number of variable assignments found. If this number is zero we don’t need to call the right-hand function:

procedure BranchFixup(LeftIndex, LeftStopIndex, RightIndex, RightStopIndex: Integer);
begin
  VarClearAdjust;

  //Short circuit if no assignments in the other path.
  if BranchFixupLeft(LeftIndex, LeftStopIndex) > 0 then
    BranchFixupRight(RightIndex, RightStopIndex);
end;

Example

Revisiting the example at the start of this article now gives the following output:

1:  
0- Branch {2,3} Equal %a_0, 1
2:  
1- %a_1 = Add %a_0, 1
2- Branch {4} 
3:  
3- %a_2 = Subtract %a_0, 1
4- Branch {4} 
4:  
5- %a_3 = phi [%a_1 {2}] [%a_2 {3}]

Note now that the subtraction in Block 3 correctly uses %a_0.

As usual you can find the code for this project in the repository.

One Reply on “Writing a Compiler 3: The Branch Fixup”

Comments are closed.