dart-sdk/runtime/docs/compiler/data_dep_for_control_dep.md
Aart Bik aae0f7997d [vm/compiler] Express control dependence as data dependence.
Rationale:
This provides a more robust way of expressing control
dependences between bounds checks and their uses, while
still allowing for CSE and LICM where possible.
This also fixes the original bug that started this
whole redesign: invalid LICM of checkbound.

Note:
Deals with bounds check, null check still TBD.

Reland of:
https://dart-review.googlesource.com/c/sdk/+/85470

Mini design doc:
runtime/docs/compiler/data_dep_for_control_dep.md

https://github.com/dart-lang/sdk/issues/35139
https://github.com/dart-lang/sdk/issues/34684
https://github.com/dart-lang/sdk/issues/30633

Change-Id: I24b994344ed28c9f5fbf0ed033698be75448df1c
Reviewed-on: https://dart-review.googlesource.com/c/85720
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
2018-12-03 18:07:39 +00:00

6.5 KiB

Dart Mini Design Doc

Using Data Dependence to Preserve Control Dependence

Introduction

The DartFuzzer found bug 34684 with the current LICM implementation. We need a fix urgently, since this bug may break user code, emerges for pretty much every nightly fuzz test run (making investigating new fuzz test divergences cumbersome; most of the time, it is a "same as"), and blocks moving towards more shards per nightly fuzz test run (the Q4 objective is 100 shards with 8 isolates each). Fixing the LICM bug, however, requires a bit more redesign in the way control dependence is preserved in our IR.

The Problem

The bug itself consists of an issue with applying LICM to an array bound check with loop invariant values. In its simplest form, LICM should not be applied to the bounds check that belongs to l[j] below since the loop condition (for a situation k <= 0) could logically protect OOB exceptions for particular values of n (for example, calling foo(2,0) should not throw).

void foo(int n, int k) {
  var l = new List<int>(1);
  var j = n - 1;
  for (var i = 0; i < k; i++) {
     l[j] = 10;
  }
}

Hosting the seemingly loop invariant bounds check out of the loop would break the control dependence between the loop condition and potentially move an OOB exception into the always-taken path.

An obvious fix would simply disable hoisting CheckBound constructs of out loops, thereby respecting the control dependence between condition and check. However, this exposes another omission in our IR with respect to control dependence. Leaving CheckBound in place would not prevent hoisting the IndexLoad that follows, as illustrated below.

              ←-----------------+
for loop                         |
    Checkbound (length, index)   |      <- pinned in the loop
    IndexLoad (base, index)   ---+

In this case, this problem is caused by the fact that the control dependence from CheckBound to IndexLoad is not made explicit. Trending on the same path would require us to disable hoisting such instructions as well….

At first glance, it may seem that a Redefinition could help here, as in:

for loop
  v0 = get array
  v1 = get length of v0
  v2 = get index
  CheckBound:id(v1, v2)
  v3 <- Redefinition(v2)
  IndexLoad(v0, v3)

This would indeed keep the index load instruction inside the loop, thereby relying on not-too-well-documented assumptions on unmovable redefinitions and the now explicit data dependence on v3. However, this approach would be too prohibitive, since cases that would allow for hoisting or eliminating CheckBound from the IR would not break the newly introduced chain on v3 (nothing in the IR expresses the relation between the Redefinition and the CheckBound).

Alternatively, we could introduce control dependence as an explicit, but orthogonal order in our compiler. However, this would require introducing a new data structure to our IR as well as inspecting all code to ensure that this new order is indeed respected. The next section introduces a simpler solution.

Proposed Solution

The proposed solution is making the control dependence between any check and all its uses explicit with a data dependence, as shown below.

  v3 <- CheckBound:id(v1, v2)   // returns the index value v2 in v3
  IndexLoad(v0, v3)

The semantics of the new node is that the returned value (v3) is a safe index (viz. checked values v2) into any load that uses that value. Any optimization that hoists or removes the CheckBound automatically exposes the opportunity for further optimization of the load by hoisting or breaking the dependence on v3. Common subexpression elimination can also be applied to equal-valued CheckBound nodes.

For completeness, the same approach will be taken for null checks.

The construct

 CheckNull:id(v2)
 v100 <- LoadField(v2, …)

will be replaced by

 v3 = CheckNull:id(v2)  // returns the reference value v2 in v3
 v100 <- LoadField(v3, …)

Here, the value v3 denotes a safe, null-checked reference of v2.

The explicit data dependence ensures that all passes and transformations automatically preserve the required order, without the need to make adjustments anywhere else. In contrast, introducing an explicit control dependence as a new concept in our compiler would require a careful inspection of all code to make sure the new dependence is respected. A drawback of the new "indirection" through the check is that it may break some optimizations and simplifications that inspect the inputs directly. Although cumbersome, since it also involves looking at a lot of code, this is easily remedied by "looking under the hood" of checks (as is done for redefinitions). Missed opportunities for optimizations are preferable over missed correctness.

The proposed solution will have no impact on the register allocator or any of our backends, since the new data dependence will be removed in the FinalizeGraph pass, similar to what is already done now for redefinitions in RemoveRedefinitions(), except that this method will only redirect the inputs across checks, but obviously not remove the checks themselves. Nothing that runs after the register allocator should move code around too freely, an assumption that is already made in our current implementation with respect to redefinitions.

This approach was used successfully in the ART optimization compiler. Here, check-based control dependence was made explicit with data dependence. All passes and transformations were aware that data dependence should always be satisfied (inter- and intra-basic block), whereas all optimizations that crossed basic blocks were aware of implicit control dependence (e.g. using dominance relation). In combination with the actual LICM fix, the proposed solution will result in a previously proven robust framework for null and bounds checks.

IR Nodes

Our IR currently has the following "check" instructions. Although potentially others could benefit form this new scheme too, the first CL will focus on the ones marked below only. As the implementation progresses, we may introduce some "object-orientedness" for check instructions that return their safe value.

CheckEitherNonSmiInstr
CheckClassInstr
CheckSmiInstr
CheckNullInstr                 // returns safe non-null reference
CheckClassIdInstr
CheckArrayBoundInstr           // returns safe non-OOBE index
GenericCheckBoundInstr         // returns safe non-OOBE index
CheckConditionInstr