dart-sdk/pkg/dart2wasm/tool
Ömer Sinan Ağacan 0f54180b51 [dart2wasm] New typed data implementation
New typed data implementation that optimizes the common cases.

This uses the best possible representation for the fast case with a
representation like:

    class _I32List implements Int32List {
      final WasmIntArray<WasmI32> _data;

      int operator [](int index) {
        // range check
        return _data.read(index);
      }

      void operator []=(int index, int value) {
        // range check
        _data.writeSigned(index, value);
      }

      ...
    }

This gives us the best possible runtime performance in the common cases
of:

- The list is used directly.
- The list is used via a view of the same Wasm element type (e.g. a
  `Uint32List` view of a `Int32List`) and with aligned byte offset.

All other classes (`ByteBuffer`, `ByteData`, and view classes)
implemented to be able to support this representation.

Summary of classes:

- One list class per Dart typed data list, with the matching Wasm array
  as the buffer (as shown in the example above): `_I8List`, `_U8List`,
  `_U8ClampedList`, `_I16List`, `_U16List`, ...

- One list class per Dart typed data list, with mismatching Wasm array
  as the buffer. These classes are used when a view is created from a
  list, and the original list has a Wasm array with different element
  type than the view needs. `_SlowI8List`, `_SlowU8List`, ...

  These classes use `ByteData` interface to update the buffer.

- One list class for each of the classes listed above, for immutable
  views. `_UnmodifiableI32List`, `_UnmodifiableSlowU64List`, ...

  These classes inherit from their modifiable list classes and override
  update methods using a mixin.

- One `ByteData` class for each Wasm array type: `_I8ByteData`,
  `_I16ByteData`,
  ...

- One immutable `ByteData` view for each `ByteData` class.

- One `ByteBuffer` class for each Wasm array type: `_I8ByteBuffer`,
  `_I16ByteBuffer`, ...

- A single `ByteBuffer` class for the immutable view of a byte buffer.

  We don't need one immutable `ByteBuffer` view class per Wasm array
  type as `ByteBuffer` API does not provide direct access to the buffer.

Other optimizations:

- `setRange` now uses `array.copy` when possible, which causes a huge
  performance win in some benchmarks.

- The new implementation is pure Dart and needs no support or special
  cases from the compiler other than the Wasm array type support and
  intrinsics like `array.copy`. As a result this removes a bunch of
  `entry-point` pragmas and significantly reduces code size in some
  cases.

Other changes:

- Patch and implementation files for typed data and SIMD types are split
  into separate files. `typed_data_patch.dart` and `simd_patch.dart` now
  only contains patched factories. Implementation classes are moved to
  `typed_data.dart` and `simd.dart` as libraries `dart:_typed_data` and
  `dart:_simd`.

Benchmark results:

This CL significantly improves common cases. New implementation is only
slower than the current implementation when a view uses a Wasm array
type with incompatible element type (for example, `Uint32List` created
from a `Uint64List`).

These cases can still be improved by overriding the relevant `ByteData`
methods. For example, in the example of `Uint32List` view of a
`Uint64List`, by overriding `_I64ByteData.getUint32` to do a single read
then requested bytes don't cross element boundaries in the Wasm array.
These optimizations are left as future work.

Some sample benchmarks:

vector_math matrix_bench before:

    Binary size: 133,104 bytes.
    MatrixMultiply(RunTime): 201 us.
    SIMDMatrixMultiply(RunTime): 3,608 us.
    VectorTransform(RunTime): 94 us.
    SIMDVectorTransform(RunTime): 833 us.
    setViewMatrix(RunTime): 506 us.
    aabb2Transform(RunTime): 987 us.
    aabb2Rotate(RunTime): 721 us.
    aabb3Transform(RunTime): 1,710 us.
    aabb3Rotate(RunTime): 1,156 us.
    Matrix3.determinant(RunTime): 171 us.
    Matrix3.transform(Vector3)(RunTime): 8,550 us.
    Matrix3.transform(Vector2)(RunTime): 3924 us.
    Matrix3.transposeMultiply(RunTime): 201 us.

vector_math matrix_bench after:

    Binary size: 135,198 bytes.
    MatrixMultiply(RunTime): 42 us.
    SIMDMatrixMultiply(RunTime): 2,068 us.
    VectorTransform(RunTime): 12 us.
    SIMDVectorTransform(RunTime): 272 us.
    setViewMatrix(RunTime): 82 us.
    aabb2Transform(RunTime): 167 us.
    aabb2Rotate(RunTime): 147 us.
    aabb3Transform(RunTime): 194 us.
    aabb3Rotate(RunTime): 199 us.
    Matrix3.determinant(RunTime): 70 us.
    Matrix3.transform(Vector3)(RunTime): 726 us.
    Matrix3.transform(Vector2)(RunTime): 504 us.
    Matrix3.transposeMultiply(RunTime): 53 us.

FluidMotion before:

    Binary size: 121,130 bytes.
    FluidMotion(RunTime): 270,625 us.

FluidMotion after:

    Binary size: 110,674 bytes.
    FluidMotion(RunTime): 71,357 us.

With bound checks omitted (not in this CL), FluidMotion becomes
competitive with `dart2js -O4`:

FluidMotion dart2js -O4:

    FluidMotion(RunTime): 47,813 us.

FluidMotion this CL + boud checks omitted:

    FluidMotion(RunTime): 51,289 us.

Fixes #52710.

Tested: With existing tests.
Change-Id: I33bf5585c3be5d3919a99af857659cf7d9393df0
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/312907
Reviewed-by: Joshua Litt <joshualitt@google.com>
Commit-Queue: Ömer Ağacan <omersa@google.com>
2023-07-20 09:47:39 +00:00
..
compile_benchmark [dart2wasm] New typed data implementation 2023-07-20 09:47:39 +00:00
run_benchmark [dart2wasm] New async implementation 2023-05-22 08:32:12 +00:00