mirror of
https://github.com/dart-lang/sdk
synced 2024-11-05 18:22:09 +00:00
0f54180b51
New typed data implementation that optimizes the common cases. This uses the best possible representation for the fast case with a representation like: class _I32List implements Int32List { final WasmIntArray<WasmI32> _data; int operator [](int index) { // range check return _data.read(index); } void operator []=(int index, int value) { // range check _data.writeSigned(index, value); } ... } This gives us the best possible runtime performance in the common cases of: - The list is used directly. - The list is used via a view of the same Wasm element type (e.g. a `Uint32List` view of a `Int32List`) and with aligned byte offset. All other classes (`ByteBuffer`, `ByteData`, and view classes) implemented to be able to support this representation. Summary of classes: - One list class per Dart typed data list, with the matching Wasm array as the buffer (as shown in the example above): `_I8List`, `_U8List`, `_U8ClampedList`, `_I16List`, `_U16List`, ... - One list class per Dart typed data list, with mismatching Wasm array as the buffer. These classes are used when a view is created from a list, and the original list has a Wasm array with different element type than the view needs. `_SlowI8List`, `_SlowU8List`, ... These classes use `ByteData` interface to update the buffer. - One list class for each of the classes listed above, for immutable views. `_UnmodifiableI32List`, `_UnmodifiableSlowU64List`, ... These classes inherit from their modifiable list classes and override update methods using a mixin. - One `ByteData` class for each Wasm array type: `_I8ByteData`, `_I16ByteData`, ... - One immutable `ByteData` view for each `ByteData` class. - One `ByteBuffer` class for each Wasm array type: `_I8ByteBuffer`, `_I16ByteBuffer`, ... - A single `ByteBuffer` class for the immutable view of a byte buffer. We don't need one immutable `ByteBuffer` view class per Wasm array type as `ByteBuffer` API does not provide direct access to the buffer. Other optimizations: - `setRange` now uses `array.copy` when possible, which causes a huge performance win in some benchmarks. - The new implementation is pure Dart and needs no support or special cases from the compiler other than the Wasm array type support and intrinsics like `array.copy`. As a result this removes a bunch of `entry-point` pragmas and significantly reduces code size in some cases. Other changes: - Patch and implementation files for typed data and SIMD types are split into separate files. `typed_data_patch.dart` and `simd_patch.dart` now only contains patched factories. Implementation classes are moved to `typed_data.dart` and `simd.dart` as libraries `dart:_typed_data` and `dart:_simd`. Benchmark results: This CL significantly improves common cases. New implementation is only slower than the current implementation when a view uses a Wasm array type with incompatible element type (for example, `Uint32List` created from a `Uint64List`). These cases can still be improved by overriding the relevant `ByteData` methods. For example, in the example of `Uint32List` view of a `Uint64List`, by overriding `_I64ByteData.getUint32` to do a single read then requested bytes don't cross element boundaries in the Wasm array. These optimizations are left as future work. Some sample benchmarks: vector_math matrix_bench before: Binary size: 133,104 bytes. MatrixMultiply(RunTime): 201 us. SIMDMatrixMultiply(RunTime): 3,608 us. VectorTransform(RunTime): 94 us. SIMDVectorTransform(RunTime): 833 us. setViewMatrix(RunTime): 506 us. aabb2Transform(RunTime): 987 us. aabb2Rotate(RunTime): 721 us. aabb3Transform(RunTime): 1,710 us. aabb3Rotate(RunTime): 1,156 us. Matrix3.determinant(RunTime): 171 us. Matrix3.transform(Vector3)(RunTime): 8,550 us. Matrix3.transform(Vector2)(RunTime): 3924 us. Matrix3.transposeMultiply(RunTime): 201 us. vector_math matrix_bench after: Binary size: 135,198 bytes. MatrixMultiply(RunTime): 42 us. SIMDMatrixMultiply(RunTime): 2,068 us. VectorTransform(RunTime): 12 us. SIMDVectorTransform(RunTime): 272 us. setViewMatrix(RunTime): 82 us. aabb2Transform(RunTime): 167 us. aabb2Rotate(RunTime): 147 us. aabb3Transform(RunTime): 194 us. aabb3Rotate(RunTime): 199 us. Matrix3.determinant(RunTime): 70 us. Matrix3.transform(Vector3)(RunTime): 726 us. Matrix3.transform(Vector2)(RunTime): 504 us. Matrix3.transposeMultiply(RunTime): 53 us. FluidMotion before: Binary size: 121,130 bytes. FluidMotion(RunTime): 270,625 us. FluidMotion after: Binary size: 110,674 bytes. FluidMotion(RunTime): 71,357 us. With bound checks omitted (not in this CL), FluidMotion becomes competitive with `dart2js -O4`: FluidMotion dart2js -O4: FluidMotion(RunTime): 47,813 us. FluidMotion this CL + boud checks omitted: FluidMotion(RunTime): 51,289 us. Fixes #52710. Tested: With existing tests. Change-Id: I33bf5585c3be5d3919a99af857659cf7d9393df0 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/312907 Reviewed-by: Joshua Litt <joshualitt@google.com> Commit-Queue: Ömer Ağacan <omersa@google.com> |
||
---|---|---|
.. | ||
compile_benchmark | ||
run_benchmark |